I am the proud owner of a Mini Cooper. The likely explanation for how and why I purchased exactly the car I did highlights some of the central problems advertisers face today. We can investigate what leads advertisers to believe certain forms of advertising affect someone like me, as well as if the decisions they made with that information were reasonable. First, let us assume that the sequence of events prior to and including my purchase went something like the following.
Many years ago, I rented The Italian Job, which has a prominent getaway scene involving Mini Coopers—the first time I had ever heard of the cars. Nearly a decade later, I viewed an advertisement for Minis when watching a video on YouTube, which triggered a memory of Mini Coopers being interesting. After viewing the video I started researching the car. After about a week’s worth of research, I decided that I wanted to purchase one. I did a quick search for Mini Coopers in the Bay Area and was directed to a local dealer’s website by a sponsored link that appeared on the side of the search results. I stopped by the dealer and later purchased my Mini.
In buying my car, I traversed what marketers call the sales funnel. At many points along the way I was exposed to various forms of advertising. These exposures, or touches in marketing terminology, may or may not have contributed to my purchasing decision. For instance, the marketing group at BMW could claim that the product placement in the movie informed me of the the existence of the brand, and that the video ads on YouTube rekindled my interest. The dealer’s search engine marketer could claim that the sponsored search link delivered me straight to their door, rather than some other dealership.
Determining what advertising actions affected my purchase is a question of attribution. There is a multi-billion dollar online advertising industry that is relentlessly devoted to figuring it out, because reasoning about how to employ this information allows us to engineer the most effective experiences for customers.
Historically, advertisers lamented an attribution problem due to data scarcity. Perhaps the most oft-heard expression of this is a quip attributed to John Wanamaker:
Half the money I spend on advertising is wasted; the trouble is I don’t know which half.
As the science of advertising progressed, different channels started to emerge. Some more amenable to experimental analysis than others.
Fortunately, the abundance of rich data on the Internet has partially alleviated the scarcity problem near the end of the funnel—in many cases, we know what advertisements users were exposed to immediately before a potential purchase and what actions the users took. In fact, the distinction campaigns at various points in the funnel is so stark, we classify them into performance (near the end of the funnel) and brand (near the begining of the funnel) campaigns in online advertising. The differences in the types of campaigns are also reflected in how the respective advertisers choose to pay for ad placement, which we discuss subsequently.
In performance advertising scenarios, largely due to the existence of the aforementioned data, we can accurately predict how likely someone is to make a purchase soon after they view an online advertisement. In my view, this is a Herculean task undertaken by some of the best experts in machine learning that I know. Some claim we have reached a sort of phase transition in marketing from Mad Men to Math Men. It is not clear to me that we are quite there yet, many of the institutions from the prior era still exist today. Perhaps they have evolved or are yet to be disrupted. The history of ad agencies themselves is interesting and there is a great infographic about the origins of various agencies, courtesy of VitaminTalent:
As we move further up the funnel, attribution modeling becomes more difficult. In particular, when users have exposure to multiple advertisements, which ones did the trick? Figuring this out correctly is a very difficult problem and, in practice, often requires a well designed experimentation system. As Randall Lewis and David Reiley exposed, it is even easy for experts to significantly error in their analysis. For small advertisers without trained econometricians, this problem can be even more acute. The core of the problem is that when estimating the lift (gain in revenue) of a given advertisement using data from a control and test group, great care must be taken so that the populations are comparable. Consider this simple example.
A restaurant owner wants to test the effect of handing out coupons to potential customers. The owner offers $2 off a $10 meal (the only meal the restaurant offers). The restaurant decides to hand out coupons. The restaurant happens to be the only restaurant situated along a right branch of a fork in a walking path. The restaurant owner stands a little ways down the right path and hands out coupons and has another employee, that stands in the middle of the left path, simply observe the people walking by. At the end of the day, they tally their results. Of the 100 people walking down the right path (all got coupons), 50 purchased a meal. Of the 100 people walking down the left path, 5 purchased meals (without coupons). The day before 200 people also passed through the fork and 50 people purchased meals.
Is it worth it for the restaurant owner to hand out coupons? Well, using the left branch as a control, we see that handing out coupons (right branch) gives a (50 – 5) / 5 = 900% lift in customer purchases. That amounts to an expected $450 in extra revenue at a cost of $90. A net gain of $360, for a total of $460 net revenue after accounting for the coupon costs. Well worth it! Or is it?
If we look at the previous day’s totals, we see 50 meals were purchased for a total of $500 in revenue. If both days were typical days, what could have gone wrong in the restaurant owners analysis? It is likely that the two populations of people (the ones going left and the ones going right) were fundamentally different. The ones going right may have taken that direction because the restaurant is on that branch. In effect, the owner was giving coupons to people that were already going to purchase meals at the restaurant. Thus, a net loss in revenue when compared to not using coupons. A better experiment would have been to randomly give coupons to those traveling on the left and right branches, tracking the population of travelers that did not receive a coupon and comparing it to those who did. Of course, this is a clear example of how the owner got the analysis wrong and lost money in the process. Even so, these errors are easy to make and happen all of the time when attributing the affects of advertising. Unfortunately, designing a good experiment can be difficult and expensive, so many firms knowingly render flawed analysis.
This article is not just about modeling attribution, however. This article investigates how we would use a model of attribution to improve the effectiveness of online advertising. What we have found is that successfully employing multiple-attribution models is every bit as difficult as coming up with the models themselves.
Employing Attribution Models
The previous section highlighted how important it is to correctly estimate the effects of advertising. This section focuses on the particulars of online advertising, specifically how we can incorporate multiple attribution into current pricing schemes. The figure below, produced by InfographicLabs, gives a nice illustration of the flow of information in online advertising, for those that are unfamilar.
In recent years the online advertising industry has witnessed a shift from the more traditional pay-per-impression model to the pay-per-click (PPC) and more recently to the pay-per-conversion model. Historically, website publishers sold banner advertisements through direct sales on a cost-per-mille (CPM) basis. Remnant inventory was then sold to ad networks at, typically, a substantially lower price. The ad networks then bundled and resold the inventory to affiliated advertisers, allowing advertisers to get greater audience reach as the network’s inventory broadened.
In parallel, sponsored-search auctions arrived as a viable method for purchasing advertising space, conditioned on user intent. Initially, slots were sold on a CPM basis, but soon search engines allowed advertisers to pay only after a click occurred. These PPC revenue models facilitated an explosion of long-tail advertisers in addition to large advertising firms. No longer did individual advertisers have to reason about the rate at which an ad would be clicked, that risk was transferred to the search engines that were in a much better position to model and mitigate the risk though the large number of auctions served.
Not to be outdone, display ad networks evolved into ad exchanges, such as the Right Media Exchange (RMX), which offered multiple models for payment. Advertisers can choose from a variety of payment options: pay each impression, pay on a click, pay on a conversion (sale or other advertiser-defined event). The amount an advertiser pays is determined by the exchange rules and whether or not a payment event occurred. In practice, what ads are rendered on a page and how much the advertisers are charged for the opportunity to display their ads are determined via an auction that is facilitated by the exchange. For contingent payments, the exchange mechanism accounts for the estimated rate at which payment occurs for an ad when determining which advertiser’s ad is shown on the page. When a payment event occurs, the exchange determines the auction associated with the payment (attribution) and the resulting transaction costs between the advertiser and publisher. For impression events, the attribution is simple: the associated auction (and publisher) is the auction where the ad was served to the user. For a click event, exchanges attribute the click to the last auction that served the ad to the particular user. This is called last-event attribution and seems, intuitively, to be a reasonable model: the likelihood that I click on an ad is not influenced by previous views of the ad. For conversion events, the situation is a bit more complicated.
Whether or not I make a purchase may be influenced by a sequence of views. How many times I have viewed the ad influences where I am in the funnel—how more or less likely I am to purchase. This presents a problem for the exchange, how should payment be distributed when multiple events influence a purchase (conversion). Incorrect attribution by the exchange can cause inefficiencies in the marketplace. We can illustrate this with a simple example:
[Consider] one pay-per-conversion advertiser that has a value of $1 per conversion. Assume a user sees the ad of this advertiser four times on average. The probability of converting after viewing the ad for the first time is 0.02, and after the second viewing this probability increases to 0.1. The third and the fourth viewing of the ad will not lead to any conversions. Also, assume that this ad always competes with a pay-per-impression ad with a bid of 4 cents per impression.
First, consider a system that simply computes the average conversion rate of the ad and allocates based on that. This method would estimate the conversion rate of the ad at (0.02 + 0.1 + 0 + 0)/4 = 0.03. Therefore, the ad’s effective bid per impression is 3 cents and the ad will always lose to the competitor. This is inefficient, since showing the ad twice gives an average expected value of 6 cents per impression, which is more than the competitor.
If we employ frequency capping and restrict the ad to be shown at most twice to each user, the above problem would be resolved, but another problem arises. In this case, the average conversion rate will be (0.02+0.1)/2 = 0.06, and the ad will win both impressions. This is indeed the efficient outcome, but let us look at this outcome from the perspective of the publishers (website owners). If the two impressions are on different publishers, the first publisher only gets 2 cents per impression in expectation, less than what the competitor pays. This is an unfair outcome, and means that this publisher would have an incentive not to accept this ad, thereby creating inefficiency.
Finally, note that even if the conversion rate is estimated accurately for each impression, still the usual mechanism of allocating based on expected value per impression is inefficient, since it will estimate the expected value per impression at 2 cents for the first impression. This will lose to the 4 cent competitor, and never gives the ad a chance to secure the second, more valuable, impression.
By now it should be clear that the standard pay-per-conversion auction mechanism is not efficient. What can we do? First, we can just accept this inefficiency, however this leaves open an opportunity for a clever arbitrageur to enter the market and profit. Second, the advertiser could just pay per impression and the exchanges problem would be solved. However, this assumes that a lone advertiser has the requisite information to correctly model multiple attribution, which is unlikely for the reasons discussed in the previous section. A third option involves the exchange, or some intermediary, purchasing the inventory per impression and selling it per conversion to the advertiser.
In an external paper (Patrick Jordan, Mohammad Mahdian, Sergei Vassilvitskii, and Erik Vee, 2011, 31-43), coauthors and I describe a novel approach to this problem. We develop a fairly general model to capture how a sequence of impressions can lead to a conversion, and solve the optimal ad allocation problem in this model. We show that this allocation can be supplemented with a payment scheme to obtain a mechanism that is incentive compatible for the advertiser and fair for the publishers.
The crux of the idea is that the intermediary must account both for the immediate value of the impression, as well as the expected future value given the prevailing market conditions. Determining the correct amount to bid involves a recursive formulation that can be solved with dynamic programming.