Data driven models in conversion attribution.

renenijhuis_digitalmarketing_conversion attribution.jpg

This article was written as a final paper for the subject EGBM for Professor Dr. Koen Berden at the Global OneMBA, Rotterdam School of Management, January 2016.

1. Conversion Attribution and its Relevance in Marketing.

Online marketing is an ever larger part of the overall marketing spend for many companies. In the US, total digital marketing spends overtook TV marketing spends in 2013 (Berman, 2015, p.2). Globally this is expected to happen in 2018 (Statista 2016).

Quoting eMarketer, Li (2014, p.1) states that “According to a recent forecast, the total U.S. spending on search marketing is slated to increase from $15 billion in 2011 via $28 billion by 2014 to 40 billion in 2019. The budget for display and video ads in 2016 actually for the first time overtook the search marketing spend and is expected to grow fast, to up to 44 billion by 2019.

One of the advantages of digital advertising compared to its traditional counterpart, is that online campaigns are measurable into minute detail, in what they cost and generate in exposure and effects with the target group. Google Adwords for example, will tell you on a daily basis what you spent, how many times your ads have been shown, what they generated in clicks and revenue and even the average cost per acquisition and the return on investment. By contrast, offline campaigns have always required a relatively large leap of faith in what their effect really is. Individual offline marketing budgets are generally a multitude of those in online, but what a commercial on radio, TV or print truly gives in returns, remains largely unclear.

Obviously, between different types of online campaigns, there are big differences in their effect as well. A branding campaign that utilizes display formats to convey its message will have a lot more difficulty getting a positive return on investment, than a highly targeted search campaign on product level. There are a number of reasons for this.

First, there is the reward calculation model: when does the publisher get paid? It seems like a very basic question, but a plethora of different models is employed to calculate rewards for publishers. I will discuss the three most widely used ones here briefly.

CPM - cost per mille

This model is one of the oldest of online cost calculation models and charges an amount per 1000 impressions. Whenever a visitor is exposed to the ad, this counts as one impression and the advertiser pays for a minimum amount of impressions. Much like offline advertising, the publisher is not responsible for clicks on the ad, nor for visits to an advertisers website, nor realized revenue, just for showing the ad. Therefore, also like in traditional advertising, a lot hinges on trust between advertiser and publisher about the data on ad impressions.

CPC - cost per click

With the cpc model, the advertiser pays a certain amount for every click that is realized on their ad. If visitors don’t click, the advertiser pays nothing. As most advertisers use web analytics tools in order to track their online success, this model is more easily controllable and therefore currently more widely used. Especially search advertising has made this model current. It is obvious that for publishers this model is less attractive, as they run the risk of not getting paid for supplying their website real estate to advertisers. However, publishers always have the freedom to favor ads that perform better in terms of visitors clicking the ad, or ‘Click Through Rate’ (CTR).

CPA - cost per acquisition

Third model that is widely used, is the cost per acquisition or CPA. In this model the publisher is only paid if they send traffic to the advertiser that is subsequently converting. Most affiliate programs work with this model. They will place a little text file in the internet browser of the visitor (a so called ‘cookie’) that will usually stay active for 30 days and then disappear. If the visitor converts on the advertisers website within those 30 days, the text file will register the sale and the publisher will get a predefined percentage of the revenue.

The CPA model is the only model here that incorporates the acquisition, the sale, in its calculation of payment to the publisher. With that comes one of the first conversion attribution models applied in online marketing, which is widely known as the “30 day cookie model”. As described above, the cookie in the browser of the visitor will communicate with a piece of code on the conversion page of the advertiser, registering a conversion. This will happen as long as there is no more time than 30 days between the moment the cookie was placed and the moment the conversion took place, irrespective of whether the visitor has utilized any other campaigns.

On one hand apparently the least favorable model for publishers, the CPA model has what Ron Berman (2015, p. 3) aptly calls, an intrinsic ‘moral hazard’ in that it invites affiliate partners to claim conversions that they had very limited contribution in realizing. As long as they can place a cookie in the browser of the visitor 30 days before the conversion happens, they will get paid.

Advertisers, thus confronted with multiple affiliates claiming their affiliate fees for realizing a conversion, generally reacted by honoring only the final affiliate in the chain of marketing touch points leading up to the conversion. This second attribution model known as “Last Cookie Counts” is still widely used because of the simplicity of its design and wide availability of data and tools needed to analyse it.

Originally only applied to affiliates, it got a more general application to other online marketing campaigns - not so much as a cost calculation model in order to reward partners, but as an ROI calculation model to understand the value created by campaigns as a percentage of their cost. Now Last Cookie Counts no longer looked at all conversions equally and assigned a set value to each of them (as was common with the CPA model), but instead, it looked at the true conversion value of each transaction individually and offset that with the campaign cost related to it.

Branding vs Performance Campaigns

The proliferation of this attribution model highlights the second reason why there may be big differences in return on investment in online marketing campaigns: while some campaigns, like aforementioned branding campaign, focus on brand- and product awareness among new prospects, more performance oriented campaigns focus on reining in sales from already aware prospects that are actively considering a purchase. Applying the Last Cookie Counts attribution model, obviously favors the latter, leaving the former with the cost of advertising, but without the revenue of the conversion.

As a result, when comparing the effect of both types of ads by applying the Last Cookie Counts model, marketers had a hard time defending their budgets for display marketing. Publishers, already faced with dwindling offline subscriptions and struggling to come up with successful online business models, cried foul as advertisers en masse replaced their display advertising with search ads. Although highly successful for some advertisers however, for most of them this strategy did not immediately lead to an increase in total sales and in the longer run sometimes even for a decrease in business.

This brings us to the concept of the ‘conversion funnel’ of consumers. I will go into more detail on this when I describe one of the data driven attribution models but the general idea is that consumers are thought to - over time influenced by a multitude of touchpoints, get increasingly clear what it is that they are looking for. This is recognizable in their behavior in terms of search terms used, interactions with display banners and activity on advertisers’ websites.

Ignoring upper funnel marketing leaves advertisers in peril of losing important market share, because for a lot of products, consumers value brands and brand identity as much as good deals and low prices. Therefore the advertising industry went searching for attribution models that would grant a greater part of conversion value to upper funnel targeted campaigns than the Last Cookie Counts model is able to do.

2. Commonly Used Online Marketing Attribution Models.

2.1 Rule Based Conversion Attribution Models.

In the first chapter we have seen that with the growing maturity of the online marketing business, the need for more comprehensive attribution models grew with it. Early models like the 30 day Cookie Model and Last Cookie Counts distinctly favored specific types of marketing and inspired behavior at publishers to game the system in order to gain more ad revenue.

Technological progress made that more advanced models came within reach of a larger audience, specifically when Google introduced attribution functionality in their free Analytics product. Then for the first time, it became possible to compare campaign outcomes using different models on the fly and even putting together one’s own.

These models are based on rules chosen by the marketers themselves. I will discuss the most widely used models here briefly.

First interaction

All value generated with the conversion is attributed to the first touchpoint in the conversion funnel. Like Last cookie counts, this is an extreme model that favors specific types of campaigns and has value mostly to validate the impression that some campaigns really do work to raise awareness.

Linear attribution

The value generated by the conversion gets evenly distributed over all campaigns that played a role in the conversion funnel of the customer, irrespective of position in the funnel, channel type or visit quality.

Time Decay

All touchpoints get a share of the conversion value, but the closer to the conversion moment the higher the percentage of value assigned to it.

U-shaped

Set percentages of the conversion value are assigned to the first and the last touchpoint, and the rest is evenly distributed over those in between.

Last non-direct click

Attribution is used to redistribute the conversion value generated by campaigns run in a specific period. As direct visits do not have any campaign assigned to it, these visits are generally regarded as the baseline segment (Berman, 2015, p4). In this segment we group all customers that would have purchased from us even if we had not applied any marketing. However, if Direct was their last means of visiting our website, but before that they were touched by one of our campaigns, according to advertisers applying this model, the last touchpoint should be disregarded and instead all value should go to the last campaign touchpoint.

Although the application of these rule based attribution models meant a big step forward in the understanding of the contribution of campaigns in conversion funnels, in fact all rule based attribution models are equally arbitrary as the Last Cookie Counts model. If you value touchpoints in the beginning of the funnel more than the end, you will see better results for campaigns that play a role in that phase. If you value campaigns more that play a role later in the conversion funnel, you will see those campaigns performing better. Advertisers who are looking for ‘the real contribution’ of campaigns or ‘the truth’ behind the numbers, will find little solace, as each model works with preconceived notions of value and each model will give back what you put in.

Best practice with rule based models is therefore to choose a number of attribution models, run them side by side and choose one of them as your new truth. The 30 day cookie model is still widely used as reward calculation model and performance indicator, but hardly as attribution model. Last Cookie Counts is often a starting point for other attribution models to be compared against and then First Touchpoint can serve as the other extreme. A third model usually combines elements of aforementioned models.

2.2 Data Driven Conversion Attribution models.

As we have seen in the previous paragraph, rule based attribution models take assumptions and preferences of marketers as a starting point to evaluate the value of each marketing touchpoint in the conversion funnel of customers.

By contrast, data driven conversion attribution models work with the data that is generated by visitors and applies models to that. In this case not just the touchpoints of the converting visitors, but also those of non converters in order to eliminate the so called ‘survivorship bias’. (Andale, 2016)

There is only a limited number of statistical models with which you can describe the problem of conversion attribution and I will name the ones that I have personally worked with.

When looking for an attribution model, Dalessandro et al cite a number of properties that attribution models need to incorporate: Fairness, Datadriven and Interpretability.(Dalessandro et al, 2013, p. 2)

With this in mind, in 2014, my team and I created a conversion attribution toolkit that contains novel visualisations of conversion paths and two data driven attribution models, that we christened the Conversion Attribution ToolKit or short CATKit.

2.2.1 Logistic Regression Model.

The first data driven attribution model is based on the work of Shao et al (2011) and Dalessandro et al (2012) and calculates the probability of conversion, in case channels appear in the conversion funnel, vs when they do not, by applying logistic regression. The outcome is a multiplier of the probability of conversion for each channel.

Logistic Regression Results: to what extent did presence in the conversion path by the individual channel improve chances of conversion?

Our experience with this model is that although it is fairly accurate and intuitive to interpret, the actionability of the outcomes is relatively low, as it is an oversimplification of reality to make handling of the amount of data possible. On top of that, the predictive value of the results is limited, as results per channel vary greatly over time, as a result of specific campaign strategies.

We are currently working on refining this model, so we account for the position in the funnel of all touchpoints and combinations of channels in conversion funnels.

2.2.2 The Hidden Markov Model.

A completely different approach is the application of the Hidden Markov model on the problem of attribution.

This model, also part of iProspect’s CATKit solution, was developed based on the work of Abhishek et al (2012). It suggests a limited number of possible mental states for the audience toward conversion, that mimics the conversion funnel: Dormant (unaware), Awareness, Consideration and Purchase. Interaction with marketing can move prospects both up or down the conversion funnel:

Hidden Markov - Three states in the Conversion Funnel.

The application of the Hidden Markov model is both elegant and effective, as it introduces the conversion funnel in the distribution of value and recognizes that certain channels may generate value higher up in that funnel, without finally leading to a conversion at all.

The ‘hidden’ part of the Markov model, is the fact that we don’t really know the status of every visitor at each moment in time. The model calculates the probability that visitors fall into those states and their probability of shifting to another stage.

Drawback of this model, is that specific interactions of the users must be valued in order for the model to work. For instance, points can be assigned to viewing a number of pages, viewing specific pages or micro conversions, like signing up for a newsletter. The number of points scored indicates the progression of the visitors through the various conversion states. As these interaction are arbitrary and specific for each advertiser, a certain bias has to be expected.

How many visitors do i have in each of the conversion stages and which channels were responsible for bringing them in?

The outcome of the Hidden Markov model is twofold. First we get a total conversion value assigned to each channel, much like with the other attribution models. Next to that, we get a visualisation of the number of times our campaigns are responsible for moving people into new conversion stages. This gives us the perfect overview of what campaigns play a role in what phase of the conversion funnel.

Hidden Markov Flowchart - Visitors moving from one state to the next and the tooltip shows the channels that were responsible for the transition.

3. Game Theory - general notions.

It is beyond the purpose of this paper, to go in depth into the workings of Game Theory, but I will mention one important difference between two types of it, which is relevant here.

First we have Competitive Game Theory, the type that Georgios Chalkiadakis defines as “concerned with decision making in strategic settings, where you must factor the preferences and rational choices of other players into your decision to make the best choice for yourself.” (Chalkiadakis, et al 2012). Figure 6 depicts a typical Game Theory scheme that has also been described as the Prisoners’ Dilemma.

Competitive Game theory - In search of the "Nash Equilibrium"

The Nash Equilibrium in competitive game theory, defines the strategy that competitors have when they know their opponent’s options and have nothing to gain by changing their own.

The Nash Equilibrium in above example exists in both players defecting, as they would gain most by risking the least.

By contrast there is Cooperative Game Theory. In this branch of the theory, all competitors have a part in a joint final result that is desirable for everyone involved.

But how does this really work? Brandenburger (2007) gives a telling example: Imagine 3 players, one of which wants to sell an item that has a value of $4, the other two players want to buy the product. Player 2 has a budget of $9, player 3 has a budget of $11. By playing together, player 1 and 2 create a value of $5, the difference between the cost of the item and the budget of player 2. Similarly, when player 1 and 3 play, that value is $7 (11-4). When player 2 and 3 come together no value is created, as none wants to sell. When all 3 come together the value is still only $7, as player one only has 1 item to sell.

All have a ‘marginal contribution’ toward that final result and calculations are geared toward finding what that contribution has been for each of the competitors. Here ‘marginal contribution’ is defined as “... the amount by which the overall value would shrink if the player in question were to leave the game.” (Brandenburger, 2007)

That metric is also known at the Shapley Value and this definition makes clear that the application of that value, that was first conceived in 1953, to conversion attribution seems like a natural match.

4. The Application of Game Theory in Conversion Attribution.

From above definitions, it is clear that for the analysis of conversion attribution, cooperative game theory fits like a glove: the total conversion funnel represents the ‘game’ that is responsible for generating value, and each campaign is a player. For each player we want to know what its value as a part of the total value created by the string of touchpoints leading up to the conversion. And this we need to do for the funnels of all converting and non converting visitors - again also here the non-converting paths to eliminate the survivorship bias .

As the value of all combinations of players is known - namely the total conversion value, what we are looking for is the Shapley Value that calculates the marginal contribution of each campaign within all individual conversion funnels.

Gandolfo Dominici (2011) shows in his paper that effective application of game theory in marketing is still scarce. As a reason for that, Dominici claims technical limitations, among other reasons:

“The reality of the market and the behavior of its players implies a number of possible strategic solutions that is too high to be summarized in a game”. (Dominici, 2011 p. 3527)

In their paper, Dalessandro et al distinguish between ‘causal attribution’ and what they call ‘channel importance attribution’. While more comprehensive and in theory more accurate, they too conclude that because of the sheer number of variables at play, applying causal attribution “in many cases (...) may be downright impossible.” (Dalessandro et al, 2013, p. 2)

Their ‘channel importance attribution’ is based on the the work of Shao (2011) and for the first time it incorporates elements we know from Game Theory. It stops short however in applying that model completely, again because of the sheer number of variables involved.

These scientists came to the same conclusion as us, when starting to model the data according to the literature describing the Shapley value: the amount of data quickly became uncontrollable. The number of variables unlimited and therefore the data quickly became too big to handle.

An American software company called Abakus (www.abakus.me) however, claims to have cracked it and mapped the Shapley value to marketing attribution. Their solution is patented as “an advertising attribution system...that identifies and allocates conversion credit to advertisement appearing in different advertising modalities based on the modality’s contribution to the advertising campaign.” (Abakus 2014, p 8)

Repeated requests for information about how they set up their tool, has unfortunately not led to a satisfactory explanation. It seems they want to keep their secret recipe safe from competitors.

Conclusions.

The goal of this paper has been to give the manager more in depth background about the workings of online marketing conversion attribution, the options available and all of their pros and cons. Specifically the workings of Game Theory when applied to the problem of attributing value to campaigns in an online marketing conversion funnel has had the interest, because it holds the promise of being the holy grail in the field of attribution, but precious little experience with it has been gathered. There is only the company that claims to apply it, which doesn’t give specifics about how they have applied it and on the other hand, there are the scientists who are highly sceptical about the feasibility in real life situations. More research is needed to clarify how to apply it and what benefits it has in real life compared with the models here discussed.

References.

Brandenburger, Adam. "Cooperative game theory." Teaching Materials at New York University (2007).

Dominici, Gandolfo. “Game Theory as a Marketing Tool: uses and limitations” (2011)

Berman, Ron. "Beyond the last touch: Attribution in online advertising."Available at SSRN 2384211 (2015).

Statista - The statistic’s Portal for Market Data http://www.statista.com/statistics/265717/distribution-of-advertising-spending-worldwide-by-medium/ (2016)

Andale. “Bias in Statistics: Definition, Selection Bias & Survivorship Bias” http://www.statisticshowto.com/ (2016)

Dalessandro, Brian, et al. "Causally motivated attribution for online advertising." Proceedings of the Sixth International Workshop on Data Mining for Online Advertising and Internet Economy. ACM, 2012.

Shao, Xuhui, and Lexin Li. "Data-driven multi-touch attribution models."Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011.

Abhishek, Vibhanshu, Peter Fader, and Kartik Hosanagar. "Media exposure through the funnel: A model of multi-stage attribution." Available at SSRN 2158421 (2012).

Chalkiadakis, Georgios, Edith Elkind, and Michael Wooldridge. "Cooperative game theory: Basic concepts and computational challenges." IEEE Intelligent Systems 3 (2012): 86-90.

Li, Hongshuang, and P. K. Kannan. "Attributing conversions in a multichannel online marketing environment: An empirical model and a field experiment."Journal of Marketing Research 51.1 (2014): 40-56.

Abakus - Hello Attribution - Goodbye Confusion - 2014

AttributionRene NijhuisSeptember 7, 2017Conversion Attribution, attributie