Getting Started With DataDrivenAttribution

tutorial
Author

Ben Denis Shaffer

Published

June 18, 2023

Loading Packages

First let’s load the DataDrivenAttribution package along with packages for working with DataFrames. For instruction on how to install please consult the package github page https://github.com/bdshaff/DataDrivenAttribution.jl

using DataDrivenAttribution
using DataFramesMeta
using DataFrames

The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.

Path Data

For this tutorial we’ll be using a simulated sample data-set. A samller sample can be downloaded from this github repo https://github.com/bdshaff/simpathdata

DataDrivenAttribution functionality relies on the data to follow this specific format

  1. Have columns named path and conv
  2. Each path in the path column is a vector of strings
  3. conv column in binary contain only 1 (converted) and -1 (did not convert)

Other columns are allowed currently will not be used by the attribution models.

first(path_df, 10)
10×2 DataFrame
Row path conv
Array… String
1 ["Channel-6"] -1
2 ["Channel-6"] -1
3 ["Channel-2"] -1
4 ["Channel-2", "Channel-7", "Channel-6"] -1
5 ["Channel-7"] -1
6 ["Channel-3"] -1
7 ["Channel-6"] -1
8 ["Channel-6"] -1
9 ["Channel-6"] -1
10 ["Channel-3"] -1

The dda_summary function can be used for a useful description of the path data. This function will return a DataFrame with a row for each of the touchpoints contained in the path data. For each touchpoint we compute

  1. Exposures - number of times this touchpoint occurs across all paths.
  2. Reach - number of unique paths within which a touchpoint occurs in at least once.
  3. Average Frequency - Ratio of Exposures to Reach
  4. Total Eaxposures - total number of touchpoints in the data
  5. Percent Exposures - Ratio of Exposures to Total Eaxposures
  6. Total Reach - Number of paths (rows) in the data
  7. Percent Reach - Ratio of Reach to Total Reach
  8. Total Average Frequency - Ratio of Total Exposures to Total Reach
  9. tid - id assigned to the touchpoint. can be used to tie to attribution results
summary = dda_summary(path_df)
7×10 DataFrame
Row tid Touchpoint exposures pct_exposures reach pct_reach avg_freq total_reach total_exposurs total_avg_freq
String String Int64 Float64 Int64 Float64 Float64 Int64 Int64 Float64
1 4 Channel-7 1280771 11.8054 1249420 11.5164 1.02509 8000000 10849066 1.35613
2 3 Channel-2 3613255 33.3048 3353411 30.9097 1.07749 8000000 10849066 1.35613
3 5 Channel-3 1184353 10.9166 1141278 10.5196 1.03774 8000000 10849066 1.35613
4 2 Channel-6 2323915 21.4204 2314597 21.3345 1.00403 8000000 10849066 1.35613
5 6 Channel-11 889494 8.19881 846093 7.79876 1.0513 8000000 10849066 1.35613
6 8 Channel-4 447072 4.12083 446331 4.114 1.00166 8000000 10849066 1.35613
7 7 Channel-8 1110206 10.2332 1081531 9.96889 1.02651 8000000 10849066 1.35613
Note

Unlike pct_exposure that adds up to 100%, pct_reach does not add up to 100%

Applying Attribution Models

dda_model function serves as the main interface for all attribution modeling needs.

There is only one required argument

  • path_df - the DataFrame with the paths and conversions in the format discussed above

Additional default arguments are

  • model - the attribution model to apply. Currently markov, shapley, logistireg are supported. markov is the default method.
  • markov_order - a vector of orders of the markov model/s to apply. Default is the vector [1] to build one model of the 1st-order.
  • include_heuristics - Bool defaults to true. Instruction to include Last-Touch and First-Touch heuristic attribution models.

Default Model

Let’s first run the default model and explore what we get in return

dda_default = dda_model(path_df);

The AttributionModel Type

The Type of the object that is returned is

typeof(dda_default)
DataDrivenAttribution.MarkovAttributionModel

This is a sub-type of

supertype(typeof(dda_default))
DataDrivenAttribution.AttributionModel

which is the supertype to all of attribution models.

The following are the fields that are contained with the MarkovAttributionModel specifically.

fieldnames(typeof(dda_default))
(:method, :paths, :result, :touchpoints, :markov_order, :transition_matrices)

General AttributionModel Output

All AttributionModel have the bellow fields

  1. method - method applied
  2. paths - path data used to build the model
  3. touchpoints - a mapping of unique touchpoints and tid’s
  4. result - the conversion table with attribution results
dda_default.method
"markov"

You will notice that the path data returned for the markov model is not the original data-set but the aggregated version of it. This is because the actual markov model building is accomplished with the dda_markov_model function which takes as it’s input the aggregated data.

first(dda_default.paths,15)
15×3 DataFrame
Row path total_conversions total_null
String Int64 Int64
1 Channel-2 2454 1969810
2 Channel-6 694 1901213
3 Channel-7 491 698382
4 Channel-8 463 522362
5 Channel-3 356 426183
6 Channel-11 182 197482
7 Channel-2>Channel-3 167 202970
8 Channel-2>Channel-11 163 158754
9 Channel-4 161 321348
10 Channel-3>Channel-2 130 99807
11 Channel-2>Channel-11>Channel-2 91 72236
12 Channel-11>Channel-2 89 89204
13 Channel-7>Channel-8 66 72439
14 Channel-2>Channel-8 61 55924
15 Channel-8>Channel-7 55 98312
dda_default.touchpoints
Dict{String, String} with 10 entries:
  "Channel-7"  => "4"
  "(conv)"     => "1"
  "Channel-2"  => "2"
  "Channel-3"  => "6"
  "(drop)"     => "-1"
  "Channel-6"  => "3"
  "(start)"    => "0"
  "Channel-11" => "7"
  "Channel-4"  => "8"
  "Channel-8"  => "5"

Results are come in the standard long format column structure with the

  • tid - Touchpoint id
  • Touchpoint - Touchpoint Name
  • Conversions - Conversion Attributed
  • Model - Attribution Model
dda_default.result
21×4 DataFrame
Row tid Touchpoint Conversions Model
String? String? Float64 String
1 2 Channel-2 2823.73 Markov_1
2 6 Channel-3 763.823 Markov_1
3 3 Channel-6 717.809 Markov_1
4 5 Channel-8 706.433 Markov_1
5 4 Channel-7 685.121 Markov_1
6 7 Channel-11 645.945 Markov_1
7 8 Channel-4 181.143 Markov_1
8 7 Channel-11 461.0 LastTouch
9 2 Channel-2 3091.0 LastTouch
10 6 Channel-3 649.0 LastTouch
11 8 Channel-4 188.0 LastTouch
12 3 Channel-6 793.0 LastTouch
13 4 Channel-7 673.0 LastTouch
14 5 Channel-8 669.0 LastTouch
15 7 Channel-11 349.0 FirstTouch
16 2 Channel-2 3194.0 FirstTouch
17 6 Channel-3 602.0 FirstTouch
18 8 Channel-4 210.0 FirstTouch
19 3 Channel-6 834.0 FirstTouch
20 4 Channel-7 705.0 FirstTouch
21 5 Channel-8 630.0 FirstTouch

Last-touch and First-Touch are included by default. Markov models have a naming convention of Markov_{order}

Vizualizing & Interpreting Results

For convenience some very basic visualizations are avaliable

Conversion Volume

Fist we can look at the actual volume of conversions attributed to each channel/touchpoint by the models. We can clearly see that while the variance in the attributed volume is not huge there are certain channels that may gain a significant volume of conversions - In this case Channel-3 and Channel-11 stand out.

plot_conversion_volume(dda_default.result);

Conversion Volume Comparison: 1st Order Markov and Heuristics

If we quickly look back at the top 15 paths that drove the largest volume of conversions this will begin to make intuitive sense as well. Apart from their individual contribution Channel-3 appears in rows 7 & 10 and Channel-11 appears in rows 8, 11 & 12. This suggests that these touchpoints contribute both thourgh volume and interaction effects. Channel-8 is another contender but to a lower extent. Channel-3 and Channel-11 interact with Channel-2 which is by far the largest driver of conversions.

using DataFramesMeta
@chain dda_default.paths begin
  @rtransform :pathCVR = :total_conversions/:total_null
  first(15)
  @rorderby -:pathCVR
end
15×4 DataFrame
Row path total_conversions total_null pathCVR
String Int64 Int64 Float64
1 Channel-3>Channel-2 130 99807 0.00130251
2 Channel-2>Channel-11>Channel-2 91 72236 0.00125976
3 Channel-2 2454 1969810 0.00124581
4 Channel-2>Channel-8 61 55924 0.00109077
5 Channel-2>Channel-11 163 158754 0.00102675
6 Channel-11>Channel-2 89 89204 0.000997713
7 Channel-11 182 197482 0.000921603
8 Channel-7>Channel-8 66 72439 0.000911111
9 Channel-8 463 522362 0.000886359
10 Channel-3 356 426183 0.000835322
11 Channel-2>Channel-3 167 202970 0.000822782
12 Channel-7 491 698382 0.000703054
13 Channel-8>Channel-7 55 98312 0.000559443
14 Channel-4 161 321348 0.000501014
15 Channel-6 694 1901213 0.00036503

You can also see that among the 15 most common paths, paths with an interaction tend to display a higher path level conversion rate (CVR). This further validates our ability to capture such interaction effects with the markov attribution model.

To further compare a attribution models that we have in out results table we can look at the relative conversion rates (RCR) for each touchpoint/channel included in the data. To compute a channel CVR we need reach, however for the RCR we only need to compare the conversion volumes since the level of reach is constant.

using DataFramesMeta
results_df = @chain dda_default.result begin
  unstack(:Model, :Conversions)
end

plot_rcr(results_df, "Markov_1", "LastTouch");

Relative Conversion Rate: 1st Order Markov vs Last-Touch

Looking at the RCRs is one way to gauge the strength of the interaction effect that a given touchpoint carries. The visual supports the discussion above and clearly indicates that Channel-11 is doing more work to drive a conversion behind the scene.

Of course attributing volume properly is just the first step. Optimizing media campaigns is not so simple and we would need to further evaluate cost, the value of a conversion as well as our ability to scale a channel given available inventory. Other exogenous factors such as seasonality, competitive environment, the state of the economy, also all come into play.

versioninfo()
Julia Version 1.8.0-rc3
Commit 33f19bcbd25 (2022-07-13 19:10 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores