A Quick Look at Three Data Driven Attribution Methods

tutorial
Author

Ben Denis Shaffer

Published

June 18, 2023

In this post we will take a quick look at the three attribution methods currently available in DataDrivenAttribution package:

We won’t go into all the details of either the implementation or theory behind the models but rather give a quick feel for how one can easily run and compare the different attribution results.

Loading Packages

using DataDrivenAttribution
using DataFrames
using Arrow

The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.

For this demo we’ll use the same sample dataset used in the getting started post

first(path_df, 15)
15×2 DataFrame
Row path conv
Array… String
1 ["Channel-6"] -1
2 ["Channel-6"] -1
3 ["Channel-2"] -1
4 ["Channel-2", "Channel-7", "Channel-6"] -1
5 ["Channel-7"] -1
6 ["Channel-3"] -1
7 ["Channel-6"] -1
8 ["Channel-6"] -1
9 ["Channel-6"] -1
10 ["Channel-3"] -1
11 ["Channel-2", "Channel-3"] -1
12 ["Channel-11", "Channel-2", "Channel-7"] -1
13 ["Channel-2", "Channel-8"] -1
14 ["Channel-7"] -1
15 ["Channel-8", "Channel-2"] -1

Data Driven Attribution Methods

As mentioned there are currently three attribution methods implemented. There are the three most commonly used approaches in the industry. Let’s start with Markov-Chain attribution - the default DDA

Markov-Chain Attribution

In the getting started post we saw the default model fit to the simulated data-set. There we looked at the results from the 1st order Markov-Chain model. For this example we’ll show how to fit a set of markov-models of various orders simultaneously. All we need to do is supply a vector of integers to the markov_order parameter of the dda_model function

dda_markov = dda_model(path_df, model = "markov", markov_order = [1,2,3,4], include_heuristics = false);

The type of the object returned is identical to the object returned if only more model is fit.

typeof(dda_markov)
DataDrivenAttribution.MarkovAttributionModel

The elements of the model will also be the same, however the transition_matrices element will be a vector with more then one transitin matric returned.

fieldnames(typeof(dda_markov))
(:method, :paths, :result, :touchpoints, :markov_order, :transition_matrices)

First let’s take a quick look at the results to verify that models of all orders were run

dda_markov.result
28×4 DataFrame
3 rows omitted
Row tid Touchpoint Conversions Model
String String? Float64 String
1 2 Channel-2 2823.73 Markov_1
2 6 Channel-3 763.823 Markov_1
3 3 Channel-6 717.809 Markov_1
4 5 Channel-8 706.433 Markov_1
5 4 Channel-7 685.121 Markov_1
6 7 Channel-11 645.945 Markov_1
7 8 Channel-4 181.143 Markov_1
8 2 Channel-2 2224.75 Markov_2
9 6 Channel-3 1152.62 Markov_2
10 7 Channel-11 1092.88 Markov_2
11 5 Channel-8 773.078 Markov_2
12 4 Channel-7 724.26 Markov_2
13 3 Channel-6 419.697 Markov_2
17 6 Channel-3 1154.11 Markov_3
18 5 Channel-8 767.965 Markov_3
19 4 Channel-7 673.604 Markov_3
20 3 Channel-6 360.701 Markov_3
21 8 Channel-4 93.5759 Markov_3
22 2 Channel-2 1907.36 Markov_4
23 6 Channel-3 1335.06 Markov_4
24 7 Channel-11 1284.02 Markov_4
25 5 Channel-8 785.387 Markov_4
26 4 Channel-7 718.758 Markov_4
27 3 Channel-6 385.524 Markov_4
28 8 Channel-4 107.902 Markov_4

As you can see we have four different models here, and like before, we can quickly plot the the conversion volume plot

plot_conversion_volume(dda_markov.result);

result

Now let’s take a quick look at the actual transition matrices returned. The 1st order transition matrix is a named matrix with a column and a row for each othe the touchpoints in the data. More percisely the touchpoint id (tid) is used for the names

dda_markov.transition_matrices[1]
10×10 Named Matrix{Float64}
trans_from ╲ trans_to │           4            1  …            8            5
──────────────────────┼──────────────────────────────────────────────────────
4                     │         0.0  0.000525465  …   0.00201441     0.117988
1                     │         0.0          1.0             0.0          0.0
2                     │   0.0273238  0.000855461      0.00286418    0.0293522
6                     │   0.0207092  0.000547979      0.00177228    0.0222679
-1                    │         0.0          0.0             0.0          0.0
3                     │   0.0160445  0.000341235         0.01183    0.0121924
0                     │    0.113818          0.0       0.0500289    0.0967815
7                     │   0.0208456  0.000518272      0.00137494    0.0225319
8                     │   0.0141029  0.000420514             0.0   0.00901868
5                     │    0.166488  0.000602591  …   0.00279047          0.0

For higher order models the transition matrix will begin to grow very quickly. The matrix will always be symmetric but only those unions of touchpoints that exist in the data will be present. The first order matrix was a 10x10, and the 2nd order matrix here is a 45x45 matrix.

dda_markov.transition_matrices[2]
45×45 Named Matrix{Float64}
trans_from ╲ trans_to │         2U6          2U7  …            1           -1
──────────────────────┼──────────────────────────────────────────────────────
2U6                   │         0.0          0.0  …  0.000555274     0.667031
2U7                   │         0.0          0.0     0.000513152     0.514227
6U2                   │    0.107856     0.109649     0.000881767     0.696865
7U2                   │    0.107229     0.110634     0.000758939     0.697328
4U5                   │         0.0          0.0     0.000582334     0.673509
2U5                   │         0.0          0.0     0.000669451     0.672091
5U4                   │         0.0          0.0     0.000432816       0.7657
4U2                   │    0.105853     0.109512     0.000959578     0.699007
5U2                   │     0.10683     0.110092     0.000795178      0.69759
2U4                   │         0.0          0.0     0.000496313      0.76651
2U3                   │         0.0          0.0     0.000410846     0.919008
⋮                                 ⋮            ⋮  ⋱            ⋮            ⋮
8U6                   │         0.0          0.0     0.000971345     0.669743
4U8                   │         0.0          0.0      0.00193798     0.794574
8U4                   │         0.0          0.0     0.000317209     0.768279
6U8                   │         0.0          0.0      0.00142925     0.800381
8U5                   │         0.0          0.0     0.000248016     0.675595
5U8                   │         0.0          0.0     0.000645578     0.793415
8U7                   │         0.0          0.0             0.0     0.518906
7U8                   │         0.0          0.0             0.0     0.805397
0                     │    0.155279     0.157635             0.0          0.0
1                     │         0.0          0.0             1.0          0.0
-1                    │         0.0          0.0  …          0.0          1.0

The 4th order matrix is a 1327x1327 so we are already getting pretty large. In practice anything above 4 might not make a whole lot of sense but it’s sitll possible to model with dda_model

size(dda_markov.transition_matrices[4])
(1327, 1327)

Shapley Attribution

The shapley algorithm is another popular approach to data-driven attribution modeling. The method works best with data and the number of different touchpoints contained int he data are not very large. The reason is simply due to combination nature of the method and the fact that factorials grow very quickly. Still for smaller problems this may be a reasonable approach

dda_shapley = dda_model(path_df, model = "shapley", include_heuristics = false);

You will see that the type will not be a ShapleyAttributionModel

typeof(dda_shapley)
DataDrivenAttribution.ShapleyAttributionModel

and while the same attributs will be present in the object we’ll get back a few other elements sepecific to the Shapley model

fieldnames(typeof(dda_shapley))
(:method, :paths, :result, :touchpoints, :coalitions, :shapley_df, :values)

We have the vector of all of the possible coalitions for which marginal contributions are computed

dda_shapley.coalitions
127-element Vector{Vector{String}}:
 ["Channel-7"]
 ["Channel-2"]
 ["Channel-3"]
 ["Channel-6"]
 ["Channel-11"]
 ["Channel-4"]
 ["Channel-8"]
 ["Channel-7", "Channel-2"]
 ["Channel-7", "Channel-3"]
 ["Channel-7", "Channel-6"]
 ["Channel-7", "Channel-11"]
 ["Channel-7", "Channel-4"]
 ["Channel-7", "Channel-8"]
 ⋮
 ["Channel-2", "Channel-3", "Channel-6", "Channel-4", "Channel-8"]
 ["Channel-2", "Channel-3", "Channel-11", "Channel-4", "Channel-8"]
 ["Channel-2", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
 ["Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
 ["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-4"]
 ["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-8"]
 ["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-4", "Channel-8"]
 ["Channel-7", "Channel-2", "Channel-3", "Channel-11", "Channel-4", "Channel-8"]
 ["Channel-7", "Channel-2", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
 ["Channel-7", "Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
 ["Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
 ["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]

And we also have all of the shapley values for each of the coalitions in the dictionary.

dda_shapley.values
Dict{Any, Any} with 127 entries:
  Set(["Channel-7", "Channel-2", "Channel-3", "Channel-4"])       => 0.00348792
  Set(["Channel-4"])                                              => 0.000500764
  Set(["Channel-7", "Channel-6", "Channel-8"])                    => 0.00152873
  Set(["Channel-3", "Channel-6", "Channel-11", "Channel-4", "Cha… => 0.00338653
  Set(["Channel-7", "Channel-2", "Channel-6", "Channel-11"])      => 0.00302389
  Set(["Channel-7", "Channel-3", "Channel-11", "Channel-4"])      => 0.00278733
  Set(["Channel-2"])                                              => 0.00124426
  Set(["Channel-7", "Channel-2", "Channel-6", "Channel-8"])       => 0.00301627
  Set(["Channel-3", "Channel-6", "Channel-11", "Channel-8"])      => 0.00247106
  Set(["Channel-6", "Channel-11"])                                => 0.000939583
  Set(["Channel-3", "Channel-6", "Channel-8"])                    => 0.00160128
  Set(["Channel-7", "Channel-8"])                                 => 0.00133767
  Set(["Channel-2", "Channel-11"])                                => 0.00192556
  Set(["Channel-2", "Channel-6", "Channel-4", "Channel-8"])       => 0.00295792
  Set(["Channel-2", "Channel-3", "Channel-8"])                    => 0.00269993
  Set(["Channel-2", "Channel-11", "Channel-4"])                   => 0.00254572
  Set(["Channel-2", "Channel-11", "Channel-4", "Channel-8"])      => 0.00371187
  Set(["Channel-7", "Channel-3", "Channel-11", "Channel-4", "Cha… => 0.00418024
  Set(["Channel-7", "Channel-11", "Channel-4", "Channel-8"])      => 0.00282988
  Set(["Channel-6", "Channel-11", "Channel-8"])                   => 0.00161735
  Set(["Channel-2", "Channel-6", "Channel-11"])                   => 0.0021328
  Set(["Channel-7", "Channel-2", "Channel-4"])                    => 0.00239742
  Set(["Channel-2", "Channel-3", "Channel-11", "Channel-4"])      => 0.00366474
  Set(["Channel-6", "Channel-4", "Channel-8"])                    => 0.00140132
  Set(["Channel-7", "Channel-11", "Channel-4"])                   => 0.00180649
  ⋮                                                               => ⋮

The shapley_df is essentially the DataFrame with the results, only we also have the calculated shapley value for each of the touchpoints avaliable here too

dda_shapley.shapley_df
7×6 DataFrame
Row tid tactic shapley_value Conversions Touchpoint Model
String String Float64 Float64 String String
1 4 Channel-7 0.001168 871.432 Channel-7 Shapley
2 2 Channel-2 0.00199216 1486.33 Channel-2 Shapley
3 6 Channel-3 0.00128906 961.754 Channel-3 Shapley
4 3 Channel-6 0.000582897 434.893 Channel-6 Shapley
5 7 Channel-11 0.00132958 991.984 Channel-11 Shapley
6 8 Channel-4 0.00105388 786.286 Channel-4 Shapley
7 5 Channel-8 0.00132869 991.321 Channel-8 Shapley

Since the results alwasy come in the same format we can plot the conversion volume as usual

plot_conversion_volume(dda_shapley.result);

result

Logistic Attribution

Logistic regression is another common approach to the attribution problem. Given that the outcome of the path to convert is binary (technically there may be multiple conversions but that’s a separate topic) this approach makes sense. The data format requirements for dda_model are mainly influenced by the desire to have logistic regression as one of the available methods. Currently the regression only uses the paths and and counts of touchpoints as features, and uses the GLM function for fitting. In the future the ability to include interaction effects and exogenous variables will be supported. Furthermore, it is very likely that a transition to the use of Flux.jl will be adopted. For now, all we need to do is specify the model parameter.

dda_logistic = dda_model(path_df, model = "logisticreg", include_heuristics = true);

You will notice that the type is now LogisticAttributionModel

typeof(dda_logistic)
DataDrivenAttribution.LogisticAttributionModel

And we get back two additional elements - the glm_fit and the attr_weights

fieldnames(typeof(dda_logistic))
(:method, :paths, :result, :touchpoints, :glm_fit, :attr_weights)

The glm_fit is the actual regression model object.

dda_logistic.glm_fit
GLM.GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Distributions.Poisson{Float64}, GLM.LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
       Coef.  Std. Error        z  Pr(>|z|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────
x1  -6.61763   0.0326831  -202.48    <1e-99   -6.68169   -6.55357
x2  -6.26388   0.016308   -384.10    <1e-99   -6.29585   -6.23192
x3  -6.03034   0.0311547  -193.56    <1e-99   -6.0914    -5.96928
x4  -7.60322   0.0324184  -234.53    <1e-99   -7.66676   -7.53968
x5  -5.47331   0.0345839  -158.26    <1e-99   -5.54109   -5.40552
x6  -7.19213   0.0642111  -112.01    <1e-99   -7.31798   -7.06628
x7  -6.34704   0.0329972  -192.35    <1e-99   -6.41172   -6.28237
─────────────────────────────────────────────────────────────────

attr_weights are simply a transformation of the fitted coefficients the magnitude of which guids the volume of attributed conversions.

dda_logistic.attr_weights
1×7 Matrix{Float64}:
 0.0941608  0.35998  0.154743  0.0651031  0.200241  0.0189372  0.106835

And as always, results can be viewed and plotted just the same. Because we set include_heuristics to true we get back the Last and First-Touch conversions as well.

plot_conversion_volume(dda_logistic.result);

result

Comparing Results

conv_df = vcat(
  dda_markov.result,
  dda_shapley.result,
  dda_logistic.result
);

#plot_conversion_volume(conv_df)

result
#=
using DataFramesMeta
results_df = @chain conv_df begin
  unstack(:Model, :Conversions)
end

#plot_rcr(results_df, "Markov_2", "Markov_3")
=#
#bson("Julia/models/model_markov.bson", model_markov)
#bson("Julia/models/model_shapley.bson", model_shapley)