using DataDrivenAttribution
using DataFrames
using ArrowThe WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.
Ben Denis Shaffer
June 18, 2023
In this post we will take a quick look at the three attribution methods currently available in DataDrivenAttribution package:
MarkovShapleyLogistic RegressionWe won’t go into all the details of either the implementation or theory behind the models but rather give a quick feel for how one can easily run and compare the different attribution results.
The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.
For this demo we’ll use the same sample dataset used in the getting started post
| Row | path | conv |
|---|---|---|
| Array… | String | |
| 1 | ["Channel-6"] | -1 |
| 2 | ["Channel-6"] | -1 |
| 3 | ["Channel-2"] | -1 |
| 4 | ["Channel-2", "Channel-7", "Channel-6"] | -1 |
| 5 | ["Channel-7"] | -1 |
| 6 | ["Channel-3"] | -1 |
| 7 | ["Channel-6"] | -1 |
| 8 | ["Channel-6"] | -1 |
| 9 | ["Channel-6"] | -1 |
| 10 | ["Channel-3"] | -1 |
| 11 | ["Channel-2", "Channel-3"] | -1 |
| 12 | ["Channel-11", "Channel-2", "Channel-7"] | -1 |
| 13 | ["Channel-2", "Channel-8"] | -1 |
| 14 | ["Channel-7"] | -1 |
| 15 | ["Channel-8", "Channel-2"] | -1 |
As mentioned there are currently three attribution methods implemented. There are the three most commonly used approaches in the industry. Let’s start with Markov-Chain attribution - the default DDA
In the getting started post we saw the default model fit to the simulated data-set. There we looked at the results from the 1st order Markov-Chain model. For this example we’ll show how to fit a set of markov-models of various orders simultaneously. All we need to do is supply a vector of integers to the markov_order parameter of the dda_model function
The type of the object returned is identical to the object returned if only more model is fit.
The elements of the model will also be the same, however the transition_matrices element will be a vector with more then one transitin matric returned.
(:method, :paths, :result, :touchpoints, :markov_order, :transition_matrices)
First let’s take a quick look at the results to verify that models of all orders were run
| Row | tid | Touchpoint | Conversions | Model |
|---|---|---|---|---|
| String | String? | Float64 | String | |
| 1 | 2 | Channel-2 | 2823.73 | Markov_1 |
| 2 | 6 | Channel-3 | 763.823 | Markov_1 |
| 3 | 3 | Channel-6 | 717.809 | Markov_1 |
| 4 | 5 | Channel-8 | 706.433 | Markov_1 |
| 5 | 4 | Channel-7 | 685.121 | Markov_1 |
| 6 | 7 | Channel-11 | 645.945 | Markov_1 |
| 7 | 8 | Channel-4 | 181.143 | Markov_1 |
| 8 | 2 | Channel-2 | 2224.75 | Markov_2 |
| 9 | 6 | Channel-3 | 1152.62 | Markov_2 |
| 10 | 7 | Channel-11 | 1092.88 | Markov_2 |
| 11 | 5 | Channel-8 | 773.078 | Markov_2 |
| 12 | 4 | Channel-7 | 724.26 | Markov_2 |
| 13 | 3 | Channel-6 | 419.697 | Markov_2 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 17 | 6 | Channel-3 | 1154.11 | Markov_3 |
| 18 | 5 | Channel-8 | 767.965 | Markov_3 |
| 19 | 4 | Channel-7 | 673.604 | Markov_3 |
| 20 | 3 | Channel-6 | 360.701 | Markov_3 |
| 21 | 8 | Channel-4 | 93.5759 | Markov_3 |
| 22 | 2 | Channel-2 | 1907.36 | Markov_4 |
| 23 | 6 | Channel-3 | 1335.06 | Markov_4 |
| 24 | 7 | Channel-11 | 1284.02 | Markov_4 |
| 25 | 5 | Channel-8 | 785.387 | Markov_4 |
| 26 | 4 | Channel-7 | 718.758 | Markov_4 |
| 27 | 3 | Channel-6 | 385.524 | Markov_4 |
| 28 | 8 | Channel-4 | 107.902 | Markov_4 |
As you can see we have four different models here, and like before, we can quickly plot the the conversion volume plot

Now let’s take a quick look at the actual transition matrices returned. The 1st order transition matrix is a named matrix with a column and a row for each othe the touchpoints in the data. More percisely the touchpoint id (tid) is used for the names
10×10 Named Matrix{Float64}
trans_from ╲ trans_to │ 4 1 … 8 5
──────────────────────┼──────────────────────────────────────────────────────
4 │ 0.0 0.000525465 … 0.00201441 0.117988
1 │ 0.0 1.0 0.0 0.0
2 │ 0.0273238 0.000855461 0.00286418 0.0293522
6 │ 0.0207092 0.000547979 0.00177228 0.0222679
-1 │ 0.0 0.0 0.0 0.0
3 │ 0.0160445 0.000341235 0.01183 0.0121924
0 │ 0.113818 0.0 0.0500289 0.0967815
7 │ 0.0208456 0.000518272 0.00137494 0.0225319
8 │ 0.0141029 0.000420514 0.0 0.00901868
5 │ 0.166488 0.000602591 … 0.00279047 0.0
For higher order models the transition matrix will begin to grow very quickly. The matrix will always be symmetric but only those unions of touchpoints that exist in the data will be present. The first order matrix was a 10x10, and the 2nd order matrix here is a 45x45 matrix.
45×45 Named Matrix{Float64}
trans_from ╲ trans_to │ 2U6 2U7 … 1 -1
──────────────────────┼──────────────────────────────────────────────────────
2U6 │ 0.0 0.0 … 0.000555274 0.667031
2U7 │ 0.0 0.0 0.000513152 0.514227
6U2 │ 0.107856 0.109649 0.000881767 0.696865
7U2 │ 0.107229 0.110634 0.000758939 0.697328
4U5 │ 0.0 0.0 0.000582334 0.673509
2U5 │ 0.0 0.0 0.000669451 0.672091
5U4 │ 0.0 0.0 0.000432816 0.7657
4U2 │ 0.105853 0.109512 0.000959578 0.699007
5U2 │ 0.10683 0.110092 0.000795178 0.69759
2U4 │ 0.0 0.0 0.000496313 0.76651
2U3 │ 0.0 0.0 0.000410846 0.919008
⋮ ⋮ ⋮ ⋱ ⋮ ⋮
8U6 │ 0.0 0.0 0.000971345 0.669743
4U8 │ 0.0 0.0 0.00193798 0.794574
8U4 │ 0.0 0.0 0.000317209 0.768279
6U8 │ 0.0 0.0 0.00142925 0.800381
8U5 │ 0.0 0.0 0.000248016 0.675595
5U8 │ 0.0 0.0 0.000645578 0.793415
8U7 │ 0.0 0.0 0.0 0.518906
7U8 │ 0.0 0.0 0.0 0.805397
0 │ 0.155279 0.157635 0.0 0.0
1 │ 0.0 0.0 1.0 0.0
-1 │ 0.0 0.0 … 0.0 1.0
The 4th order matrix is a 1327x1327 so we are already getting pretty large. In practice anything above 4 might not make a whole lot of sense but it’s sitll possible to model with dda_model
The shapley algorithm is another popular approach to data-driven attribution modeling. The method works best with data and the number of different touchpoints contained int he data are not very large. The reason is simply due to combination nature of the method and the fact that factorials grow very quickly. Still for smaller problems this may be a reasonable approach
You will see that the type will not be a ShapleyAttributionModel
and while the same attributs will be present in the object we’ll get back a few other elements sepecific to the Shapley model
(:method, :paths, :result, :touchpoints, :coalitions, :shapley_df, :values)
We have the vector of all of the possible coalitions for which marginal contributions are computed
127-element Vector{Vector{String}}:
["Channel-7"]
["Channel-2"]
["Channel-3"]
["Channel-6"]
["Channel-11"]
["Channel-4"]
["Channel-8"]
["Channel-7", "Channel-2"]
["Channel-7", "Channel-3"]
["Channel-7", "Channel-6"]
["Channel-7", "Channel-11"]
["Channel-7", "Channel-4"]
["Channel-7", "Channel-8"]
⋮
["Channel-2", "Channel-3", "Channel-6", "Channel-4", "Channel-8"]
["Channel-2", "Channel-3", "Channel-11", "Channel-4", "Channel-8"]
["Channel-2", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
["Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-4"]
["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-8"]
["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-4", "Channel-8"]
["Channel-7", "Channel-2", "Channel-3", "Channel-11", "Channel-4", "Channel-8"]
["Channel-7", "Channel-2", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
["Channel-7", "Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
["Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
["Channel-7", "Channel-2", "Channel-3", "Channel-6", "Channel-11", "Channel-4", "Channel-8"]
And we also have all of the shapley values for each of the coalitions in the dictionary.
Dict{Any, Any} with 127 entries:
Set(["Channel-7", "Channel-2", "Channel-3", "Channel-4"]) => 0.00348792
Set(["Channel-4"]) => 0.000500764
Set(["Channel-7", "Channel-6", "Channel-8"]) => 0.00152873
Set(["Channel-3", "Channel-6", "Channel-11", "Channel-4", "Cha… => 0.00338653
Set(["Channel-7", "Channel-2", "Channel-6", "Channel-11"]) => 0.00302389
Set(["Channel-7", "Channel-3", "Channel-11", "Channel-4"]) => 0.00278733
Set(["Channel-2"]) => 0.00124426
Set(["Channel-7", "Channel-2", "Channel-6", "Channel-8"]) => 0.00301627
Set(["Channel-3", "Channel-6", "Channel-11", "Channel-8"]) => 0.00247106
Set(["Channel-6", "Channel-11"]) => 0.000939583
Set(["Channel-3", "Channel-6", "Channel-8"]) => 0.00160128
Set(["Channel-7", "Channel-8"]) => 0.00133767
Set(["Channel-2", "Channel-11"]) => 0.00192556
Set(["Channel-2", "Channel-6", "Channel-4", "Channel-8"]) => 0.00295792
Set(["Channel-2", "Channel-3", "Channel-8"]) => 0.00269993
Set(["Channel-2", "Channel-11", "Channel-4"]) => 0.00254572
Set(["Channel-2", "Channel-11", "Channel-4", "Channel-8"]) => 0.00371187
Set(["Channel-7", "Channel-3", "Channel-11", "Channel-4", "Cha… => 0.00418024
Set(["Channel-7", "Channel-11", "Channel-4", "Channel-8"]) => 0.00282988
Set(["Channel-6", "Channel-11", "Channel-8"]) => 0.00161735
Set(["Channel-2", "Channel-6", "Channel-11"]) => 0.0021328
Set(["Channel-7", "Channel-2", "Channel-4"]) => 0.00239742
Set(["Channel-2", "Channel-3", "Channel-11", "Channel-4"]) => 0.00366474
Set(["Channel-6", "Channel-4", "Channel-8"]) => 0.00140132
Set(["Channel-7", "Channel-11", "Channel-4"]) => 0.00180649
⋮ => ⋮
The shapley_df is essentially the DataFrame with the results, only we also have the calculated shapley value for each of the touchpoints avaliable here too
| Row | tid | tactic | shapley_value | Conversions | Touchpoint | Model |
|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | String | String | |
| 1 | 4 | Channel-7 | 0.001168 | 871.432 | Channel-7 | Shapley |
| 2 | 2 | Channel-2 | 0.00199216 | 1486.33 | Channel-2 | Shapley |
| 3 | 6 | Channel-3 | 0.00128906 | 961.754 | Channel-3 | Shapley |
| 4 | 3 | Channel-6 | 0.000582897 | 434.893 | Channel-6 | Shapley |
| 5 | 7 | Channel-11 | 0.00132958 | 991.984 | Channel-11 | Shapley |
| 6 | 8 | Channel-4 | 0.00105388 | 786.286 | Channel-4 | Shapley |
| 7 | 5 | Channel-8 | 0.00132869 | 991.321 | Channel-8 | Shapley |
Since the results alwasy come in the same format we can plot the conversion volume as usual

Logistic regression is another common approach to the attribution problem. Given that the outcome of the path to convert is binary (technically there may be multiple conversions but that’s a separate topic) this approach makes sense. The data format requirements for dda_model are mainly influenced by the desire to have logistic regression as one of the available methods. Currently the regression only uses the paths and and counts of touchpoints as features, and uses the GLM function for fitting. In the future the ability to include interaction effects and exogenous variables will be supported. Furthermore, it is very likely that a transition to the use of Flux.jl will be adopted. For now, all we need to do is specify the model parameter.
You will notice that the type is now LogisticAttributionModel
And we get back two additional elements - the glm_fit and the attr_weights
The glm_fit is the actual regression model object.
GLM.GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Distributions.Poisson{Float64}, GLM.LogLink}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:
Coefficients:
─────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────
x1 -6.61763 0.0326831 -202.48 <1e-99 -6.68169 -6.55357
x2 -6.26388 0.016308 -384.10 <1e-99 -6.29585 -6.23192
x3 -6.03034 0.0311547 -193.56 <1e-99 -6.0914 -5.96928
x4 -7.60322 0.0324184 -234.53 <1e-99 -7.66676 -7.53968
x5 -5.47331 0.0345839 -158.26 <1e-99 -5.54109 -5.40552
x6 -7.19213 0.0642111 -112.01 <1e-99 -7.31798 -7.06628
x7 -6.34704 0.0329972 -192.35 <1e-99 -6.41172 -6.28237
─────────────────────────────────────────────────────────────────
attr_weights are simply a transformation of the fitted coefficients the magnitude of which guids the volume of attributed conversions.
1×7 Matrix{Float64}:
0.0941608 0.35998 0.154743 0.0651031 0.200241 0.0189372 0.106835
And as always, results can be viewed and plotted just the same. Because we set include_heuristics to true we get back the Last and First-Touch conversions as well.

