Skip to main content

Multi-Touch Attribution Models for Marketing Analytics

Project description

Multi-Touch Attribution (MTA)

A comprehensive Python library for multi-touch attribution modeling in marketing analytics. This library implements various attribution models to help marketers understand the contribution of different touchpoints in the customer journey.

🎯 Features

Attribution Models Implemented

  • First Touch: 100% credit to the first interaction
  • Last Touch: 100% credit to the last interaction before conversion
  • Linear: Equal credit distribution across all touchpoints
  • Position-Based (U-Shaped): Customizable weights for first/last touch with remaining credit distributed to middle touches
  • Time Decay: Higher credit to more recent touchpoints
  • Markov Chain: Probabilistic model using transition matrices
  • Shapley Value: Game-theoretic fair allocation based on marginal contributions
  • Shao's Model: Probabilistic Shapley-equivalent approach
  • Logistic Regression: Machine learning-based ensemble attribution
  • Additive Hazard: Survival analysis-based attribution

📦 Installation

pip install mta

Or install from source:

git clone https://github.com/eeghor/mta.git
cd mta
pip install -e .

🚀 Quick Start

Basic Usage

from mta import MTA

# Initialize with your data
mta = MTA(data="your_data.csv", allow_loops=False, add_timepoints=True)

# Run a single attribution model
mta.linear(share="proportional", normalize=True)
mta.show()

# Chain multiple models
(mta.linear(share="proportional")
    .time_decay(count_direction="right")
    .markov(sim=False)
    .shapley()
    .show())

Using Configuration

from mta import MTA, MTAConfig

# Create custom configuration
config = MTAConfig(
    allow_loops=False,
    add_timepoints=True,
    sep=" > ",
    normalize_by_default=True
)

mta = MTA(data="data.csv", config=config)

Working with DataFrames

import pandas as pd
from mta import MTA

# Load your data
df = pd.read_csv("customer_journeys.csv")

# Initialize MTA with DataFrame
mta = MTA(data=df, allow_loops=False)

# Run attribution models
mta.first_touch().last_touch().linear().show()

📊 Data Format

Your input data should be a CSV file or pandas DataFrame with the following columns:

path,total_conversions,total_null,exposure_times
alpha > beta > gamma,10,5,2023-01-01 10:00:00 > 2023-01-01 11:00:00 > 2023-01-01 12:00:00
beta > gamma,5,3,2023-01-02 09:00:00 > 2023-01-02 10:00:00

Required Columns:

  • path: Customer journey as channel names separated by > (or custom separator)
  • total_conversions: Number of conversions for this path
  • total_null: Number of non-conversions for this path
  • exposure_times: Timestamps of channel exposures (optional, can be auto-generated)

🎨 Advanced Usage

Position-Based Attribution with Custom Weights

# Give 30% to first touch, 30% to last touch, 40% distributed to middle
mta.position_based(first_weight=30, last_weight=30, normalize=True)

Time Decay with Direction Control

# Count from left (earliest gets lowest credit)
mta.time_decay(count_direction="left")

# Count from right (latest gets highest credit - more common)
mta.time_decay(count_direction="right")

Markov Chain Attribution

# Analytical calculation (faster)
mta.markov(sim=False, normalize=True)

# Simulation-based (more flexible, handles complex scenarios)
mta.markov(sim=True, normalize=True)

Shapley Value Attribution

# With custom coalition size
mta.shapley(max_coalition_size=3, normalize=True)

Logistic Regression Ensemble

# Custom sampling and iteration parameters
mta.logistic_regression(
    test_size=0.25,
    sample_rows=0.5,
    sample_features=0.5,
    n_iterations=1000,
    normalize=True
)

Export Results

# Compare all models
results_df = mta.compare_models()

# Export to various formats
mta.export_results("attribution_results.csv", format="csv")
mta.export_results("attribution_results.json", format="json")
mta.export_results("attribution_results.xlsx", format="excel")

📈 Example: Complete Analysis Pipeline

from mta import MTA
import pandas as pd

# Load data
mta = MTA(
    data="customer_journeys.csv",
    allow_loops=False,  # Remove consecutive duplicate channels
    add_timepoints=True  # Auto-generate timestamps if missing
)

# Run all heuristic models
(mta
    .first_touch()
    .last_touch()
    .linear(share="proportional")
    .position_based(first_weight=40, last_weight=40)
    .time_decay(count_direction="right"))

# Run algorithmic models
(mta
    .markov(sim=False)
    .shapley(max_coalition_size=2)
    .shao()
    .logistic_regression(n_iterations=2000)
    .additive_hazard(epochs=20))

# Display and export results
results = mta.compare_models()
mta.export_results("full_attribution_analysis.csv")

# Access specific model results
print(f"Markov Attribution: {mta.attribution['markov']}")
print(f"Shapley Attribution: {mta.attribution['shapley']}")

🔬 Model Comparison

Model Type Strengths Use Case
First/Last Touch Heuristic Simple, fast Quick baseline
Linear Heuristic Fair, interpretable Equal value assumption
Position-Based Heuristic Balances first/last Awareness + conversion focus
Time Decay Heuristic Recency-weighted When recent matters more
Markov Chain Algorithmic Considers path structure Sequential dependency
Shapley Value Algorithmic Game-theoretic fairness Complex interactions
Logistic Regression Machine Learning Data-driven Large datasets
Additive Hazard Statistical Time-to-event modeling Survival analysis fans

🛠️ Requirements

  • Python >= 3.8
  • pandas >= 1.3.0
  • numpy >= 1.20.0
  • scikit-learn >= 0.24.0
  • arrow >= 1.0.0

📝 Citation

If you use this library in your research, please cite:

@software{mta2024,
  author = {Igor Korostil},
  title = {MTA: Multi-Touch Attribution Library},
  year = {2024},
  url = {https://github.com/eeghor/mta}
}

📚 References

This library implements models and techniques from the following research papers:

  1. Nisar, T. M., & Yeung, M. (2015)
    Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation
    PDF

  2. Shao, X., & Li, L. (2011)
    Data-driven Multi-touch Attribution Models
    Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    PDF

  3. Dalessandro, B., Perlich, C., Stitelman, O., & Provost, F. (2012)
    Causally Motivated Attribution for Online Advertising
    Proceedings of the Sixth International Workshop on Data Mining for Online Advertising
    PDF

  4. Cano-Berlanga, S., Giménez-Gómez, J. M., & Vilella, C. (2017)
    Attribution Models and the Cooperative Game Theory
    Expert Systems with Applications, 87, 277-286
    PDF

  5. Ren, K., Fang, Y., Zhang, W., Liu, S., Li, J., Zhang, Y., Yu, Y., & Wang, J. (2018)
    Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising
    Proceedings of the 27th ACM International Conference on Information and Knowledge Management
    PDF

  6. Zhang, Y., Wei, Y., & Ren, J. (2014)
    Multi-Touch Attribution in Online Advertising with Survival Theory
    2014 IEEE International Conference on Data Mining
    PDF

  7. Geyik, S. C., Saxena, A., & Dasdan, A. (2014)
    Multi-Touch Attribution Based Budget Allocation in Online Advertising
    Proceedings of the 8th International Workshop on Data Mining for Online Advertising
    PDF

Model-to-Paper Mapping

  • Linear & Position-Based: Baseline models referenced across multiple papers
  • Time Decay: Nisar & Yeung (2015), Zhang et al. (2014)
  • Markov Chain: Shao & Li (2011), Dalessandro et al. (2012)
  • Shapley Value: Cano-Berlanga et al. (2017)
  • Logistic Regression: Dalessandro et al. (2012), Ren et al. (2018)
  • Additive Hazard: Zhang et al. (2014)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Inspired by various academic papers on marketing attribution
  • Built with pandas, numpy, and scikit-learn
  • Special thanks to the open-source community

📧 Contact

Igor Korostil - eeghor@gmail.com

Project Link: https://github.com/eeghor/mta

🐛 Known Issues

  • Shapley value computation can be slow for large numbers of channels
  • Additive hazard model requires evenly-spaced time points for best results

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mta-0.0.8.tar.gz (102.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mta-0.0.8-py3-none-any.whl (95.9 kB view details)

Uploaded Python 3

File details

Details for the file mta-0.0.8.tar.gz.

File metadata

  • Download URL: mta-0.0.8.tar.gz
  • Upload date:
  • Size: 102.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for mta-0.0.8.tar.gz
Algorithm Hash digest
SHA256 14e76e621e72fabd4754d5f9d3e7a6bf15540ba2c3ee34dbd7bc76638429d36c
MD5 902383225e47f6de5fe92ca3fae07733
BLAKE2b-256 3bbb5ad0054430b50ca902902eab44b6b56b5615a09d34d8e9b298af9ce06b11

See more details on using hashes here.

File details

Details for the file mta-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: mta-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 95.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for mta-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 463046c1b2bae3e605dc344f4837d4d53da7a3ef420466e88d92c7dc7f6eeddd
MD5 0a95a24dd562ac92c3e6144258c75048
BLAKE2b-256 cb49a190c93f199eea99f7c42c0b116eb3924c6a007c26262efe2dd1fe34bbb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page