Bayesian Marketing Mix Modeling framework with causal calibration, pressure testing, and an adaptive measurement loop
Project description
MMM Framework
A modular Marketing Mix Model framework built on PyMC-Marketing with full Bayesian uncertainty quantification, async model fitting, and interactive visualization.
๐ Read the documentation โ
Overview
This framework provides a production-ready implementation for marketing mix modeling that emphasizes methodological rigor over specification shopping. It handles variable-dimension MFF (Master Flat File) data, multiplicative specifications with Hill/logistic saturation, hierarchical panel structures, and offers fast frequentist alternatives for rapid iteration.
Why This Framework?
Traditional MMM practices often involve iterating on specifications until results "look right"โadjusting lags, decay rates, and controls until coefficients achieve desired properties. This approach inflates false positive rates, biases coefficients, and generates confidence intervals that do not reflect actual uncertainty.
This framework is designed around different principles:
- Pre-specified analyses reduce researcher degrees of freedom
- Bayesian inference provides genuine uncertainty quantification through posterior distributions
- Hierarchical modeling enables partial pooling across geographies and products
- Experimental validation where model predictions can be tested against holdout results
Features
Core Modeling
- Bayesian MMM Engine โ Full PyMC implementation with proper uncertainty quantification
- Flexible Saturation Functions โ Logistic and Hill saturation with interpretable parameterization
- Geometric Adstock โ Configurable carryover effects with multiple decay structures
- Hierarchical Effects โ Partial pooling across geographies, products, or other dimensions
- Trend Modeling โ Linear, piecewise, B-spline, and Gaussian Process trend options
- Fourier Seasonality โ Configurable seasonal harmonics
Advanced Capabilities (mmm_extensions)
- Nested/Mediated Models โ Causal pathways through intermediate outcomes (Media โ Awareness โ Sales)
- Multivariate Outcomes โ Joint modeling of multiple KPIs with correlated errors
- Cross-Product Effects โ Cannibalization, halo effects, and spillovers between products
- Partial Observation โ Handle sparse mediator data (e.g., monthly surveys in weekly model)
- Effect Decomposition โ Separate direct vs. indirect effects with uncertainty
- Flexible Priors โ Constrained effects (positive/negative) with configurable priors
Inference & Analysis
- Counterfactual Contributions โ Proper contribution analysis with uncertainty bands
- Scenario Planning โ Budget reallocation simulations with posterior predictive checks
- Prior/Posterior Comparison โ Visualize how data updates beliefs
- Component Decomposition โ Break down predictions into trend, seasonality, media, controls
Infrastructure
- FastAPI Backend โ RESTful API with OpenAPI documentation
- Async Job Processing โ Redis + ARQ for non-blocking model fitting
- Streamlit Frontend โ Interactive dashboards for configuration, fitting, and analysis
- LangGraph Integration โ AI-assisted model interpretation with multiple LLM providers
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Streamlit Frontend โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Data โ โ Config โ โ Model โ โ Results โ โ Chat โ โ
โ โ Upload โ โ Builder โ โ Fitting โ โ Viewer โ โInterface โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI Backend โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ /data/* โ โ /configs/* โ โ /models/* โ โ
โ โ Upload/List โ โ CRUD โ โ Fit/Status/Results โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Redis โโโโโโโบโ ARQ โโโโโโโบโ PyMC Model โ
โ Queue โ โ Worker โ โ Fitting โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
Installation
Prerequisites
- Python 3.12+
- Redis server
- uv (recommended) or pip
Quick Install
# Clone the repository
git clone https://github.com/redam94/mmm-framework.git
cd mmm-framework
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .
# Install app dependencies for Streamlit frontend
uv sync --group app
Development Install
# Install all dependencies including dev tools
uv sync --group dev --group app
Quick Start
1. Start Redis
redis-server
2. Start the API Server
cd api
uvicorn main:app --reload --host 0.0.0.0 --port 8000
3. Start the ARQ Worker
cd api
arq worker.WorkerSettings
4. Launch the Streamlit App
cd app
streamlit run Home.py
5. Access the Application
- Streamlit UI: http://localhost:8501
- API Documentation: http://localhost:8000/docs
- API Health Check: http://localhost:8000/health
Usage
Python API
from mmm_framework import (
MFFLoader,
ModelConfigBuilder,
MediaChannelConfigBuilder,
BayesianMMM,
)
# Load data in MFF format
loader = MFFLoader(config=mff_config)
panel_data = loader.load("data.csv")
# Build model configuration
config = (
ModelConfigBuilder()
.with_kpi("sales", log_transform=True)
.with_media_channel(
MediaChannelConfigBuilder()
.with_name("tv")
.with_adstock(alpha_prior=(1, 3), l_max=8)
.with_saturation(lam_prior=(1, 2))
.build()
)
.with_seasonality(yearly=True, n_fourier=2)
.with_trend(trend_type="linear")
.build()
)
# Fit the model
model = BayesianMMM(
X_media=panel_data.media,
y=panel_data.kpi,
channel_names=panel_data.channel_names,
config=config,
)
results = model.fit(
draws=2000,
tune=1000,
chains=4,
nuts_sampler="numpyro", # 4-10x faster than CPU PyMC
)
# Get contributions with uncertainty
contributions = model.compute_contributions()
print(contributions.mean_contributions)
print(contributions.hdi_contributions) # 94% credible intervals
Fluent Configuration API
The framework provides fluent builders for all configuration objects:
from mmm_framework import (
ModelConfigBuilder,
MediaChannelConfigBuilder,
AdstockConfigBuilder,
SaturationConfigBuilder,
HierarchicalConfigBuilder,
)
# Build a hierarchical model configuration
config = (
ModelConfigBuilder()
.with_kpi("transactions")
.with_hierarchical(
HierarchicalConfigBuilder()
.with_geo_dimension("dma")
.with_partial_pooling(True)
.build()
)
.with_media_channel(
MediaChannelConfigBuilder()
.with_name("digital")
.with_adstock(
AdstockConfigBuilder()
.with_type("geometric")
.with_alpha_prior("Beta", alpha=1, beta=3)
.with_l_max(4)
.build()
)
.with_saturation(
SaturationConfigBuilder()
.with_type("logistic")
.with_lam_prior("Gamma", alpha=2, beta=1)
.build()
)
.build()
)
.build()
)
Extended Models
For complex scenarios with mediated effects or multiple outcomes:
from mmm_framework.mmm_extensions import (
CombinedModelConfigBuilder,
awareness_mediator,
cannibalization_effect,
CombinedMMM,
)
# Build a model with awareness mediation and product cannibalization
config = (
CombinedModelConfigBuilder()
.with_mediator(awareness_mediator(decay=0.9))
.with_outcomes("single_pack", "multipack")
.with_cross_effect(
cannibalization_effect("multipack", "single_pack", promo_col="multi_promo")
)
.build()
)
model = CombinedMMM(
X_media=X_media,
outcome_data={"single_pack": y1, "multipack": y2},
channel_names=channels,
config=config,
mediator_data={"awareness": survey_data},
)
MMM Extensions Module
The mmm_extensions subpackage provides advanced modeling capabilities for scenarios beyond standard single-outcome MMM. It supports nested/mediated causal pathways, multivariate outcomes with cross-effects, and combined models that handle both.
Module Architecture
mmm_framework/mmm_extensions/
โโโ config.py # Dataclasses for all configuration objects
โโโ builders.py # Fluent builders and factory functions
โโโ components.py # PyMC/PyTensor building blocks (lazy-loaded)
โโโ models.py # NestedMMM, MultivariateMMM, CombinedMMM classes
Nested/Mediated Models
Nested models capture causal pathways where media affects intermediate outcomes (mediators) which in turn drive final outcomes:
Media โ Awareness โ Sales
โ____________โ
(direct effect)
This decomposes the total media effect into:
- Direct effect: Media โ Sales (bypassing mediator)
- Indirect effect: Media โ Awareness โ Sales
- Total effect: Direct + Indirect
from mmm_framework.mmm_extensions import (
MediatorConfigBuilder,
NestedModelConfigBuilder,
NestedMMM,
awareness_mediator,
)
# Method 1: Factory function for common configurations
awareness = awareness_mediator(
name="brand_awareness",
observation_noise=0.15, # Survey measurement error
)
# Method 2: Builder for full control
awareness_custom = (
MediatorConfigBuilder("brand_awareness")
.partially_observed(observation_noise=0.15)
.with_positive_media_effect(sigma=1.0)
.with_slow_adstock(l_max=12) # Brand metrics have long carryover
.with_direct_effect(sigma=0.3) # Allow some direct effect
.build()
)
# Build model configuration
nested_config = (
NestedModelConfigBuilder()
.add_mediator(awareness_custom)
.map_channels_to_mediator(
"brand_awareness",
["tv", "digital"] # Only these channels build awareness
)
.share_adstock(True)
.build()
)
# Fit the model
model = NestedMMM(
X_media=X_media,
y=sales,
channel_names=["tv", "digital", "social", "search"],
config=nested_config,
mediator_data={"brand_awareness": survey_data}, # Sparse observations OK
)
results = model.fit(draws=2000, tune=1000)
# Decompose effects
mediation_effects = model.get_mediation_effects()
for channel_effect in mediation_effects:
print(f"{channel_effect.channel}:")
print(f" Direct: {channel_effect.direct_effect:.3f}")
print(f" Indirect via awareness: {channel_effect.indirect_effects['brand_awareness']:.3f}")
print(f" Proportion mediated: {channel_effect.proportion_mediated:.1%}")
Mediator Types
| Type | Description | Use Case |
|---|---|---|
FULLY_OBSERVED |
Complete time series available | Foot traffic, web visits |
PARTIALLY_OBSERVED |
Sparse observations (surveys) | Brand awareness, consideration |
FULLY_LATENT |
No direct observations | Latent brand equity |
Multivariate Outcome Models
Multivariate models jointly estimate effects across multiple outcomes, capturing correlations and cross-product effects:
from mmm_framework.mmm_extensions import (
OutcomeConfigBuilder,
CrossEffectConfigBuilder,
MultivariateModelConfigBuilder,
MultivariateMMM,
cannibalization_effect,
halo_effect,
)
# Define outcomes
single_pack = (
OutcomeConfigBuilder("single_pack", column="sales_single")
.with_positive_media_effects(sigma=0.5)
.include_trend()
.include_seasonality()
.build()
)
multipack = (
OutcomeConfigBuilder("multipack", column="sales_multi")
.with_positive_media_effects(sigma=0.5)
.include_trend()
.include_seasonality()
.build()
)
# Define cross-effects
# Cannibalization: multipack promotions steal from single-pack
cannib = (
CrossEffectConfigBuilder("multipack", "single_pack")
.cannibalization()
.modulated_by_promotion("multipack_promo")
.lagged() # Effect appears next period
.with_prior_sigma(0.3)
.build()
)
# Or use factory function
cannib_simple = cannibalization_effect(
source="multipack",
target="single_pack",
promotion_column="multipack_promo",
lagged=True,
)
# Build configuration
mv_config = (
MultivariateModelConfigBuilder()
.add_outcome(single_pack)
.add_outcome(multipack)
.add_cross_effect(cannib)
.with_lkj_eta(2.0) # Prior on correlation structure
.share_media_adstock(True)
.share_media_saturation(False) # Different saturation per product
.share_seasonality(True)
.build()
)
# Fit model
model = MultivariateMMM(
X_media=X_media,
outcome_data={"single_pack": y_single, "multipack": y_multi},
channel_names=channels,
config=mv_config,
promotion_data={"multipack_promo": promo_indicator},
)
results = model.fit()
# Analyze cross-effects
cross_effects = model.get_cross_effect_summary()
print(cross_effects)
# Get correlation matrix
corr = model.get_correlation_matrix()
Cross-Effect Types
| Type | Description | Prior Constraint |
|---|---|---|
CANNIBALIZATION |
Source steals from target | Negative effect |
HALO |
Source lifts target | Positive effect |
SPILLOVER |
Bidirectional relationship | Unconstrained |
Combined Models
For the most complex scenarios, CombinedMMM supports both nested pathways and multivariate outcomes:
from mmm_framework.mmm_extensions import (
CombinedModelConfigBuilder,
CombinedMMM,
)
# Full c-store scenario:
# - Media builds awareness (nested)
# - Awareness drives both product sales
# - Multipack promotions cannibalize single-pack (cross-effect)
# - Correlated errors across products
config = (
CombinedModelConfigBuilder()
# Nested component
.with_awareness_mediator("brand_awareness", observation_noise=0.15)
.map_channels_to_mediator("brand_awareness", ["tv", "digital"])
# Multivariate component
.with_outcomes("single_pack", "multipack")
.with_cannibalization("multipack", "single_pack", promotion_column="multi_promo")
# Link mediator to outcomes
.map_mediator_to_outcomes("brand_awareness", ["single_pack", "multipack"])
.with_lkj_eta(2.0)
.build()
)
model = CombinedMMM(
X_media=X_media,
outcome_data={"single_pack": y1, "multipack": y2},
channel_names=["tv", "digital", "social", "search"],
config=config,
mediator_data={"brand_awareness": survey_data},
promotion_data={"multi_promo": promo_flags},
)
results = model.fit(draws=2000, tune=1000, chains=4)
Factory Functions
Common configurations are available as factory functions:
from mmm_framework.mmm_extensions import (
awareness_mediator,
foot_traffic_mediator,
cannibalization_effect,
halo_effect,
)
# Awareness mediator with slow decay and partial observation
awareness = awareness_mediator(
name="brand_awareness",
observation_noise=0.15,
)
# Foot traffic mediator with full observation
traffic = foot_traffic_mediator(
name="store_visits",
observation_noise=0.05,
)
# Cross-effects
cannib = cannibalization_effect("product_b", "product_a", promotion_column="b_promo")
halo = halo_effect("premium", "value") # Premium brand lifts value brand
Results and Diagnostics
All extended models provide structured result containers:
# Mediation decomposition
effects = model.get_mediation_effects()
for e in effects:
print(e.to_dict())
# Cross-effect summary with HDI
cross_df = model.get_cross_effect_summary()
# Returns: source, target, effect_type, mean, sd, hdi_3%, hdi_97%
# Correlation matrix for multivariate outcomes
corr_matrix = model.get_correlation_matrix()
# Standard ArviZ diagnostics
results.summary(var_names=["beta_media", "alpha"])
results.plot_trace(var_names=["beta_media"])
Variable Selection for Control Variables
The mmm_extensions module includes Bayesian variable selection priors for precision control variables. These methods provide principled shrinkage and selection, improving precision when many potential controls exist but only a few are truly relevant.
โ ๏ธ CAUSAL WARNING: Variable selection should ONLY be applied to precision control variablesโvariables that affect the outcome but do NOT affect treatment assignment (media spending). Confounders (variables affecting both media and sales) must be EXCLUDED from selection and always included with standard priors. Shrinking a confounder toward zero does not remove confounding bias.
Variable Classification
Before applying variable selection, classify each control variable:
| Variable Type | Examples | Selection OK? | Reason |
|---|---|---|---|
| Precision Controls | Weather, gas prices, minor holidays, sports events | โ Yes | Affect outcome only |
| Confounders | Distribution/ACV, price, competitor media | โ No | Affect both media AND outcome |
| Core Components | Trend, seasonality | โ No | Fundamental model structure |
| Mediators | Brand awareness (if on causal path) | โ No | Blocks causal effect |
Available Methods
| Method | Best For | Key Feature |
|---|---|---|
| Regularized Horseshoe | Sparse signals (few relevant controls) | Strong shrinkage of noise, preserves signals |
| Finnish Horseshoe | Same as regularized horseshoe | Emphasizes slab regularization |
| Spike-and-Slab | Explicit inclusion probabilities | Direct posterior P(included) for each variable |
| Bayesian LASSO | Many small effects | L1-like shrinkage in Bayesian framework |
Quick Start
from mmm_framework.mmm_extensions import (
# Builders
VariableSelectionConfigBuilder,
HorseshoeConfigBuilder,
# Factory functions
sparse_controls,
selection_with_inclusion_probs,
# Components
build_control_effects_with_selection,
summarize_variable_selection,
)
# Method 1: Factory function (simplest)
config = sparse_controls(
expected_nonzero=3,
"distribution", "price", "competitor_media", # Confounders to exclude
)
# Method 2: Builder with full control
config = (VariableSelectionConfigBuilder()
.regularized_horseshoe(expected_nonzero=3)
.with_slab_scale(2.0)
.exclude_confounders("distribution", "price", "competitor_media")
.build())
Configuration Classes
VariableSelectionConfig
The main configuration object:
from mmm_framework.mmm_extensions import (
VariableSelectionConfig,
VariableSelectionMethod,
HorseshoeConfig,
)
config = VariableSelectionConfig(
method=VariableSelectionMethod.REGULARIZED_HORSESHOE,
horseshoe=HorseshoeConfig(
expected_nonzero=3, # Prior belief: ~3 controls are relevant
slab_scale=2.0, # Max expected effect in std units
slab_df=4.0, # Slab tail weight
),
exclude_variables=("distribution", "price"), # Always include these
)
HorseshoeConfig
Controls the regularized horseshoe prior:
from mmm_framework.mmm_extensions import HorseshoeConfigBuilder
horseshoe = (HorseshoeConfigBuilder()
.with_expected_nonzero(5) # Expect 5 relevant controls
.with_slab_scale(2.5) # Allow effects up to 2.5 std
.with_heavy_tails() # slab_df=2.0 for larger effects
.with_aggressive_shrinkage() # Stronger shrinkage of noise
.build())
SpikeSlabConfig
For explicit inclusion probabilities:
from mmm_framework.mmm_extensions import SpikeSlabConfigBuilder
spike_slab = (SpikeSlabConfigBuilder()
.with_prior_inclusion(0.3) # 30% prior prob of inclusion
.with_sharp_selection() # Lower temperature, sharper selection
.continuous() # Required for NUTS sampling
.build())
Builder API
The VariableSelectionConfigBuilder provides a fluent interface:
from mmm_framework.mmm_extensions import VariableSelectionConfigBuilder
# Regularized horseshoe (recommended default)
config = (VariableSelectionConfigBuilder()
.regularized_horseshoe(expected_nonzero=3)
.with_slab_scale(2.0)
.with_slab_df(4.0)
.exclude_confounders("distribution", "price", "competitor_media")
.build())
# Spike-and-slab for inclusion probabilities
config = (VariableSelectionConfigBuilder()
.spike_slab(prior_inclusion=0.3)
.with_sharp_selection()
.exclude_confounders("distribution", "price")
.apply_only_to("weather", "gas_price", "minor_holiday") # Limit scope
.build())
# Bayesian LASSO for many small effects
config = (VariableSelectionConfigBuilder()
.bayesian_lasso(regularization=2.0)
.exclude_confounders("distribution", "price")
.build())
Factory Functions
For common configurations:
from mmm_framework.mmm_extensions import (
sparse_controls,
selection_with_inclusion_probs,
dense_controls,
)
# Sparse: expect few relevant controls
config = sparse_controls(3, "distribution", "price")
# Inclusion probs: want P(included) for each variable
config = selection_with_inclusion_probs(0.5, "distribution", "price")
# Dense: expect many small effects
config = dense_controls(1.0, "distribution", "price")
Integration with Models
Use build_control_effects_with_selection to handle the split between confounders and precision controls:
import pymc as pm
from mmm_framework.mmm_extensions import (
VariableSelectionConfigBuilder,
build_control_effects_with_selection,
)
# Define variable roles
all_controls = ["distribution", "price", "weather", "gas_price", "holiday"]
confounders = ["distribution", "price"]
# Configure selection (excludes confounders automatically)
selection_config = (VariableSelectionConfigBuilder()
.regularized_horseshoe(expected_nonzero=2)
.exclude_confounders(*confounders)
.build())
# Build model
with pm.Model() as model:
sigma = pm.HalfNormal("sigma", 0.5)
# This handles the split automatically:
# - Confounders get standard Normal priors
# - Precision controls get horseshoe priors
control_result = build_control_effects_with_selection(
X_controls=X_controls,
control_names=all_controls,
n_obs=len(y),
sigma=sigma,
selection_config=selection_config,
name_prefix="ctrl",
)
# Use in likelihood
mu = intercept + media_effect + control_result.contribution
pm.Normal("y", mu=mu, sigma=sigma, observed=y)
Interpreting Results
After fitting, analyze variable selection:
from mmm_framework.mmm_extensions import (
compute_inclusion_probabilities,
summarize_variable_selection,
)
# Get inclusion probabilities
inclusion = compute_inclusion_probabilities(
trace=idata,
config=selection_config,
name="ctrl_select",
)
print(f"Effective nonzero: {inclusion['effective_nonzero']:.2f}")
# Full summary table
summary = summarize_variable_selection(
trace=idata,
control_names=precision_controls,
config=selection_config,
name="ctrl_select",
)
print(summary)
# variable mean std hdi_3% hdi_97% inclusion_prob selected
# 0 weather 0.15 0.08 0.01 0.28 0.89 True
# 1 gas_price 0.02 0.05 -0.06 0.10 0.23 False
# 2 holiday -0.11 0.06 -0.21 -0.02 0.85 True
For horseshoe priors, shrinkage factors (ฮบ) indicate selection:
- ฮบ โ 1: Strongly shrunk toward zero (excluded)
- ฮบ โ 0: Preserved at estimated magnitude (included)
# Access shrinkage factors directly
kappa = idata.posterior["ctrl_select_kappa"].mean(dim=["chain", "draw"]).values
for name, k in zip(precision_controls, kappa):
status = "SHRUNK" if k > 0.5 else "PRESERVED"
print(f"{name}: ฮบ={k:.3f} ({status})")
Mathematical Specification
This section provides the formal mathematical specification for the variable selection priors available in mmm_extensions. For the complete technical document with proofs and implementation details, see variable_selection_specification.pdf.
Causal Constraints
Variable selection priors should only be applied to precision control variablesโvariables that affect outcome $Y$ but do not affect treatment $X$ (media spending).
Proposition (Bias from Shrinkage on Confounder Coefficients): Consider a confounder $C$ affecting both media $X$ and outcome $Y$:
$$X = \delta C + \nu, \quad Y = \beta X + \gamma C + \epsilon$$
where $\beta$ is the true causal effect of media. If we shrink the coefficient on $C$ by factor $s \in [0,1]$ (yielding $\tilde{\gamma} = s \cdot \gamma$), the estimated media effect satisfies:
$$\hat{\beta} \xrightarrow{p} \beta + (1-s) \cdot \gamma \cdot \frac{\text{Cov}(X, C)}{\text{Var}(X)}$$
Key implications:
- Complete shrinkage ($s=0$): Full omitted variable bias
- Partial shrinkage ($s=0.1$, 90% shrinkage): Still leaves 90% of the bias
- No shrinkage ($s=1$): Bias eliminated
The bias depends on $\text{Cov}(X,C)/\text{Var}(X)$โthe structural correlation between treatment and confounder. This is invariant to the estimator. Shrinking $\gamma$ does not shrink this correlation.
Example: Distribution (ACV) with $\gamma = 0.1$ (small direct effect) but $\text{Cov}(X,C)/\text{Var}(X) = 2.0$ (high correlation with media). Omitted variable bias = $0.1 \times 2.0 = 0.2$ (20% of media effect). A horseshoe prior seeing only small $\gamma = 0.1$ would shrink it, introducing nearly the full 0.2 bias into $\hat{\beta}$.
Regularized Horseshoe (Piironen & Vehtari, 2017)
The regularized horseshoe prior is:
$$\beta_j = z_j \cdot \tau \cdot \tilde{\lambda}_j$$
where:
- $z_j \sim \mathcal{N}(0, 1)$ โ standardized coefficient
- $\tau \sim \text{Half-}t_{\nu_g}(\tau_0)$ โ global shrinkage
- $\lambda_j \sim \text{Half-}t_{\nu_l}(1)$ โ local shrinkage
- $\tilde{\lambda}_j = \frac{c \cdot \lambda_j}{\sqrt{c^2 + \tau^2 \lambda_j^2}}$ โ regularized local shrinkage
- $c^2 \sim \text{Inv-Gamma}(\nu_s/2, \nu_s s^2/2)$ โ slab regularization
The global shrinkage scale is calibrated as:
$$\tau_0 = \frac{D_0}{D - D_0} \cdot \frac{\sigma}{\sqrt{N}}$$
where $D_0$ is the expected number of nonzero coefficients.
Spike-and-Slab (Continuous Relaxation)
For NUTS-compatible sampling:
$$\beta_j = \gamma_j \cdot \beta_{\text{slab},j} + (1 - \gamma_j) \cdot \beta_{\text{spike},j}$$
where:
- $\gamma_j = \text{sigmoid}(\text{logit}_{\gamma_j} / T)$ โ soft inclusion indicator
- $\text{logit}_{\gamma_j} \sim \mathcal{N}(\text{logit}(\pi), 1)$
- $\beta_{\text{slab},j} \sim \mathcal{N}(0, \sigma_{\text{slab}})$
- $\beta_{\text{spike},j} \sim \mathcal{N}(0, \sigma_{\text{spike}})$ with $\sigma_{\text{spike}} \ll \sigma_{\text{slab}}$
- $T$ is temperature (lower = sharper selection)
Best Practices
- Pre-specify variable classification before seeing any results
- Document rationale for each variable's classification as confounder vs. precision
- Never tune hyperparameters based on fit metrics (this is specification shopping)
- Report inclusion probabilities alongside point estimates
- Show sensitivity to
expected_nonzerospecification - When uncertain about causal role, use standard priors (don't apply selection)
Mathematical Specification: Extended Models
This section provides the formal mathematical specification and statistical justification for the nested, multivariate, and combined models in the mmm_extensions module.
Nested/Mediated Models
Nested models estimate causal pathways where media affects intermediate outcomes (mediators) which in turn affect the final outcome. This is essential when the business question is not just "does media work?" but "how does media work?"
Core Structure
The nested model specifies a system of equations:
Stage 1 โ Media โ Mediator:
$$M_t = \alpha_M + \sum_{c=1}^{C} \beta^{(M)}c \cdot f_c(x{c,t}) + \epsilon^{(M)}_t$$
Stage 2 โ Mediator โ Outcome (with direct effects):
$$y_t = \alpha_y + \gamma \cdot M_t + \sum_{c=1}^{C} \beta^{(D)}c \cdot f_c(x{c,t}) + \epsilon^{(y)}_t$$
where:
- $M_t$ is the mediator value at time $t$ (e.g., brand awareness)
- $f_c(x_{c,t})$ is the transformed media input (adstock + saturation) for channel $c$
- $\beta^{(M)}_c$ is the media โ mediator effect for channel $c$
- $\gamma$ is the mediator โ outcome effect
- $\beta^{(D)}_c$ is the direct media โ outcome effect (bypassing the mediator)
Effect Decomposition
The total effect of media channel $c$ on the outcome decomposes as:
$$\text{Total Effect}_c = \underbrace{\beta^{(D)}c}{\text{Direct}} + \underbrace{\beta^{(M)}c \cdot \gamma}{\text{Indirect}}$$
The proportion mediated quantifies how much of the total effect flows through the mediator:
$$\text{Proportion Mediated}_c = \frac{\beta^{(M)}_c \cdot \gamma}{\beta^{(D)}_c + \beta^{(M)}_c \cdot \gamma}$$
This decomposition is identified under the standard mediation assumptions (sequential ignorability), which require that there are no unmeasured confounders of (1) mediaโmediator, (2) mediatorโoutcome, or (3) mediaโoutcome relationships.
Mediator Types and Observation Models
The framework supports three mediator types, each with different observation models that determine how the latent mediator state relates to observed data.
FULLY_OBSERVED Mediators
For mediators with complete time series observations (e.g., website traffic, store visits from sensors):
$$M^{obs}t = M_t + \nu_t, \quad \nu_t \sim \mathcal{N}(0, \sigma^2\nu)$$
Likelihood:
$$M^{obs}t \sim \mathcal{N}(M_t, \sigma^2\nu) \quad \forall t$$
The measurement noise $\sigma_\nu$ captures sensor error or sampling variation. With complete observations, the latent mediator trajectory is tightly constrained by the data.
Use cases: Daily website sessions, hourly foot traffic counts, real-time social mentions.
PARTIALLY_OBSERVED Mediators
For mediators observed only at sparse intervals (e.g., monthly brand tracking surveys in a weekly model):
$$M^{obs}{t_k} = M{t_k} + \nu_{t_k}, \quad \nu_{t_k} \sim \mathcal{N}(0, \sigma^2_\nu)$$
where ${t_1, t_2, \ldots, t_K} \subset {1, 2, \ldots, T}$ are the observation times.
Likelihood (partial):
$$M^{obs}{t_k} \sim \mathcal{N}(M{t_k}, \sigma^2_\nu) \quad \text{only for } t \in {t_1, \ldots, t_K}$$
At non-observed times, the mediator is inferred from:
- The structural model (media effects)
- Interpolation/smoothing from observed points
- The prior distribution
This is a state-space model where the observation equation applies only at observed times. The framework handles this by masking the likelihood:
# Only observed times contribute to likelihood
pm.Normal("M_observed", mu=M_latent[mask], sigma=sigma_obs, observed=M_data[mask])
Justification: Brand metrics like awareness evolve continuously but are measured infrequently via surveys. The structural model (media โ awareness) provides information about the trajectory between survey waves, while the observations anchor the estimates at measured points.
Use cases: Monthly brand tracking, quarterly NPS surveys, periodic market research.
FULLY_LATENT Mediators
For mediators that are never directly observed but are theoretically important:
$$M_t = \alpha_M + \sum_{c} \beta^{(M)}c \cdot f_c(x{c,t}) + \epsilon^{(M)}_t$$
No observation likelihood โ the mediator is identified purely through:
- Its structural relationship with media inputs
- Its effect on the observed outcome
- Prior distributions on parameters
Identification requirements: Fully latent mediators require strong assumptions:
- The functional form relating media to the mediator must be correctly specified
- The mediator must have a non-zero effect on the outcome ($\gamma \neq 0$)
- Sufficient variation in media inputs to identify $\beta^{(M)}$
Bayesian regularization: Informative priors on $\beta^{(M)}$ and $\gamma$ are critical. Without observed mediator data, the posterior is heavily influenced by priors.
Use cases: Latent brand equity, unobserved consideration sets, theoretical constructs without direct measurement.
Priors for Nested Models
The framework uses weakly informative priors with optional constraints:
| Parameter | Default Prior | Constraint Options |
|---|---|---|
| $\beta^{(M)}_c$ (media โ mediator) | $\mathcal{N}^+(0, 1)$ | Positive (media builds awareness) |
| $\gamma$ (mediator โ outcome) | $\mathcal{N}(0, 1)$ | None (can be negative) |
| $\beta^{(D)}_c$ (direct effect) | $\mathcal{N}(0, 0.5)$ | None |
| $\sigma_\nu$ (observation noise) | $\text{HalfNormal}(0.1)$ | Positive |
The positive constraint on $\beta^{(M)}$ reflects the prior belief that media spending increases (not decreases) brand awareness. This can be relaxed if theoretically justified.
Multivariate Outcome Models
When modeling multiple outcomes simultaneously (e.g., sales of different products), a multivariate model captures correlations and cross-product effects that univariate models miss.
Core Structure
For $K$ outcomes $\mathbf{y}_t = (y_{1,t}, \ldots, y_{K,t})^\top$ :
$$\mathbf{y}_t = \boldsymbol{\alpha} + \mathbf{B} \cdot \mathbf{f}(x_t) + \boldsymbol{\Psi} \cdot \mathbf{y}_t + \boldsymbol{\epsilon}_t$$
where:
- $\boldsymbol{\alpha} \in \mathbb{R}^K$ is the intercept vector
- $\mathbf{B} \in \mathbb{R}^{K \times C}$ is the media effect matrix (outcome ร channel)
- $\mathbf{f}(x_t) \in \mathbb{R}^C$ is the transformed media vector
- $\boldsymbol{\Psi} \in \mathbb{R}^{K \times K}$ is the cross-effect matrix (diagonal = 0)
- $\boldsymbol{\epsilon}_t \sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma})$ is the error with covariance $\boldsymbol{\Sigma}$
Multivariate Normal Likelihood with LKJ Prior
The error covariance captures residual correlation across outcomes:
$$\boldsymbol{\epsilon}_t \sim \mathcal{N}_K(\mathbf{0}, \boldsymbol{\Sigma})$$
We decompose $\boldsymbol{\Sigma} = \mathbf{D} \mathbf{R} \mathbf{D}$ where:
- $\mathbf{D} = \text{diag}(\sigma_1, \ldots, \sigma_K)$ contains outcome-specific scales
- $\mathbf{R}$ is the correlation matrix
LKJ Prior on Correlations:
$$\mathbf{R} \sim \text{LKJCorr}(\eta)$$
The LKJ distribution is the standard prior for correlation matrices:
- $\eta = 1$: Uniform over valid correlation matrices
- $\eta = 2$: Mild shrinkage toward independence (recommended default)
- $\eta > 2$: Stronger shrinkage toward $\mathbf{R} = \mathbf{I}$
Justification: Outcomes like product sales are correlated due to shared drivers (weather, holidays, economic conditions) not captured by the model. The multivariate likelihood:
- Improves efficiency by borrowing strength across outcomes
- Provides valid inference that accounts for correlation
- Enables testing hypotheses about outcome relationships
Cross-Effects
Cross-effects model how one outcome causally affects another, beyond shared correlation.
Cannibalization
When product $j$'s sales reduce product $k$'s sales (substitution):
$$y_{k,t} = \ldots + \psi_{jk} \cdot y_{j,t} + \ldots, \quad \psi_{jk} < 0$$
Promotion-modulated cannibalization:
$$y_{k,t} = \ldots + \psi_{jk} \cdot P_{j,t} \cdot y_{j,t} + \ldots$$
where $P_{j,t} \in {0, 1}$ indicates whether product $j$ is on promotion. This captures the intuition that cannibalization is strongest during promotions.
Prior: $\psi_{jk} \sim \mathcal{N}^-(0, 0.3)$ (half-normal, negative)
Halo Effects
When product $j$'s sales increase product $k$'s sales (complementarity):
$$y_{k,t} = \ldots + \psi_{jk} \cdot y_{j,t} + \ldots, \quad \psi_{jk} > 0$$
Prior: $\psi_{jk} \sim \mathcal{N}^+(0, 0.3)$ (half-normal, positive)
Use cases: Premium brand lifts value brand, flagship product drives accessory sales.
Lagged Cross-Effects
For effects that manifest with delay:
$$y_{k,t} = \ldots + \psi_{jk} \cdot y_{j,t-1} + \ldots$$
This avoids simultaneity issues and may better reflect consumer behavior (e.g., stockpiling from a promotion reduces next-period purchases).
Identification of Cross-Effects
Cross-effects face identification challenges:
- Simultaneity: $y_j$ and $y_k$ are jointly determined
- Confounding: Shared drivers affect both outcomes
Strategies implemented:
- Lagged effects break simultaneity
- Promotion modulation provides exogenous variation
- Informative priors regularize toward zero
- The multivariate error structure captures residual correlation separately from causal effects
Combined Models
The CombinedMMM integrates nested pathways and multivariate outcomes into a unified framework.
Full Specification
Mediator equations (for each mediator $m$):
$$M_{m,t} = \alpha_m + \sum_{c \in \mathcal{C}m} \beta^{(M)}{mc} \cdot f_c(x_{c,t}) + \epsilon^{(M)}_{m,t}$$
where $\mathcal{C}_m$ is the set of channels affecting mediator $m$.
Outcome equations (for each outcome $k$):
$$y_{k,t} = \alpha_k + \underbrace{\sum_{c=1}^{C} \beta^{(D)}{kc} \cdot f_c(x{c,t})}{\text{Direct media effects}} + \underbrace{\sum{m \in \mathcal{M}k} \gamma{km} \cdot M_{m,t}}{\text{Mediator effects}} + \underbrace{\sum{j \neq k} \psi_{jk} \cdot y_{j,t}}{\text{Cross-effects}} + \epsilon{k,t}$$
where $\mathcal{M}_k$ is the set of mediators affecting outcome $k$.
Joint error distribution:
$$\boldsymbol{\epsilon}t = (\epsilon{1,t}, \ldots, \epsilon_{K,t})' \sim \mathcal{N}_K(\mathbf{0}, \boldsymbol{\Sigma})$$
Effect Decomposition in Combined Models
For channel $c$ affecting outcome $k$, the total effect decomposes as:
$$\text{Total}{ck} = \underbrace{\beta^{(D)}{kc}}{\text{Direct}} + \underbrace{\sum{m \in \mathcal{M}k} \beta^{(M)}{mc} \cdot \gamma_{km}}{\text{Indirect via mediators}} + \underbrace{\sum{j \neq k} \psi_{jk} \cdot \text{Total}{cj}}{\text{Cross-outcome spillover}}$$
The last term captures how media affecting outcome $j$ spills over to outcome $k$ through cross-effects. This requires solving a system of equations when cross-effects are bidirectional.
DAG Representation
The combined model implies a directed acyclic graph (DAG):
โโโโโโโโโโโโโโโ
โ Media โ
โ Channels โ
โโโโโโโโฌโโโโโโโ
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
โMediator โ โMediator โ โ Direct โ
โ 1 โ โ 2 โ โ Effects โ
โโโโโโฌโโโโโ โโโโโโฌโโโโโ โโโโโโฌโโโโโ
โ โ โ
โโโโโโโฌโโโโโโโดโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Outcomes โ
โ โโโโโโ โโโโโโ โ
โ โ Yโ โโโโโบโ Yโ โ โ โ Cross-effects
โ โโโโโโ โโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Correlated Errors โ
โ ฮฃ (LKJ prior) โ
โโโโโโโโโโโโโโโโโโโโโโโโ
When to Use Combined Models
| Scenario | Model Choice |
|---|---|
| Single outcome, no mediators | Standard BayesianMMM |
| Single outcome, mediators present | NestedMMM |
| Multiple outcomes, no mediators | MultivariateMMM |
| Multiple outcomes with mediators | CombinedMMM |
| Products with cross-effects | MultivariateMMM or CombinedMMM |
| Full brand funnel (awareness โ consideration โ purchase) | CombinedMMM with cascading mediators |
Computational Considerations
Identifiability
Extended models have more parameters and thus higher risk of weak identification:
| Model | Key Identification Requirements |
|---|---|
| Nested (fully observed) | Variation in media, complete mediator data |
| Nested (partially observed) | Sufficient survey observations, informative priors |
| Nested (fully latent) | Strong priors, non-zero mediator effect |
| Multivariate | Independent variation across outcomes |
| Cross-effects | Exogenous variation (promotions), lagged structure |
Diagnostics: Always check:
- $\hat{R}$ (convergence): Should be < 1.01 for all parameters
- ESS (effective sample size): Should be > 400 for reliable inference
- Prior-posterior overlap: Wide overlap suggests weak identification
Scaling
| Model Component | Computational Cost |
|---|---|
| Additional mediator | ~1.3x per mediator |
| Additional outcome | ~1.5x per outcome |
| Cross-effects | ~1.1x per effect |
| Partial observation | ~1.2x (masking overhead) |
For complex models, use nuts_sampler="numpyro" for 4-10x speedup.
Mathematical Specification: Variable Selection Priors
This section provides the formal mathematical specification for the variable selection priors available in mmm_extensions. For the complete technical document with proofs and implementation details, see variable_selection_specification.pdf.
Causal Constraints
Variable selection priors should only be applied to precision control variablesโvariables that affect outcome $Y$ but do not affect treatment $X$ (media spending).
Bias from Confounder Exclusion: For a confounder $C$ affecting both media $X$ and outcome $Y$:
$$\hat{\beta}_{OLS} \xrightarrow{p} \beta + \gamma \cdot \frac{\text{Cov}(X, C)}{\text{Var}(X)}$$
The bias depends on the correlation between treatment and confounder, not just the confounder's effect magnitude. Shrinking $\gamma$ toward zero does not remove this bias.
Regularized Horseshoe Prior
The regularized horseshoe (Piironen & Vehtari, 2017) provides adaptive shrinkage with a regularized slab.
Model Specification:
$$\gamma_j = z_j \cdot \tau \cdot \tilde{\lambda}_j$$
where:
- $z_j \sim \mathcal{N}(0, 1)$ โ standardized coefficient
- $\tau \sim \text{Half-}t_{\nu_\tau}(0, \tau_0)$ โ global shrinkage
- $\lambda_j \sim \text{Half-}t_{\nu_\lambda}(0, 1)$ โ local shrinkage
- $c^2 \sim \text{Inv-Gamma}(\nu_s/2, \nu_s s^2/2)$ โ slab variance
Regularized Local Shrinkage:
$$\tilde{\lambda}_j = \frac{c \cdot \lambda_j}{\sqrt{c^2 + \tau^2 \lambda_j^2}}$$
Global Shrinkage Calibration:
$$\tau_0 = \frac{D_0}{D - D_0} \cdot \frac{\sigma}{\sqrt{N}}$$
where $D_0$ is the expected number of nonzero coefficients.
Shrinkage Factor:
$$\kappa_j = \frac{1}{1 + \tau^2 \lambda_j^2}$$
- $\kappa_j \approx 1$: Strong shrinkage (coefficient โ 0)
- $\kappa_j \approx 0$: Minimal shrinkage (coefficient preserved)
Effective Number of Nonzero:
$$m_{\text{eff}} = \sum_{j=1}^{D} (1 - \kappa_j)$$
Spike-and-Slab Prior
The spike-and-slab provides explicit posterior inclusion probabilities.
Continuous Relaxation (for NUTS compatibility):
$$\gamma_j = \eta_j \cdot \beta_{\text{slab},j} + (1 - \eta_j) \cdot \beta_{\text{spike},j}$$
where:
- $\eta_j = \text{sigmoid}(\omega_j / T)$ โ soft inclusion indicator
- $\omega_j \sim \mathcal{N}(\text{logit}(\pi), 1)$
- $\beta_{\text{slab},j} \sim \mathcal{N}(0, \sigma_{\text{slab}}^2)$
- $\beta_{\text{spike},j} \sim \mathcal{N}(0, \sigma_{\text{spike}}^2)$ with $\sigma_{\text{spike}} \ll \sigma_{\text{slab}}$
- $T$ = temperature (lower = sharper selection)
Posterior Inclusion Probability:
$$\Pr(\text{included}_j | \mathbf{y}) = \mathbb{E}[\eta_j | \mathbf{y}]$$
Bayesian LASSO
The Bayesian LASSO places Laplace priors on coefficients via a scale mixture representation.
Scale Mixture Representation:
$$\gamma_j | \tau_j \sim \mathcal{N}(0, \tau_j), \quad \tau_j \sim \text{Exponential}\left(\frac{\lambda^2}{2}\right)$$
This is equivalent to:
$$\gamma_j \sim \text{Laplace}\left(0, \frac{1}{\lambda}\right)$$
Shrinkage Properties: Unlike the horseshoe, LASSO provides uniform shrinkageโall coefficients are shrunk by similar proportions.
Method Selection Guide
| Scenario | Recommended Prior | Reason |
|---|---|---|
| Few large effects, many zeros | Regularized Horseshoe | Adaptive shrinkage preserves signals |
| Many small effects | Bayesian LASSO | Uniform shrinkage appropriate |
| Need inclusion probabilities | Spike-and-Slab | Direct interpretation |
| Unknown sparsity structure | Regularized Horseshoe | Most robust |
Hyperparameter Guidance
| Parameter | Symbol | Default | Selection Guidance |
|---|---|---|---|
| Expected nonzero | $D_0$ | 3 | Domain knowledge; err toward larger |
| Slab scale | $s$ | 2.0 | Max plausible effect in std units |
| Slab df | $\nu_s$ | 4.0 | Lower = heavier tails |
| Local df | $\nu_\lambda$ | 5.0 | Lower = heavier tails |
| Prior inclusion | $\pi$ | 0.5 | 0.5 = maximum uncertainty |
| Temperature | $T$ | 0.1 | Lower = sharper selection |
| LASSO penalty | $\lambda$ | 1.0 | Higher = more shrinkage |
Diagnostics
Inclusion Probability (Horseshoe): $$\Pr(\text{included}_j | \mathbf{y}) \approx \mathbb{E}[1 - \kappa_j | \mathbf{y}]$$
Inclusion Probability (Spike-Slab): $$\Pr(\text{included}_j | \mathbf{y}) = \mathbb{E}[\eta_j | \mathbf{y}]$$
Report for each analysis:
- Variable classification (confounder vs precision)
- Selection method and all hyperparameters
- Posterior inclusion probabilities
- Effective number of nonzero ($m_{\text{eff}}$)
- Sensitivity to hyperparameter choices
Data Format
The framework expects data in Master Flat File (MFF) formatโa fully normalized long-format structure with 8 columns:
| Column | Description |
|---|---|
Period |
Time period identifier (date or week number) |
Geography |
Geographic unit (DMA, region, store, etc.) |
Product |
Product or brand identifier |
Campaign |
Campaign or flight identifier |
Outlet |
Media outlet or channel |
Creative |
Creative execution identifier |
VariableName |
Name of the metric (e.g., "Sales", "TV_Spend", "Price") |
VariableValue |
Numeric value for that metric |
Example MFF Data
| Period | Geography | Product | Campaign | Outlet | Creative | VariableName | VariableValue |
|---|---|---|---|---|---|---|---|
| 2023-01-01 | DMA_001 | SKU_A | Q1_Brand | TV | Hero_30s | TV_Spend | 50000 |
| 2023-01-01 | DMA_001 | SKU_A | Q1_Brand | TV | Hero_30s | TV_GRPs | 125 |
| 2023-01-01 | DMA_001 | SKU_A | Q1_Brand | Digital | Banner_A | Digital_Spend | 25000 |
| 2023-01-01 | DMA_001 | SKU_A | โ | โ | โ | Sales | 12500 |
| 2023-01-01 | DMA_001 | SKU_A | โ | โ | โ | Price | 4.99 |
| 2023-01-01 | DMA_001 | SKU_A | โ | โ | โ | Distribution | 0.85 |
| 2023-01-01 | DMA_002 | SKU_A | Q1_Brand | TV | Hero_30s | TV_Spend | 30000 |
| ... | ... | ... | ... | ... | ... | ... | ... |
This normalized structure supports:
- Multiple granularities โ National media (Geography = "National") alongside geo-specific data
- Campaign attribution โ Track spend and response by campaign/flight
- Creative-level analysis โ Compare performance across creative executions
- Flexible aggregation โ Roll up from creative โ outlet โ campaign as needed
Configuration for MFF Structure
from mmm_framework import MFFConfigBuilder
mff_config = (
MFFConfigBuilder()
.with_date_column("Period", format="%Y-%m-%d")
.with_dimension("Geography", type="geo")
.with_dimension("Product", type="product")
.with_dimension("Campaign", type="campaign")
.with_dimension("Outlet", type="outlet")
.with_dimension("Creative", type="creative")
.with_variable_column("VariableName")
.with_value_column("VariableValue")
.with_kpi("Sales")
.with_media_variables(["TV_Spend", "Digital_Spend", "Social_Spend"])
.with_control_variables(["Price", "Distribution"])
.build()
)
Model Specification
Additive Model
The default additive specification:
$$y_{jt} = \alpha_j + \sum_{m=1}^{M}\beta_m f_m(x_{m,jt}) + \sum_{c=1}^{C}\gamma_c z_{c,jt} + \text{Trend}_t + \text{Seasonality}t + \epsilon{jt}$$
where $f_m(x)$ composes adstock and saturation transformations.
Multiplicative Model
For elasticity interpretation:
$$\log(y_{jt}) = \log(\beta_0) + \sum_m \beta_m \log(f_m(x_{m,jt})) + \gamma' Z_{jt} + \epsilon_{jt}$$
Coefficients represent elasticities: percent change in sales per percent change in media.
Saturation Functions
Logistic Saturation (recommended for numerical stability):
$$f(x) = \frac{1 - e^{-\lambda x}}{1 + e^{-\lambda x}}$$
Hill Saturation:
$$f(x) = \frac{x^s}{x^s + K^s}$$
Adstock (Carryover Effects)
Geometric Adstock:
$$A_t = x_t + \alpha A_{t-1}$$
where $\alpha \in [0, 1)$ controls decay rate.
API Reference
Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check with Redis and worker status |
POST |
/data/upload |
Upload MFF data file |
GET |
/data |
List uploaded datasets |
GET |
/data/{id} |
Get dataset details |
POST |
/configs |
Create model configuration |
GET |
/configs |
List configurations |
GET |
/configs/{id} |
Get configuration details |
POST |
/models/fit |
Start async model fitting |
GET |
/models/{id}/status |
Get fitting progress |
GET |
/models/{id}/results |
Get fitted model results |
GET |
/models/{id}/contributions |
Get channel contributions |
POST |
/models/{id}/predict |
Generate predictions |
Example: Fit a Model via API
# Upload data
curl -X POST "http://localhost:8000/data/upload" \
-F "file=@data.csv"
# Create configuration
curl -X POST "http://localhost:8000/configs" \
-H "Content-Type: application/json" \
-d @config.json
# Start fitting
curl -X POST "http://localhost:8000/models/fit" \
-H "Content-Type: application/json" \
-d '{"data_id": "abc123", "config_id": "xyz789"}'
# Check status
curl "http://localhost:8000/models/{model_id}/status"
# Get results
curl "http://localhost:8000/models/{model_id}/results"
Methodological Foundation
This framework is built on established statistical methodology and addresses known problems in marketing mix modeling practice.
The Problem with Specification Shopping
When analysts test multiple model specifications and select based on results (statistical significance, expected signs, "reasonable" ROIs), they invalidate standard statistical inference:
- Testing 20 specifications at ฮฑ=0.05 yields >64% probability of at least one false positive
- Coefficients selected for significance are systematically biased upward (winner's curse)
- Confidence intervals no longer have nominal coverage
Bayesian Approach Benefits
- Genuine uncertainty quantification โ Posterior distributions reflect actual uncertainty from data limitations
- Prior incorporation โ External evidence from experiments or meta-analyses can be formally included
- Hierarchical modeling โ Partial pooling across sparse groups improves estimation
- Decision-theoretic integration โ Posteriors integrate naturally with business decision analysis
Identification Considerations
The framework explicitly addresses common identification problems:
- National media with geo-level data โ Random effects on national media cannot be interpreted as differential causal response since national media provides no geo-level exposure variation
- Saturation-scaling confounding โ When saturation functions are applied, geo-level exposure scaling becomes unidentified due to confounding between scaling and saturation parameters
Recommended Workflow
- Pre-specify the model before looking at results
- Use prior predictive checks to validate priors make sense
- Fit with full uncertainty using Bayesian inference
- Validate against experiments where feasible
- Report credible intervals, not just point estimates
Performance
Inference Speed Comparison
| Method | Time (100 obs) | Uncertainty |
|---|---|---|
| Ridge/NNLS | <10 ms | Bootstrap only |
| CVXPY constrained | 10-100 ms | Bootstrap only |
| PyMC ADVI | 10-30 sec | Approximate posterior |
| PyMC + NumPyro | 30 sec - 2 min | Full posterior |
| PyMC CPU | 2-20 min | Full posterior |
Recommendations
- Development iteration: Use
Ridge(positive=True)with differential evolution for transformation search - Production models: Use
nuts_sampler="numpyro"for 4-10x speedup over CPU PyMC - GPU acceleration: Additional 4x gains available with JAX on GPU
Project Structure
mmm-framework/
โโโ src/mmm_framework/ # Core modeling library
โ โโโ __init__.py # Package exports
โ โโโ config.py # Configuration enums and dataclasses
โ โโโ builders.py # Fluent configuration builders
โ โโโ data_loader.py # MFF parsing and validation
โ โโโ model.py # BayesianMMM implementation
โ โโโ jobs.py # Async job management
โ โโโ mmm_extensions/ # Extended model capabilities
โ โโโ __init__.py # Lazy imports for heavy dependencies
โ โโโ config.py # Mediator, Outcome, CrossEffect configs
โ โโโ builders.py # Fluent builders + factory functions
โ โโโ components.py # PyMC/PyTensor building blocks
โ โโโ models.py # NestedMMM, MultivariateMMM, CombinedMMM
โโโ api/ # FastAPI backend
โ โโโ main.py # Application factory
โ โโโ routes/ # API route handlers
โ โโโ schemas.py # Pydantic models
โ โโโ redis_service.py # Redis connection management
โ โโโ worker.py # ARQ worker settings
โโโ app/ # Streamlit frontend
โ โโโ Home.py # Main entry point
โ โโโ pages/ # Multipage app pages
โ โโโ api_client.py # HTTP client for backend
โ โโโ components/ # Reusable UI components
โโโ examples/ # Usage examples
โ โโโ ex_extensions.py # Extended model examples
โโโ tests/ # Test suite
โ โโโ mmm_extensions/ # Extension module tests
โโโ pyproject.toml # Project configuration
โโโ README.md
Dependencies
Core
pymc>=5.26โ Probabilistic programmingpymc-marketingโ MMM components (saturation, adstock)numpyro>=0.19โ JAX-based NUTS samplernutpie>=0.16โ Fast NUTS implementationpandas>=2.3โ Data manipulationnumpy>=2.3โ Numerical computing
Backend
fastapi>=0.124โ API frameworkredis>=7.1โ Queue backendarq>=0.25โ Async job queuepydantic>=2.12โ Data validationuvicorn>=0.38โ ASGI server
Frontend
streamlit>=1.52โ Web applicationplotly>=6.5โ Interactive visualizationhttpx>=0.28โ HTTP client
References
Bayesian Methods
- Gelman, A., et al. (2013). Bayesian Data Analysis (3rd ed.)
- McElreath, R. (2020). Statistical Rethinking (2nd ed.)
Marketing Mix Modeling
- Jin, Y., et al. (2017). Bayesian methods for media mix modeling with carryover and shape effects. Google Research.
- Chan, D., & Perry, M. (2017). Challenges and opportunities in media mix modeling. Google Research.
Specification Shopping & Replication
- Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology. Psychological Science, 22(11), 1359-1366.
- Silberzahn, R., et al. (2018). Many analysts, one data set. Advances in Methods and Practices in Psychological Science, 1(3), 337-356.
- Camerer, C. F., et al. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351(6280), 1433-1436.
Variable Selection
- Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2), 5018-5051.
- Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465-480.
- George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. JASA, 88(423), 881-889.
- Park, T., & Casella, G. (2008). The Bayesian Lasso. JASA, 103(482), 681-686.
- Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2), 5018-5051.
- Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97(2), 465-480.
- George, E. I., & McCulloch, R. E. (1993). Variable selection via Gibbs sampling. JASA, 88(423), 881-889.
- Park, T., & Casella, G. (2008). The Bayesian Lasso. JASA, 103(482), 681-686.
Contributing
Contributions are welcome. Please ensure:
- All new features include tests
- Code follows the existing style (run
ruff checkandruff format) - Documentation is updated for API changes
- Commit messages are descriptive
Author
Matthew Reda (m.reda94@gmail.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mmm_framework-0.1.0.tar.gz.
File metadata
- Download URL: mmm_framework-0.1.0.tar.gz
- Upload date:
- Size: 930.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
801ae4002742b52e0fd5a1f294d5c7038321a98cf248e149cf62cb0a3a11ba78
|
|
| MD5 |
1ecc2b3d07deb4eff6a67c7984e0e3f2
|
|
| BLAKE2b-256 |
3c21dc74a489c38b126de64668359f51956df59c0be3cc8bbf4faaaa1bc56b26
|
File details
Details for the file mmm_framework-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mmm_framework-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4901692ac160c49427b169ab1968787ff9901e37c1d2eb05d97d4d9c912b1a07
|
|
| MD5 |
a8e1f2f0ee7ce4c4121532480333c7b0
|
|
| BLAKE2b-256 |
242be5acd9254ce0da60d9fe7bb754dde8263d11d82d72b383337ebd34152ff7
|