Exploratory data analysis and transformation toolkit for Marketing Mix Modeling (MMM)
Project description
🦉 OwlMix
OwlMix is a comprehensive Python package for Exploratory Data Analysis (EDA) and data transformation tailored for Marketing Mix Modeling (MMM) workflows. It provides automated report generation, statistical analysis, and data transformation utilities to accelerate MMM projects.
🚀 Key Features
📊 Data Analysis & Reporting
- Automated EDA Reports: Generate professional HTML and JSON reports with comprehensive statistics and visualizations
- Correlation Analysis: Matrix correlations, lag correlations, and ACF/PACF analysis for time series
- VIF Calculation: Variance Inflation Factor detection for multicollinearity assessment
- Causality Testing: Granger causality tests to identify causal relationships
- Categorical Analysis: Distribution analysis for categorical variables
- KPI vs Features: Analyze relationships between KPIs and marketing features by time period
- Time Series Decomposition: Seasonal decomposition and trend analysis
- Outlier Detection: Visual identification and analysis of outliers
🔧 Data Transformation
- Adstock Effect: Apply advertising carryover effects to media spend data
- Lag Generation: Create lagged features for time series modeling
- Saturation Transformation: Apply saturation curves (Hill, Logistic, Logit) to media variables
- Data Cleanup: Automated data quality checks and handling (missing values, duplicates, etc.)
- Transformation Pipeline: Chainable pipeline for complex data workflows
🎨 Visual & Export Options
- Multiple HTML Templates: Light and dark theme templates for reports
- Interactive Charts: Distribution plots, time series, correlation heatmaps, outlier charts
- JSON Export: Raw report data for programmatic access
- Chart Storage: Automatic chart generation and storage in
outputs/charts/
⚙️ Flexible Configuration
- Fine-grained control over analyses to include/exclude
- Customizable precision, date formats, and aggregation frequencies
- Column-specific configurations for targeted analysis
- Template customization support
📦 Installation
pip install owl-mix
Requirements:
- Python >= 3.12
- pandas >= 1.5
- matplotlib >= 3.7
- seaborn >= 0.12
- statsmodels >= 0.14.6
- scipy >= 1.10
- Jinja2 >= 3.1
⚡ Quick Start
Basic EDA Report Generation
import pandas as pd
from owlmix.report import OwlMixReport
# Load your data
df = pd.read_csv("your_data.csv")
# Create and generate report
report = OwlMixReport(
df=df,
target="sales", # Target variable for analysis
date_column="date", # Date column for time series analysis
template_name="custom_eda_template.html" # Optional: use "custom_eda_template_dark.html" for dark theme
)
# Generate HTML and JSON reports
report.run(
json_file_name="eda_report.json",
html_file_name="eda_report.html"
)
Output:
eda_report.json: Structured analysis data in JSON formateda_report.html: Interactive HTML report with charts and statisticsoutputs/charts/: Generated visualization files
🛠️ Advanced Configuration
Customize Analyses
import pandas as pd
from owlmix.report import OwlMixReport
from owlmix.typing.enums import Period, ComparisonType, PlotMode
df = pd.read_csv("your_data.csv")
report = OwlMixReport(
df=df,
target="sales",
date_column="date"
)
# Configure time aggregation
report.config.set_time_aggregator_config(
freq="ME", # Month-end aggregation
precision=4 # Decimal precision
)
# Configure categorical columns
report.config.set_categorical_columns_config(
columns=["product_type", "region", "channel"]
)
#
report.config.set_time_comparison_config(
value_columns=["tv_spend", "digital_spend", "radio_spend", "sales"],
comparison_type=ComparisonType.YoY, # "yoy" "mom, etc strings are also valid
precision=2,
)
# Configure KPI vs Feature analysis
report.config.set_kpi_vs_feature_config(
columns=["tv_spend", "digital_spend", "radio_spend"],
date_format="%Y-%m",
period=Period.MONTHLY
)
# Configure VIF (Variance Inflation Factor) analysis
report.config.set_vif_config(
features=["tv_spend", "digital_spend", "radio_spend"],
precision=2
)
# Configure ACF/PACF analysis
report.config.set_acf_pacf_config(
columns=["sales", "digital_spend"],
n_lags=20
)
# Configure causality testing
report.config.set_causality_test_config(
max_lag=5,
error_threshold=0.15
)
report.run(
json_file_name="report.json",
html_file_name="report.html"
)
Time based comparison table and chart
⚠️ Important Notes
- YOY (week-level) can be tricky
- Some years have 53 weeks, others have 52
- ISO week numbering does not perfectly align with calendar dates
- The same week number across years may represent slightly different date ranges
- This can lead to minor inconsistencies in YoY week comparisons
📊 Supported Comparison Types
-
yoy_year
- Granularity: Year
- Comparison: Current year vs previous year
-
mom
- Granularity: Month (
YYYY-MM) - Comparison: Current month vs previous month
- Granularity: Month (
-
wow
- Granularity: Week (week start date)
- Comparison: Current week vs previous week
-
qoq
- Granularity: Quarter (
YYYYQX) - Comparison: Current quarter vs previous quarter
- Granularity: Quarter (
-
yoy_month
- Granularity: Month
- Comparison: Same month across years (e.g., Jan 2024 vs Jan 2023)
-
yoy_quarter
- Granularity: Quarter
- Comparison: Same quarter across years (e.g., Q1 2024 vs Q1 2023)
-
yoy_week
- Granularity: ISO Week
- Comparison: Same week number across years
Data Transformation Pipeline
from owlmix.transform import MMMTransformPipeline
# Create transformation pipeline
pipeline = MMMTransformPipeline(df, date_column="date")
# This feature is being developed.
Configuration Management with File Resolver
The ConfigFileResolver utility simplifies managing configuration files by automatically resolving file references in JSON configs. This is useful for keeping configuration data organized across multiple files.
from owlmix.file_resolver import ConfigFileResolver
# Create a resolver with a JSON config file
resolver = ConfigFileResolver(config="config.json")
# Resolve *_file keys to their actual content
resolved_config = resolver.resolve()
# Save the resolved config
resolver.save("resolved_config.json")
# Get as Python dictionary string
python_dict_string = resolver.to_python_string()
print(python_dict_string)
# Print formatted output
resolver.print()
How it works:
- Any JSON key ending with
_fileis automatically resolved to the file's content - Supports any file type (HTML, TXT, MD, JSON, etc.)
- Works recursively through nested dictionaries and lists
- Includes built-in caching for efficiency
Example Configuration:
{
"report_template": {
"description_file": "templates/report_description.html",
"title": "Analysis Report",
"metadata_file": "config/metadata.json"
}
}
After resolution, description_file key becomes description with the HTML file's content, and metadata_file becomes metadata with the JSON content.
📊 Report Sections
The generated HTML report includes comprehensive sections:
| Section | Description |
|---|---|
| Dataset Overview | Basic information, data types, missing values, memory usage |
| Summary Statistics | Descriptive statistics (mean, std, min, max, quantiles) |
| Data Quality | Missing value patterns, duplicate analysis |
| Distributions | Histograms and density plots for all numeric variables |
| Outlier Analysis | Box plots and outlier identification |
| Correlation Matrix | Pairwise correlations with heatmap visualization |
| Lag Correlations | Time-lagged correlation analysis for time series |
| VIF Analysis | Multicollinearity detection using Variance Inflation Factor |
| ACF/PACF | Autocorrelation and partial autocorrelation for seasonality detection |
| Causality Tests | Granger causality tests for causal relationships |
| Time Comparisons | Period-over-period comparisons (YoY, MoM) |
| KPI vs Features | Relationship between target and marketing features over time |
| Categorical Distributions | Distribution analysis for categorical variables |
🔧 Core Modules
owlmix.eda
Exploratory Data Analysis module with:
SummaryBuilder: Comprehensive summary generationOwlMixEDA: Main EDA orchestrator
Features:
- Correlation analysis (matrix, lag, causality)
- VIF calculation for multicollinearity
- ACF/PACF analysis for seasonality
- Categorical and distribution analysis
- Outlier detection and visualization
owlmix.transform
Data transformation module for MMM preprocessing:
adstock(): Apply advertising carryover effectscreate_lags(): Generate lagged featuressaturation(): Apply saturation curves (Hill, Logistic, Logit)cleanup_data(): Data quality utilitiesMMMTransformPipeline: Chainable pipeline for complex workflows
owlmix.report
Report generation module:
OwlMixReport: Main report generator- HTML template rendering with customizable themes
- JSON data export
- Chart generation and storage
📈 Example Use Cases
Marketing Mix Modeling Workflow
import pandas as pd
from owlmix.report import OwlMixReport
from owlmix.transform import MMMTransformPipeline
# Load raw data
df = pd.read_csv("mmm_data.csv")
# Step 1: Transform data
pipeline = MMMTransformPipeline(df, date_column="date")
pipeline.adstock(columns=["tv", "digital", "radio"], decay_rate=0.5)
pipeline.create_lags(columns=["sales"], lags=[1, 4, 13])
df_transformed = pipeline.get_data()
# Step 2: Analyze with EDA
report = OwlMixReport(
df=df_transformed,
target="sales",
date_column="date"
)
report.config.set_vif_config(
features=["tv", "digital", "radio"],
precision=3
)
report.run(
json_file_name="mmm_eda.json",
html_file_name="mmm_eda.html"
)
📚 Documentation
💡 Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues on GitHub.
📄 License
MIT License - see LICENSE file for details
Author: Sarbadal Pal (sarbadal@gmail.com)
Repository: github.com/sarbadal/owl-mix
📚 Documentation
Detailed documentation is available in the docs/ folder:
docs/eda.md→ EDA module detailsdocs/transform.md→ Data transformation featuresdocs/saturation.md→ Saturation modeling
🧪 Examples
Ready-to-run examples in the examples/ folder:
eda_basic.py- Basic EDA report generationeda_full_workflow.py- Complete workflow examplemmm_workflow_example.py- Marketing Mix Modeling example
🧠 Use Case: Marketing Mix Modeling
OwlMix is designed for MMM workflows where you need to:
- Explore relationships between marketing spend and sales
- Identify multicollinearity issues with VIF
- Analyze time-based patterns and correlations
- Generate professional reports for stakeholders
Perfect for preprocessing data before building MMM models!
Owl Mix is particularly useful for:
- Preprocessing marketing data
- Feature engineering for MMM
- Understanding lagged media effects
- Generating EDA reports before modeling
🔧 Roadmap
Planned enhancements:
- Visualization support (plots, heatmaps)
- HTML report generation
- Automated MMM diagnostics
- CLI support
🤝 Contributing
Contributions are welcome!
Feel free to:
- Open issues
- Suggest features
- Submit pull requests
📄 License
This project is licensed under the MIT License.
⭐ Support
If you find this project useful, consider giving it a star ⭐ on GitHub!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file owl_mix-0.2.0rc1.tar.gz.
File metadata
- Download URL: owl_mix-0.2.0rc1.tar.gz
- Upload date:
- Size: 57.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c4883da67560e4231418e3d3418afd291ad4fe202bddf031389a63e82c7bac3
|
|
| MD5 |
a36fac960caa576a86e75c804fdd14dd
|
|
| BLAKE2b-256 |
86503e2269cd5ee518fce3a7a216f490f9b95a56358ecfe5b3dadc4c44b00e13
|
File details
Details for the file owl_mix-0.2.0rc1-py3-none-any.whl.
File metadata
- Download URL: owl_mix-0.2.0rc1-py3-none-any.whl
- Upload date:
- Size: 72.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d2aa8578f162bff0aed176231182e2cc2eee502e6ce31ff201470b127a3fed7
|
|
| MD5 |
65a75f2d4a89f271a3c7a60106336fc2
|
|
| BLAKE2b-256 |
22a76fd5d0fab97608e3a56211d04df2e4a3c68fc563f11a3d202343f16b4742
|