Automated causal inference pipelines for data scientists

These details have not been verified by PyPI

Project links

Project description

PyAutoCausal

Automated causal inference pipelines for data scientists

Why Causal Inference Matters in Tech

As data scientists, we're often asked to go beyond correlation and answer causal questions:

"Did our new recommendation algorithm actually increase user engagement, or was it just seasonal trends?"
"What's the true impact of our premium subscription tier on customer retention?"
"How much did our marketing campaign increase conversions versus organic growth?"
"Did our product redesign cause the drop in user activity, or was it market conditions?"

These questions can't be answered with standard predictive models or A/B tests alone. Real-world constraints often prevent randomized experiments:

Ethical concerns: Can't randomly deny users important features
Business constraints: Can't risk revenue on large-scale experiments
Natural experiments: Sometimes changes happen organically (competitor exits, policy changes)
Historical analysis: Need to evaluate past decisions without experimental data

The Challenge of Observational Data

When working with observational data (logs, user behavior, historical metrics), we face fundamental challenges:

Confounding: Users who adopt premium features might be inherently more engaged
Selection bias: Treatment assignment isn't random
Time-varying effects: Impact changes over time
Heterogeneous effects: Different user segments respond differently

Traditional ML models are built for prediction, not causal inference. They'll happily exploit confounders and selection bias to maximize accuracy, giving you precisely wrong answers to causal questions.

PyAutoCausal: Causal Inference Made Practical

PyAutoCausal automates the complex decision tree of modern causal inference methods. Instead of manually implementing and choosing between dozens of estimators, PyAutoCausal:

Analyzes your data structure to understand treatment timing, units, and available controls
Selects appropriate methods based on your data characteristics
Validates assumptions and warns about potential violations
Executes analysis with proper statistical inference
Exports results in formats ready for stakeholder communication

Quick Example: Measuring Feature Impact

from pyautocausal.pipelines.example_graph import causal_pipeline
import pandas as pd

# Your product data with treatment (feature rollout) and outcome (engagement)
data = pd.DataFrame({
    'id_unit': [...],        # User identifier
    't': [...],              # Time periods
    'treat': [...],          # 1 if user has feature, 0 otherwise
    'y': [...],              # Your KPI (DAU, sessions, revenue, etc.)
    'x1': [...],             # User characteristics
    'x2': [...]              # Additional controls
})

# PyAutoCausal automatically:
# - Detects this is panel data with staggered treatment
# - Chooses modern DiD methods (e.g., Callaway-Sant'Anna)
# - Handles heterogeneous treatment effects
# - Produces event study plots

pipeline = causal_pipeline(output_path="./feature_impact_analysis")
pipeline.fit(df=data)

# Results include:
# - Average treatment effect with confidence intervals
# - Dynamic effects over time since treatment
# - Heterogeneity analysis across user segments
# - Diagnostic plots and assumption checks

Real Tech Applications

Product & Feature Analysis

Feature rollout impact: Measure true lift from new features beyond selection effects
UI/UX changes: Isolate design impact from user self-selection
Pricing changes: Estimate elasticity when users choose their plans
Platform migrations: Quantify the causal effect of moving users to new systems

Marketing & Growth

Campaign effectiveness: Separate campaign impact from organic trends
Channel attribution: Understand true incremental value of marketing channels
Retention interventions: Measure causal impact of win-back campaigns
Geographic expansions: Estimate market entry effects using synthetic controls

Business Operations

Policy changes: Evaluate impact of new policies on user behavior
Competitive effects: Measure how competitor actions affect your metrics
Seasonal adjustments: Separate true treatment effects from seasonality
Long-term impacts: Track how effects evolve over months/years

Why Automation Matters

Modern causal inference has seen an explosion of methods in recent years. Choosing correctly requires deep knowledge of:

Parallel trends assumptions
Staggered treatment timing
Heterogeneous treatment effects
Two-way fixed effects bias
Synthetic control construction

PyAutoCausal encodes this expertise, automatically routing your analysis through the appropriate methods while maintaining transparency about assumptions and limitations.

Installation

pip install pyautocausal

Or for development:

git clone https://github.com/yourusername/pyautocausal.git
cd pyautocausal
poetry install

Core Concepts

Graph-Based Pipeline Architecture

PyAutoCausal organizes causal analysis as directed graphs of computational nodes:

from pyautocausal.orchestration.graph import ExecutableGraph
from pyautocausal.persistence.output_config import OutputConfig, OutputType

# Build custom pipelines using the graph API
graph = (ExecutableGraph()
    .configure_runtime(output_path="./outputs")
    .create_input_node("data", input_dtype=pd.DataFrame)
    .create_decision_node("has_multiple_periods", 
                         condition=lambda df: df['t'].nunique() > 1,
                         predecessors=["data"])
    .create_node("cross_sectional_analysis", 
                cross_sectional_estimator,
                predecessors=["has_multiple_periods"])
    .create_node("panel_analysis",
                panel_estimator, 
                predecessors=["has_multiple_periods"])
    .when_false("has_multiple_periods", "cross_sectional_analysis")
    .when_true("has_multiple_periods", "panel_analysis")
)

graph.fit(data=your_dataframe)

Automated Method Selection

The framework automatically routes your data through appropriate causal inference methods:

Cross-sectional (single time period) → OLS with robust inference
Panel with single treated unit → Synthetic control methods
Panel with multiple treatment timing → Modern DiD estimators
Staggered treatment adoption → Callaway-Sant'Anna, BACON decomposition
Large datasets → Double/debiased machine learning approaches

Built-in Validation

Every analysis includes:

Data quality checks: Missing values, duplicates, proper formatting
Assumption testing: Parallel trends, common support, balance
Robustness checks: Alternative specifications and estimators
Diagnostic plots: Visual assumption validation

Project Structure

pyautocausal/
├── orchestration/          # Core graph execution framework
│   ├── graph.py            # ExecutableGraph class and execution logic
│   ├── nodes.py            # Node types (standard, decision, input)
│   └── ...
├── pipelines/              # Pre-built causal inference workflows
│   ├── library/            # Reusable causal analysis components
│   │   ├── specifications.py  # Treatment/outcome specifications
│   │   ├── estimators.py      # Statistical estimators
│   │   ├── conditions.py      # Data characteristic detectors
│   │   ├── plots.py           # Visualization functions
│   │   └── ...
│   └── example_graph.py    # Main causal inference pipeline
├── causal_methods/         # Core statistical methods
│   └── double_ml.py        # DoubleML implementation
├── persistence/            # Output handling and export
│   ├── notebook_export.py  # Jupyter notebook generation
│   ├── output_config.py    # Output format configuration
│   └── ...
└── utils/                  # Utility functions

Next Steps

📖 Getting Started Guide - Step-by-step tutorial
📊 Causal Methods Reference - All available estimators
🔧 Pipeline Development - Building custom workflows
📋 Data Requirements - Input formats and validation
💡 Examples - Real-world case studies

Contributing

We welcome contributions! Please see our contributing guidelines for details.

License

MIT License

Citation

If you use PyAutoCausal in your research, please cite:

@software{pyautocausal,
  title={PyAutoCausal: Automated Causal Inference Pipelines},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/pyautocausal}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Aug 24, 2025

0.1.0

Aug 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyautocausal-0.1.1.tar.gz (1.7 MB view details)

Uploaded Aug 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyautocausal-0.1.1-py3-none-any.whl (1.8 MB view details)

Uploaded Aug 24, 2025 Python 3

File details

Details for the file pyautocausal-0.1.1.tar.gz.

File metadata

Download URL: pyautocausal-0.1.1.tar.gz
Upload date: Aug 24, 2025
Size: 1.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.10.16 Darwin/24.4.0

File hashes

Hashes for pyautocausal-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`03b0f4104bddaae95c8e7d1f962406c932d5b6601a2be8d3f08dbfdd38321642`
MD5	`7f277a4b2c43100c972a9ef2551cd85b`
BLAKE2b-256	`f139a31e17d4925c9518c65a7e63547caa4fd47455d1d66950ca5fca0dfd7115`

See more details on using hashes here.

File details

Details for the file pyautocausal-0.1.1-py3-none-any.whl.

File metadata

Download URL: pyautocausal-0.1.1-py3-none-any.whl
Upload date: Aug 24, 2025
Size: 1.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.5 CPython/3.10.16 Darwin/24.4.0

File hashes

Hashes for pyautocausal-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`32135b58272309e7319f27aea3480fb73b4c03a11401f1018de2317deb6b3fbe`
MD5	`3468aced1f93e68dc85da27e923d801c`
BLAKE2b-256	`99fc65f90c335a2291b31dbc68f800d71258e8fc2dfb4b197a23d59ebc883b1e`

See more details on using hashes here.

pyautocausal 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyAutoCausal

Why Causal Inference Matters in Tech

The Challenge of Observational Data

PyAutoCausal: Causal Inference Made Practical

Quick Example: Measuring Feature Impact

Real Tech Applications

Product & Feature Analysis

Marketing & Growth

Business Operations

Why Automation Matters

Installation

Core Concepts

Graph-Based Pipeline Architecture

Automated Method Selection

Built-in Validation

Project Structure

Next Steps

Contributing

License

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes