Skip to main content

Causal inference toolkit for observational data and treatment-effect diagnostics.

Project description

CausalLens

Diagnostics-first causal inference for Python. Estimate treatment effects, inspect overlap and balance, and compare estimators—all with publication-ready diagnostics and plots.

Why CausalLens?

Most causal inference software focuses on fitting models quickly. CausalLens instead focuses on understanding whether your results are trustworthy.

  • Diagnostics bundled with every result: Overlap, balance improvement, effective sample size, and sensitivity checks are returned alongside estimates—not as optional afterthoughts
  • Specification transparency: Compare regression, matching, IPW, and doubly robust estimates side-by-side to reveal model sensitivity and build confidence through agreement (or diagnose problems when estimates diverge)
  • Publication-ready in one command: Exports formatted benchmark tables, Love plots, propensity histograms, sensitivity curves, and subgroup summaries—ready for manuscript submission
  • Falsification tests integrated: Placebo tests and Rosenbaum bounds are CLI-integrated, not separate scripts, so stress-testing your assumptions becomes the default workflow
  • Python + pandas native: Designed for scikit-learn-compatible workflows and seamless pandas integration

Example:

# Load data, run estimation, inspect diagnostics—all in one object
from causal_lens import DoublyRobustEstimator
dr = DoublyRobustEstimator("treatment", "outcome", confounders)
result = dr.fit(data)
print(result.summary())  # Shows effect, CI, p-value, AND overlap/balance diagnostics

Compare this to typical workflows where you fit a model, then manually write separate scripts to check overlap, balance, and sensitivity. CausalLens makes diagnostics inseparable from estimation.

Snapshot

  • Lane: Observational data causal inference and treatment-effect diagnostics
  • Domain: Tabular observational data with binary treatments
  • Stack: Python (pandas, scikit-learn, statsmodels, matplotlib)
  • Estimators: regression adjustment, propensity matching, IPW, doubly robust, 2SLS, difference-in-differences, synthetic control, plus heterogeneous effects and sensitivity analysis
  • Publication-oriented: Exports benchmark tables, charts, and sensitivity summaries optimized for peer review

Overview

CausalLens packages core causal-inference workflows for observational tabular data into a small, testable Python library. The initial release is designed around practical treatment-effect estimation rather than theory-heavy experimentation: estimate treatment effects, inspect overlap and balance, and compare estimators with consistent result objects.

The current repository now uses four complementary evidence tracks:

  • a fixed public-safe observational intervention sample under data/ for reproducible article figures and tests
  • public benchmark datasets drawn from the causal inference literature for externally recognizable evaluation
  • synthetic known-effect data for correctness-oriented validation of estimator behavior
  • a formal Monte Carlo simulation study evaluating estimator bias, RMSE, coverage, and SE calibration across five data-generating processes

What It Demonstrates

  • Propensity score estimation with a scikit-learn logistic model with standardized covariates
  • Regression-adjustment treatment effects with statsmodels OLS
  • Nearest-neighbor propensity matching with optional calipers and Abadie-Imbens analytic standard errors
  • Inverse probability weighting with stabilized weights and weight capping for ATE and ATT targets
  • Doubly robust estimation that combines outcome and propensity models with weight trimming
  • Cross-fitted doubly robust estimation (DML/AIPW style) with 5-fold out-of-fold nuisance estimates to avoid overfitting bias
  • Flexible doubly robust estimation using gradient boosting outcome models for nonlinear confounding
  • T-learner and S-learner meta-learners for conditional average treatment effect (CATE) estimation with optional GBM
  • Analytic standard errors from OLS (regression), Hajek sandwich variance (IPW), Abadie-Imbens matched-pair variance (matching), and semiparametric influence functions (doubly robust)
  • Covariate-balance summaries using standardized mean differences and variance ratios
  • Kish effective sample size for weighted estimators to detect unstable weights
  • Common-support and overlap diagnostics for positivity review
  • Additive-bias sensitivity summaries for explain-away analysis on the outcome scale
  • E-values for unmeasured confounding (VanderWeele & Ding 2017) quantifying the minimum confounder strength to explain away the effect
  • Rosenbaum sensitivity bounds for matched-pair designs quantifying hidden-bias tolerance
  • Placebo/falsification tests on pre-treatment outcomes for specification validation
  • Subgroup treatment-effect summaries for quick heterogeneous-effect review
  • A small command-line demo that exports a reproducible causal report
  • A real-style observational intervention fixture for stable estimator-comparison tests
  • Publication-oriented methodology notes explaining why the initial estimator set is justified
  • Reference parity tests against manual formulas and direct statistical-model fits
  • Paper-ready chart and table exports for estimator comparison, balance, sensitivity, and subgroup effects
  • Love plots showing covariate-level balance before and after adjustment with standard |SMD| thresholds
  • Propensity-score overlap histograms for visual positivity assessment
  • Manuscript drafting docs, figure captions, and cross-dataset benchmark tables for the software-paper path
  • Packaged public benchmarks based on Lalonde and NHEFS so installed users can reproduce the evidence stack without a source checkout
  • Literature comparison table showing CausalLens results match published reference values from Dehejia & Wahba (1999) and Hernán & Robins (2020)
  • Repeated-run stability analysis across seeds, bootstrap counts, and caliper settings
  • External comparison script verifying CausalLens matches manual sklearn/statsmodels implementations to machine precision
  • Difference-in-differences estimator with regression-based ATT, cluster-robust standard errors, and a parallel-trends pre-test
  • Synthetic control method with constrained least-squares donor weights and placebo inference via leave-one-out permutation
  • Two-stage least squares (2SLS) instrumental variables estimator with proper IV variance, first-stage F-statistic, and weak-instrument detection
  • Monte Carlo simulation framework with five DGPs (linear, nonlinear outcome, nonlinear propensity, double nonlinear, strong confounding) evaluating bias, RMSE, coverage, and SE calibration ratio
  • IPW standard errors corrected for propensity-score estimation uncertainty via the Lunceford & Davidian (2004) stacked estimating equations adjustment

Current Output

The default command writes outputs/causal_report.json with:

  • a fixed real-style observational dataset section with estimator comparisons
  • a Lalonde benchmark section with public observational training-program data, using light propensity-overlap trimming for the weighting estimators
  • an NHEFS benchmark section with public smoking-cessation observational data
  • a synthetic validation dataset section with known-effect comparisons
  • overlap summary and propensity score range checks
  • covariate balance before/after weighting, variance ratios, and effective sample sizes
  • lightweight bootstrap intervals for the selected estimate
  • analytic standard errors and p-values from influence functions (DR, IPW) and OLS (regression)
  • additive-bias sensitivity summaries with E-values for the primary doubly robust estimate
  • subgroup treatment-effect estimates
  • placebo/falsification test results on pre-treatment outcomes
  • Rosenbaum sensitivity bounds for matched-pair designs
  • external comparison and stability-analysis summaries for the exported benchmark artifacts

It also writes paper-oriented artifacts under outputs/charts/ and outputs/tables/ including:

  • estimator comparison charts with confidence intervals
  • balance before/after summary charts
  • sensitivity curves
  • subgroup effect charts
  • estimator summary tables in CSV and Markdown
  • external_comparison.csv showing parity against manual sklearn/statsmodels implementations
  • stability_raw.csv and stability_summary.csv capturing repeated-run variability across benchmark settings
  • placebo_test.csv showing falsification test results on pre-treatment outcomes
  • rosenbaum_bounds.csv showing matched-pair sensitivity to hidden bias at each Gamma level
  • Love plots and propensity-score overlap histograms for each benchmark dataset

Next Upgrade Path

  • add article figures, benchmark tables, and formal estimator-comparison writeups for DiD, synthetic control, and IV
  • add regression discontinuity design (RDD) and bunching estimators
  • expand simulation study to additional sample sizes and publish summary tables

All cross-sectional estimators, panel-data methods, IV, and simulation infrastructure are now in place.

Installation

pip install .

Or in development mode:

pip install -e .

For review or replication work:

pip install -e .[dev]

Quick Start

from causal_lens import (
    generate_synthetic_observational_data,
    RegressionAdjustmentEstimator,
    DoublyRobustEstimator,
    CrossFittedDREstimator,
    DifferenceInDifferences,
    TwoStageLeastSquares,
    run_quick_simulation,
    summarize_simulation,
)

# --- Cross-sectional estimators ---
data = generate_synthetic_observational_data(rows=600, seed=42)
confounders = ["age", "severity", "baseline_score"]

reg = RegressionAdjustmentEstimator("treatment", "outcome", confounders)
result_reg = reg.fit(data)

dr = CrossFittedDREstimator("treatment", "outcome", confounders)
result_dr = dr.fit(data)

for r in [result_reg, result_dr]:
    print(f"{r.method:35s}  effect={r.effect:.2f}  SE={r.se:.3f}  p={r.p_value:.4f}")

# --- Panel data: Difference-in-Differences ---
import pandas as pd
panel = pd.DataFrame({"unit": [1,1,2,2], "period": [0,1,0,1],
                      "treat": [1,1,0,0], "y": [3.0,7.0,2.0,4.0]})
did = DifferenceInDifferences("unit", "period", "treat", "y")
result_did = did.fit(panel)
print(f"DiD ATT={result_did.att:.2f}  SE={result_did.se:.3f}")

# --- Monte Carlo simulation study ---
raw = run_quick_simulation()
summary = summarize_simulation(raw)
print(summary[["dgp", "estimator", "bias", "rmse", "coverage"]].to_string(index=False))

Real-World Use Case: Complete Workflow

Here is how CausalLens handles a complete real-world analysis from data loading through diagnostics, estimation, and result inspection:

import pandas as pd
import matplotlib.pyplot as plt
from causal_lens import (
    RegressionAdjustmentEstimator,
    PropensityMatcher,
    IPWEstimator,
    DoublyRobustEstimator,
    export_propensity_overlap,
    export_balance_summary,
)

# Load real observational data (e.g., NHEFS smoking cessation study)
data = pd.read_csv("data/nhefs_complete.csv")
print(f"Dataset: {len(data)} observations, {len(data.columns)} variables")

# Define analysis parameters
treatment_col = "treatment"      # Binary treatment assignment (1=quit smoking, 0=continue)
outcome_col = "weight_change"    # Outcome: change in weight (kg)
confounders = [
    "age", "sex", "race", "education",
    "baseline_weight", "baseline_smoking_intensity"
]

# Inspect the data
print(f"\nTreatment: {data[treatment_col].sum()} treated, {(1-data[treatment_col]).sum()} control")
print(f"Outcome mean (treated): {data[data[treatment_col]==1][outcome_col].mean():.2f} kg")
print(f"Outcome mean (control): {data[data[treatment_col]==0][outcome_col].mean():.2f} kg")
print(f"Raw difference: {data[data[treatment_col]==1][outcome_col].mean() - data[data[treatment_col]==0][outcome_col].mean():.2f} kg")

# --- Method 1: Regression Adjustment (fast, transparent) ---
print("\n" + "="*60)
print("1. REGRESSION ADJUSTMENT")
print("="*60)
reg = RegressionAdjustmentEstimator(treatment_col, outcome_col, confounders, bootstrap_repeats=100)
result_reg = reg.fit(data)
print(result_reg.summary())  # Human-readable output with diagnostics

# --- Method 2: Propensity Matching (robust to model misspecification) ---
print("\n" + "="*60)
print("2. PROPENSITY MATCHING (caliper=0.01)")
print("="*60)
matcher = PropensityMatcher(
    treatment_col, 
    outcome_col, 
    confounders, 
    caliper=0.01,
    bootstrap_repeats=100
)
result_match = matcher.fit(data)
print(result_match.summary())
print(f"\nMatched pairs: {result_match.treated_count} pairs")

# --- Method 3: IPW with propensity trimming (efficient but variance-sensitive) ---
print("\n" + "="*60)
print("3. INVERSE PROBABILITY WEIGHTING (propensity trim: 0.05-0.95)")
print("="*60)
ipw = IPWEstimator(
    treatment_col, 
    outcome_col, 
    confounders,
    propensity_trim_bounds=(0.05, 0.95),  # Exclude extreme propensity scores
    bootstrap_repeats=100
)
result_ipw = ipw.fit(data)
print(result_ipw.summary())
print(f"Effective sample size (treated): {result_ipw.diagnostics.ess_treated:.0f}")
print(f"Effective sample size (control): {result_ipw.diagnostics.ess_control:.0f}")

# --- Method 4: Doubly Robust (combines outcome + propensity strengths) ---
print("\n" + "="*60)
print("4. DOUBLY ROBUST (AIPW style)")
print("="*60)
dr = DoublyRobustEstimator(
    treatment_col, 
    outcome_col, 
    confounders,
    propensity_trim_bounds=(0.05, 0.95),
    bootstrap_repeats=100
)
result_dr = dr.fit(data)
print(result_dr.summary())

# --- DIAGNOSTICS & QUALITY CHECKS ---
print("\n" + "="*60)
print("DIAGNOSTIC SUMMARY ACROSS METHODS")
print("="*60)

results = [result_reg, result_match, result_ipw, result_dr]
summary_table = pd.DataFrame([
    {
        "Method": r.method,
        "Effect": f"{r.effect:.2f}",
        "95% CI": f"[{r.ci_low:.2f}, {r.ci_high:.2f}]" if r.ci_low else "N/A",
        "p-value": f"{r.p_value:.4f}" if r.p_value else "N/A",
        "Overlap": "✓" if r.diagnostics.overlap_ok else "✗",
        "Balance (before)": f"{sum(r.diagnostics.balance_before.values())/len(r.diagnostics.balance_before):.4f}",
        "Balance (after)": f"{sum(r.diagnostics.balance_after.values())/len(r.diagnostics.balance_after):.4f}",
    }
    for r in results
])
print(summary_table.to_string(index=False))

# --- SENSITIVITY ANALYSIS ---
print("\n" + "="*60)
print("SENSITIVITY ANALYSIS (primary doubly robust)")
print("="*60)
sensitivity = dr.sensitivity_analysis(data, steps=6)
print("\nBias scenarios:")
for scenario in sensitivity.scenarios:
    print(
        f"  Bias={scenario.bias:.2f}: adjusted effect={scenario.adjusted_effect:.2f}, "
        f"CI=[{scenario.adjusted_ci_low:.2f}, {scenario.adjusted_ci_high:.2f}]"
    )
print(f"\nE-value: {sensitivity.e_value:.2f} (minimum confounder strength to explain away effect)")

# --- SUBGROUP ANALYSIS ---
print("\n" + "="*60)
print("HETEROGENEOUS TREATMENT EFFECTS (by sex)")
print("="*60)
subgroups = dr.subgroup_analysis(data, subgroup_col="sex")
for sg in subgroups:
    print(
        f"  {sg.subgroup}: effect={sg.effect:.2f}, CI=[{sg.ci_low:.2f}, {sg.ci_high:.2f}], "
        f"n={sg.rows} ({sg.treated_count} treated)"
    )

# --- VISUALIZATION ---
print("\n" + "="*60)
print("GENERATING PUBLICATION-READY FIGURES")
print("="*60)

# 1. Propensity score overlap check
export_propensity_overlap(dr, data, output_path="nhefs_propensity_overlap.png")
print("✓ Propensity overlap histogram: nhefs_propensity_overlap.png")

# 2. Balance summary (before/after adjustment)
export_balance_summary(dr, data, output_path="nhefs_balance_summary.png")
print("✓ Balance summary plot: nhefs_balance_summary.png")

# 3. Estimator comparison with CIs
from causal_lens.reporting import export_estimator_comparison
estimates_dict = {
    "Regression": (result_reg.effect, result_reg.ci_low, result_reg.ci_high),
    "Matching": (result_match.effect, result_match.ci_low, result_match.ci_high),
    "IPW": (result_ipw.effect, result_ipw.ci_low, result_ipw.ci_high),
    "Doubly Robust": (result_dr.effect, result_dr.ci_low, result_dr.ci_high),
}
export_estimator_comparison(estimates_dict, output_path="nhefs_estimator_comparison.png")
print("✓ Estimator comparison plot: nhefs_estimator_comparison.png")

# --- CONCLUSION ---
print("\n" + "="*60)
print("ANALYSIS COMPLETE")
print("="*60)
print(f"Primary estimate (doubly robust): {result_dr.effect:.2f} kg")
print(f"95% CI: [{result_dr.ci_low:.2f}, {result_dr.ci_high:.2f}]")
print(f"p-value: {result_dr.p_value:.4e}")
print(f"\nEstimator agreement: effects range from {min([r.effect for r in results]):.2f} to {max([r.effect for r in results]):.2f} kg")
print(f"This suggests moderate specification robustness.")
print("\nAll diagnostics, sensitivity analyses, and figures are generated above.")
print("Exported plots are publication-ready (high-DPI, no label overlap).")

This example demonstrates:

  • Loading & inspecting real data: dataset size, sample composition, raw associations
  • Trying multiple methods: regression, matching, IPW, doubly robust—each with different parameter choices
  • Diagnostic outputs: overlap checks, balance improvement, effective sample sizes, p-values
  • Result inspection: human-readable summary() method showing all key metrics at once
  • Sensitivity analysis: bias scenarios and E-values quantifying confounding robustness
  • Subgroup analysis: heterogeneous treatment effects by covariate
  • Visualization: exportable publication-ready figures (propensity overlap, balance before/after, estimator comparison)
  • Interpretation: estimator agreement as a specification-robustness signal

Users can adapt this template to their own datasets by changing column names, confounders, and parameter choices (e.g., caliper, propensity trimming, bootstrap repeats).

Run the test suite:

pytest

Regenerate the default report and paper-oriented artifacts:

causal-lens

This writes the JSON report plus tracked charts and tables under outputs/charts/ and outputs/tables/.

Submission-Facing Assets

  • README.md provides installation, scope, and reviewer-facing reproduction commands.
  • CITATION.cff provides machine-readable citation metadata.
  • LICENSE provides the repository license.
  • docs/methodology.md, docs/reference-validation.md, and docs/limitations-and-assumptions.md provide manuscript-supporting narrative.
  • outputs/charts/ and outputs/tables/ contain the tracked benchmark artifacts used in the current evidence stack.

Documentation

See docs/architecture.md for the design notes. See docs/methodology.md for assumptions, reasoning, and estimator justification. See docs/public-benchmarks.md for the public dataset choices and benchmark rationale. See docs/benchmark-interpretation.md for a results-oriented reading of the current benchmark artifacts. See docs/reference-validation.md for executable validation logic tied to the future journal article. See docs/limitations-and-assumptions.md for a paper-ready limitations section.

Citation

Citation metadata is available in CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_lens-0.3.1.tar.gz (149.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causal_lens-0.3.1-py3-none-any.whl (140.9 kB view details)

Uploaded Python 3

File details

Details for the file causal_lens-0.3.1.tar.gz.

File metadata

  • Download URL: causal_lens-0.3.1.tar.gz
  • Upload date:
  • Size: 149.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for causal_lens-0.3.1.tar.gz
Algorithm Hash digest
SHA256 30f54a94b925f59e414c95087f36b09e9b3db001819b31df1c9117b1050ffd8d
MD5 4fea2d7ee0d01c0e8e005781c6971e1a
BLAKE2b-256 3ec405120467b5155135a79d45b4e4ae6e19f37d90eabfc37fb75cbeb76cf0a5

See more details on using hashes here.

File details

Details for the file causal_lens-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: causal_lens-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 140.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for causal_lens-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 254e16ebf4ff0526fb36d301353de0d1d8b3a95eeec5fd786406440483897dfc
MD5 f3a08dde200db1ffae8d8302ef2e6876
BLAKE2b-256 5e2eeea5faae1357197f1abe7568ec489d360bbd9d0dc8340dcd8159465ca808

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page