Causal inference toolkit for observational data and treatment-effect diagnostics.

Project description

CausalLens

Diagnostics-first causal inference for Python. Estimate treatment effects, inspect overlap and balance, and compare estimators—all with publication-ready diagnostics and plots.

Why CausalLens?

Most causal inference software focuses on fitting models quickly. CausalLens instead focuses on understanding whether your results are trustworthy.

✅ Diagnostics bundled with every result: Overlap, balance improvement, effective sample size, and sensitivity checks are returned alongside estimates—not as optional afterthoughts
✅ Specification transparency: Compare regression, matching, IPW, and doubly robust estimates side-by-side to reveal model sensitivity and build confidence through agreement (or diagnose problems when estimates diverge)
✅ Publication-ready in one command: Exports formatted benchmark tables, Love plots, propensity histograms, sensitivity curves, and subgroup summaries—ready for manuscript submission
✅ Falsification tests integrated: Placebo tests and Rosenbaum bounds are CLI-integrated, not separate scripts, so stress-testing your assumptions becomes the default workflow
✅ Python + pandas native: Designed for scikit-learn-compatible workflows and seamless pandas integration

Example:

# Load data, run estimation, inspect diagnostics—all in one object
from causal_lens import DoublyRobustEstimator
dr = DoublyRobustEstimator("treatment", "outcome", confounders)
result = dr.fit(data)
print(result.summary())  # Shows effect, CI, p-value, AND overlap/balance diagnostics

Compare this to typical workflows where you fit a model, then manually write separate scripts to check overlap, balance, and sensitivity. CausalLens makes diagnostics inseparable from estimation.

Snapshot

Lane: Observational data causal inference and treatment-effect diagnostics
Domain: Tabular observational data with binary treatments
Stack: Python (pandas, scikit-learn, statsmodels, matplotlib)
Estimators: regression adjustment, propensity matching, IPW, doubly robust, 2SLS, difference-in-differences, synthetic control, plus heterogeneous effects and sensitivity analysis
Publication-oriented: Exports benchmark tables, charts, and sensitivity summaries optimized for peer review

Overview

CausalLens packages core causal-inference workflows for observational tabular data into a small, testable Python library. The initial release is designed around practical treatment-effect estimation rather than theory-heavy experimentation: estimate treatment effects, inspect overlap and balance, and compare estimators with consistent result objects.

The current repository now uses four complementary evidence tracks:

a fixed public-safe observational intervention sample under data/ for reproducible article figures and tests
public benchmark datasets drawn from the causal inference literature for externally recognizable evaluation
synthetic known-effect data for correctness-oriented validation of estimator behavior
a formal Monte Carlo simulation study evaluating estimator bias, RMSE, coverage, and SE calibration across five data-generating processes

What It Demonstrates

Propensity score estimation with a scikit-learn logistic model with standardized covariates
Regression-adjustment treatment effects with statsmodels OLS
Nearest-neighbor propensity matching with optional calipers and Abadie-Imbens analytic standard errors
Inverse probability weighting with stabilized weights and weight capping for ATE and ATT targets
Doubly robust estimation that combines outcome and propensity models with weight trimming
Cross-fitted doubly robust estimation (DML/AIPW style) with 5-fold out-of-fold nuisance estimates to avoid overfitting bias
Flexible doubly robust estimation using gradient boosting outcome models for nonlinear confounding
T-learner and S-learner meta-learners for conditional average treatment effect (CATE) estimation with optional GBM
Analytic standard errors from OLS (regression), Hajek sandwich variance (IPW), Abadie-Imbens matched-pair variance (matching), and semiparametric influence functions (doubly robust)
Covariate-balance summaries using standardized mean differences and variance ratios
Kish effective sample size for weighted estimators to detect unstable weights
Common-support and overlap diagnostics for positivity review
Additive-bias sensitivity summaries for explain-away analysis on the outcome scale
E-values for unmeasured confounding (VanderWeele & Ding 2017) quantifying the minimum confounder strength to explain away the effect
Rosenbaum sensitivity bounds for matched-pair designs quantifying hidden-bias tolerance
Placebo/falsification tests on pre-treatment outcomes for specification validation
Subgroup treatment-effect summaries for quick heterogeneous-effect review
A small command-line demo that exports a reproducible causal report
A real-style observational intervention fixture for stable estimator-comparison tests
Publication-oriented methodology notes explaining why the initial estimator set is justified
Reference parity tests against manual formulas and direct statistical-model fits
Paper-ready chart and table exports for estimator comparison, balance, sensitivity, and subgroup effects
Love plots showing covariate-level balance before and after adjustment with standard |SMD| thresholds
Propensity-score overlap histograms for visual positivity assessment
Manuscript drafting docs, figure captions, and cross-dataset benchmark tables for the software-paper path
Packaged public benchmarks based on Lalonde and NHEFS so installed users can reproduce the evidence stack without a source checkout
Literature comparison table showing CausalLens results match published reference values from Dehejia & Wahba (1999) and Hernán & Robins (2020)
Repeated-run stability analysis across seeds, bootstrap counts, and caliper settings
External comparison script verifying CausalLens matches manual sklearn/statsmodels implementations to machine precision
Difference-in-differences estimator with regression-based ATT, cluster-robust standard errors, and a parallel-trends pre-test
Synthetic control method with constrained least-squares donor weights and placebo inference via leave-one-out permutation
Two-stage least squares (2SLS) instrumental variables estimator with proper IV variance, first-stage F-statistic, and weak-instrument detection
Monte Carlo simulation framework with five DGPs (linear, nonlinear outcome, nonlinear propensity, double nonlinear, strong confounding) evaluating bias, RMSE, coverage, and SE calibration ratio
IPW standard errors corrected for propensity-score estimation uncertainty via the Lunceford & Davidian (2004) stacked estimating equations adjustment

Current Output

The default command writes outputs/causal_report.json with:

a fixed real-style observational dataset section with estimator comparisons
a Lalonde benchmark section with public observational training-program data, using light propensity-overlap trimming for the weighting estimators
an NHEFS benchmark section with public smoking-cessation observational data
a synthetic validation dataset section with known-effect comparisons
overlap summary and propensity score range checks
covariate balance before/after weighting, variance ratios, and effective sample sizes
lightweight bootstrap intervals for the selected estimate
analytic standard errors and p-values from influence functions (DR, IPW) and OLS (regression)
additive-bias sensitivity summaries with E-values for the primary doubly robust estimate
subgroup treatment-effect estimates
placebo/falsification test results on pre-treatment outcomes
Rosenbaum sensitivity bounds for matched-pair designs
external comparison and stability-analysis summaries for the exported benchmark artifacts

It also writes paper-oriented artifacts under outputs/charts/ and outputs/tables/ including:

estimator comparison charts with confidence intervals
balance before/after summary charts
sensitivity curves
subgroup effect charts
estimator summary tables in CSV and Markdown
external_comparison.csv showing parity against manual sklearn/statsmodels implementations
stability_raw.csv and stability_summary.csv capturing repeated-run variability across benchmark settings
placebo_test.csv showing falsification test results on pre-treatment outcomes
rosenbaum_bounds.csv showing matched-pair sensitivity to hidden bias at each Gamma level
Love plots and propensity-score overlap histograms for each benchmark dataset

Next Upgrade Path

add article figures, benchmark tables, and formal estimator-comparison writeups for DiD, synthetic control, and IV
add regression discontinuity design (RDD) and bunching estimators
expand simulation study to additional sample sizes and publish summary tables

All cross-sectional estimators, panel-data methods, IV, and simulation infrastructure are now in place.

Installation

pip install .

Or in development mode:

pip install -e .

For review or replication work:

pip install -e .[dev]

Quick Start

from causal_lens import (
    generate_synthetic_observational_data,
    RegressionAdjustmentEstimator,
    DoublyRobustEstimator,
    CrossFittedDREstimator,
    DifferenceInDifferences,
    TwoStageLeastSquares,
    run_quick_simulation,
    summarize_simulation,
)

# --- Cross-sectional estimators ---
data = generate_synthetic_observational_data(rows=600, seed=42)
confounders = ["age", "severity", "baseline_score"]

reg = RegressionAdjustmentEstimator("treatment", "outcome", confounders)
result_reg = reg.fit(data)

dr = CrossFittedDREstimator("treatment", "outcome", confounders)
result_dr = dr.fit(data)

for r in [result_reg, result_dr]:
    print(f"{r.method:35s}  effect={r.effect:.2f}  SE={r.se:.3f}  p={r.p_value:.4f}")

# --- Panel data: Difference-in-Differences ---
import pandas as pd
panel = pd.DataFrame({"unit": [1,1,2,2], "period": [0,1,0,1],
                      "treat": [1,1,0,0], "y": [3.0,7.0,2.0,4.0]})
did = DifferenceInDifferences("unit", "period", "treat", "y")
result_did = did.fit(panel)
print(f"DiD ATT={result_did.att:.2f}  SE={result_did.se:.3f}")

# --- Monte Carlo simulation study ---
raw = run_quick_simulation()
summary = summarize_simulation(raw)
print(summary[["dgp", "estimator", "bias", "rmse", "coverage"]].to_string(index=False))

Real-World Use Case: Complete Workflow

Here is how CausalLens handles a complete real-world analysis from data loading through diagnostics, estimation, and result inspection:

import pandas as pd
import matplotlib.pyplot as plt
from causal_lens import (
    RegressionAdjustmentEstimator,
    PropensityMatcher,
    IPWEstimator,
    DoublyRobustEstimator,
    export_propensity_overlap,
    export_balance_summary,
)

# Load real observational data (e.g., NHEFS smoking cessation study)
data = pd.read_csv("data/nhefs_complete.csv")
print(f"Dataset: {len(data)} observations, {len(data.columns)} variables")

# Define analysis parameters
treatment_col = "treatment"      # Binary treatment assignment (1=quit smoking, 0=continue)
outcome_col = "weight_change"    # Outcome: change in weight (kg)
confounders = [
    "age", "sex", "race", "education",
    "baseline_weight", "baseline_smoking_intensity"
]

# Inspect the data
print(f"\nTreatment: {data[treatment_col].sum()} treated, {(1-data[treatment_col]).sum()} control")
print(f"Outcome mean (treated): {data[data[treatment_col]==1][outcome_col].mean():.2f} kg")
print(f"Outcome mean (control): {data[data[treatment_col]==0][outcome_col].mean():.2f} kg")
print(f"Raw difference: {data[data[treatment_col]==1][outcome_col].mean() - data[data[treatment_col]==0][outcome_col].mean():.2f} kg")

# --- Method 1: Regression Adjustment (fast, transparent) ---
print("\n" + "="*60)
print("1. REGRESSION ADJUSTMENT")
print("="*60)
reg = RegressionAdjustmentEstimator(treatment_col, outcome_col, confounders, bootstrap_repeats=100)
result_reg = reg.fit(data)
print(result_reg.summary())  # Human-readable output with diagnostics

# --- Method 2: Propensity Matching (robust to model misspecification) ---
print("\n" + "="*60)
print("2. PROPENSITY MATCHING (caliper=0.01)")
print("="*60)
matcher = PropensityMatcher(
    treatment_col, 
    outcome_col, 
    confounders, 
    caliper=0.01,
    bootstrap_repeats=100
)
result_match = matcher.fit(data)
print(result_match.summary())
print(f"\nMatched pairs: {result_match.treated_count} pairs")

# --- Method 3: IPW with propensity trimming (efficient but variance-sensitive) ---
print("\n" + "="*60)
print("3. INVERSE PROBABILITY WEIGHTING (propensity trim: 0.05-0.95)")
print("="*60)
ipw = IPWEstimator(
    treatment_col, 
    outcome_col, 
    confounders,
    propensity_trim_bounds=(0.05, 0.95),  # Exclude extreme propensity scores
    bootstrap_repeats=100
)
result_ipw = ipw.fit(data)
print(result_ipw.summary())
print(f"Effective sample size (treated): {result_ipw.diagnostics.ess_treated:.0f}")
print(f"Effective sample size (control): {result_ipw.diagnostics.ess_control:.0f}")

# --- Method 4: Doubly Robust (combines outcome + propensity strengths) ---
print("\n" + "="*60)
print("4. DOUBLY ROBUST (AIPW style)")
print("="*60)
dr = DoublyRobustEstimator(
    treatment_col, 
    outcome_col, 
    confounders,
    propensity_trim_bounds=(0.05, 0.95),
    bootstrap_repeats=100
)
result_dr = dr.fit(data)
print(result_dr.summary())

# --- DIAGNOSTICS & QUALITY CHECKS ---
print("\n" + "="*60)
print("DIAGNOSTIC SUMMARY ACROSS METHODS")
print("="*60)

results = [result_reg, result_match, result_ipw, result_dr]
summary_table = pd.DataFrame([
    {
        "Method": r.method,
        "Effect": f"{r.effect:.2f}",
        "95% CI": f"[{r.ci_low:.2f}, {r.ci_high:.2f}]" if r.ci_low else "N/A",
        "p-value": f"{r.p_value:.4f}" if r.p_value else "N/A",
        "Overlap": "✓" if r.diagnostics.overlap_ok else "✗",
        "Balance (before)": f"{sum(r.diagnostics.balance_before.values())/len(r.diagnostics.balance_before):.4f}",
        "Balance (after)": f"{sum(r.diagnostics.balance_after.values())/len(r.diagnostics.balance_after):.4f}",
    }
    for r in results
])
print(summary_table.to_string(index=False))

# --- SENSITIVITY ANALYSIS ---
print("\n" + "="*60)
print("SENSITIVITY ANALYSIS (primary doubly robust)")
print("="*60)
sensitivity = dr.sensitivity_analysis(data, steps=6)
print("\nBias scenarios:")
for scenario in sensitivity.scenarios:
    print(
        f"  Bias={scenario.bias:.2f}: adjusted effect={scenario.adjusted_effect:.2f}, "
        f"CI=[{scenario.adjusted_ci_low:.2f}, {scenario.adjusted_ci_high:.2f}]"
    )
print(f"\nE-value: {sensitivity.e_value:.2f} (minimum confounder strength to explain away effect)")

# --- SUBGROUP ANALYSIS ---
print("\n" + "="*60)
print("HETEROGENEOUS TREATMENT EFFECTS (by sex)")
print("="*60)
subgroups = dr.subgroup_analysis(data, subgroup_col="sex")
for sg in subgroups:
    print(
        f"  {sg.subgroup}: effect={sg.effect:.2f}, CI=[{sg.ci_low:.2f}, {sg.ci_high:.2f}], "
        f"n={sg.rows} ({sg.treated_count} treated)"
    )

# --- VISUALIZATION ---
print("\n" + "="*60)
print("GENERATING PUBLICATION-READY FIGURES")
print("="*60)

# 1. Propensity score overlap check
export_propensity_overlap(dr, data, output_path="nhefs_propensity_overlap.png")
print("✓ Propensity overlap histogram: nhefs_propensity_overlap.png")

# 2. Balance summary (before/after adjustment)
export_balance_summary(dr, data, output_path="nhefs_balance_summary.png")
print("✓ Balance summary plot: nhefs_balance_summary.png")

# 3. Estimator comparison with CIs
from causal_lens.reporting import export_estimator_comparison
estimates_dict = {
    "Regression": (result_reg.effect, result_reg.ci_low, result_reg.ci_high),
    "Matching": (result_match.effect, result_match.ci_low, result_match.ci_high),
    "IPW": (result_ipw.effect, result_ipw.ci_low, result_ipw.ci_high),
    "Doubly Robust": (result_dr.effect, result_dr.ci_low, result_dr.ci_high),
}
export_estimator_comparison(estimates_dict, output_path="nhefs_estimator_comparison.png")
print("✓ Estimator comparison plot: nhefs_estimator_comparison.png")

# --- CONCLUSION ---
print("\n" + "="*60)
print("ANALYSIS COMPLETE")
print("="*60)
print(f"Primary estimate (doubly robust): {result_dr.effect:.2f} kg")
print(f"95% CI: [{result_dr.ci_low:.2f}, {result_dr.ci_high:.2f}]")
print(f"p-value: {result_dr.p_value:.4e}")
print(f"\nEstimator agreement: effects range from {min([r.effect for r in results]):.2f} to {max([r.effect for r in results]):.2f} kg")
print(f"This suggests moderate specification robustness.")
print("\nAll diagnostics, sensitivity analyses, and figures are generated above.")
print("Exported plots are publication-ready (high-DPI, no label overlap).")

This example demonstrates:

Loading & inspecting real data: dataset size, sample composition, raw associations
Trying multiple methods: regression, matching, IPW, doubly robust—each with different parameter choices
Diagnostic outputs: overlap checks, balance improvement, effective sample sizes, p-values
Result inspection: human-readable summary() method showing all key metrics at once
Sensitivity analysis: bias scenarios and E-values quantifying confounding robustness
Subgroup analysis: heterogeneous treatment effects by covariate
Visualization: exportable publication-ready figures (propensity overlap, balance before/after, estimator comparison)
Interpretation: estimator agreement as a specification-robustness signal

Users can adapt this template to their own datasets by changing column names, confounders, and parameter choices (e.g., caliper, propensity trimming, bootstrap repeats).

Run the test suite:

pytest

Regenerate the default report and paper-oriented artifacts:

causal-lens

This writes the JSON report plus tracked charts and tables under outputs/charts/ and outputs/tables/.

Submission-Facing Assets

README.md provides installation, scope, and reviewer-facing reproduction commands.
CITATION.cff provides machine-readable citation metadata.
LICENSE provides the repository license.
docs/methodology.md, docs/reference-validation.md, and docs/limitations-and-assumptions.md provide manuscript-supporting narrative.
outputs/charts/ and outputs/tables/ contain the tracked benchmark artifacts used in the current evidence stack.

Documentation

See docs/architecture.md for the design notes. See docs/methodology.md for assumptions, reasoning, and estimator justification. See docs/public-benchmarks.md for the public dataset choices and benchmark rationale. See docs/benchmark-interpretation.md for a results-oriented reading of the current benchmark artifacts. See docs/reference-validation.md for executable validation logic tied to the future journal article. See docs/limitations-and-assumptions.md for a paper-ready limitations section.

Citation

Citation metadata is available in CITATION.cff.

Project details

Release history Release notifications | RSS feed

This version

0.3.1

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_lens-0.3.1.tar.gz (149.0 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

causal_lens-0.3.1-py3-none-any.whl (140.9 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file causal_lens-0.3.1.tar.gz.

File metadata

Download URL: causal_lens-0.3.1.tar.gz
Upload date: Apr 21, 2026
Size: 149.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for causal_lens-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`30f54a94b925f59e414c95087f36b09e9b3db001819b31df1c9117b1050ffd8d`
MD5	`4fea2d7ee0d01c0e8e005781c6971e1a`
BLAKE2b-256	`3ec405120467b5155135a79d45b4e4ae6e19f37d90eabfc37fb75cbeb76cf0a5`

See more details on using hashes here.

File details

Details for the file causal_lens-0.3.1-py3-none-any.whl.

File metadata

Download URL: causal_lens-0.3.1-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 140.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for causal_lens-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`254e16ebf4ff0526fb36d301353de0d1d8b3a95eeec5fd786406440483897dfc`
MD5	`f3a08dde200db1ffae8d8302ef2e6876`
BLAKE2b-256	`5e2eeea5faae1357197f1abe7568ec489d360bbd9d0dc8340dcd8159465ca808`

See more details on using hashes here.

causal-lens 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CausalLens

Why CausalLens?

Snapshot

Overview

What It Demonstrates

Current Output

Next Upgrade Path

Installation

Quick Start

Real-World Use Case: Complete Workflow

Submission-Facing Assets

Documentation

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes