Causal inference toolkit for observational data and treatment-effect diagnostics.
Project description
CausalLens
Diagnostics-first causal inference for Python. Estimate treatment effects, inspect overlap and balance, and compare estimators—all with publication-ready diagnostics and plots.
Why CausalLens?
Most causal inference software focuses on fitting models quickly. CausalLens instead focuses on understanding whether your results are trustworthy.
- ✅ Diagnostics bundled with every result: Overlap, balance improvement, effective sample size, and sensitivity checks are returned alongside estimates—not as optional afterthoughts
- ✅ Specification transparency: Compare regression, matching, IPW, and doubly robust estimates side-by-side to reveal model sensitivity and build confidence through agreement (or diagnose problems when estimates diverge)
- ✅ Publication-ready in one command: Exports formatted benchmark tables, Love plots, propensity histograms, sensitivity curves, and subgroup summaries—ready for manuscript submission
- ✅ Falsification tests integrated: Placebo tests and Rosenbaum bounds are CLI-integrated, not separate scripts, so stress-testing your assumptions becomes the default workflow
- ✅ Python + pandas native: Designed for scikit-learn-compatible workflows and seamless pandas integration
Example:
# Load data, run estimation, inspect diagnostics—all in one object
from causal_lens import DoublyRobustEstimator
dr = DoublyRobustEstimator("treatment", "outcome", confounders)
result = dr.fit(data)
print(result.summary()) # Shows effect, CI, p-value, AND overlap/balance diagnostics
Compare this to typical workflows where you fit a model, then manually write separate scripts to check overlap, balance, and sensitivity. CausalLens makes diagnostics inseparable from estimation.
Snapshot
- Lane: Observational data causal inference and treatment-effect diagnostics
- Domain: Tabular observational data with binary treatments
- Stack: Python (pandas, scikit-learn, statsmodels, matplotlib)
- Estimators: regression adjustment, propensity matching, IPW, doubly robust, 2SLS, difference-in-differences, synthetic control, plus heterogeneous effects and sensitivity analysis
- Publication-oriented: Exports benchmark tables, charts, and sensitivity summaries optimized for peer review
Overview
CausalLens packages core causal-inference workflows for observational tabular data into a small, testable Python library. The initial release is designed around practical treatment-effect estimation rather than theory-heavy experimentation: estimate treatment effects, inspect overlap and balance, and compare estimators with consistent result objects.
The current repository now uses four complementary evidence tracks:
- a fixed public-safe observational intervention sample under
data/for reproducible article figures and tests - public benchmark datasets drawn from the causal inference literature for externally recognizable evaluation
- synthetic known-effect data for correctness-oriented validation of estimator behavior
- a formal Monte Carlo simulation study evaluating estimator bias, RMSE, coverage, and SE calibration across five data-generating processes
What It Demonstrates
- Propensity score estimation with a scikit-learn logistic model with standardized covariates
- Regression-adjustment treatment effects with statsmodels OLS
- Nearest-neighbor propensity matching with optional calipers and Abadie-Imbens analytic standard errors
- Inverse probability weighting with stabilized weights and weight capping for ATE and ATT targets
- Doubly robust estimation that combines outcome and propensity models with weight trimming
- Cross-fitted doubly robust estimation (DML/AIPW style) with 5-fold out-of-fold nuisance estimates to avoid overfitting bias
- Flexible doubly robust estimation using gradient boosting outcome models for nonlinear confounding
- T-learner and S-learner meta-learners for conditional average treatment effect (CATE) estimation with optional GBM
- Analytic standard errors from OLS (regression), Hajek sandwich variance (IPW), Abadie-Imbens matched-pair variance (matching), and semiparametric influence functions (doubly robust)
- Covariate-balance summaries using standardized mean differences and variance ratios
- Kish effective sample size for weighted estimators to detect unstable weights
- Common-support and overlap diagnostics for positivity review
- Additive-bias sensitivity summaries for explain-away analysis on the outcome scale
- E-values for unmeasured confounding (VanderWeele & Ding 2017) quantifying the minimum confounder strength to explain away the effect
- Rosenbaum sensitivity bounds for matched-pair designs quantifying hidden-bias tolerance
- Placebo/falsification tests on pre-treatment outcomes for specification validation
- Subgroup treatment-effect summaries for quick heterogeneous-effect review
- A small command-line demo that exports a reproducible causal report
- A real-style observational intervention fixture for stable estimator-comparison tests
- Publication-oriented methodology notes explaining why the initial estimator set is justified
- Reference parity tests against manual formulas and direct statistical-model fits
- Paper-ready chart and table exports for estimator comparison, balance, sensitivity, and subgroup effects
- Love plots showing covariate-level balance before and after adjustment with standard |SMD| thresholds
- Propensity-score overlap histograms for visual positivity assessment
- Manuscript drafting docs, figure captions, and cross-dataset benchmark tables for the software-paper path
- Packaged public benchmarks based on Lalonde and NHEFS so installed users can reproduce the evidence stack without a source checkout
- Literature comparison table showing CausalLens results match published reference values from Dehejia & Wahba (1999) and Hernán & Robins (2020)
- Repeated-run stability analysis across seeds, bootstrap counts, and caliper settings
- External comparison script verifying CausalLens matches manual sklearn/statsmodels implementations to machine precision
- Difference-in-differences estimator with regression-based ATT, cluster-robust standard errors, and a parallel-trends pre-test
- Synthetic control method with constrained least-squares donor weights and placebo inference via leave-one-out permutation
- Two-stage least squares (2SLS) instrumental variables estimator with proper IV variance, first-stage F-statistic, and weak-instrument detection
- Monte Carlo simulation framework with five DGPs (linear, nonlinear outcome, nonlinear propensity, double nonlinear, strong confounding) evaluating bias, RMSE, coverage, and SE calibration ratio
- IPW standard errors corrected for propensity-score estimation uncertainty via the Lunceford & Davidian (2004) stacked estimating equations adjustment
Current Output
The default command writes outputs/causal_report.json with:
- a fixed real-style observational dataset section with estimator comparisons
- a Lalonde benchmark section with public observational training-program data, using light propensity-overlap trimming for the weighting estimators
- an NHEFS benchmark section with public smoking-cessation observational data
- a synthetic validation dataset section with known-effect comparisons
- overlap summary and propensity score range checks
- covariate balance before/after weighting, variance ratios, and effective sample sizes
- lightweight bootstrap intervals for the selected estimate
- analytic standard errors and p-values from influence functions (DR, IPW) and OLS (regression)
- additive-bias sensitivity summaries with E-values for the primary doubly robust estimate
- subgroup treatment-effect estimates
- placebo/falsification test results on pre-treatment outcomes
- Rosenbaum sensitivity bounds for matched-pair designs
- external comparison and stability-analysis summaries for the exported benchmark artifacts
It also writes paper-oriented artifacts under outputs/charts/ and outputs/tables/ including:
- estimator comparison charts with confidence intervals
- balance before/after summary charts
- sensitivity curves
- subgroup effect charts
- estimator summary tables in CSV and Markdown
external_comparison.csvshowing parity against manual sklearn/statsmodels implementationsstability_raw.csvandstability_summary.csvcapturing repeated-run variability across benchmark settingsplacebo_test.csvshowing falsification test results on pre-treatment outcomesrosenbaum_bounds.csvshowing matched-pair sensitivity to hidden bias at each Gamma level- Love plots and propensity-score overlap histograms for each benchmark dataset
Next Upgrade Path
- add article figures, benchmark tables, and formal estimator-comparison writeups for DiD, synthetic control, and IV
- add regression discontinuity design (RDD) and bunching estimators
- expand simulation study to additional sample sizes and publish summary tables
All cross-sectional estimators, panel-data methods, IV, and simulation infrastructure are now in place.
Installation
pip install .
Or in development mode:
pip install -e .
For review or replication work:
pip install -e .[dev]
Quick Start
from causal_lens import (
generate_synthetic_observational_data,
RegressionAdjustmentEstimator,
DoublyRobustEstimator,
CrossFittedDREstimator,
DifferenceInDifferences,
TwoStageLeastSquares,
run_quick_simulation,
summarize_simulation,
)
# --- Cross-sectional estimators ---
data = generate_synthetic_observational_data(rows=600, seed=42)
confounders = ["age", "severity", "baseline_score"]
reg = RegressionAdjustmentEstimator("treatment", "outcome", confounders)
result_reg = reg.fit(data)
dr = CrossFittedDREstimator("treatment", "outcome", confounders)
result_dr = dr.fit(data)
for r in [result_reg, result_dr]:
print(f"{r.method:35s} effect={r.effect:.2f} SE={r.se:.3f} p={r.p_value:.4f}")
# --- Panel data: Difference-in-Differences ---
import pandas as pd
panel = pd.DataFrame({"unit": [1,1,2,2], "period": [0,1,0,1],
"treat": [1,1,0,0], "y": [3.0,7.0,2.0,4.0]})
did = DifferenceInDifferences("unit", "period", "treat", "y")
result_did = did.fit(panel)
print(f"DiD ATT={result_did.att:.2f} SE={result_did.se:.3f}")
# --- Monte Carlo simulation study ---
raw = run_quick_simulation()
summary = summarize_simulation(raw)
print(summary[["dgp", "estimator", "bias", "rmse", "coverage"]].to_string(index=False))
Real-World Use Case: Complete Workflow
Here is how CausalLens handles a complete real-world analysis from data loading through diagnostics, estimation, and result inspection:
import pandas as pd
import matplotlib.pyplot as plt
from causal_lens import (
RegressionAdjustmentEstimator,
PropensityMatcher,
IPWEstimator,
DoublyRobustEstimator,
export_propensity_overlap,
export_balance_summary,
)
# Load real observational data (e.g., NHEFS smoking cessation study)
data = pd.read_csv("data/nhefs_complete.csv")
print(f"Dataset: {len(data)} observations, {len(data.columns)} variables")
# Define analysis parameters
treatment_col = "treatment" # Binary treatment assignment (1=quit smoking, 0=continue)
outcome_col = "weight_change" # Outcome: change in weight (kg)
confounders = [
"age", "sex", "race", "education",
"baseline_weight", "baseline_smoking_intensity"
]
# Inspect the data
print(f"\nTreatment: {data[treatment_col].sum()} treated, {(1-data[treatment_col]).sum()} control")
print(f"Outcome mean (treated): {data[data[treatment_col]==1][outcome_col].mean():.2f} kg")
print(f"Outcome mean (control): {data[data[treatment_col]==0][outcome_col].mean():.2f} kg")
print(f"Raw difference: {data[data[treatment_col]==1][outcome_col].mean() - data[data[treatment_col]==0][outcome_col].mean():.2f} kg")
# --- Method 1: Regression Adjustment (fast, transparent) ---
print("\n" + "="*60)
print("1. REGRESSION ADJUSTMENT")
print("="*60)
reg = RegressionAdjustmentEstimator(treatment_col, outcome_col, confounders, bootstrap_repeats=100)
result_reg = reg.fit(data)
print(result_reg.summary()) # Human-readable output with diagnostics
# --- Method 2: Propensity Matching (robust to model misspecification) ---
print("\n" + "="*60)
print("2. PROPENSITY MATCHING (caliper=0.01)")
print("="*60)
matcher = PropensityMatcher(
treatment_col,
outcome_col,
confounders,
caliper=0.01,
bootstrap_repeats=100
)
result_match = matcher.fit(data)
print(result_match.summary())
print(f"\nMatched pairs: {result_match.treated_count} pairs")
# --- Method 3: IPW with propensity trimming (efficient but variance-sensitive) ---
print("\n" + "="*60)
print("3. INVERSE PROBABILITY WEIGHTING (propensity trim: 0.05-0.95)")
print("="*60)
ipw = IPWEstimator(
treatment_col,
outcome_col,
confounders,
propensity_trim_bounds=(0.05, 0.95), # Exclude extreme propensity scores
bootstrap_repeats=100
)
result_ipw = ipw.fit(data)
print(result_ipw.summary())
print(f"Effective sample size (treated): {result_ipw.diagnostics.ess_treated:.0f}")
print(f"Effective sample size (control): {result_ipw.diagnostics.ess_control:.0f}")
# --- Method 4: Doubly Robust (combines outcome + propensity strengths) ---
print("\n" + "="*60)
print("4. DOUBLY ROBUST (AIPW style)")
print("="*60)
dr = DoublyRobustEstimator(
treatment_col,
outcome_col,
confounders,
propensity_trim_bounds=(0.05, 0.95),
bootstrap_repeats=100
)
result_dr = dr.fit(data)
print(result_dr.summary())
# --- DIAGNOSTICS & QUALITY CHECKS ---
print("\n" + "="*60)
print("DIAGNOSTIC SUMMARY ACROSS METHODS")
print("="*60)
results = [result_reg, result_match, result_ipw, result_dr]
summary_table = pd.DataFrame([
{
"Method": r.method,
"Effect": f"{r.effect:.2f}",
"95% CI": f"[{r.ci_low:.2f}, {r.ci_high:.2f}]" if r.ci_low else "N/A",
"p-value": f"{r.p_value:.4f}" if r.p_value else "N/A",
"Overlap": "✓" if r.diagnostics.overlap_ok else "✗",
"Balance (before)": f"{sum(r.diagnostics.balance_before.values())/len(r.diagnostics.balance_before):.4f}",
"Balance (after)": f"{sum(r.diagnostics.balance_after.values())/len(r.diagnostics.balance_after):.4f}",
}
for r in results
])
print(summary_table.to_string(index=False))
# --- SENSITIVITY ANALYSIS ---
print("\n" + "="*60)
print("SENSITIVITY ANALYSIS (primary doubly robust)")
print("="*60)
sensitivity = dr.sensitivity_analysis(data, steps=6)
print("\nBias scenarios:")
for scenario in sensitivity.scenarios:
print(
f" Bias={scenario.bias:.2f}: adjusted effect={scenario.adjusted_effect:.2f}, "
f"CI=[{scenario.adjusted_ci_low:.2f}, {scenario.adjusted_ci_high:.2f}]"
)
print(f"\nE-value: {sensitivity.e_value:.2f} (minimum confounder strength to explain away effect)")
# --- SUBGROUP ANALYSIS ---
print("\n" + "="*60)
print("HETEROGENEOUS TREATMENT EFFECTS (by sex)")
print("="*60)
subgroups = dr.subgroup_analysis(data, subgroup_col="sex")
for sg in subgroups:
print(
f" {sg.subgroup}: effect={sg.effect:.2f}, CI=[{sg.ci_low:.2f}, {sg.ci_high:.2f}], "
f"n={sg.rows} ({sg.treated_count} treated)"
)
# --- VISUALIZATION ---
print("\n" + "="*60)
print("GENERATING PUBLICATION-READY FIGURES")
print("="*60)
# 1. Propensity score overlap check
export_propensity_overlap(dr, data, output_path="nhefs_propensity_overlap.png")
print("✓ Propensity overlap histogram: nhefs_propensity_overlap.png")
# 2. Balance summary (before/after adjustment)
export_balance_summary(dr, data, output_path="nhefs_balance_summary.png")
print("✓ Balance summary plot: nhefs_balance_summary.png")
# 3. Estimator comparison with CIs
from causal_lens.reporting import export_estimator_comparison
estimates_dict = {
"Regression": (result_reg.effect, result_reg.ci_low, result_reg.ci_high),
"Matching": (result_match.effect, result_match.ci_low, result_match.ci_high),
"IPW": (result_ipw.effect, result_ipw.ci_low, result_ipw.ci_high),
"Doubly Robust": (result_dr.effect, result_dr.ci_low, result_dr.ci_high),
}
export_estimator_comparison(estimates_dict, output_path="nhefs_estimator_comparison.png")
print("✓ Estimator comparison plot: nhefs_estimator_comparison.png")
# --- CONCLUSION ---
print("\n" + "="*60)
print("ANALYSIS COMPLETE")
print("="*60)
print(f"Primary estimate (doubly robust): {result_dr.effect:.2f} kg")
print(f"95% CI: [{result_dr.ci_low:.2f}, {result_dr.ci_high:.2f}]")
print(f"p-value: {result_dr.p_value:.4e}")
print(f"\nEstimator agreement: effects range from {min([r.effect for r in results]):.2f} to {max([r.effect for r in results]):.2f} kg")
print(f"This suggests moderate specification robustness.")
print("\nAll diagnostics, sensitivity analyses, and figures are generated above.")
print("Exported plots are publication-ready (high-DPI, no label overlap).")
This example demonstrates:
- Loading & inspecting real data: dataset size, sample composition, raw associations
- Trying multiple methods: regression, matching, IPW, doubly robust—each with different parameter choices
- Diagnostic outputs: overlap checks, balance improvement, effective sample sizes, p-values
- Result inspection: human-readable summary() method showing all key metrics at once
- Sensitivity analysis: bias scenarios and E-values quantifying confounding robustness
- Subgroup analysis: heterogeneous treatment effects by covariate
- Visualization: exportable publication-ready figures (propensity overlap, balance before/after, estimator comparison)
- Interpretation: estimator agreement as a specification-robustness signal
Users can adapt this template to their own datasets by changing column names, confounders, and parameter choices (e.g., caliper, propensity trimming, bootstrap repeats).
Run the test suite:
pytest
Regenerate the default report and paper-oriented artifacts:
causal-lens
This writes the JSON report plus tracked charts and tables under outputs/charts/ and outputs/tables/.
Submission-Facing Assets
README.mdprovides installation, scope, and reviewer-facing reproduction commands.CITATION.cffprovides machine-readable citation metadata.LICENSEprovides the repository license.docs/methodology.md,docs/reference-validation.md, anddocs/limitations-and-assumptions.mdprovide manuscript-supporting narrative.outputs/charts/andoutputs/tables/contain the tracked benchmark artifacts used in the current evidence stack.
Documentation
See docs/architecture.md for the design notes. See docs/methodology.md for assumptions, reasoning, and estimator justification. See docs/public-benchmarks.md for the public dataset choices and benchmark rationale. See docs/benchmark-interpretation.md for a results-oriented reading of the current benchmark artifacts. See docs/reference-validation.md for executable validation logic tied to the future journal article. See docs/limitations-and-assumptions.md for a paper-ready limitations section.
Citation
Citation metadata is available in CITATION.cff.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causal_lens-0.3.1.tar.gz.
File metadata
- Download URL: causal_lens-0.3.1.tar.gz
- Upload date:
- Size: 149.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30f54a94b925f59e414c95087f36b09e9b3db001819b31df1c9117b1050ffd8d
|
|
| MD5 |
4fea2d7ee0d01c0e8e005781c6971e1a
|
|
| BLAKE2b-256 |
3ec405120467b5155135a79d45b4e4ae6e19f37d90eabfc37fb75cbeb76cf0a5
|
File details
Details for the file causal_lens-0.3.1-py3-none-any.whl.
File metadata
- Download URL: causal_lens-0.3.1-py3-none-any.whl
- Upload date:
- Size: 140.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
254e16ebf4ff0526fb36d301353de0d1d8b3a95eeec5fd786406440483897dfc
|
|
| MD5 |
f3a08dde200db1ffae8d8302ef2e6876
|
|
| BLAKE2b-256 |
5e2eeea5faae1357197f1abe7568ec489d360bbd9d0dc8340dcd8159465ca808
|