A library for Difference-in-Differences causal inference analysis
Project description
diff-diff
A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.
Installation
pip install diff-diff
Or install from source:
git clone https://github.com/igerber/diff-diff.git
cd diff-diff
pip install -e .
Quick Start
import pandas as pd
from diff_diff import DifferenceInDifferences
# Create sample data
data = pd.DataFrame({
'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
'treated': [1, 1, 1, 1, 0, 0, 0, 0],
'post': [0, 0, 1, 1, 0, 0, 1, 1]
})
# Fit the model
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
# View results
print(results) # DiDResults(ATT=3.5000*, SE=1.2583, p=0.0367)
results.print_summary()
Output:
======================================================================
Difference-in-Differences Estimation Results
======================================================================
Observations: 8
Treated units: 4
Control units: 4
R-squared: 0.9123
----------------------------------------------------------------------
Parameter Estimate Std. Err. t-stat P>|t|
----------------------------------------------------------------------
ATT 3.5000 1.2583 2.782 0.0367
----------------------------------------------------------------------
95% Confidence Interval: [0.3912, 6.6088]
Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
======================================================================
Features
- sklearn-like API: Familiar
fit()interface withget_params()andset_params() - Pythonic results: Easy access to coefficients, standard errors, and confidence intervals
- Multiple interfaces: Column names or R-style formulas
- Robust inference: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
- Panel data support: Two-way fixed effects estimator for panel designs
Usage
Basic DiD with Column Names
from diff_diff import DifferenceInDifferences
did = DifferenceInDifferences(robust=True, alpha=0.05)
results = did.fit(
data,
outcome='sales',
treatment='treated',
time='post_policy'
)
# Access results
print(f"ATT: {results.att:.4f}")
print(f"Standard Error: {results.se:.4f}")
print(f"P-value: {results.p_value:.4f}")
print(f"95% CI: {results.conf_int}")
print(f"Significant: {results.is_significant}")
Using Formula Interface
# R-style formula syntax
results = did.fit(data, formula='outcome ~ treated * post')
# Explicit interaction syntax
results = did.fit(data, formula='outcome ~ treated + post + treated:post')
# With covariates
results = did.fit(data, formula='outcome ~ treated * post + age + income')
Including Covariates
results = did.fit(
data,
outcome='outcome',
treatment='treated',
time='post',
covariates=['age', 'income', 'education']
)
Fixed Effects
Use fixed_effects for low-dimensional categorical controls (creates dummy variables):
# State and industry fixed effects
results = did.fit(
data,
outcome='sales',
treatment='treated',
time='post',
fixed_effects=['state', 'industry']
)
# Access fixed effect coefficients
state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')}
Use absorb for high-dimensional fixed effects (more efficient, uses within-transformation):
# Absorb firm-level fixed effects (efficient for many firms)
results = did.fit(
data,
outcome='sales',
treatment='treated',
time='post',
absorb=['firm_id']
)
Combine covariates with fixed effects:
results = did.fit(
data,
outcome='sales',
treatment='treated',
time='post',
covariates=['size', 'age'], # Linear controls
fixed_effects=['industry'], # Low-dimensional FE (dummies)
absorb=['firm_id'] # High-dimensional FE (absorbed)
)
Cluster-Robust Standard Errors
did = DifferenceInDifferences(cluster='state')
results = did.fit(
data,
outcome='outcome',
treatment='treated',
time='post'
)
Two-Way Fixed Effects (Panel Data)
from diff_diff.estimators import TwoWayFixedEffects
twfe = TwoWayFixedEffects()
results = twfe.fit(
panel_data,
outcome='outcome',
treatment='treated',
time='year',
unit='firm_id'
)
Working with Results
Export Results
# As dictionary
results.to_dict()
# {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}
# As DataFrame
df = results.to_dataframe()
Check Significance
if results.is_significant:
print(f"Effect is significant at {did.alpha} level")
# Get significance stars
print(f"ATT: {results.att}{results.significance_stars}")
# ATT: 3.5000*
Access Full Regression Output
# All coefficients
results.coefficients
# {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}
# Variance-covariance matrix
results.vcov
# Residuals and fitted values
results.residuals
results.fitted_values
# R-squared
results.r_squared
Checking Assumptions
Parallel Trends
Simple slope-based test:
from diff_diff.utils import check_parallel_trends
trends = check_parallel_trends(
data,
outcome='outcome',
time='period',
treatment_group='treated'
)
print(f"Treated trend: {trends['treated_trend']:.4f}")
print(f"Control trend: {trends['control_trend']:.4f}")
print(f"Difference p-value: {trends['p_value']:.4f}")
Robust distributional test (Wasserstein distance):
from diff_diff.utils import check_parallel_trends_robust
results = check_parallel_trends_robust(
data,
outcome='outcome',
time='period',
treatment_group='treated',
unit='firm_id', # Unit identifier for panel data
pre_periods=[2018, 2019], # Pre-treatment periods
n_permutations=1000 # Permutations for p-value
)
print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}")
print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}")
print(f"KS test p-value: {results['ks_p_value']:.4f}")
print(f"Parallel trends plausible: {results['parallel_trends_plausible']}")
The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to:
- Non-normal distributions
- Heterogeneous effects across units
- Outliers
Equivalence testing (TOST):
from diff_diff.utils import equivalence_test_trends
results = equivalence_test_trends(
data,
outcome='outcome',
time='period',
treatment_group='treated',
unit='firm_id',
equivalence_margin=0.5 # Define "practically equivalent"
)
print(f"Mean difference: {results['mean_difference']:.4f}")
print(f"TOST p-value: {results['tost_p_value']:.4f}")
print(f"Trends equivalent: {results['equivalent']}")
API Reference
DifferenceInDifferences
DifferenceInDifferences(
robust=True, # Use HC1 robust standard errors
cluster=None, # Column for cluster-robust SEs
alpha=0.05 # Significance level for CIs
)
Methods:
| Method | Description |
|---|---|
fit(data, outcome, treatment, time, ...) |
Fit the DiD model |
summary() |
Get formatted summary string |
print_summary() |
Print summary to stdout |
get_params() |
Get estimator parameters (sklearn-compatible) |
set_params(**params) |
Set estimator parameters (sklearn-compatible) |
fit() Parameters:
| Parameter | Type | Description |
|---|---|---|
data |
DataFrame | Input data |
outcome |
str | Outcome variable column name |
treatment |
str | Treatment indicator column (0/1) |
time |
str | Post-treatment indicator column (0/1) |
formula |
str | R-style formula (alternative to column names) |
covariates |
list | Linear control variables |
fixed_effects |
list | Categorical FE columns (creates dummies) |
absorb |
list | High-dimensional FE (within-transformation) |
DiDResults
Attributes:
| Attribute | Description |
|---|---|
att |
Average Treatment effect on the Treated |
se |
Standard error of ATT |
t_stat |
T-statistic |
p_value |
P-value for H0: ATT = 0 |
conf_int |
Tuple of (lower, upper) confidence bounds |
n_obs |
Number of observations |
n_treated |
Number of treated units |
n_control |
Number of control units |
r_squared |
R-squared of regression |
coefficients |
Dictionary of all coefficients |
is_significant |
Boolean for significance at alpha |
significance_stars |
String of significance stars |
Methods:
| Method | Description |
|---|---|
summary(alpha) |
Get formatted summary string |
print_summary(alpha) |
Print summary to stdout |
to_dict() |
Convert to dictionary |
to_dataframe() |
Convert to pandas DataFrame |
Requirements
- Python >= 3.9
- numpy >= 1.20
- pandas >= 1.3
- scipy >= 1.7
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black diff_diff tests
ruff check diff_diff tests
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diff_diff-0.1.0.tar.gz.
File metadata
- Download URL: diff_diff-0.1.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
122f70542f95e5127c3fab1d9d7dc8340f6def49f8bfa343a95babd7a1409b4c
|
|
| MD5 |
c7493fab9164ed57881b739d24bc0c3f
|
|
| BLAKE2b-256 |
18249de16a791e061aee5344a8e85990805ff78b9d03840ad6059b940871f667
|
File details
Details for the file diff_diff-0.1.0-py3-none-any.whl.
File metadata
- Download URL: diff_diff-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
509fdf3c353c22bab9108ac7692c89460436a084c19b142e5e3e5121a6efe15e
|
|
| MD5 |
5111d20e1f54b38c8a7a5222a5417a6c
|
|
| BLAKE2b-256 |
f405a0b6cbfc58fda356e67272d6788179c1ff631e0ec3bd918fa79d78ed8464
|