A library for Difference-in-Differences causal inference analysis

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

diff-diff

A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs.

Installation

pip install diff-diff

Or install from source:

git clone https://github.com/igerber/diff-diff.git
cd diff-diff
pip install -e .

Quick Start

import pandas as pd
from diff_diff import DifferenceInDifferences

# Create sample data
data = pd.DataFrame({
    'outcome': [10, 11, 15, 18, 9, 10, 12, 13],
    'treated': [1, 1, 1, 1, 0, 0, 0, 0],
    'post': [0, 0, 1, 1, 0, 0, 1, 1]
})

# Fit the model
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')

# View results
print(results)  # DiDResults(ATT=3.5000*, SE=1.2583, p=0.0367)
results.print_summary()

Output:

======================================================================
          Difference-in-Differences Estimation Results
======================================================================

Observations:                        8
Treated units:                       4
Control units:                       4
R-squared:                      0.9123

----------------------------------------------------------------------
Parameter         Estimate     Std. Err.     t-stat      P>|t|
----------------------------------------------------------------------
ATT                 3.5000       1.2583      2.782      0.0367
----------------------------------------------------------------------

95% Confidence Interval: [0.3912, 6.6088]

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
======================================================================

Features

sklearn-like API: Familiar fit() interface with get_params() and set_params()
Pythonic results: Easy access to coefficients, standard errors, and confidence intervals
Multiple interfaces: Column names or R-style formulas
Robust inference: Heteroskedasticity-robust (HC1) and cluster-robust standard errors
Panel data support: Two-way fixed effects estimator for panel designs
Multi-period analysis: Event-study style DiD with period-specific treatment effects
Staggered adoption: Callaway-Sant'Anna (2021) estimator for heterogeneous treatment timing
Synthetic DiD: Combined DiD with synthetic control for improved robustness
Event study plots: Publication-ready visualization of treatment effects
Parallel trends testing: Multiple methods including equivalence tests
Data prep utilities: Helper functions for common data preparation tasks

Data Preparation

diff-diff provides utility functions to help prepare your data for DiD analysis. These functions handle common data transformation tasks like creating treatment indicators, reshaping panel data, and validating data formats.

Generate Sample Data

Create synthetic data with a known treatment effect for testing and learning:

from diff_diff import generate_did_data, DifferenceInDifferences

# Generate panel data with 100 units, 4 periods, and a treatment effect of 5
data = generate_did_data(
    n_units=100,
    n_periods=4,
    treatment_effect=5.0,
    treatment_fraction=0.5,  # 50% of units are treated
    treatment_period=2,       # Treatment starts at period 2
    seed=42
)

# Verify the estimator recovers the treatment effect
did = DifferenceInDifferences()
results = did.fit(data, outcome='outcome', treatment='treated', time='post')
print(f"Estimated ATT: {results.att:.2f} (true: 5.0)")

Create Treatment Indicators

Convert categorical variables or numeric thresholds to binary treatment indicators:

from diff_diff import make_treatment_indicator

# From categorical variable
df = make_treatment_indicator(
    data,
    column='state',
    treated_values=['CA', 'NY', 'TX']  # These states are treated
)

# From numeric threshold (e.g., firms above median size)
df = make_treatment_indicator(
    data,
    column='firm_size',
    threshold=data['firm_size'].median()
)

# Treat units below threshold
df = make_treatment_indicator(
    data,
    column='income',
    threshold=50000,
    above_threshold=False  # Units with income <= 50000 are treated
)

Create Post-Treatment Indicators

Convert time/date columns to binary post-treatment indicators:

from diff_diff import make_post_indicator

# From specific post-treatment periods
df = make_post_indicator(
    data,
    time_column='year',
    post_periods=[2020, 2021, 2022]
)

# From treatment start date
df = make_post_indicator(
    data,
    time_column='year',
    treatment_start=2020  # All years >= 2020 are post-treatment
)

# Works with datetime columns
df = make_post_indicator(
    data,
    time_column='date',
    treatment_start='2020-01-01'
)

Reshape Wide to Long Format

Convert wide-format data (one row per unit, multiple time columns) to long format:

from diff_diff import wide_to_long

# Wide format: columns like sales_2019, sales_2020, sales_2021
wide_df = pd.DataFrame({
    'firm_id': [1, 2, 3],
    'industry': ['tech', 'retail', 'tech'],
    'sales_2019': [100, 150, 200],
    'sales_2020': [110, 160, 210],
    'sales_2021': [120, 170, 220]
})

# Convert to long format for DiD
long_df = wide_to_long(
    wide_df,
    value_columns=['sales_2019', 'sales_2020', 'sales_2021'],
    id_column='firm_id',
    time_name='year',
    value_name='sales',
    time_values=[2019, 2020, 2021]
)
# Result: 9 rows (3 firms × 3 years), columns: firm_id, year, sales, industry

Balance Panel Data

Ensure all units have observations for all time periods:

from diff_diff import balance_panel

# Keep only units with complete data (drop incomplete units)
balanced = balance_panel(
    data,
    unit_column='firm_id',
    time_column='year',
    method='inner'
)

# Include all unit-period combinations (creates NaN for missing)
balanced = balance_panel(
    data,
    unit_column='firm_id',
    time_column='year',
    method='outer'
)

# Fill missing values
balanced = balance_panel(
    data,
    unit_column='firm_id',
    time_column='year',
    method='fill',
    fill_value=0  # Or None for forward/backward fill
)

Validate Data

Check that your data meets DiD requirements before fitting:

from diff_diff import validate_did_data

# Validate and get informative error messages
result = validate_did_data(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    unit='firm_id',      # Optional: for panel-specific validation
    raise_on_error=False  # Return dict instead of raising
)

if result['valid']:
    print("Data is ready for DiD analysis!")
    print(f"Summary: {result['summary']}")
else:
    print("Issues found:")
    for error in result['errors']:
        print(f"  - {error}")

for warning in result['warnings']:
    print(f"Warning: {warning}")

Summarize Data by Groups

Get summary statistics for each treatment-time cell:

from diff_diff import summarize_did_data

summary = summarize_did_data(
    data,
    outcome='sales',
    treatment='treated',
    time='post'
)
print(summary)

Output:

                        n      mean       std       min       max
Control - Pre        250  100.5000   15.2340   65.0000  145.0000
Control - Post       250  105.2000   16.1230   68.0000  152.0000
Treated - Pre        250  101.2000   14.8900   67.0000  143.0000
Treated - Post       250  115.8000   17.5600   72.0000  165.0000
DiD Estimate           -    9.9000         -         -         -

Create Event Time for Staggered Designs

For designs where treatment occurs at different times:

from diff_diff import create_event_time

# Add event-time column relative to treatment timing
df = create_event_time(
    data,
    time_column='year',
    treatment_time_column='treatment_year'
)
# Result: event_time = -2, -1, 0, 1, 2 relative to treatment

Aggregate to Cohort Means

Aggregate unit-level data for visualization:

from diff_diff import aggregate_to_cohorts

cohort_data = aggregate_to_cohorts(
    data,
    unit_column='firm_id',
    time_column='year',
    treatment_column='treated',
    outcome='sales'
)
# Result: mean outcome by treatment group and period

Rank Control Units

Select the best control units for DiD or Synthetic DiD analysis by ranking them based on pre-treatment outcome similarity:

from diff_diff import rank_control_units, generate_did_data

# Generate sample data
data = generate_did_data(n_units=50, n_periods=6, seed=42)

# Rank control units by their similarity to treated units
ranking = rank_control_units(
    data,
    unit_column='unit',
    time_column='period',
    outcome_column='outcome',
    treatment_column='treated',
    n_top=10  # Return top 10 controls
)

print(ranking[['unit', 'quality_score', 'pre_trend_rmse']])

Output:

   unit  quality_score  pre_trend_rmse
0    35         1.0000          0.4521
1    42         0.9234          0.5123
2    28         0.8876          0.5892
...

With covariates for matching:

# Add covariate-based matching
ranking = rank_control_units(
    data,
    unit_column='unit',
    time_column='period',
    outcome_column='outcome',
    treatment_column='treated',
    covariates=['size', 'age'],  # Match on these too
    outcome_weight=0.7,          # 70% weight on outcome trends
    covariate_weight=0.3         # 30% weight on covariate similarity
)

Filter data for SyntheticDiD using top controls:

from diff_diff import SyntheticDiD

# Get top control units
top_controls = ranking['unit'].tolist()

# Filter data to treated + top controls
filtered_data = data[
    (data['treated'] == 1) | (data['unit'].isin(top_controls))
]

# Fit SyntheticDiD with selected controls
sdid = SyntheticDiD()
results = sdid.fit(
    filtered_data,
    outcome='outcome',
    treatment='treated',
    unit='unit',
    time='period',
    post_periods=[3, 4, 5]
)

Usage

Basic DiD with Column Names

from diff_diff import DifferenceInDifferences

did = DifferenceInDifferences(robust=True, alpha=0.05)
results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post_policy'
)

# Access results
print(f"ATT: {results.att:.4f}")
print(f"Standard Error: {results.se:.4f}")
print(f"P-value: {results.p_value:.4f}")
print(f"95% CI: {results.conf_int}")
print(f"Significant: {results.is_significant}")

Using Formula Interface

# R-style formula syntax
results = did.fit(data, formula='outcome ~ treated * post')

# Explicit interaction syntax
results = did.fit(data, formula='outcome ~ treated + post + treated:post')

# With covariates
results = did.fit(data, formula='outcome ~ treated * post + age + income')

Including Covariates

results = did.fit(
    data,
    outcome='outcome',
    treatment='treated',
    time='post',
    covariates=['age', 'income', 'education']
)

Fixed Effects

Use fixed_effects for low-dimensional categorical controls (creates dummy variables):

# State and industry fixed effects
results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    fixed_effects=['state', 'industry']
)

# Access fixed effect coefficients
state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')}

Use absorb for high-dimensional fixed effects (more efficient, uses within-transformation):

# Absorb firm-level fixed effects (efficient for many firms)
results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    absorb=['firm_id']
)

Combine covariates with fixed effects:

results = did.fit(
    data,
    outcome='sales',
    treatment='treated',
    time='post',
    covariates=['size', 'age'],           # Linear controls
    fixed_effects=['industry'],            # Low-dimensional FE (dummies)
    absorb=['firm_id']                     # High-dimensional FE (absorbed)
)

Cluster-Robust Standard Errors

did = DifferenceInDifferences(cluster='state')
results = did.fit(
    data,
    outcome='outcome',
    treatment='treated',
    time='post'
)

Two-Way Fixed Effects (Panel Data)

from diff_diff.estimators import TwoWayFixedEffects

twfe = TwoWayFixedEffects()
results = twfe.fit(
    panel_data,
    outcome='outcome',
    treatment='treated',
    time='year',
    unit='firm_id'
)

Multi-Period DiD (Event Study)

For settings with multiple pre- and post-treatment periods:

from diff_diff import MultiPeriodDiD

# Fit with multiple time periods
did = MultiPeriodDiD()
results = did.fit(
    panel_data,
    outcome='sales',
    treatment='treated',
    time='period',
    post_periods=[3, 4, 5],      # Periods 3-5 are post-treatment
    reference_period=0           # Reference period for comparison
)

# View period-specific treatment effects
for period, effect in results.period_effects.items():
    print(f"Period {period}: {effect.effect:.3f} (SE: {effect.se:.3f})")

# View average treatment effect across post-periods
print(f"Average ATT: {results.avg_att:.3f}")
print(f"Average SE: {results.avg_se:.3f}")

# Full summary with all period effects
results.print_summary()

Output:

================================================================================
            Multi-Period Difference-in-Differences Estimation Results
================================================================================

Observations:                      600
Pre-treatment periods:             3
Post-treatment periods:            3

--------------------------------------------------------------------------------
Average Treatment Effect
--------------------------------------------------------------------------------
Average ATT       5.2000       0.8234      6.315      0.0000
--------------------------------------------------------------------------------
95% Confidence Interval: [3.5862, 6.8138]

Period-Specific Effects:
--------------------------------------------------------------------------------
Period            Effect     Std. Err.     t-stat      P>|t|
--------------------------------------------------------------------------------
3                 4.5000       0.9512      4.731      0.0000***
4                 5.2000       0.8876      5.858      0.0000***
5                 5.9000       0.9123      6.468      0.0000***
--------------------------------------------------------------------------------

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
================================================================================

Staggered Difference-in-Differences (Callaway-Sant'Anna)

When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption.

from diff_diff import CallawaySantAnna

# Panel data with staggered treatment
# 'first_treat' = period when unit was first treated (0 if never treated)
cs = CallawaySantAnna()
results = cs.fit(
    panel_data,
    outcome='sales',
    unit='firm_id',
    time='year',
    first_treat='first_treat',  # 0 for never-treated, else first treatment year
    aggregate='event_study'      # Compute event study effects
)

# View results
results.print_summary()

# Access group-time effects ATT(g,t)
for (group, time), effect in results.group_time_effects.items():
    print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}")

# Event study effects (averaged by relative time)
for rel_time, effect in results.event_study_effects.items():
    print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})")

# Convert to DataFrame
df = results.to_dataframe(level='event_study')

Output:

=====================================================================================
          Callaway-Sant'Anna Staggered Difference-in-Differences Results
=====================================================================================

Total observations:                     600
Treated units:                           35
Control units:                           15
Treatment cohorts:                        3
Time periods:                             8
Control group:                never_treated

-------------------------------------------------------------------------------------
                  Overall Average Treatment Effect on the Treated
-------------------------------------------------------------------------------------
Parameter         Estimate     Std. Err.     t-stat      P>|t|   Sig.
-------------------------------------------------------------------------------------
ATT                 2.5000       0.3521       7.101     0.0000   ***
-------------------------------------------------------------------------------------

95% Confidence Interval: [1.8099, 3.1901]

-------------------------------------------------------------------------------------
                          Event Study (Dynamic) Effects
-------------------------------------------------------------------------------------
Rel. Period       Estimate     Std. Err.     t-stat      P>|t|   Sig.
-------------------------------------------------------------------------------------
0                   2.1000       0.4521       4.645     0.0000   ***
1                   2.5000       0.4123       6.064     0.0000   ***
2                   2.8000       0.5234       5.349     0.0000   ***
-------------------------------------------------------------------------------------

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
=====================================================================================

When to use Callaway-Sant'Anna vs TWFE:

Scenario	Use TWFE	Use Callaway-Sant'Anna
All units treated at same time	✓	✓
Staggered adoption, homogeneous effects	✓	✓
Staggered adoption, heterogeneous effects	✗	✓
Need event study with staggered timing	✗	✓
Fewer than ~20 treated units	✓	Depends on design

Parameters:

CallawaySantAnna(
    control_group='never_treated',  # or 'not_yet_treated'
    anticipation=0,                  # Periods before treatment with effects
    estimation_method='dr',          # 'dr', 'ipw', or 'reg'
    alpha=0.05,                      # Significance level
    cluster=None,                    # Column for cluster SEs
    n_bootstrap=0,                   # Must be 0 (bootstrap not yet implemented)
    seed=None                        # Random seed
)

Current limitations:

Bootstrap inference (n_bootstrap > 0) is not yet implemented
Covariate adjustment for conditional parallel trends is not yet implemented

Event Study Visualization

Create publication-ready event study plots:

from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna

# From MultiPeriodDiD
did = MultiPeriodDiD()
results = did.fit(data, outcome='y', treatment='treated',
                  time='period', post_periods=[3, 4, 5])
plot_event_study(results, title="Treatment Effects Over Time")

# From CallawaySantAnna (with event study aggregation)
cs = CallawaySantAnna()
results = cs.fit(data, outcome='y', unit='unit', time='period',
                 first_treat='first_treat', aggregate='event_study')
plot_event_study(results, title="Staggered DiD Event Study")

# From a DataFrame
df = pd.DataFrame({
    'period': [-2, -1, 0, 1, 2],
    'effect': [0.1, 0.05, 0.0, 2.5, 2.8],
    'se': [0.3, 0.25, 0.0, 0.4, 0.45]
})
plot_event_study(df, reference_period=0)

# With customization
ax = plot_event_study(
    results,
    title="Dynamic Treatment Effects",
    xlabel="Years Relative to Treatment",
    ylabel="Effect on Sales ($1000s)",
    color="#2563eb",
    marker="o",
    shade_pre=True,           # Shade pre-treatment region
    show_zero_line=True,      # Horizontal line at y=0
    show_reference_line=True, # Vertical line at reference period
    figsize=(10, 6),
    show=False                # Don't call plt.show(), return axes
)

Synthetic Difference-in-Differences

Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes.

from diff_diff import SyntheticDiD

# Fit Synthetic DiD model
sdid = SyntheticDiD()
results = sdid.fit(
    panel_data,
    outcome='gdp_growth',
    treatment='treated',
    unit='state',
    time='year',
    post_periods=[2015, 2016, 2017, 2018]
)

# View results
results.print_summary()
print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})")

# Examine unit weights (which control units matter most)
weights_df = results.get_unit_weights_df()
print(weights_df.head(10))

# Examine time weights
time_weights_df = results.get_time_weights_df()
print(time_weights_df)

Output:

===========================================================================
         Synthetic Difference-in-Differences Estimation Results
===========================================================================

Observations:                      500
Treated units:                       1
Control units:                      49
Pre-treatment periods:               6
Post-treatment periods:              4
Regularization (lambda):        0.0000
Pre-treatment fit (RMSE):       0.1234

---------------------------------------------------------------------------
Parameter         Estimate     Std. Err.     t-stat      P>|t|
---------------------------------------------------------------------------
ATT                 2.5000       0.4521      5.530      0.0000
---------------------------------------------------------------------------

95% Confidence Interval: [1.6139, 3.3861]

---------------------------------------------------------------------------
                   Top Unit Weights (Synthetic Control)
---------------------------------------------------------------------------
  Unit state_12: 0.3521
  Unit state_5: 0.2156
  Unit state_23: 0.1834
  Unit state_8: 0.1245
  Unit state_31: 0.0892
  (8 units with weight > 0.001)

Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
===========================================================================

When to Use Synthetic DiD Over Vanilla DiD

Use Synthetic DiD instead of standard DiD when:

Few treated units: When you have only one or a small number of treated units (e.g., a single state passed a policy), standard DiD averages across all controls equally. Synthetic DiD finds the optimal weighted combination of controls.
```
# Example: California passed a policy, want to estimate its effect
# Standard DiD would compare CA to the average of all other states
# Synthetic DiD finds states that together best match CA's pre-treatment trend
```
Parallel trends is questionable: When treated and control groups have different pre-treatment levels or trends, Synthetic DiD can construct a better counterfactual by matching the pre-treatment trajectory.
```
# Example: A tech hub city vs rural areas
# Rural areas may not be a good comparison on average
# Synthetic DiD can weight urban/suburban controls more heavily
```

Heterogeneous control units: When control units are very different from each other, equal weighting (as in standard DiD) is suboptimal.

# Example: Comparing a treated developing country to other countries
# Some control countries may be much more similar economically
# Synthetic DiD upweights the most comparable controls

You want transparency: Synthetic DiD provides explicit unit weights showing which controls contribute most to the comparison.
```
# See exactly which units are driving the counterfactual
print(results.get_unit_weights_df())
```

Key differences from standard DiD:

Aspect	Standard DiD	Synthetic DiD
Control weighting	Equal (1/N)	Optimized to match pre-treatment
Time weighting	Equal across periods	Can emphasize informative periods
N treated required	Can be many	Works with 1 treated unit
Parallel trends	Assumed	Partially relaxed via matching
Interpretability	Simple average	Explicit weights

Parameters:

SyntheticDiD(
    lambda_reg=0.0,     # Regularization toward uniform weights (0 = no reg)
    zeta=1.0,           # Time weight regularization (higher = more uniform)
    alpha=0.05,         # Significance level
    n_bootstrap=200,    # Bootstrap iterations for SE (0 = placebo-based)
    seed=None           # Random seed for reproducibility
)

Working with Results

Export Results

# As dictionary
results.to_dict()
# {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...}

# As DataFrame
df = results.to_dataframe()

Check Significance

if results.is_significant:
    print(f"Effect is significant at {did.alpha} level")

# Get significance stars
print(f"ATT: {results.att}{results.significance_stars}")
# ATT: 3.5000*

Access Full Regression Output

# All coefficients
results.coefficients
# {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5}

# Variance-covariance matrix
results.vcov

# Residuals and fitted values
results.residuals
results.fitted_values

# R-squared
results.r_squared

Checking Assumptions

Parallel Trends

Simple slope-based test:

from diff_diff.utils import check_parallel_trends

trends = check_parallel_trends(
    data,
    outcome='outcome',
    time='period',
    treatment_group='treated'
)

print(f"Treated trend: {trends['treated_trend']:.4f}")
print(f"Control trend: {trends['control_trend']:.4f}")
print(f"Difference p-value: {trends['p_value']:.4f}")

Robust distributional test (Wasserstein distance):

from diff_diff.utils import check_parallel_trends_robust

results = check_parallel_trends_robust(
    data,
    outcome='outcome',
    time='period',
    treatment_group='treated',
    unit='firm_id',              # Unit identifier for panel data
    pre_periods=[2018, 2019],    # Pre-treatment periods
    n_permutations=1000          # Permutations for p-value
)

print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}")
print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}")
print(f"KS test p-value: {results['ks_p_value']:.4f}")
print(f"Parallel trends plausible: {results['parallel_trends_plausible']}")

The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to:

Non-normal distributions
Heterogeneous effects across units
Outliers

Equivalence testing (TOST):

from diff_diff.utils import equivalence_test_trends

results = equivalence_test_trends(
    data,
    outcome='outcome',
    time='period',
    treatment_group='treated',
    unit='firm_id',
    equivalence_margin=0.5       # Define "practically equivalent"
)

print(f"Mean difference: {results['mean_difference']:.4f}")
print(f"TOST p-value: {results['tost_p_value']:.4f}")
print(f"Trends equivalent: {results['equivalent']}")

API Reference

DifferenceInDifferences

DifferenceInDifferences(
    robust=True,      # Use HC1 robust standard errors
    cluster=None,     # Column for cluster-robust SEs
    alpha=0.05        # Significance level for CIs
)

Methods:

Method	Description
`fit(data, outcome, treatment, time, ...)`	Fit the DiD model
`summary()`	Get formatted summary string
`print_summary()`	Print summary to stdout
`get_params()`	Get estimator parameters (sklearn-compatible)
`set_params(**params)`	Set estimator parameters (sklearn-compatible)

fit() Parameters:

Parameter	Type	Description
`data`	DataFrame	Input data
`outcome`	str	Outcome variable column name
`treatment`	str	Treatment indicator column (0/1)
`time`	str	Post-treatment indicator column (0/1)
`formula`	str	R-style formula (alternative to column names)
`covariates`	list	Linear control variables
`fixed_effects`	list	Categorical FE columns (creates dummies)
`absorb`	list	High-dimensional FE (within-transformation)

DiDResults

Attributes:

Attribute	Description
`att`	Average Treatment effect on the Treated
`se`	Standard error of ATT
`t_stat`	T-statistic
`p_value`	P-value for H0: ATT = 0
`conf_int`	Tuple of (lower, upper) confidence bounds
`n_obs`	Number of observations
`n_treated`	Number of treated units
`n_control`	Number of control units
`r_squared`	R-squared of regression
`coefficients`	Dictionary of all coefficients
`is_significant`	Boolean for significance at alpha
`significance_stars`	String of significance stars

Methods:

Method	Description
`summary(alpha)`	Get formatted summary string
`print_summary(alpha)`	Print summary to stdout
`to_dict()`	Convert to dictionary
`to_dataframe()`	Convert to pandas DataFrame

MultiPeriodDiD

MultiPeriodDiD(
    robust=True,      # Use HC1 robust standard errors
    cluster=None,     # Column for cluster-robust SEs
    alpha=0.05        # Significance level for CIs
)

fit() Parameters:

Parameter	Type	Description
`data`	DataFrame	Input data
`outcome`	str	Outcome variable column name
`treatment`	str	Treatment indicator column (0/1)
`time`	str	Time period column (multiple values)
`post_periods`	list	List of post-treatment period values
`covariates`	list	Linear control variables
`fixed_effects`	list	Categorical FE columns (creates dummies)
`absorb`	list	High-dimensional FE (within-transformation)
`reference_period`	any	Omitted period for time dummies

MultiPeriodDiDResults

Attributes:

Attribute	Description
`period_effects`	Dict mapping periods to PeriodEffect objects
`avg_att`	Average ATT across post-treatment periods
`avg_se`	Standard error of average ATT
`avg_t_stat`	T-statistic for average ATT
`avg_p_value`	P-value for average ATT
`avg_conf_int`	Confidence interval for average ATT
`n_obs`	Number of observations
`pre_periods`	List of pre-treatment periods
`post_periods`	List of post-treatment periods

Methods:

Method	Description
`get_effect(period)`	Get PeriodEffect for specific period
`summary(alpha)`	Get formatted summary string
`print_summary(alpha)`	Print summary to stdout
`to_dict()`	Convert to dictionary
`to_dataframe()`	Convert to pandas DataFrame

PeriodEffect

Attributes:

Attribute	Description
`period`	Time period identifier
`effect`	Treatment effect estimate
`se`	Standard error
`t_stat`	T-statistic
`p_value`	P-value
`conf_int`	Confidence interval
`is_significant`	Boolean for significance at 0.05
`significance_stars`	String of significance stars

SyntheticDiD

SyntheticDiD(
    lambda_reg=0.0,     # L2 regularization for unit weights
    zeta=1.0,           # Regularization for time weights
    alpha=0.05,         # Significance level for CIs
    n_bootstrap=200,    # Bootstrap iterations for SE
    seed=None           # Random seed for reproducibility
)

fit() Parameters:

Parameter	Type	Description
`data`	DataFrame	Panel data
`outcome`	str	Outcome variable column name
`treatment`	str	Treatment indicator column (0/1)
`unit`	str	Unit identifier column
`time`	str	Time period column
`post_periods`	list	List of post-treatment period values
`covariates`	list	Covariates to residualize out

SyntheticDiDResults

Attributes:

Attribute	Description
`att`	Average Treatment effect on the Treated
`se`	Standard error (bootstrap or placebo-based)
`t_stat`	T-statistic
`p_value`	P-value
`conf_int`	Confidence interval
`n_obs`	Number of observations
`n_treated`	Number of treated units
`n_control`	Number of control units
`unit_weights`	Dict mapping control unit IDs to weights
`time_weights`	Dict mapping pre-treatment periods to weights
`pre_periods`	List of pre-treatment periods
`post_periods`	List of post-treatment periods
`pre_treatment_fit`	RMSE of synthetic vs treated in pre-period
`placebo_effects`	Array of placebo effect estimates

Methods:

Method	Description
`summary(alpha)`	Get formatted summary string
`print_summary(alpha)`	Print summary to stdout
`to_dict()`	Convert to dictionary
`to_dataframe()`	Convert to pandas DataFrame
`get_unit_weights_df()`	Get unit weights as DataFrame
`get_time_weights_df()`	Get time weights as DataFrame

Data Preparation Functions

generate_did_data

generate_did_data(
    n_units=100,          # Number of units
    n_periods=4,          # Number of time periods
    treatment_effect=5.0, # True ATT
    treatment_fraction=0.5,  # Fraction treated
    treatment_period=2,   # First post-treatment period
    unit_fe_sd=2.0,       # Unit fixed effect std dev
    time_trend=0.5,       # Linear time trend
    noise_sd=1.0,         # Idiosyncratic noise std dev
    seed=None             # Random seed
)

Returns DataFrame with columns: unit, period, treated, post, outcome, true_effect.

make_treatment_indicator

make_treatment_indicator(
    data,                 # Input DataFrame
    column,               # Column to create treatment from
    treated_values=None,  # Value(s) indicating treatment
    threshold=None,       # Numeric threshold for treatment
    above_threshold=True, # If True, >= threshold is treated
    new_column='treated'  # Output column name
)

make_post_indicator

make_post_indicator(
    data,                  # Input DataFrame
    time_column,           # Time/period column
    post_periods=None,     # Specific post-treatment period(s)
    treatment_start=None,  # First post-treatment period
    new_column='post'      # Output column name
)

wide_to_long

wide_to_long(
    data,                  # Wide-format DataFrame
    value_columns,         # List of time-varying columns
    id_column,             # Unit identifier column
    time_name='period',    # Name for time column
    value_name='value',    # Name for value column
    time_values=None       # Values for time periods
)

balance_panel

balance_panel(
    data,                  # Panel DataFrame
    unit_column,           # Unit identifier column
    time_column,           # Time period column
    method='inner',        # 'inner', 'outer', or 'fill'
    fill_value=None        # Value for filling (if method='fill')
)

validate_did_data

validate_did_data(
    data,                  # DataFrame to validate
    outcome,               # Outcome column name
    treatment,             # Treatment column name
    time,                  # Time/post column name
    unit=None,             # Unit column (for panel validation)
    raise_on_error=True    # Raise ValueError or return dict
)

Returns dict with valid, errors, warnings, and summary keys.

summarize_did_data

summarize_did_data(
    data,                  # Input DataFrame
    outcome,               # Outcome column name
    treatment,             # Treatment column name
    time,                  # Time/post column name
    unit=None              # Unit column (optional)
)

Returns DataFrame with summary statistics by treatment-time cell.

create_event_time

create_event_time(
    data,                  # Panel DataFrame
    time_column,           # Calendar time column
    treatment_time_column, # Column with treatment timing
    new_column='event_time' # Output column name
)

aggregate_to_cohorts

aggregate_to_cohorts(
    data,                  # Unit-level panel data
    unit_column,           # Unit identifier column
    time_column,           # Time period column
    treatment_column,      # Treatment indicator column
    outcome,               # Outcome variable column
    covariates=None        # Additional columns to aggregate
)

rank_control_units

rank_control_units(
    data,                          # Panel data in long format
    unit_column,                   # Unit identifier column
    time_column,                   # Time period column
    outcome_column,                # Outcome variable column
    treatment_column=None,         # Treatment indicator column (0/1)
    treated_units=None,            # Explicit list of treated unit IDs
    pre_periods=None,              # Pre-treatment periods (default: first half)
    covariates=None,               # Covariate columns for matching
    outcome_weight=0.7,            # Weight for outcome trend similarity (0-1)
    covariate_weight=0.3,          # Weight for covariate distance (0-1)
    exclude_units=None,            # Units to exclude from control pool
    require_units=None,            # Units that must appear in output
    n_top=None,                    # Return only top N controls
    suggest_treatment_candidates=False,  # Identify treatment candidates
    n_treatment_candidates=5,      # Number of treatment candidates
    lambda_reg=0.0                 # Regularization for synthetic weights
)

Returns DataFrame with columns: unit, quality_score, outcome_trend_score, covariate_score, synthetic_weight, pre_trend_rmse, is_required.

Requirements

Python >= 3.9
numpy >= 1.20
pandas >= 1.3
scipy >= 1.7

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black diff_diff tests
ruff check diff_diff tests

References

This library implements methods from the following scholarly works:

Difference-in-Differences

Ashenfelter, O., & Card, D. (1985). "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." The Review of Economics and Statistics, 67(4), 648-660. https://doi.org/10.2307/1924810
Card, D., & Krueger, A. B. (1994). "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." The American Economic Review, 84(4), 772-793. https://www.jstor.org/stable/2118030
Angrist, J. D., & Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. Chapter 5: Differences-in-Differences.

Two-Way Fixed Effects

Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data (2nd ed.). MIT Press.
Imai, K., & Kim, I. S. (2021). "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." Political Analysis, 29(3), 405-415. https://doi.org/10.1017/pan.2020.33

Robust Standard Errors

White, H. (1980). "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." Econometrica, 48(4), 817-838. https://doi.org/10.2307/1912934
MacKinnon, J. G., & White, H. (1985). "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." Journal of Econometrics, 29(3), 305-325. https://doi.org/10.1016/0304-4076(85)90158-7
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011). "Robust Inference With Multiway Clustering." Journal of Business & Economic Statistics, 29(2), 238-249. https://doi.org/10.1198/jbes.2010.07136

Synthetic Control Method

Abadie, A., & Gardeazabal, J. (2003). "The Economic Costs of Conflict: A Case Study of the Basque Country." The American Economic Review, 93(1), 113-132. https://doi.org/10.1257/000282803321455188
Abadie, A., Diamond, A., & Hainmueller, J. (2010). "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." Journal of the American Statistical Association, 105(490), 493-505. https://doi.org/10.1198/jasa.2009.ap08746
Abadie, A., Diamond, A., & Hainmueller, J. (2015). "Comparative Politics and the Synthetic Control Method." American Journal of Political Science, 59(2), 495-510. https://doi.org/10.1111/ajps.12116

Synthetic Difference-in-Differences

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). "Synthetic Difference-in-Differences." American Economic Review, 111(12), 4088-4118. https://doi.org/10.1257/aer.20190159

Parallel Trends and Pre-Trend Testing

Roth, J. (2022). "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." American Economic Review: Insights, 4(3), 305-322. https://doi.org/10.1257/aeri.20210236
Rambachan, A., & Roth, J. (2023). "A More Credible Approach to Parallel Trends." The Review of Economic Studies, 90(5), 2555-2591. https://doi.org/10.1093/restud/rdad018
Lakens, D. (2017). "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." Social Psychological and Personality Science, 8(4), 355-362. https://doi.org/10.1177/1948550617697177

Multi-Period and Staggered Adoption

Callaway, B., & Sant'Anna, P. H. C. (2021). "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, 225(2), 200-230. https://doi.org/10.1016/j.jeconom.2020.12.001
Sant'Anna, P. H. C., & Zhao, J. (2020). "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, 219(1), 101-122. https://doi.org/10.1016/j.jeconom.2020.06.003
Sun, L., & Abraham, S. (2021). "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." Journal of Econometrics, 225(2), 175-199. https://doi.org/10.1016/j.jeconom.2020.09.006
de Chaisemartin, C., & D'Haultfœuille, X. (2020). "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." American Economic Review, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169
Goodman-Bacon, A. (2021). "Difference-in-Differences with Variation in Treatment Timing." Journal of Econometrics, 225(2), 254-277. https://doi.org/10.1016/j.jeconom.2021.03.014

General Causal Inference

Imbens, G. W., & Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.
Cunningham, S. (2021). Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/

License

MIT License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

igerber

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.1.3

Apr 18, 2026

3.1.2

Apr 18, 2026

3.1.1

Apr 16, 2026

3.1.0

Apr 14, 2026

3.0.2

Apr 13, 2026

3.0.1

Apr 7, 2026

3.0.0

Apr 7, 2026

2.9.1

Apr 6, 2026

2.9.0

Apr 5, 2026

2.8.4

Apr 4, 2026

2.8.3

Apr 2, 2026

2.8.2

Apr 2, 2026

2.8.1

Apr 2, 2026

2.8.0

Mar 31, 2026

2.7.6

Mar 29, 2026

2.7.5

Mar 24, 2026

2.7.4

Mar 21, 2026

2.7.3

Mar 20, 2026

2.7.2

Mar 18, 2026

2.7.1

Mar 16, 2026

2.7.0

Mar 15, 2026

2.6.1

Mar 8, 2026

2.6.0

Feb 22, 2026

2.5.0

Feb 19, 2026

2.4.3

Feb 19, 2026

2.4.2

Feb 18, 2026

2.4.1

Feb 17, 2026

2.4.0

Feb 16, 2026

2.3.2

Feb 16, 2026

2.3.1

Feb 15, 2026

2.3.0

Feb 9, 2026

2.2.1

Feb 8, 2026

2.2.0

Jan 27, 2026

2.1.9

Jan 26, 2026

2.1.8

Jan 25, 2026

2.1.7

Jan 25, 2026

2.1.6

Jan 24, 2026

2.1.5

Jan 22, 2026

2.1.4

Jan 21, 2026

2.1.3

Jan 19, 2026

2.1.2

Jan 19, 2026

2.1.1

Jan 19, 2026

2.1.0

Jan 18, 2026

2.0.4

Jan 17, 2026

2.0.3

Jan 17, 2026

2.0.2

Jan 15, 2026

2.0.1

Jan 13, 2026

2.0.0

Jan 12, 2026

1.4.0

Jan 11, 2026

1.3.1

Jan 10, 2026

1.3.0

Jan 9, 2026

1.2.1

Jan 8, 2026

1.2.0

Jan 7, 2026

1.1.0

Jan 5, 2026

1.0.0

Jan 4, 2026

0.6.0

Jan 4, 2026

0.5.0

Jan 3, 2026

This version

0.4.0

Jan 3, 2026

0.3.0

Jan 3, 2026

0.2.0

Jan 2, 2026

0.1.0

Jan 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diff_diff-0.4.0.tar.gz (96.7 kB view details)

Uploaded Jan 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diff_diff-0.4.0-py3-none-any.whl (60.4 kB view details)

Uploaded Jan 3, 2026 Python 3

File details

Details for the file diff_diff-0.4.0.tar.gz.

File metadata

Download URL: diff_diff-0.4.0.tar.gz
Upload date: Jan 3, 2026
Size: 96.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for diff_diff-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`d3d374d57eebf9d0adcebc5e13e84495f99af1fc66427fe96f736e8d917a6a76`
MD5	`9f9db07056dc22eff2356d146245cec7`
BLAKE2b-256	`4a7aa402648ed7edaf112ef924a1a1c4f7e242858ce2e6d3c325852f524395a8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for diff_diff-0.4.0.tar.gz:

Publisher: publish.yml on igerber/diff-diff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: diff_diff-0.4.0.tar.gz
- Subject digest: d3d374d57eebf9d0adcebc5e13e84495f99af1fc66427fe96f736e8d917a6a76
- Sigstore transparency entry: 789749256
- Sigstore integration time: Jan 3, 2026
Source repository:
- Permalink: igerber/diff-diff@457b248739889915449543a33bc361632a09e7b6
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/igerber
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@457b248739889915449543a33bc361632a09e7b6
- Trigger Event: release

File details

Details for the file diff_diff-0.4.0-py3-none-any.whl.

File metadata

Download URL: diff_diff-0.4.0-py3-none-any.whl
Upload date: Jan 3, 2026
Size: 60.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for diff_diff-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5984379b8b16659cf60d1f5a3e402b0693f5b189b2af358ee915314f808211d8`
MD5	`ef24c3935ec1918f807b0336938e4b4a`
BLAKE2b-256	`c5564dfdee549b615a6278f0b02cbaf8b57a8c13ca4c37cd0e5b13ec9207721c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for diff_diff-0.4.0-py3-none-any.whl:

Publisher: publish.yml on igerber/diff-diff

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: diff_diff-0.4.0-py3-none-any.whl
- Subject digest: 5984379b8b16659cf60d1f5a3e402b0693f5b189b2af358ee915314f808211d8
- Sigstore transparency entry: 789749257
- Sigstore integration time: Jan 3, 2026
Source repository:
- Permalink: igerber/diff-diff@457b248739889915449543a33bc361632a09e7b6
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/igerber
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@457b248739889915449543a33bc361632a09e7b6
- Trigger Event: release

diff-diff 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

diff-diff

Installation

Quick Start

Features

Data Preparation

Generate Sample Data

Create Treatment Indicators

Create Post-Treatment Indicators

Reshape Wide to Long Format

Balance Panel Data

Validate Data

Summarize Data by Groups

Create Event Time for Staggered Designs

Aggregate to Cohort Means

Rank Control Units

Usage

Basic DiD with Column Names

Using Formula Interface

Including Covariates

Fixed Effects

Cluster-Robust Standard Errors

Two-Way Fixed Effects (Panel Data)

Multi-Period DiD (Event Study)

Staggered Difference-in-Differences (Callaway-Sant'Anna)

Event Study Visualization

Synthetic Difference-in-Differences

When to Use Synthetic DiD Over Vanilla DiD

Working with Results

Export Results

Check Significance

Access Full Regression Output

Checking Assumptions

Parallel Trends

API Reference

DifferenceInDifferences

DiDResults

MultiPeriodDiD

MultiPeriodDiDResults

PeriodEffect

SyntheticDiD

SyntheticDiDResults

Data Preparation Functions

generate_did_data

make_treatment_indicator

make_post_indicator

wide_to_long

balance_panel

validate_did_data

summarize_did_data

create_event_time

aggregate_to_cohorts

rank_control_units

Requirements

Development

References

Difference-in-Differences

Two-Way Fixed Effects

Robust Standard Errors

Synthetic Control Method

Synthetic Difference-in-Differences

Parallel Trends and Pre-Trend Testing

Multi-Period and Staggered Adoption

General Causal Inference

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details