Skip to main content

Coarsened Exact Matching for Causal Inference

Project description

cem: Coarsened Exact Matching for Causal Inference

pypi pytest style

cem is a lightweight library for Coarsened Exact Matching (CEM). CEM is a matching technique used to reduce covariate imbalance, which would otherwise lead to treatment effect estimates that are sensitive to model specification. By removing and/or reweighting certain observations via CEM, one can arrive at treatment effect estimates that are more stable than those found using other matching techniques like propensity score matching. The L1 and L2 multivariate imbalance measures are implemented as described in [2].

Usage

Load the data

from cem.match import match
from cem.coarsen import coarsen
from cem.imbalance import L1

import statsmodels.api as sm

boston = load_boston()

O = "MEDV"  # outcome variable
T = "CHAS"  # treatment variable

y = boston[O]
X = boston.drop(columns=O)
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24
1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.6
2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.9 5.33 36.2

Automatic Coarsening

First we coarsen the data in an automatic fashion to get a baseline imbalance. Be sure to drop the column containing your outcome variable prior to coarsening/matching. coarsen optionally takes a list of columns you'd like to auto-coarsen, ignoring the rest.

# coarsen predictor variables
X_coarse = coarsen(X, T, "l1")

# match observations
weights = match(X_coarse, T)

# calculate weighted imbalance
L1(X_coarse, weights)

Informed Coarsening

It's recommended to coarsen using pandas.cut and pandas.qcut, but you are free to coarsen your predictor variables however you wish.

# coarsen predictor variables
schema = {
   'CRIM': (pd.cut, {'bins': 4}),
   'ZN': (pd.qcut, {'q': 4}),
   'INDUS': (pd.qcut, {'q': 4}),
   'NOX': (pd.cut, {'bins': 5}),
   'RM': (pd.cut, {'bins': 5}),
   'AGE': (pd.cut, {'bins': 5}),
   'DIS': (pd.cut, {'bins': 5}),
   'RAD': (pd.cut, {'bins': 6}),
   'TAX': (pd.cut, {'bins': 5}),
   'PTRATIO': (pd.cut, {'bins': 6}),
   'B': (pd.cut, {'bins': 5}),
   'LSTAT': (pd.cut, {'bins': 5})
}

X_coarse = X.apply(lambda x: schema[x.name][0](x, **schema[x.name][1]) if x.name in schema else x)

# match observations
weights = match(X_coarse, T)

# calculate weighted imbalance
L1(X_coarse, weights)

# perform weighted regression
model = sm.WLS(y, sm.add_constant(X), weights=weights)

References

[1] Porro, Giuseppe & King, Gary & Iacus, Stefano. (2009). CEM: Software for Coarsened Exact Matching. Journal of Statistical Software. 30. 10.18637/jss.v030.i09.

[2] Iacus, S. M., King, G., and Porro, G. Multivariate matching methods that are monotonic imbalance bounding. Journal of the American Statistical Association 106, 493 (2011 2011), 345–361.

[3] Iacus, S. M., King, G., and Porro, G. Causal inference without balance checking: Coarsened exact matching. Political Analysis 20, 1 (2012), 1–24.

[4] King, G., and Zeng, L. The dangers of extreme counterfactuals. Political Analysis 14 (2006), 131–159.

[5] Ho, D., Imai, K., King, G., and Stuart, E. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15 (2007), 199–236.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cem-1.1.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

cem-1.1.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file cem-1.1.0.tar.gz.

File metadata

  • Download URL: cem-1.1.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.11 Darwin/22.6.0

File hashes

Hashes for cem-1.1.0.tar.gz
Algorithm Hash digest
SHA256 6c3fc73fe7b4d079a088fa465813db65a04aabc4342190050a317e37303bde5a
MD5 0e0e8ac9326f6704283f93b7d052fc14
BLAKE2b-256 62bb3de962520dca0d84948936f079ce2f0e73092b2e8b7303a568cb2b4b2bc7

See more details on using hashes here.

File details

Details for the file cem-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: cem-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.9.11 Darwin/22.6.0

File hashes

Hashes for cem-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23910c6566af76cf6b9e36055ce3b9d8c38bba9d11b6dba4263cf861db11e0fd
MD5 2c056e982edaae18332b3ab410ae1ed3
BLAKE2b-256 0f74887f285bb79807a8b91ca387fbac9da58af809b996dac10add15af8cb20d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page