cem · PyPI

A Python implmentation of Coarsened Exact Matching for causal inference

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

cem is a lightweight library for Coarsened Exact Matching (CEM) and is essentially a poor man’s version of the original R-package [1]. CEM is a matching technique used to reduce covariate imbalance which would otherwise lead to treatment effect estimates that are sensitive to model specification. By removing and/or reweighting certain observations via CEM, one can arrive at treatment effect estimates that are more stable than those found using other matching techniques like propensity score matching. The L1 and L2 multivariate imbalance measures are implemented as described in [2]. I make no claim to originality and thank the authors for their research.

Get the code, read the docs, or contribute!

Usage

from cem import CEM

boston = load_boston()
...
df

+----+---------+------+---------+--------+-------+-------+-------+--------+-------+-------+-----------+--------+---------+--------+
|    |    CRIM |   ZN |   INDUS |   CHAS |   NOX |    RM |   AGE |    DIS |   RAD |   TAX |   PTRATIO |      B |   LSTAT |   MEDV |
+====+=========+======+=========+========+=======+=======+=======+========+=======+=======+===========+========+=========+========+
|  0 | 0.00632 |   18 |    2.31 |      0 | 0.538 | 6.575 |  65.2 | 4.09   |     1 |   296 |      15.3 | 396.9  |    4.98 |   24   |
+----+---------+------+---------+--------+-------+-------+-------+--------+-------+-------+-----------+--------+---------+--------+
|  1 | 0.02731 |    0 |    7.07 |      0 | 0.469 | 6.421 |  78.9 | 4.9671 |     2 |   242 |      17.8 | 396.9  |    9.14 |   21.6 |
+----+---------+------+---------+--------+-------+-------+-------+--------+-------+-------+-----------+--------+---------+--------+
|  2 | 0.02729 |    0 |    7.07 |      0 | 0.469 | 7.185 |  61.1 | 4.9671 |     2 |   242 |      17.8 | 392.83 |    4.03 |   34.7 |
+----+---------+------+---------+--------+-------+-------+-------+--------+-------+-------+-----------+--------+---------+--------+
|  3 | 0.03237 |    0 |    2.18 |      0 | 0.458 | 6.998 |  45.8 | 6.0622 |     3 |   222 |      18.7 | 394.63 |    2.94 |   33.4 |
+----+---------+------+---------+--------+-------+-------+-------+--------+-------+-------+-----------+--------+---------+--------+
|  4 | 0.06905 |    0 |    2.18 |      0 | 0.458 | 7.147 |  54.2 | 6.0622 |     3 |   222 |      18.7 | 396.9  |    5.33 |   36.2 |
+----+---------+------+---------+--------+-------+-------+-------+--------+-------+-------+-----------+--------+---------+--------+

c = CEM(df, "CHAS", "MEDV")

# schema are dicts where keys are column names and values are tuples of (panda cut function name, function kwargs)
schema = {
   'CRIM': ('cut', {'bins': 4}),
   'ZN': ('cut', {'bins': 4}),
   'INDUS': ('cut', {'bins': 4}),
   'NOX': ('cut', {'bins': 5}),
   'RM': ('cut', {'bins': 5}),
   'AGE': ('cut', {'bins': 5}),
   'DIS': ('cut', {'bins': 5}),
   'RAD': ('cut', {'bins': 6}),
   'TAX': ('cut', {'bins': 5}),
   'PTRATIO': ('cut', {'bins': 6}),
   'B': ('cut', {'bins': 5}),
   'LSTAT': ('cut', {'bins': 5})
   }

# Check the multidimensional (L1) imbalance before and after matching
c.imbalance() # 0.96
c.imbalance(schema) # 0.60

# Get the weights for each example after matching using the coarsening schema
weights = c.match(schema)
weights[weights > 0]

+-----+-----------+
|     |   weights |
+=====+===========+
|   1 |  1.25     |
+-----+-----------+
|   2 |  2.5      |
+-----+-----------+
|  96 |  1.25     |
+-----+-----------+
| 142 |  1        |
+-----+-----------+
| 143 |  0.625    |
+-----+-----------+
| 144 |  0.625    |
+-----+-----------+
| 147 |  0.625    |
+-----+-----------+
| 148 |  0.625    |
+-----+-----------+
| 150 |  2.5      |
+-----+-----------+
| 151 |  2.5      |
+-----+-----------+
...


# ..perform weighted regression or weighted difference of means to find your treatment effect

References

[1] Porro, Giuseppe & King, Gary & Iacus, Stefano. (2009). CEM: Software for Coarsened Exact Matching. Journal of Statistical Software. 30. 10.18637/jss.v030.i09.

[2] Iacus, S. M., King, G., and Porro, G. Multivariate matching methods that are monotonic imbalance bounding. Journal of the American Statistical Association 106, 493 (2011 2011), 345â€“361.

[3] Iacus, S. M., King, G., and Porro, G. Causal inference without balance checking: Coarsened exact matching. Political Analysis 20, 1 (2012), 1â€“24.

[4] King, G., and Zeng, L. The dangers of extreme counterfactuals. Political Analysis 14 (2006), 131â€“159.

[5] Ho, D., Imai, K., King, G., and Stuart, E. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15 (2007), 199â€“236.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

1.1.0

Oct 12, 2023

1.0.0

Oct 7, 2023

0.1.9

Sep 18, 2023

0.1.8

Sep 18, 2023

0.1.7

Sep 17, 2023

0.1.6

Sep 17, 2023

This version

0.1.5

Apr 3, 2021

0.1.4

Mar 31, 2021

0.1.3

Mar 18, 2021

0.1.2

May 22, 2020

0.1.1

May 18, 2020

0.1.0

May 18, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cem-0.1.5.tar.gz (11.4 kB view hashes)

Uploaded Apr 3, 2021 Source

Built Distribution

cem-0.1.5-py2.py3-none-any.whl (7.8 kB view hashes)

Uploaded Apr 3, 2021 Python 2 Python 3

Hashes for cem-0.1.5.tar.gz

Hashes for cem-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`517a91fae25b8e0c0f2624976e27e516325163bc61edd792cb82e976fce9b8ed`
MD5	`f214447b4115c02772b3f296f84ce633`
BLAKE2b-256	`b9f8ff9e5fe6d3156541188f3e2cece63eb185b892a26d86ba5b6738ef0350bc`

Hashes for cem-0.1.5-py2.py3-none-any.whl

Hashes for cem-0.1.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2ca32a5431c5a53417cb1feefc7fc243c805c42f8702c5012ac237ca0bf0b78`
MD5	`3b58623aca0c35b9a0d09acd678a6d1a`
BLAKE2b-256	`bbd1d4a844adb6a522f47293884ad306adee5d451d2bb7c65b5a180d1d1d8eaf`