Program to Calculate Optimal Propensity Score

These details have not been verified by PyPI

Project links

Homepage

Project description

Propensity Score Calculator

Estimate the Propensity Score in Python following Imbens and Rubin (2015a). The following additional methods are incorporated:

Strata based on the estimated propensity score Imbens and Rubin (2015a)
Suggested Maximum and Minimum values of the propensity score to maintain covariate balance through trimming Imbens and Rubin (2015b)
Matching (with/without replacement) based on the estimated propensity score Imbens and Rubin (2015c)

This package has been constructed with the end-user (likely a social science researcher) in mind. Several notable features that make this package unique:

Testing for important covariates/features in the propensity score is done in parallel (optional, default is True) which greatly reduces the time required for completion. Testing features is not required to utilize the matching and stratifying methods; see below for information about fitting the propensity score without testing any variables.
The matching algorithm has been optimized with a recursive-binary search function. This means iteration over the potential controls happens in O(logN) time. In testing, runtime for matching with replacement for 100k treated units and 500k control units was <5 seconds on my personal laptop with 8 cores. Matching without replacement for the same set was 1.75 minutes.

New Features in Most Recent Version

Faster solver for generating matches (see above).
Addition of caliper so that the user can make sure matches fall within some bandwidth in propensity score or log-odds units.
A feature that automatically standardizes variables for the user.

Installation

Use pip to install:

pip install propensityscore

Description

This package allows one to estimate the propensity score (the probability of being in the treated group) following the general methodology laid out in Imbens and Rubin (2015a).

Support currently exists for first and second order terms. The method estimates in 3 steps. The first is done by the user, the remaining are done by the code.

Step 1

Choose which covariates you think are relevant and should always be included in the propensity score equation. These will be in main_vars in the code.

Step 2

Add additional linear terms that will be tested. These are specified by test_vars. These will be selected one-by-one according to which gives the largest overall gain in log odds in the propensity score calculation. In each step we take the max such that the gain is greater than a predetermined value (the default is 1). Once no remaining variable gives a gain of at least one, the linear portion terminates.

Step 3

Quadratic and interaction terms are automatically generated from the main_vars and the test_vars and these are compared in the same way as the linear terms except the log odds must increase by a separate amount (default is 2.71).

Tips and Notes

If one would like to use all of the modules embedded in the propensity score class without testing any variables, feel free to fit the model with only main_vars and set the test_second_order=False. Alternatively, one could test second order combinations of all main_vars by setting that argument to True. Similarly, it is possible to only test first order variables by specifying them in test_vars.
You must have enough control units to match without replacement, the program will warn you otherwise.
If you want to employ a hybrid matching strategy (whereby you ensure that a particular covariate is matched on, and then within each set of covariates, the best match is chosen), you can do this by selecting an additional Series or DataFrame with the same index as the original data, and specifying match_covs in the matching module. If you specify a list, the values in this list must have been searched over in the original propensity score fit. Please note that you should only do this with categorical data; you cannot do a hybrid matching method with continuous data.

Example Use

The following is sample code to illustrate the use:

from propensityscore import PropensityScore
import sklearn.datasets
import numpy as np
import pandas as pd
X, y = sklearn.datasets.make_classification(n_samples=30000, n_features=3000,
                                            n_informative=100, n_redundant=0, n_repeated=0)

df = pd.DataFrame(X).iloc[:,:100]
df.columns=['x{}'.format(str(ii).zfill(3)) for ii in range(100)]

main_vars = ['x002','x003'] # we think these are important
test_vars = [x for x in df.columns if x not in main_vars]

df.loc[:,'treated'] = y

# take a sample of the treated units and all of the control units (for matching without replacement)
# df = df.loc[df.treated.eq(1),:].sample(1000).append(df.loc[df.treated.eq(0),:])

################################################################################
## Normal Use: Testing Potential Features
################################################################################

# initialize the Class
output = PropensityScore(outcome='treated', df=df, test_vars=test_vars,
                      main_vars=main_vars, test_second_order=True)

# The propensity score values are given in the pandas Series:
# We specify cutoffs of 5 and 1 for first and second order terms (in log-likelihood ratio improvements)
# We additionally exclude higher order terms involving variable x002. The result is stored in output.propscore

# fit the model
output.fit(cutoff_ord1=4,cutoff_ord2=8,exclude_vars_ord2=['x002'])

# to run on all cores in each test in the utilized scikit-learn logit regression, specify:
output.fit(cutoff_ord1=4,cutoff_ord2=8,exclude_vars_ord2=['x002'],n_jobs=-1)


# Alternatively, one could initialize and standardize all non-binary variables with
output = PropensityScore(outcome='treated', df=df, test_vars=test_vars,
                      main_vars=main_vars, test_second_order=True,standardize=True)
output.fit(cutoff_ord1=5,cutoff_ord2=1,exclude_vars_ord2=['x002'],solver='sag') # the sag solver can be faster.


################################################################################
## A Model that tests no variables so that additional tools can be utilized.
################################################################################

# The user may want to estimate a model by testing nothing so that they can still use the matching features.
# This is possible by specifying a list of covariates in main_vars, no test variables, and setting test_second_order=False.
# alternatively, one could test linear but not second order terms, or vice-versa.

model2 = PropensityScore(outcome='treated', df=df, test_vars=None, main_vars=main_vars, test_second_order=False)
model2.fit()

################################################################################
## Matching/Stratifying Modules
################################################################################

# To see the different strata calculated, you can reference where the result will be in output.strata
output.stratify()

# To trim, we can simply run
output.trim()

# Finally, we can match as follows (this specifies two matches for each control unit)
output.match(n_matches=2,replacement=True)

# we can specify an optional caliper in the following way:
output.match(replacement=True,caliper=.01,caliper_param='propscore')
# this will verify that the propensity score is no more than .01 apart between treated and control units.


# Imagine there is a covariate we want to additionally match on (multiple accepted)
cov = pd.Series(np.random.randint(0,4,size=len(df)),index=df.index)
output.match(n_matches=2,replacement=True,match_covs=cov)

References

Imbens, G., & Rubin, D. (2015a). Estimating the Propensity Score. In Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (pp. 281-308). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139025751.014

Imbens, G., & Rubin, D. (2015b). Trimming to Improve Balance in Covariate Distributions. In Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (pp. 359-374). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139025751.017

Imbens, G., & Rubin, D. (2015c). Matching to Improve Balance in Covariate Distributions. In Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (pp. 337-358). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139025751.016

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Dec 30, 2025

1.0.0

Jun 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

propensityscore-1.0.1.tar.gz (19.6 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

propensityscore-1.0.1-py3-none-any.whl (17.6 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file propensityscore-1.0.1.tar.gz.

File metadata

Download URL: propensityscore-1.0.1.tar.gz
Upload date: Dec 30, 2025
Size: 19.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.10

File hashes

Hashes for propensityscore-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`8f8495d3880c3c2bb139461c96efda0b1f4cd8ed2969277ab6548861de1140c0`
MD5	`96eb2e7f495ad40c1832d347a7eaad06`
BLAKE2b-256	`23d8590aead4508f923b13ebae8b17afaf6b178e28092afc6d380f39892de862`

See more details on using hashes here.

File details

Details for the file propensityscore-1.0.1-py3-none-any.whl.

File metadata

Download URL: propensityscore-1.0.1-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 17.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.10

File hashes

Hashes for propensityscore-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3af60fb8477a55a0efac354950c40b15860f34bb984f14cdcdbb80246c70dd80`
MD5	`004f3770ef8f37f35c53b15995b78a2d`
BLAKE2b-256	`ee83e012df235b40f548eac6057d2cbc0dab7c52f7554da6af6b89929f325d4f`

See more details on using hashes here.

propensityscore 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Propensity Score Calculator

New Features in Most Recent Version

Installation

Description

Step 1

Step 2

Step 3

Tips and Notes

Example Use

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes