Propensity score matching for python and graphical plots

These details have not been verified by PyPI

Project links

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

`PsmPy`

Matching techniques for epidemiological observational studies as carried out in Python. Propensity score matching is a statistical matching technique used with observational data that attempts to ascertain the validity of concluding there is a potential causal link between a treatment or intervention and an outcome(s) of interest. It does so by accounting for a set of covariates between a binary treatment state (as would occur in a randomized control trial, either received the intervention or not), and control for potential confounding (covariates) in outcome measures between the treatment and control groups such as death, or length of stay etc. It is using this technique on observational data that we gain an insight into the effects or lack thereof of an interventional state.

Citing this work:

A. Kline and Y. Luo, PsmPy: A Package for Retrospective Cohort Matching in Python, 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2022, pp. 1354-1357, doi: 10.1109/EMBC48229.2022.9871333.

Integration with Jupyter Notebooks
Additional plotting functionality to assess balance before and after
A more modular, user-specified matching process
Ability to define 1:1 or 1:many matching

Installation

Install the package through pip:

$ pip install psmpy

Installation
Data Preparation
Predict Scores
Matching algorithm
Graphical Outputs
Extra Attributes
Cohen D Function
Conclusion

Data Prep

Import psmpy class and functions

# import relevant libraries
from psmpy import PsmPy
from psmpy.functions import cohenD
from psmpy.plotting import *
sns.set(rc={'figure.figsize':(10,8)}, font_scale = 1.3)

# read in your data
data = pd.read_csv(path)

Initialize PsmPy Class

Initialize the PsmPy class:

psm = PsmPy(df, treatment='treatment', indx='pat_id', exclude = [])

Note:

PsmPy - The class. It will use all covariates in the dataset unless formally excluded in the exclude argument.
df - the dataframe being passed to the class
exclude - (optional) parameter and will ignore any covariates (columns) passed to the it during the model fitting process. This will be a list of strings. Note, it is not necessary to pass the unique index column here. That process will be taken care of within the code after specifying your index column.
indx - required parameter that references a unique ID number for each case in the dataset.

Predict Scores

Calculate logistic propensity scores/logits:

psm.logistic_ps(balance = True)

Note:

balance - Whether the logistic regression will run in a balanced fashion, default = True.

There often exists a significant Class Imbalance in the data. This will be detected automatically in the software where the majority group has more records than the minority group. We account for this by setting balance=True when calling psm.logistic_ps(). This tells PsmPy to sample from the majority group when fitting the logistic regression model so that the groups are of equal size. This process is repeated until all the entries of the major class have been regressed on the minor class in equal paritions. This calculates both the logistic propensity scores and logits for each entry.

Review values in dataframe:

psm.predicted_data

Matching algorithm - version 1

Perform KNN matching.

psm.kdtree_matched(matcher='propensity_logit', replacement=False, caliper=None, drop_unmatched=True)

Note:

matcher - propensity_logit (default) and generated inprevious step alternative option is propensity_score, specifies the argument on which matching will proceed
replacement - False (default), determines whethermacthing will happen with or without replacement,when replacement is false matching happens 1:1
caliper - None (default), user can specify caliper size relative to std. dev of the control sample, restricting neighbors eligible to match within a certain distance.
drop_unmatched - True (default) In the event that indexes do not have a match due to caliper size it will remove them from the 'matched_df', 'matched_ids' and subsequent calculations of effect size

Matching algorithm - version 2

Perform KNN matching 1:many

psm.kdtree_matched_12n(matcher='propensity_logit', how_many=1)

Note:

matcher - propensity_logit (default) and generated inprevious step alternative option is propensity_score, specifies the argument on which matching will proceed
how_many - 1 (default) performs 1:n matching, where 'n' is specified by the user and matched the minor class 'n' times to the major class

Graphical Outputs

Plot the propensity score or propensity logits

Plot the distribution of the propensity scores (or logits) for the two groups side by side. Note that here the names are coded as 'treatment' and 'control' under the assumption that the majority class you are sampling from is the control group. If this is not the case you will need to flip the order of these.

psm.plot_match(Title='Side by side matched controls', Ylabel='Number ofpatients', Xlabel= 'Propensity logit', names = ['treatment', 'control'], colors=['#E69F00', '#56B4E9'] ,save=True)

Note:

title - 'Side by side matched controls' (default),creates plot title
Ylabel - 'Number of patients' (default), string, labelfor y-axis
Xlabel - 'Propensity logit' (default), string, label for x-axis
names - ['treatment', 'control'] (default), list of strings for legend
colors - ['#E69F00', '#56B4E9'] (default) plotting colors default
save - False (default), saves the figure generated to current working directory if True

Plot the effect sizes

psm.effect_size_plot(title='Standardized Mean differences accross covariates before and after matching', before_color='#FCB754', after_color='#3EC8FB', save=False)

Note:

title - Title of the plot
before_color - color (hex) for before matching effect size
after_color - color (hex) for after macthing effect size
save - False (default), saves the figure generated tocurrent working directory if True

Extra Attributes

Other attributes available to user:

Matched IDs

psm.matched_ids

matched_ids - returns a dataframe of indicies from the minor class and their associated matched indice from the major class psm.

Major_ID	Minor_ID
6781	9432
3264	7624

Note: That not all matches will be unique if replacement=False

Matched Dataframe

psm.df_matched

df_matched - returns a subset of the original dataframe using indices that were matched. This works regardless of which matching protocol is used.

Effect sizes per variable

psm.effect_size

effect_size - returns dataframe with columns 'variable', 'matching' (before or after), and 'effect_size'

variable	matching	effect_size
hypertension	before	0.5
hypertension	after	0.01
age	7624	9432
age	7624	9432
sex	7624	9432

Note: The thresholds for a small, medium and large effect size were characterizedby Cohen in: J. Cohen, "A Power Primer", Quantitative Methods in Psychology, vol.111, no. 1, pp. 155-159, 1992

Relative Size	Effect Size
small	≤ 0.2
medium	≤ 0.5
large	≤0.8

Cohen D Function

A function to calculate effect size (Cohen D) can be imported alone should the user have a need for it. A floating point number is returned. This floating point number represents the effect size of a variable on a binary outcome.

from psmpy.functions import cohenD

cohenD(df, treatment, metricName)

df - dataframe with data under investigation
treatment - name of binary treatment/intervention under investigation
metricName - variable user wishes to check the influence of on treatment/intervention

Conclusion

This package offers a user friendly propensity score matching protocol created for a Python environment. In this we have tried to capture automatic figure generation, contextualization of the results and flexibility in the matching and modeling protocol to serve a wide base.

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.3.16

Nov 12, 2025

0.3.15

Nov 12, 2025

0.3.14

Jul 18, 2025

0.3.13

Jan 6, 2023

0.3.12

Jan 6, 2023

0.3.11

Jan 6, 2023

0.3.10

Jan 6, 2023

0.3.9

Dec 31, 2022

0.3.8

Dec 6, 2022

0.3.7

Dec 2, 2022

0.3.6

Nov 23, 2022

0.3.5

Aug 6, 2022

0.3.4

Aug 6, 2022

0.3.3

Aug 4, 2022

0.3.2

Jul 4, 2022

0.3.1

Jul 4, 2022

0.3.0

Jul 4, 2022

0.2.9

Jun 30, 2022

0.2.8

Apr 23, 2022

0.2.7

Apr 23, 2022

0.2.6

Apr 23, 2022

0.2.5

Mar 21, 2022

0.2.4

Mar 1, 2022

0.2.3

Mar 1, 2022

0.2.2

Mar 1, 2022

0.2.1

Mar 1, 2022

0.2.0

Mar 1, 2022

0.1.9

Jan 30, 2022

0.1.8

Jan 28, 2022

0.1.7

Jan 26, 2022

0.1.6

Jan 26, 2022

0.1.5

Jan 25, 2022

0.1.4

Jan 25, 2022

0.1.3

Jan 25, 2022

0.1.2

Jan 24, 2022

0.1.1

Jan 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psmpy-0.3.16.tar.gz (16.5 kB view details)

Uploaded Nov 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

psmpy-0.3.16-py3-none-any.whl (13.9 kB view details)

Uploaded Nov 12, 2025 Python 3

File details

Details for the file psmpy-0.3.16.tar.gz.

File metadata

Download URL: psmpy-0.3.16.tar.gz
Upload date: Nov 12, 2025
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for psmpy-0.3.16.tar.gz
Algorithm	Hash digest
SHA256	`67628f909cc7b72629a611ea09e88802342793c92b7dbd44b32e0afb535e7aa4`
MD5	`a8de0d1386696b8971dfdc7a4ce98315`
BLAKE2b-256	`6eee22c59aedbda61cc4d9c2e27212529849b20765fc177a0c156a5c111a2dfe`

See more details on using hashes here.

File details

Details for the file psmpy-0.3.16-py3-none-any.whl.

File metadata

Download URL: psmpy-0.3.16-py3-none-any.whl
Upload date: Nov 12, 2025
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for psmpy-0.3.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fbddf4572ac2b0ecc0a1215be3a45868e43030d6f56448b875af9b22c40b2e79`
MD5	`19d7350e452eec5df631c31287c7c8e9`
BLAKE2b-256	`14a56b8d2520f55a9b1a20ed15810e4175adfaed1a6b631f87868ce5ec18cdad`

See more details on using hashes here.

psmpy 0.3.16

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PsmPy

Citing this work:

Installation

Data Prep

Import psmpy class and functions

Initialize PsmPy Class

Predict Scores

Matching algorithm - version 1

Matching algorithm - version 2

Graphical Outputs

Plot the propensity score or propensity logits

Plot the effect sizes

Extra Attributes

Matched IDs

Matched Dataframe

Effect sizes per variable

Cohen D Function

Conclusion

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`PsmPy`