Fast and simple probabilistic data matching package
Project description
healmatcher
healmatcher
is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab.- The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
Splink package
is extensively being used to run core linkage processes.- Currently, the model supports 4 variables (
sex
,date of birth
,last 4 digits of ssn
, andfirst 2 letters of last name
) to run the linkage process.
How to install
pip install healmatcher
How to use (example)
# Install package
!pip install healmatcher
# Load package
from healmatcher import hm
# create example dataset
testa = pd.DataFrame({
'sex':[1,2,1,2,1,2,1,2,1,2],
'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
'sex':[2,2,1,1,1,2,1,2,1,1],
'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
# Run matching
hm(
df_a = testa,
df_b = testb,
col_a=['sex','dob','ssn','ln'],
col_b=['sex','dob','ssn','ln'],
match_prob_threshold = 0.001,
iteration = 20,
model2 = True,
blocking_rule_for_training_input = 'PROVIDER_NUMBER',
onetoone = True,
match_summary = True
)
Updates
use_save_model=True
: Load pre-trained model to run matchingsave_model_path = PATH
: add path to load a model (json format)export_model=True
: argument to save current modelexport_model_path=PATH
: add path to save current model
Follow up
- Please visit our repo if you have any questions.
Webpage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
healmatcher-0.0.25.tar.gz
(5.6 kB
view details)
Built Distribution
File details
Details for the file healmatcher-0.0.25.tar.gz
.
File metadata
- Download URL: healmatcher-0.0.25.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a22c481bbd9473519407fc1979281b9c8b1c808b010b8dc83294404ebf9373f |
|
MD5 | ab45030b09ec6b2621275f9ef0bcbd42 |
|
BLAKE2b-256 | 01a810c4ab8dabc64fc5723372ae62436189593d26dc4962536bf3fa140cd002 |
File details
Details for the file healmatcher-0.0.25-py3-none-any.whl
.
File metadata
- Download URL: healmatcher-0.0.25-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 864001b8d6f235113bbd0174f9def5d6226cd6c4fdab4f67345d1c1fe2879eff |
|
MD5 | 43f64f719295244b5246eb8e4c921b67 |
|
BLAKE2b-256 | ab6456dcd82819724da35634466669785980b5ed313c499892e9eb52f1e9a083 |