Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab.
  • The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
  • Splink package is extensively being used to run core linkage processes.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process.

How to install

pip install healmatcher

How to use (example)

# Install package
!pip install healmatcher

# Load package
from healmatcher import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_training_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Updates

  • use_save_model=True : Load pre-trained model to run matching
  • save_model_path = PATH : add path to load a model (json format)
  • export_model=True : argument to save current model
  • export_model_path=PATH : add path to save current model

Follow up

  • Please visit our repo if you have any questions.

Webpage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.35.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.35-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.35.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.35.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.35.tar.gz
Algorithm Hash digest
SHA256 888e7c752b923c194617990c81601104ae4b2fd990b22b410b9dcdfa82e07803
MD5 d9299ac72ec386a6590414566214a706
BLAKE2b-256 b45f8ba0f3f3f1c2a93cc72642cd3722688fd59f0dd28168beb537a3e1b77809

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.35-py3-none-any.whl.

File metadata

File hashes

Hashes for healmatcher-0.0.35-py3-none-any.whl
Algorithm Hash digest
SHA256 a0240ddb9d38dc0489814185cc5c86935020bc773a2cab788279e80805a5ff8b
MD5 9689b98f32074bcc2e8dde9c9cc67c56
BLAKE2b-256 0e66653e6eb3f2cb8d7ef67b1ce8e9ae5ab7acb01073d0d1099e5c80e993b3b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page