Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab.
  • The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
  • Splink package is extensively being used to run core linkage processes.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process.

How to install

pip install healmatcher

How to use (example)

# Install package
!pip install healmatcher

# Load package
from healmatcher import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_training_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Updates

  • use_save_model=True : Load pre-trained model to run matching
  • save_model_path = PATH : add path to load a model (json format)
  • export_model=True : argument to save current model
  • export_model_path=PATH : add path to save current model

Follow up

  • Please visit our repo if you have any questions.

Webpage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.45.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.45-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.45.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.45.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.45.tar.gz
Algorithm Hash digest
SHA256 928d9dedb0fe3591d0b2a8ee23ca629c60b20151e4e0d5b98aba233575592284
MD5 e5003ec85b15d2f62d572694f9304273
BLAKE2b-256 d2beb6f6ceda6fde88cc5d7443cbab2dcf4ad02d572ebc3f6f13bff56c88509d

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.45-py3-none-any.whl.

File metadata

File hashes

Hashes for healmatcher-0.0.45-py3-none-any.whl
Algorithm Hash digest
SHA256 f2c465bd716b6e1ccf023d32d820ff78544d95aa7662d7dfc247b140d10d5f38
MD5 fc89a88b59a53149c515892309acc79a
BLAKE2b-256 dbe555950b5dfa84eb3857905f481b5b4407634ed17982f2bae3830a5ee2006e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page