Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab.
  • The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
  • Splink package is extensively being used to run core linkage processes.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process.

How to install

pip install healmatcher

How to use (example)

# Install package
!pip install healmatcher

# Load package
from hm import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVID

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Follow up

  • Please visit our repo if you have any questions.

Webpage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.14.tar.gz (2.4 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.14-py3-none-any.whl (2.1 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.14.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.14.tar.gz
  • Upload date:
  • Size: 2.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.14.tar.gz
Algorithm Hash digest
SHA256 3029547f29a8343d834618f8b716132ef761726ad3e7aa3191782225eedd3eb1
MD5 5c7e2709f236b0ef0e53232480879ea4
BLAKE2b-256 e50610156eaeeaaa116cde2ee379c9afe9f16f6bc57b710d7c2c2170a3964616

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.14-py3-none-any.whl.

File metadata

File hashes

Hashes for healmatcher-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 ef574b51b37249e1c5287d15663af0c74b8f802e95c935f774662ae750960f19
MD5 0a654d55fafbe7adc8a0ccce5c016060
BLAKE2b-256 5a32fedcad44117bf90f5d1a3fc0612dfd483424d7bc33d2b8d41776c3b2f5cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page