Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic matching package developed by NYULH HEAL Lab.
  • The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
  • Splink package is extensively being used to run core linkage processes.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process.

How to install

pip install healmatcher

How to use (example)

# Install package
!pip install healmatcher

# Load package
from hm import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVID

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Follow up

  • Please visit our repo if you have any questions.

Webpage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.8.tar.gz (2.4 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.8-py3-none-any.whl (2.1 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.8.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.8.tar.gz
  • Upload date:
  • Size: 2.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.8.tar.gz
Algorithm Hash digest
SHA256 d4fca909c83a988a3abde86d890b0116bdd690f78491f8e1965ad6b7489ce328
MD5 87d4eab7ab0ca7e28258d657cdb7b20d
BLAKE2b-256 fb20f7e572d8a74dc2d2241eac40c7a24750f7a2443bc7e3a1d84c4d5a0224f1

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: healmatcher-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 2.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 16cb3828ccd3f4b752ff985da923cf6f6a1a34ee809fe04c27aeb12fa893f495
MD5 7572deb5b9fd91b63ecc136d3a37a3d4
BLAKE2b-256 911bb47cee232e7ae3be51deeed5139d4d182a37593a0d4862262c645d002553

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page