Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic matching package developed by NYULH HEAL Lab.
  • The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
  • Splink package is extensively being used to run core linkage processes.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process.

How to install

pip install healmatcher

How to use (example)

# Install package
!pip install healmatcher

# Load package
from hm import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVID

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Follow up

  • Please visit our repo if you have any questions.

Webpage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.5.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.5-py3-none-any.whl (2.3 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.5.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.5.tar.gz
  • Upload date:
  • Size: 2.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.5.tar.gz
Algorithm Hash digest
SHA256 61f2a93256892ec37496fd97e4bd9128e631cebd196a0bf6f9ed0ee70dd0e69d
MD5 a68eaa9cd6f71adf7081350e4f5515eb
BLAKE2b-256 03ca0a432c33485fa0d11869ca80d8dabff89c7fe1716cb0ca6fa4462497dece

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: healmatcher-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 2.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7d030d09324ca5854f6f1faf4a36c5885ba56f587a8abbce85d18fceebe72ec5
MD5 06a1eeb92377be21c6f9e24c503e4b13
BLAKE2b-256 f11a4e1b5329072952b687d65462f662e8d0d1a7f4c5315c09f34bd91cb33746

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page