Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic data matching package developed by NYULH HEAL Lab.
  • The package is best optimized for matching healthcare database (e.g. EHR) as it has designed to link Medicaid and Client Database System data.
  • Splink package is extensively being used to run core linkage processes.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process.

How to install

pip install healmatcher

How to use (example)

# Install package
!pip install healmatcher

# Load package
from hm import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVID

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Follow up

  • Please visit our repo if you have any questions.

Webpage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.19.tar.gz (2.3 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.19-py3-none-any.whl (2.1 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.19.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.19.tar.gz
  • Upload date:
  • Size: 2.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.19.tar.gz
Algorithm Hash digest
SHA256 43891e2e807d0f10d8627166c037f8266dd242c3a33a8bf9dd20e250105c83de
MD5 583e67205e54959288d917a94b63c9a7
BLAKE2b-256 9833f57cb6aae094dd1fd5262929e0d70a126a9269e865a6ad5113c7c3154ea1

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.19-py3-none-any.whl.

File metadata

File hashes

Hashes for healmatcher-0.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 1a4a774369bb873c05d8a37f4b91bbb1552032df570bad00b0e397471dab15d2
MD5 03b7a0eda31b12a1aa949dbf6502e8ab
BLAKE2b-256 475925886d823ad5db0d0b104014b998c2cce5e71d65de83b54cc83c7da9b096

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page