Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic matching package developed by NYULH HEAL Lab.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process

How to use (example)

pip install healmatcher

# Install package
!pip install healmatcher

# Load package
from hm import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVID

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Webpage

healmatcher

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.3.tar.gz (2.2 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.3-py3-none-any.whl (2.1 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.3.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.3.tar.gz
  • Upload date:
  • Size: 2.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.6

File hashes

Hashes for healmatcher-0.0.3.tar.gz
Algorithm Hash digest
SHA256 41b32563ed87e109715e1f9249e860ee44360080e9e5da0b0e3c3b1d7d51d814
MD5 39e327f390400b453c5d90a3da33f6bb
BLAKE2b-256 2cab6294727899b7a3a0888f998dbb0f68244f39ae121df3430310861bcfdf2d

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: healmatcher-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 2.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.6

File hashes

Hashes for healmatcher-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d94a396a556fab49056f340cc7619ef89b2d76e4ae8eadca9a501cd6102da26b
MD5 ad3453ac948884a1d050697e0d3814e9
BLAKE2b-256 34dd415826860f0d01334138cc62bac7de0c668eaa4ef6424970a6ce4eec5f10

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page