Skip to main content

Fast and simple probabilistic data matching package

Project description

healmatcher

  • healmatcher is a simple but fast probabilistic matching package developed by NYULH HEAL Lab.
  • Currently, the model supports 4 variables (sex, date of birth, last 4 digits of ssn, and first 2 letters of last name) to run the linkage process

How to use (example)

pip install healmatcher

# Install package
!pip install healmatcher

# Load package
from hm import hm

# create example dataset
testa = pd.DataFrame({
    'sex':[1,2,1,2,1,2,1,2,1,2],
    'dob':['2012-1-1','2011-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ss','zz','rr','ww','wa','tr','tt','hh','gq'],
    'PROVIDER_NUMBER':[2,1,1,1,1,1,1,1,2,1]
})
testb = pd.DataFrame({
    'sex':[2,2,1,1,1,2,1,2,1,1],
    'dob':['2012-1-1','2001-12-1','1999-1-1','1998-11-1','2012-11-1','1984-1-1','1982-1-1','1975-1-1','1967-1-1','1954-1-1'],
    'ssn':[1111,2222,3333,4444,5555,6666,7777,8888,9999,1010],
    'ln':["as",'ls','zz','rr','wb','wa','tr','tt','ha','gq'],
    'PROVID

# Run matching
hm(
    df_a = testa,
    df_b = testb,
    col_a=['sex','dob','ssn','ln'],
    col_b=['sex','dob','ssn','ln'],
    match_prob_threshold = 0.001,
    iteration = 20,
    model2 = True,
    blocking_rule_for_input = 'PROVIDER_NUMBER',
    onetoone = True,
    match_summary = True
)

Webpage

healmatcher

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healmatcher-0.0.4.tar.gz (2.3 kB view details)

Uploaded Source

Built Distribution

healmatcher-0.0.4-py3-none-any.whl (2.1 kB view details)

Uploaded Python 3

File details

Details for the file healmatcher-0.0.4.tar.gz.

File metadata

  • Download URL: healmatcher-0.0.4.tar.gz
  • Upload date:
  • Size: 2.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.4.tar.gz
Algorithm Hash digest
SHA256 f4bffadb9dd9e1fe3d9ab96cfc52003398689a2ef5430bc4b428624cde3ef3ac
MD5 7d39312c64dfb42dd3ab4fbd2bc7d19a
BLAKE2b-256 63286c11914bae4f0fc0adf26ad46ff5df91db161228e069779316d38e9d4438

See more details on using hashes here.

File details

Details for the file healmatcher-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: healmatcher-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 2.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.0

File hashes

Hashes for healmatcher-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b2cb1014fe612df5d1ab4c52eb16442ef661c4265b0bea456154ce4b73b04bb0
MD5 ae3d49c91ec5580714b5397cb492812c
BLAKE2b-256 4fe78c6a0a02f833e8dcc404431b0d1a33412686e8e3f16f7a0eb6eb112d7801

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page