Skip to main content

graphical and continuous representations of ICD-9 and ICD-10 codes

Project description

PyPI version fury.io Documentation Status Downloads DOI

This is experimental software and a stable API is not expected until version 1.0

What is it?

A python library for building vector representations of ICD-9 and ICD-10 codes. Because it takes advantage of the hierarchical nature of ICD codes, it also provides these hierarchies in a networkx format.

Motivation

icdcodex was the first prize winner in the Data Driven Healthcare Track of John Hopkins' MedHacks 2020. It was hacked together to address the problem of ICD miscodes, which is a major issue for health insurance in the United States. Indeed, while ICD coding is tedious and labour intensive, it is not obvious how to automate because the output space is enourmous. For example, ICD-10 CM (clinical modification) has over 70,000 codes and growing.

There are many strategies for target encoding that address these issues. icdcodex has two features that make ICD classification more amenable to modeling:

  • Access to a networkx tree representation of the ICD-9 and ICD-10 hierarchies
  • Vector embeddings of ICD codes using the node2vec algorithm (including pre-computed embeddings and an interface to create new embeddings)

Example Code

from icdcodex import icd2vec, hierarchy
embedder = icd2vec.Icd2Vec(num_embedding_dimensions=64)
embedder.fit(*hierarchy.icd9())
X = get_patient_covariates()
y = embedder.to_vec(["0010"])  # Cholera due to vibrio cholerae

In this case, y is a 64-dimensional vector close to other Infectious And Parasitic Diseases codes.

Related Work

The Hackathon Team

  • Jeremy Fisher (Maintainer)
  • Alhusain Abdalla
  • Natasha Nehra
  • Tejas Patel
  • Hamrish Saravanakumar

Documentation

See the full documentation: https://icd-codex.readthedocs.io/en/latest/

Contributions

Contributions are always welcome!

History

0.4.9, 0.5.0 and 0.5.1 (2024-01-08)

  • Added 2024 ICD-10-CM
  • Fixed a bug where the node descriptions were being discarded
  • Regenerated the hierarchy files with the descriptions

0.4.8 (2023-05-02)

  • Use the newer scikit-learn PyPI location
  • Deprecate the Python 3.6 version
  • Add 2023 ICD-10-CM

0.4.6 (2022-10-13)

  • Add 2022 ICD-10-CM (thank you @keithcallenberg!)

0.4.4 and 0.4.5 (2020-10-18)

  • Add the code descriptions for ICD9
  • Add usage on how to recapitulate functionality of sirrice/icd9
  • Make the hierarchy directed to allow simpler and more intuitive traversal
  • Fix issue where edges were not being formed between "Diseases Of The Blood And Blood-Forming Organs" and "Congenital Anomalies" and their children

0.4.3 (2020-10-04)

  • Fix issue where hierarchy jsons were not being shipped with the pypi distribution

0.4.2 (2020-10-03)

  • Add support for python <= 3.8 in the hierarchy module by using the importlib.resources backport

0.4.1 (2020-09-11)

  • Update PyPI metadata

0.4.0 (2020-09-11)

  • ICD-10-CM (2019 to 2020) codes are now fully present (whereas hackathon version missed certain codes)
  • Versions of the ICD 9 and ICD-10-CM hierarchies are now cached to the data module
  • Changed the hierarchy API: hierarchy.icd9hierarchy() is now hierarchy.icd9(). Ditto for ICD-10-CM.

0.3.0 (2020-09-05)

  • Finesse API, now consistent between documentation and implementation

0.1.0 (2020-09-04)

  • First release on PyPI, testing the waters during hackathon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icdcodex-0.5.1.tar.gz (28.8 MB view details)

Uploaded Source

Built Distribution

icdcodex-0.5.1-py2.py3-none-any.whl (5.8 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file icdcodex-0.5.1.tar.gz.

File metadata

  • Download URL: icdcodex-0.5.1.tar.gz
  • Upload date:
  • Size: 28.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for icdcodex-0.5.1.tar.gz
Algorithm Hash digest
SHA256 64c6ef286f7daf60a4937a2f1e3678c6e8b92b977ac67b04445aa217dd02fe0a
MD5 80ec276922e2ab1365f75c2b4c3f3715
BLAKE2b-256 1ca0b47d3a7f855fe4875af28d06498166719fb71d155cd9144e9600257405fe

See more details on using hashes here.

File details

Details for the file icdcodex-0.5.1-py2.py3-none-any.whl.

File metadata

  • Download URL: icdcodex-0.5.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.8 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for icdcodex-0.5.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f3d39d6a054db76d8fe65ff31f0b16209ed48f71181f58383729b49cbd14d02f
MD5 881a9e3fc516c60b1e52450ff73022a8
BLAKE2b-256 186fcf46612df8b2708b9b2ee814883c1abbc74bfc9d44b02adeeb0b415656a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page