graphical and continuous representations of ICD-9 and ICD-10 codes
Project description
icdcodex
was the first prize winner in the Data Driven Healthcare Track of John Hopkins' MedHacks 2020.
This is experimental software and a stable API is not expected until version 1.0
Motivation
Thousands of Americans are misquoted on their health insurance yearly due to ICD miscodes. While ICD coding is manual and laborous, it is difficult to automate by machine learning because the output space is enormous. For example, ICD-10 CM (clinical modification) has over 70,000 codes and growing. There are many strategies for label embedding that address these issues.
icdcodex
has two features that make ICD classification more amenable to modeling:
- Access to a
networkx
tree representation of the ICD9 and ICD10 hierarchies - Vector embeddings of ICD codes (including pre-computed embeddings and an interface to create new embeddings)
Example Code
from icdcodex import icd2vec, hierarchy
embedder = icd2vec.Icd2Vec(num_embedding_dimensions=64)
embedder.fit(*hierarchy.icd9())
X = get_patient_covariates()
y = embedder.to_vec(["001.0"]) # Cholera due to vibrio cholerae
In this case, y
is a 64-dimensional vector close to other Infectious And Parasitic Diseases
codes.
Related Work
- node2vec Paper, Website, Code, Alternate Code
- Learning Low-Dimensional Representations of Medical Concepts: Paper, Code
- Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes Paper
The Hackathon Team
- Jeremy Fisher (Maintainer)
- Alhusain Abdalla
- Natasha Nehra
- Tejas Patel
- Hamrish Saravanakumar
Documentation
See the full documentation: https://icd-codex.readthedocs.io/en/latest/
Contributions
Contributions are always welcome!
History
0.4.1 (2020-09-11)
- Update PyPI metadata
0.4.0 (2020-09-11)
- ICD-10-CM (2019 to 2020) codes are now fully present (whereas hackathon version missed certain codes)
- Versions of the ICD 9 and ICD-10-CM hierarchies are now cached to the
data
module - Changed the hierarchy API:
hierarchy.icd9hierarchy()
is nowhierarchy.icd9()
. Ditto for ICD-10-CM.
0.3.0 (2020-09-05)
- Finesse API, now consistent between documentation and implementation
0.1.0 (2020-09-04)
- First release on PyPI, testing the waters during hackathon
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for icdcodex-0.4.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81bb10e84402f43c8a5cc800cb28db6dae3cdee9f9690a904449229dce7cf4e0 |
|
MD5 | 63090b3157dfcb422786164c899d8be9 |
|
BLAKE2b-256 | a3c48873e5d23f72d44ef5bebc67e91869888cf7b7af520aa05f718a8c7d64e6 |