Skip to main content

Concept annotation tool for Electronic Health Records

Project description

Medical oncept Annotation Tool

Build Status Documentation Status Latest release pypi Version

MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Paper on arXiv.

Official Docs here

Discussion Forum discourse

Available Models

We have 4 public models available:

  1. UMLS Small (A modelpack containing a subset of UMLS (disorders, symptoms, medications...). Trained on MIMIC-III)
  2. SNOMED International (Full SNOMED modelpack trained on MIMIC-III)
  3. UMLS Dutch v1.10 (a modelpack provided by UMC Utrecht containing UMLS entities with Dutch names trained on Dutch medical wikipedia articles and a negation detection model repository/paper trained on EMC Dutch Clinical Corpus).
  4. UMLS Full. >4MM concepts trained self-supervsied on MIMIC-III. v2022AA of UMLS.

To download any of these models, please follow this link and sign into your NIH profile / UMLS license. You will then be redirected to the MedCAT model download form. Please complete this form and you will be provided a download link.

News

Installation

To install the latest version of MedCAT run the following command:

pip install medcat

Normal installations of MedCAT will install torch-gpu and all relevant dependancies (such as CUDA). This can require as much as 10 GB more disk space, which isn't required for CPU only usage.

To install the latest version of MedCAT without torch GPU support run the following command:

pip install medcat --extra_index_url https://download.pytorch.org/whl/cpu/

Demo

A demo application is available at MedCAT. This was trained on MIMIC-III and all of SNOMED-CT. PS: This link can take a long time to load the first time around. The machine spins up as needed and spins down when inactive.

Tutorials

A guide on how to use MedCAT is available at MedCAT Tutorials. Read more about MedCAT on Towards Data Science.

Logging

Since MedCAT is primarily a library, logging has been effectively disabled by default. The idea is that the user of the library should have the choice of what, where, and how to log the information from a specific library they are using.

The idea is that the user can directly modify the logging behaviour of either the entire library or a certain set of modules within as they wish. We have provided a convenience method to add default handlers that log into the console as well as medcat.log (medcat.add_default_log_handlers).

Some details as to how one can configure the logging are described in the MedCAT Tutorials.

Acknowledgements

Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS

The vocabulary was compiled from Wiktionary In total ~ 800K unique words

Powered By

A big thank you goes to spaCy and Hugging Face - who made life a million times easier.

Citation

@ARTICLE{Kraljevic2021-ln,
  title="Multi-domain clinical natural language processing with {MedCAT}: The Medical Concept Annotation Toolkit",
  author="Kraljevic, Zeljko and Searle, Thomas and Shek, Anthony and Roguski, Lukasz and Noor, Kawsar and Bean, Daniel and Mascio, Aurelie and Zhu, Leilei and Folarin, Amos A and Roberts, Angus and Bendayan, Rebecca and Richardson, Mark P and Stewart, Robert and Shah, Anoop D and Wong, Wai Keong and Ibrahim, Zina and Teo, James T and Dobson, Richard J B",
  journal="Artif. Intell. Med.",
  volume=117,
  pages="102083",
  month=jul,
  year=2021,
  issn="0933-3657",
  doi="10.1016/j.artmed.2021.102083"
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medcat-1.14.0.tar.gz (10.9 MB view details)

Uploaded Source

Built Distribution

medcat-1.14.0-py3-none-any.whl (240.7 kB view details)

Uploaded Python 3

File details

Details for the file medcat-1.14.0.tar.gz.

File metadata

  • Download URL: medcat-1.14.0.tar.gz
  • Upload date:
  • Size: 10.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for medcat-1.14.0.tar.gz
Algorithm Hash digest
SHA256 bd769677b14a3a4ac5abb7e84bd7984c2643433a7f7d6f3d68d6826373fdc27a
MD5 fac8475ef2f018b70d08cd9f3efec570
BLAKE2b-256 f6f88cdfccc05db6711398940244a3c0c663bbcf18ea124ad2c712aac027d184

See more details on using hashes here.

File details

Details for the file medcat-1.14.0-py3-none-any.whl.

File metadata

  • Download URL: medcat-1.14.0-py3-none-any.whl
  • Upload date:
  • Size: 240.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for medcat-1.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fbc1edfa7c430f0f77ed83afec8132ad3793aeca17e9052b8c8beb54589e2e9
MD5 788919acd570cb02244975a7ecbeb29a
BLAKE2b-256 1f9a07e5b93e769206ee17fd8811b4882923ce8e153f43fce3ccf3120a3a82dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page