Concept annotation tool for Electronic Health Records
Project description
Medical oncept Annotation Tool
MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Paper on arXiv.
Official Docs here
Discussion Forum discourse
News
- New Downloader [15. March 2022]: You can now download the latest SNOMED-CT and UMLS model packs via UMLS user authentication.
- New Feature and Tutorial [7. December 2021]: Exploring Electronic Health Records with MedCAT and Neo4j
- New Minor Release [20. October 2021] Introducing model packs, new faster multiprocessing for large datasets (100M+ documents) and improved MetaCAT.
- New Release [1. August 2021]: Upgraded MedCAT to use spaCy v3, new scispaCy models have to be downloaded - all old CDBs (compatble with MedCAT v1) will work without any changes.
- New Feature and Tutorial [8. July 2021]: Integrating 🤗 Transformers with MedCAT for biomedical NER+L
- General [1. April 2021]: MedCAT is upgraded to v1, unforunately this introduces breaking changes with older models (MedCAT v0.4), as well as potential problems with all code that used the MedCAT package. MedCAT v0.4 is available on the legacy branch and will still be supported until 1. July 2021 (with respect to potential bug fixes), after it will still be available but not updated anymore.
- Paper: What’s in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization
- (more...)
Demo
A demo application is available at MedCAT. This was trained on MIMIC-III and all of SNOMED-CT.
Tutorials
A guide on how to use MedCAT is available at MedCAT Tutorials. Read more about MedCAT on Towards Data Science.
Available Models
Available models here
Acknowledgements
Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS
The vocabulary was compiled from Wiktionary In total ~ 800K unique words
Powered By
A big thank you goes to spaCy and Hugging Face - who made life a million times easier.
Citation
@ARTICLE{Kraljevic2021-ln,
title="Multi-domain clinical natural language processing with {MedCAT}: The Medical Concept Annotation Toolkit",
author="Kraljevic, Zeljko and Searle, Thomas and Shek, Anthony and Roguski, Lukasz and Noor, Kawsar and Bean, Daniel and Mascio, Aurelie and Zhu, Leilei and Folarin, Amos A and Roberts, Angus and Bendayan, Rebecca and Richardson, Mark P and Stewart, Robert and Shah, Anoop D and Wong, Wai Keong and Ibrahim, Zina and Teo, James T and Dobson, Richard J B",
journal="Artif. Intell. Med.",
volume=117,
pages="102083",
month=jul,
year=2021,
issn="0933-3657",
doi="10.1016/j.artmed.2021.102083"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for zensols.medcat-1.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e68d8187590cc94256ff01b9c36124a2157730e83f736585e47cd7641a6cea36 |
|
MD5 | 5d22e4c52795b3e6415ad4a0f5e7659e |
|
BLAKE2b-256 | 1e35eec323066deca3ad9a2be3d335ac7a0b77785ae3d77eee312344aae91453 |