Concept annotation tool for Electronic Health Records
Project description
Medical oncept Annotation Tool
A tool for extraction and linking (UMLS/SNOMED/...) of diseases/drugs/sympotms/... from Electronic Health Reacords or any other free text. Paper on arXiv.
Demo
A demo application is available at MedCAT. Please note that this was trained on MedMentions and contains a very small portion of UMLS (<1%).
Tutorial
A guide on how to use MedCAT is available in the tutorial folder. Read more about MedCAT on Towards Data Science.
Papers
Install using PIP
- Install MedCAT
pip install --upgrade medcat
- Get the scispacy models:
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz
-
Downlad the Vocabulary and CDB from the Models section bellow
-
Quickstart:
from medcat.cat import CAT
from medcat.utils.vocab import Vocab
from medcat.cdb import CDB
vocab = Vocab()
# Load the vocab model you downloaded
vocab.load_dict('<path to the vocab file>')
# Load the cdb model you downloaded
cdb = CDB()
cdb.load_dict('<path to the cdb file>')
# create cat
cat = CAT(cdb=cdb, vocab=vocab)
# Test it
text = "My simple document with kidney failure"
doc_spacy = cat(text)
# Print detected entities
print(doc_spacy.ents)
# Or to get a json
import json
doc_json = json.loads(cat.get_json(text))
print(doc_json)
Models
A basic trained model is made public for the vocabulary and CDB. It is trained for the ~ 35K concepts available in MedMentions
. It is quite limited
so the performance might not be the best.
Vocabulary Download - Built from MedMentions
CDB Download - Built from MedMentions
(Note: This is was compiled from MedMentions and does not have any data from NLM as that data is not publicaly available.)
Acknowledgement
Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS
The vocabulary was compiled from Wiktionary In total ~ 800K unique words
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.