Skip to main content

Concept annotation tool for Electronic Health Records

Project description

Medical oncept Annotation Tool

MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT and UMLS. Preprint research-gate.

Demo [currently not available]

A demo application is available at MedCAT. Please note that this was trained on MedMentions and contains a very small portion of UMLS (<1%).

Tutorial

A guide on how to use MedCAT is available in the tutorial folder. Read more about MedCAT on Towards Data Science.

Papers that use MedCAT

Related Projects

  • MedCATtrainer - an interface for building, improving and customising a given Named Entity Recognition and Linking (NER+L) model (MedCAT) for biomedical domain text.
  • MedCATservice - implements the MedCAT NLP application as a service behind a REST API.
  • iCAT - A docker container for CogStack/MedCAT/HuggingFace development in isolated environments.

Install using PIP (Requires Python 3.6.1+)

  1. Install MedCAT

pip install --upgrade medcat

  1. Get the scispacy models:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz

  1. Downlad the Vocabulary and CDB from the Models section bellow

  2. Quickstart:

from medcat.cat import CAT
from medcat.utils.vocab import Vocab
from medcat.cdb import CDB 

vocab = Vocab()
# Load the vocab model you downloaded
vocab.load_dict('<path to the vocab file>')

# Load the cdb model you downloaded
cdb = CDB()
cdb.load_dict('<path to the cdb file>') 

# create cat
cat = CAT(cdb=cdb, vocab=vocab)

# Test it
text = "My simple document with kidney failure"
doc_spacy = cat(text)
# Print detected entities
print(doc_spacy.ents)

# Or to get an array of entities, this will return much more information
#and usually easier to use unless you know a lot about spaCy
doc = cat.get_entities(text)
print(doc)

Models

A basic trained model is made public for the vocabulary and CDB. It is trained for the ~ 35K concepts available in MedMentions. It is quite limited so the performance might not be the best.

Vocabulary Download - Built from MedMentions

CDB Download - Built from MedMentions

(Note: This is was compiled from MedMentions and does not have any data from NLM as that data is not publicaly available.)

SNOMED-CT and UMLS

If you have access to UMLS or SNOMED-CT and can provide some proof (a screenshot of the UMLS profile page is perfect, feel free to redact all information you do not want to share), contact us - we are happy to share the pre-built CDB and Vocab for those databases.

Acknowledgement

Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS

The vocabulary was compiled from Wiktionary In total ~ 800K unique words

Powered By

A big thank you goes to spaCy and Hugging Face - who made life a million times easier.

Citation

@misc{kraljevic2019medcat,
    title={MedCAT -- Medical Concept Annotation Tool},
    author={Zeljko Kraljevic and Daniel Bean and Aurelie Mascio and Lukasz Roguski and Amos Folarin and Angus Roberts and Rebecca Bendayan and Richard Dobson},
    year={2019},
    eprint={1912.10166},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medcat-0.3.9.9.2.tar.gz (60.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medcat-0.3.9.9.2-py3-none-any.whl (70.4 kB view details)

Uploaded Python 3

File details

Details for the file medcat-0.3.9.9.2.tar.gz.

File metadata

  • Download URL: medcat-0.3.9.9.2.tar.gz
  • Upload date:
  • Size: 60.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.0

File hashes

Hashes for medcat-0.3.9.9.2.tar.gz
Algorithm Hash digest
SHA256 737c6b4521b645de191f8ee721dded123f713e10ba2bd940836cad54670b9812
MD5 24efcbdae7c2db89ffac3e9011431884
BLAKE2b-256 54b9db57f398e758f74c16b86eb93678c0e5b47177dd955589a43170b7773759

See more details on using hashes here.

File details

Details for the file medcat-0.3.9.9.2-py3-none-any.whl.

File metadata

  • Download URL: medcat-0.3.9.9.2-py3-none-any.whl
  • Upload date:
  • Size: 70.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.0

File hashes

Hashes for medcat-0.3.9.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 db5824c2a869a0b7a789e6ac02a6f0354d38c088a052263555d85b7918248c49
MD5 9fbcf0ab5e1e592c74bac82ef47fc68f
BLAKE2b-256 75708781a9c5968f0d0869d380e93bd4a8e5c0e9006ede858926f0a6066d7fef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page