Skip to main content

Concept annotation tool for Electronic Health Records

Project description

Medical oncept Annotation Tool

A tool for extraction and linking (UMLS/SNOMED/...) of diseases/drugs/sympotms/... from Electronic Health Reacords or any other free text. Paper on arXiv.

Demo

A demo application is available at MedCAT. Please note that this was trained on MedMentions and contains a very small portion of UMLS (<1%).

Tutorial

A guide on how to use MedCAT is available in the tutorial folder. Read more about MedCAT on Towards Data Science.

Papers

Treatment with ACE-inhibitors is not associated with early severe SARS-Covid-19 infection in a multi-site UK acute Hospital Trust

Install using PIP

  1. Install MedCAT

pip install --upgrade medcat

  1. Get the scispacy models:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz

  1. Downlad the Vocabulary and CDB from the Models section bellow

  2. Quickstart:

from medcat.cat import CAT
from medcat.utils.vocab import Vocab
from medcat.cdb import CDB 

vocab = Vocab()
# Load the vocab model you downloaded
vocab.load_dict('<path to the vocab file>')

# Load the cdb model you downloaded
cdb = CDB()
cdb.load_dict('<path to the cdb file>') 

# create cat
cat = CAT(cdb=cdb, vocab=vocab)

# Test it
text = "My simple document with kidney failure"
doc_spacy = cat(text)
# Print detected entities
print(doc_spacy.ents)

# Or to get a json
import json
doc_json = json.loads(cat.get_json(text))
print(doc_json)

Models

A basic trained model is made public for the vocabulary and CDB. It is trained for the ~ 35K concepts available in MedMentions. It is quite limited so the performance might not be the best.

Vocabulary Download - Built from MedMentions

CDB Download - Built from MedMentions

(Note: This is was compiled from MedMentions and does not have any data from NLM as that data is not publicaly available.)

Acknowledgement

Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS

The vocabulary was compiled from Wiktionary In total ~ 800K unique words

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medcat-0.3.2.8.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medcat-0.3.2.8-py3-none-any.whl (56.0 kB view details)

Uploaded Python 3

File details

Details for the file medcat-0.3.2.8.tar.gz.

File metadata

  • Download URL: medcat-0.3.2.8.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.0

File hashes

Hashes for medcat-0.3.2.8.tar.gz
Algorithm Hash digest
SHA256 0edbb025e9834e58d45c8fb34b55a056d057ad3a345eb0ec5e3bab635d0d1a99
MD5 b731b4860edfe48543d4b39716cdfab8
BLAKE2b-256 03a1c76908c7c86ada664ac6177198b8834a27452a4d459f3ee27c1b4deeb56c

See more details on using hashes here.

File details

Details for the file medcat-0.3.2.8-py3-none-any.whl.

File metadata

  • Download URL: medcat-0.3.2.8-py3-none-any.whl
  • Upload date:
  • Size: 56.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.0

File hashes

Hashes for medcat-0.3.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 739b1ac5bc3df81570a14c7392851a21f6d8454bfb0688f2e513525f84ce80d5
MD5 66774869806104c1822079760e875e17
BLAKE2b-256 9bef3a035fe879e314ec41cbbcaff7d9219cb627e5498621fc84eb56635b9939

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page