Skip to main content

Medical Concept Annotation Toolkit (v2)

Project description

Medical oncept Annotation Tool (version 2)

MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT, UMLS, or HPO (and potentially other ontologies). Original paper for v1 on arXiv.

Why MedCAT v2?

MedCAT v2 is a comprehensive refactor designed to improve modularity, flexibility, and maintainability. The core library is now lightweight, with optional extras (spaCy tokenization, MetaCAT, DeID, RelCAT) available as separate installable features—allowing you to install only what you need. This modular approach reduces dependencies, enables smaller installs, and provides better separation of concerns. Additionally, v2 reduces internal coupling with spaCy, allowing for alternative tokenizers and greater extensibility. The new architecture makes it easier to create custom components and addons, while improving code maintainability and preparing the foundation for future enhancements. For most users, single-threaded inference APIs remain unchanged, ensuring a smooth transition.

There's a number of breaking changes in MedCAT v2 compared to v1. When moving from v1 to v2, please refer to the migration guide. Details on breaking are outlined here.

Build Status Documentation Status Latest release

Official Docs here

Discussion Forum discourse

Available Models

We have 2 public v2 models available:

  1. SnomedCT UK Clinical edition 39.0 (Oct 2024) and UK Drug Extension 39.0 (July 2024) based model enriched with UMLS 2024AA; trained only on MIMIC-IV
  2. SnomedCT UK Clinical edition 40.2 (June 2025) and UK Drug Extension 40.3 (July 2024) based model enriched with UMLS 2024AA; trained only on MIMIC-IV

There are also a number of MedCAT v1 models available that can automatically be converted if required.

To download any of these models, please follow this link (or this link for API key based download) and sign into your NIH profile / UMLS license. You will then be redirected to the MedCAT model download form. Please complete this form and you will be provided a download link.

While we encourage you use MedCAT v2 and the models in that native format, if you download an older version MedCAT v2 will be able to load it and covnert it to the format it knows. However, the loading process will be considerably longerin those cases.

If you wish you can also convert the v1 models into the v2 format (see tutorial).

from medcat.utils.legacy import legacy_converter
from medcat.storage.serialisers import AvailableSerialisers
old_model = '<path to old v1 model>'
new_model_dir = '<dir to place new model in>'
legacy_converter.do_conversion(old_model_path, new_model_dir, AvailableSerialisers.dill)

OR

model_path = "models/medcat1_model_pack.zip"
new_model_folder = "models"  # file in this folder
! python -m  medcat.utils.legacy.legacy_converter $model_path $new_model_folder --verbose

News

  • New public 2024 and 2025 Snomed models were uploaded and made available 7. October 2025.
  • MedCAT 2.0.0 was released 18. August 2025.

Installation

MedCAT v2 has its first full release

pip install medcat

Do note that this installs only the core MedCAT v2. It does not necessary dependencies for spacy-based tokenizing or MetaCATs or DeID. However, all of those are supported as well. You can install them as follows:

pip install "medcat[spacy]" # for spacy-based tokenizer
pip install "medcat[meta-cat]"  # for MetaCAT
pip install "medcat[deid]"  # for DeID models
pip install "medcat[spacy,meta-cat,deid,rel-cat,dict-ner]"  # for all of the above

Installing plugins

MedCAT v2 supports external plugins that can provide new components (e.g. alternative NER models, addons, tokenizers) via Python entry points.

  • Curated plugins: The medcat.plugins.catalog module ships with a curated plugin catalog that can be updated from a remote JSON file.
  • Installer: The medcat.plugins.installer.PluginInstallationManager wraps a pip-based installer and knows how to resolve a compatible plugin version for your current MedCAT version.
  • CLI: You can install curated plugins directly from the command line:
python -m medcat plugins install medcat-gliner

This will:

  • look up medcat-gliner in the curated catalog,
  • resolve a version compatible with your installed MedCAT,
  • and install it using pip.

You can also:

  • pass --dry-run to show what would be installed without making changes:

    python -m medcat plugins install --dry-run medcat-gliner
    
  • override the version/ref explicitly (e.g. when testing a branch or tag):

    python -m medcat plugins install medcat-gliner --force-version main
    

If a plugin requires authentication (for example, private Git repositories), MedCAT will log a warning and the installer will surface pip’s error messages if credentials are missing or incorrect.

Version / update checking

MedCAT now has the ability to check for newer versions of itself on PyPI (or a local mirror of it). This is so users don't get left behind too far with older versions of our software. This is configurable by evnironmental variables so that sys admins (e.g for JupyterHub) can specify the settings they wish. Version checks are done once a week and the results are cached.

Below is a table of the environmental variables that govern the version checking and their defaults.

Variable Default Description
MEDCAT_DISABLE_VERSION_CHECK (unset) When set to true, yes or disable, disables the version update check entirely. Useful for CI environments, offline setups, or deployments where external network access is restricted.
MEDCAT_PYPI_URL https://pypi.org/pypi Base URL used to query package metadata. Can be changed to a PyPI mirror or internal repository that exposes the /pypi/{pkg}/json API.
MEDCAT_MINOR_UPDATE_THRESHOLD 3 Number of newer minor versions (e.g. 1.4.x, 1.5.x) that must exist before MedCAT emits a “newer version available” log message.
MEDCAT_PATCH_UPDATE_THRESHOLD 3 Number of newer patch versions (e.g. 1.3.1, 1.3.2, 1.3.3) on the same minor line required before emitting an informational update message.
MEDCAT_VERSION_UPDATE_LOG_LEVEL INFO Logging level used when reporting available newer versions (minor/patch thresholds). Accepts any valid logging level string (DEBUG, INFO, WARNING, ERROR, CRITICAL).
MEDCAT_VERSION_UPDATE_YANKED_LOG_LEVEL WARNING Logging level used when reporting that the current version has been yanked on PyPI. Accepts the same values as above.

Demo

The MedCAT v2 demo web app is available here.

Key Concepts

  • Components: The building blocks of MedCAT (NER, Entity Linking, preprocessing, etc.)
  • Addons: Components that extend the core NER+EL pipeline with additional processing stages
  • Plugins: External packages that provide new component implementations or other functionality via entry points

See Architecture Documentation for detailed information.

Tutorials

A guide on how to use MedCAT v2 is available at on the medcat documentation page on docs.cogstack.org

Acknowledgements

Entity extraction was trained on MedMentions In total it has ~ 35K entites from UMLS

The vocabulary was compiled from Wiktionary In total ~ 800K unique words

Powered By

A big thank you goes to spaCy and Hugging Face - who made life a million times easier.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medcat-2.8.6.tar.gz (960.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medcat-2.8.6-py3-none-any.whl (317.5 kB view details)

Uploaded Python 3

File details

Details for the file medcat-2.8.6.tar.gz.

File metadata

  • Download URL: medcat-2.8.6.tar.gz
  • Upload date:
  • Size: 960.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for medcat-2.8.6.tar.gz
Algorithm Hash digest
SHA256 241d85ed546711ed0bb427b49e1fc02824f9310bca0ca3ad8f0e4527d4697b31
MD5 2c80f8c74f0e14104e083da54d0bf515
BLAKE2b-256 9bb474f88348bbe1eeaddb0c86c45d8f7e9be81409644faf5b59a98b4979805a

See more details on using hashes here.

Provenance

The following attestation bundles were made for medcat-2.8.6.tar.gz:

Publisher: medcat-v2_release.yml on CogStack/cogstack-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medcat-2.8.6-py3-none-any.whl.

File metadata

  • Download URL: medcat-2.8.6-py3-none-any.whl
  • Upload date:
  • Size: 317.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for medcat-2.8.6-py3-none-any.whl
Algorithm Hash digest
SHA256 726dfdd9fc1afafbce5ace3cef58412d0de40d6ef89767bb162ad77cb1386e74
MD5 8062b2f999322bbd17814590cedb68e0
BLAKE2b-256 06b03411e675d004f5823ef50fa6804cd73e6185f1445c82ac578a4cd7dc2582

See more details on using hashes here.

Provenance

The following attestation bundles were made for medcat-2.8.6-py3-none-any.whl:

Publisher: medcat-v2_release.yml on CogStack/cogstack-nlp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page