Skip to main content

OpenMed delivers state-of-the-art biomedical and clinical LLMs that rival proprietary enterprise stacks, unifying model discovery, advanced extractions, and one-line orchestration.

Project description

OpenMed

Production-ready medical NLP toolkit powered by state-of-the-art transformers

Transform clinical text into structured insights with a single line of code. OpenMed delivers enterprise-grade entity extraction, assertion detection, and medical reasoning—no vendor lock-in, no compromise on accuracy.

License Python 3.10+ arXiv Open In Colab

from openmed import analyze_text

result = analyze_text(
    "Patient started on imatinib for chronic myeloid leukemia.",
    model_name="disease_detection_superclinical"
)

for entity in result.entities:
    print(f"{entity.label:<12} {entity.text:<35} {entity.confidence:.2f}")
# DISEASE      chronic myeloid leukemia            0.98
# DRUG         imatinib                            0.95

✨ Why OpenMed?

  • Specialized Models: 12+ curated medical NER models outperforming proprietary solutions
  • HIPAA-Compliant PII Detection: Smart de-identification with all 18 Safe Harbor identifiers
  • One-Line Deployment: From prototype to production in minutes
  • Dockerized REST API: FastAPI endpoints for service deployments
  • Batch Processing: Multi-file workflows with progress tracking
  • Production-Ready: Configuration profiles, profiling tools, and medical-aware tokenization
  • Zero Lock-In: Apache 2.0 licensed, runs on your infrastructure

Quick Start

Installation

# From a local checkout of this repo:
# Install with Hugging Face support
uv pip install -e ".[hf]"

# Or include REST service dependencies
uv pip install -e ".[hf,service]"

Apple Silicon acceleration in Python:

uv pip install -e ".[mlx]"

Swift apps on macOS and iOS use OpenMedKit. In 1.2.0, that means:

  • MLX on Apple Silicon macOS and real iPhone/iPad hardware for supported OpenMed PII, OpenAI Privacy Filter, and experimental GLiNER-family artifacts
  • CoreML when you already have a bundled Apple model package or want the fallback Apple path

Add the Swift package like this:

dependencies: [
    .package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.2.0"),
]

OpenMedKit is public and now supports native MLX runtime paths for PII token classification, Privacy Filter, and experimental GLiNER-family zero-shot tasks. The broader OpenMed model-packaging flow is still being hardened across the full collection, so treat conversion as active work rather than a fully universal public release surface.

For published releases, the editable install examples above can be replaced with plain uv pip install "openmed[...]".

Three Ways to Use OpenMed

1️⃣ Python API — One-liner for scripts and notebooks

from openmed import analyze_text

result = analyze_text(
    "Patient received 75mg clopidogrel for NSTEMI.",
    model_name="pharma_detection_superclinical"
)

2️⃣ REST API Service — FastAPI endpoints for app backends

uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080

3️⃣ Batch Processing — Programmatic multi-document workflows

from openmed import BatchProcessor

processor = BatchProcessor(
    model_name="disease_detection_superclinical",
    confidence_threshold=0.55,
    group_entities=True,
)

result = processor.process_texts([
    "Patient started metformin for type 2 diabetes.",
    "Imatinib started for chronic myeloid leukemia.",
])

Key Features

Core Capabilities

  • Curated Model Registry: Metadata-rich catalog with 12+ specialized medical NER models
  • PII Detection & De-identification: HIPAA-compliant de-identification with smart entity merging
  • Medical-Aware Tokenization: Clean handling of clinical patterns (COVID-19, CAR-T, IL-6)
  • Advanced NER Processing: Confidence filtering, entity grouping, and span alignment
  • Multiple Output Formats: Dict, JSON, HTML, CSV for any downstream system

Production Tools (v1.2.0)

  • Batch Processing: Multi-text and multi-file workflows with progress tracking
  • Configuration Profiles: dev/prod/test/fast presets with flexible overrides
  • Performance Profiling: Built-in inference timing and bottleneck analysis
  • Dockerized REST API: GET /health, POST /analyze, POST /pii/extract, POST /pii/deidentify
  • Service Reliability Hardening: request validation, shared pipeline preload, and timeout/error envelopes

Documentation

Comprehensive guides available at openmed.life/docs

Quick links:


REST API

OpenMed includes a Docker-friendly FastAPI service with reliability hardening:

  • GET /health
  • POST /analyze
  • POST /pii/extract
  • POST /pii/deidentify

Run locally

uv pip install -e ".[hf,service]"
uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080

Optional shared model warm-up:

OPENMED_SERVICE_PRELOAD_MODELS=disease_detection_superclinical,OpenMed/OpenMed-PII-SuperClinical-Small-44M-v1 \
uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080

Run with Docker

docker build -t openmed:1.2.0 .
docker run --rm -p 8080:8080 -e OPENMED_PROFILE=prod openmed:1.2.0

Example request

curl -X POST http://127.0.0.1:8080/pii/extract \
  -H "Content-Type: application/json" \
  -d '{"text":"Paciente: Maria Garcia, DNI: 12345678Z","lang":"es"}'

See the full service guide at REST Service docs.

Non-2xx responses now use a unified envelope:

{
  "error": {
    "code": "validation_error",
    "message": "Request validation failed",
    "details": [
      {
        "field": "body.text",
        "message": "Text must not be blank",
        "type": "value_error"
      }
    ]
  }
}

Models

OpenMed includes a curated registry of 12+ specialized medical NER models:

Model Specialization Entity Types Size
disease_detection_superclinical Disease & Conditions DISEASE, CONDITION, DIAGNOSIS 434M
pharma_detection_superclinical Drugs & Medications DRUG, MEDICATION, TREATMENT 434M
pii_detection_superclinical PII & De-identification NAME, DATE, SSN, PHONE, EMAIL, ADDRESS 434M
anatomy_detection_electramed Anatomy & Body Parts ANATOMY, ORGAN, BODY_PART 109M
gene_detection_genecorpus Genes & Proteins GENE, PROTEIN 109M

📖 Full Model Catalog


Advanced Usage

PII Detection & De-identification (v0.5.0)

from openmed import extract_pii, deidentify

# Extract PII entities with smart merging (default)
result = extract_pii(
    "Patient: John Doe, DOB: 01/15/1970, SSN: 123-45-6789",
    model_name="pii_detection_superclinical",
    use_smart_merging=True  # Prevents entity fragmentation
)

# De-identify with multiple methods
masked = deidentify(text, method="mask")        # [NAME], [DATE]
removed = deidentify(text, method="remove")     # Complete removal
replaced = deidentify(text, method="replace")   # Synthetic data
hashed = deidentify(text, method="hash")        # Cryptographic hashing
shifted = deidentify(text, method="shift_dates", date_shift_days=180)

Smart Entity Merging (NEW in v0.5.0): Fixes tokenization fragmentation by merging split entities like dates (01/15/1970 instead of 01 + /15/1970), ensuring production-ready de-identification.

HIPAA Compliance: Covers all 18 Safe Harbor identifiers with configurable confidence thresholds.

📓 Complete PII Notebook | 📖 Documentation

Multilingual PII (9 Languages)

OpenMed now supports multilingual PII extraction and de-identification across en, fr, de, it, es, nl, hi, te, and pt. French, German, Italian, and Spanish expose the full 35-model family; Portuguese ships 31 public API-visible models; Dutch, Hindi, and Telugu currently ship one flagship public model each, bringing the total PII catalog to 210 models.

uv pip install "openmed[hf]" && python -c "from openmed import extract_pii; print([(e.label,e.text) for e in extract_pii('Dr. Pedro Almeida, CPF: 123.456.789-09, email: pedro@hospital.pt, tel: +351 912 345 678', lang='pt').entities])"
from openmed import extract_pii

portuguese = extract_pii(
    "Paciente: Pedro Almeida, CPF: 123.456.789-09, email: pedro@hospital.pt, telefone: +351 912 345 678",
    lang="pt",
    model_name="OpenMed/OpenMed-PII-Portuguese-SnowflakeMed-Large-568M-v1",
    use_smart_merging=True,
)

dutch = extract_pii(
    "Patiënt: Eva de Vries, geboortedatum: 15 januari 1984, BSN: 123456782, telefoon: +31 6 12345678",
    lang="nl",
    model_name="OpenMed/OpenMed-PII-Dutch-SuperClinical-Large-434M-v1",
    use_smart_merging=True,
)

hindi = extract_pii(
    "रोगी: अनीता शर्मा, जन्मतिथि: 15 जनवरी 1984, फोन: +91 9876543210, पता: 12 गली संख्या 5, नई दिल्ली 110001",
    lang="hi",
    model_name="OpenMed/OpenMed-PII-Hindi-SuperClinical-Large-434M-v1",
    use_smart_merging=True,
)

telugu = extract_pii(
    "రోగి: సితా రెడ్డి, జన్మ తేదీ: 15 జనవరి 1984, ఫోన్: +91 9876543210, చిరునామా: 12 వీధి 5, హైదరాబాద్ 500001",
    lang="te",
    model_name="OpenMed/OpenMed-PII-Telugu-SuperClinical-Large-434M-v1",
    use_smart_merging=True,
)

print([(e.label, e.text) for e in portuguese.entities])
print([(e.label, e.text) for e in dutch.entities])
print([(e.label, e.text) for e in hindi.entities])
print([(e.label, e.text) for e in telugu.entities])

Batch Processing

from openmed import BatchProcessor, OpenMedConfig

config = OpenMedConfig.from_profile("prod")
processor = BatchProcessor(
    model_name="disease_detection_superclinical",
    config=config,
    group_entities=True,
)

result = processor.process_texts([
    "Metastatic breast cancer treated with trastuzumab.",
    "Acute lymphoblastic leukemia diagnosed.",
])

Configuration Profiles

from openmed import analyze_text

# Apply a profile programmatically
result = analyze_text(
    text,
    model_name="disease_detection_superclinical",
    config_profile="prod"  # High confidence, grouped entities
)

Performance Profiling

from openmed import analyze_text, profile_inference

with profile_inference() as profiler:
    result = analyze_text(text, model_name="disease_detection_superclinical")

print(profiler.summary())  # Inference time, bottlenecks, recommendations

📖 More Examples


Contributing

We welcome contributions! Whether it's bug reports, feature requests, or pull requests.


License

OpenMed is released under the Apache-2.0 License.


Citation

If you use OpenMed in your research, please cite:

@misc{panahi2025openmedneropensourcedomainadapted,
      title={OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets},
      author={Maziyar Panahi},
      year={2025},
      eprint={2508.01630},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.01630},
}

Star History

If you find OpenMed useful, consider giving it a star ⭐ to help others discover it!


Built with ❤️ by the OpenMed team

🌐 Website📚 Documentation🐦 X/Twitter💬 LinkedIn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openmed-1.2.0.tar.gz (137.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openmed-1.2.0-py3-none-any.whl (171.8 kB view details)

Uploaded Python 3

File details

Details for the file openmed-1.2.0.tar.gz.

File metadata

  • Download URL: openmed-1.2.0.tar.gz
  • Upload date:
  • Size: 137.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.11.15 HTTPX/0.28.1

File hashes

Hashes for openmed-1.2.0.tar.gz
Algorithm Hash digest
SHA256 25c086346011981321e47ff3d907156ebce4a65f6d36a05f7f5a25d313467cf2
MD5 be22485f1cc6676d75c2a4f51c705f87
BLAKE2b-256 f94f34471ac8696af05675f8b217908ee662d117d739ac5d1f1d68d8b6f7964e

See more details on using hashes here.

File details

Details for the file openmed-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: openmed-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 171.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.11.15 HTTPX/0.28.1

File hashes

Hashes for openmed-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d0ca722d8c340b6ab72a870e7a354871f63e3ad187a8346e69db0cec9f48a61
MD5 78b16730e82fecad0cf107c715bebbbd
BLAKE2b-256 3781d864884ce445da09632192926e8caf4f7d2e5f2ca555abdd95e62bcad00b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page