Skip to main content

PRISM annotation of medical texts using LLMs and TANL parsing

Project description

prism-annotator

A CLI tool for automatic PRISM annotation of medical/clinical texts using LLMs.

PRISM (Problem-oriented, Real-time, Informatics-based, Structured, Medical record) defines a schema for annotating medical entities (diseases, symptoms, anatomical parts, tests, medications, etc.) and their relations (temporal, spatial, causal) in clinical text.

prism-annotator uses the TANL (Typed Augmented Natural Language) inline annotation format with LLMs to extract structured PRISM annotations from free-text medical documents.

Installation

pip install prism-annotator

Or with uv:

uv add prism-annotator

Quick Start

1. Scaffold a new project

prism init my-project --language en
cd my-project

This creates:

  • config.yaml — extraction configuration
  • prompts/ — system prompts and few-shot examples (customise these)
  • data/ — place your input texts here
  • .env.example — API key template

2. Add your input data

Place .txt files in data/, or point config.yaml at a CSV file:

data:
  input_path: "data/"          # directory of .txt files
  # input_path: "data/notes.csv"  # or a CSV file
  # text_column: "text"           # CSV column name

3. Add few-shot examples

Edit prompts/entity_examples.yaml with domain-specific examples:

- input: "Chest CT showed ground-glass opacity in the right lung."
  output: "[Chest CT | t-test(+)] showed [ground-glass opacity | d(+)] in the [right lung | a]."

4. Set your API key and run

export OPENROUTER_API_KEY=sk-...
prism extract --config config.yaml

CLI Commands

prism extract       Run entity or relation extraction
prism merge         Merge entity + relation results
prism visualise     Generate interactive HTML viewer
prism to-xml        Convert to PRISM inline XML
prism validate      Validate results against PRISM schema
prism init          Scaffold a new project

Multi-phase pipeline

PRISM annotation runs in three phases:

# Phase 1: Entity extraction
prism extract --config entity.yaml

# Phase 2a: Medical relation extraction
prism extract --config medical_rel.yaml --entity-results output/entity/results.json

# Phase 2b: Temporal relation extraction
prism extract --config time_rel.yaml --entity-results output/entity/results.json

# Phase 3: Merge
prism merge --entity output/entity/results.json \
            --medical-relation output/medical/results.json \
            --time-relation output/time/results.json \
            -o output/merged

# Generate viewer
prism visualise output/merged

Configuration

See config.yaml generated by prism init for all options. Key settings:

Section Field Description
data.input_path Path Directory of .txt files or a .csv file
data.text_column String CSV column containing document text
model.model_id String LLM model ID (OpenRouter format)
model.base_url URL API endpoint (OpenRouter, OpenAI, etc.)
prompts.language ja/en Language for built-in system prompts
prompts.prompts_dir Path Custom prompts directory

Custom Prompts

The prompt fallback chain:

  1. prompts_dir in config (if set)
  2. prompts/ in working directory
  3. Built-in defaults (Japanese or English)

Each phase has a system prompt (.md) and few-shot examples (.yaml):

Phase System prompt Examples
Entity entity_system.md entity_examples.yaml
Medical relation medical_relation_system.md medical_relation_examples.yaml
Time relation time_relation_system.md time_relation_examples.yaml

Output Formats

  • JSON (results.json) — primary structured output
  • XML (results.xml) — PRISM inline-annotated XML
  • HTML (viewer.html) — interactive browser-based viewer
  • Statistics (stats.json) — entity/relation distribution

PRISM Schema (v8)

13 entity types, 10 relation types. See the PRISM Annotation Guidelines v8 for full specification:

The PRISM annotation scheme was originally proposed in the following works:

Shuntaro Yada, Ayami Joh, Ribeka Tanaka, Fei Cheng, Eiji Aramaki, and Sadao Kurohashi. 2020. Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), pages 4565–4572, Marseille, France. European Language Resources Association. [ACL Anthology]

矢田 竣太郎, 田中 リベカ, Fei Cheng, 荒牧 英治, 黒橋 禎夫. 2022. 汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定:重篤肺疾患ドメインに着目して. 自然言語処理, 29(4), pp. 1165–1197. [J-STAGE]

The TANL inline annotation format used by this tool is adapted from:

Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, and Stefano Soatto. 2021. Structured Prediction as Translation between Augmented Natural Languages. In Proceedings of the Ninth International Conference on Learning Representations (ICLR). [OpenReview]

Supported LLM Providers

Any OpenAI-compatible API endpoint works. Configure in config.yaml:

model:
  model_id: "anthropic/claude-sonnet-4"   # OpenRouter
  base_url: "https://openrouter.ai/api/v1"
  api_key_env: "OPENROUTER_API_KEY"
model:
  model_id: "gpt-4o"                        # OpenAI direct
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"

Citation

If you use this tool, please cite the original PRISM annotation works:

@inproceedings{yada-etal-2020-towards,
    title = "Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases",
    author = "Yada, Shuntaro and Joh, Ayami and Tanaka, Ribeka and Cheng, Fei and Aramaki, Eiji and Kurohashi, Sadao",
    booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2020.lrec-1.561/",
    pages = "4565--4572",
    isbn = "979-10-95546-34-4",
}

@article{yada-etal-2022-prism,
    title = "汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定:重篤肺疾患ドメインに着目して",
    author = "矢田, 竣太郎 and 田中, リベカ and Cheng, Fei and 荒牧, 英治 and 黒橋, 禎夫",
    journal = "自然言語処理",
    volume = "29",
    number = "4",
    pages = "1165--1197",
    year = "2022",
    url = "https://www.jstage.jst.go.jp/article/jnlp/29/4/29_1165/_article/-char/ja/",
}

See also CITATION.cff for machine-readable citation metadata.

Licence

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prism_annotator-0.2.0.tar.gz (59.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prism_annotator-0.2.0-py3-none-any.whl (48.5 kB view details)

Uploaded Python 3

File details

Details for the file prism_annotator-0.2.0.tar.gz.

File metadata

  • Download URL: prism_annotator-0.2.0.tar.gz
  • Upload date:
  • Size: 59.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prism_annotator-0.2.0.tar.gz
Algorithm Hash digest
SHA256 687f86fd0371b1b7a9837a9dee4c7b17c63b5d7ae9e86db07f0523c47de79876
MD5 ebafdee21a133d716c58ecae3e09e799
BLAKE2b-256 7b3e1a18ce177d9ff0a64e08ce2be2da5103a3c78c19a938a2acf21aa8bdc71a

See more details on using hashes here.

Provenance

The following attestation bundles were made for prism_annotator-0.2.0.tar.gz:

Publisher: publish.yml on sociocom/prism-annotator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prism_annotator-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for prism_annotator-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 776922c95932f6dd7c940e6149b58576d872a3d5239ca7cc1a9ce84bbc26f8e7
MD5 51b8470073e5bd6cf6bdeb9e1903d3bd
BLAKE2b-256 d1fe609a8ed4076a0bb5330aae7e1b7434f5609f8735731f9b2cd224f3e54a79

See more details on using hashes here.

Provenance

The following attestation bundles were made for prism_annotator-0.2.0-py3-none-any.whl:

Publisher: publish.yml on sociocom/prism-annotator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page