PRISM annotation of medical texts using LLMs and TANL parsing
Project description
prism-annotator
A CLI tool for automatic PRISM annotation of medical/clinical texts using LLMs.
PRISM (Problem-oriented, Real-time, Informatics-based, Structured, Medical record) defines a schema for annotating medical entities (diseases, symptoms, anatomical parts, tests, medications, etc.) and their relations (temporal, spatial, causal) in clinical text.
prism-annotator uses the TANL (Translation between Augmented Natural Languages; Paolini+, ICLR 2021) inline annotation format with LLMs to extract structured PRISM annotations from free-text medical documents.
Installation
pip install prism-annotator
Or with uv:
uv add prism-annotator
Quick Start
1. Scaffold a new project
prism init my-project --language en
cd my-project
This creates:
config.yaml— extraction configurationprompts/— system prompts and few-shot examples (customise these)data/— place your input texts here.env.example— API key template
2. Add your input data
Place .txt files in data/, or point config.yaml at a CSV file:
data:
input_path: "data/" # directory of .txt files
# input_path: "data/notes.csv" # or a CSV file
# text_column: "text" # CSV column name
3. Add few-shot examples
Edit prompts/entity_examples.yaml with domain-specific examples:
- input: "Chest CT showed ground-glass opacity in the right lung."
output: "[Chest CT | t-test(+)] showed [ground-glass opacity | d(+)] in the [right lung | a]."
4. Set your API key and run
export OPENROUTER_API_KEY=sk-...
prism extract --config config.yaml
CLI Commands
prism extract Run entity or relation extraction
prism merge Merge entity + relation results
prism visualise Generate interactive HTML viewer
prism to-xml Convert to PRISM inline XML
prism validate Validate results against PRISM schema
prism init Scaffold a new project
Multi-phase pipeline
PRISM annotation runs in three phases:
# Phase 1: Entity extraction
prism extract --config entity.yaml
# Phase 2a: Medical relation extraction
prism extract --config medical_rel.yaml --entity-results output/entity/results.json
# Phase 2b: Temporal relation extraction
prism extract --config time_rel.yaml --entity-results output/entity/results.json
# Phase 3: Merge
prism merge --entity output/entity/results.json \
--medical-relation output/medical/results.json \
--time-relation output/time/results.json \
-o output/merged
# Generate viewer
prism visualise output/merged
Configuration
See config.yaml generated by prism init for all options. Key settings:
| Section | Field | Description |
|---|---|---|
data.input_path |
Path | Directory of .txt files or a .csv file |
data.text_column |
String | CSV column containing document text |
model.model_id |
String | LLM model ID (OpenRouter format) |
model.base_url |
URL | API endpoint (OpenRouter, OpenAI, etc.) |
prompts.language |
ja/en |
Language for built-in system prompts |
prompts.prompts_dir |
Path | Custom prompts directory |
Custom Prompts
The prompt fallback chain:
prompts_dirin config (if set)prompts/in working directory- Built-in defaults (Japanese or English)
Each phase has a system prompt (.md) and few-shot examples (.yaml):
| Phase | System prompt | Examples |
|---|---|---|
| Entity | entity_system.md |
entity_examples.yaml |
| Medical relation | medical_relation_system.md |
medical_relation_examples.yaml |
| Time relation | time_relation_system.md |
time_relation_examples.yaml |
Output Formats
- JSON (
results.json) — primary structured output - XML (
results.xml) — PRISM inline-annotated XML - HTML (
viewer.html) — interactive browser-based viewer - Statistics (
stats.json) — entity/relation distribution
PRISM Schema (v8)
13 entity types, 10 relation types. See the PRISM Annotation Guidelines v8 for full specification:
The PRISM annotation scheme was originally proposed in the following works:
Shuntaro Yada, Ayami Joh, Ribeka Tanaka, Fei Cheng, Eiji Aramaki, and Sadao Kurohashi. 2020. Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), pages 4565–4572, Marseille, France. European Language Resources Association. [ACL Anthology]
矢田 竣太郎, 田中 リベカ, Fei Cheng, 荒牧 英治, 黒橋 禎夫. 2022. 汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定:重篤肺疾患ドメインに着目して. 自然言語処理, 29(4), pp. 1165–1197. [J-STAGE]
Note that our "PRISM" acronym (i.e. Problem-oriented, Real-time, Informatics-based, Structured, Medical record) dereived from a research funding scheme, called PRISM (Public/Private R&D Investment Strategic Expansion PrograM), by which our research project above was originally supported.
The TANL inline annotation format used by this tool is adapted from:
Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, and Stefano Soatto. 2021. Structured Prediction as Translation between Augmented Natural Languages. In Proceedings of the Ninth International Conference on Learning Representations (ICLR). [OpenReview]
Supported LLM Providers
Any OpenAI-compatible API endpoint works. Configure in config.yaml:
model:
model_id: "anthropic/claude-sonnet-4" # OpenRouter
base_url: "https://openrouter.ai/api/v1"
api_key_env: "OPENROUTER_API_KEY"
model:
model_id: "gpt-4o" # OpenAI direct
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY"
Citation
If you use this tool, please cite the original PRISM annotation works:
@inproceedings{yada-etal-2020-towards,
title = "Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases",
author = "Yada, Shuntaro and Joh, Ayami and Tanaka, Ribeka and Cheng, Fei and Aramaki, Eiji and Kurohashi, Sadao",
booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2020.lrec-1.561/",
pages = "4565--4572",
isbn = "979-10-95546-34-4",
}
@article{yada-etal-2022-prism,
title = "汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定:重篤肺疾患ドメインに着目して",
author = "矢田, 竣太郎 and 田中, リベカ and Cheng, Fei and 荒牧, 英治 and 黒橋, 禎夫",
journal = "自然言語処理",
volume = "29",
number = "4",
pages = "1165--1197",
year = "2022",
url = "https://www.jstage.jst.go.jp/article/jnlp/29/4/29_1165/_article/-char/ja/",
}
See also CITATION.cff for machine-readable citation metadata.
Changelog
0.2.1
- Fix: viewer entity highlights now align to the original source text instead of using TANL-stripped offsets, which caused the first character(s) of entities to be excluded or misaligned.
0.2.0
- Initial release.
Licence
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prism_annotator-0.2.1.tar.gz.
File metadata
- Download URL: prism_annotator-0.2.1.tar.gz
- Upload date:
- Size: 59.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c86b72defe986540518e67337ea2ede7f9f32ecbd87177560aa5089d132cdaf
|
|
| MD5 |
9b58a817d8ad91053c700870fd29e7e2
|
|
| BLAKE2b-256 |
2f49338cfd4f22cc0cc057e51cad7a85f8f6084da264404dc7da1ba05c8f102d
|
Provenance
The following attestation bundles were made for prism_annotator-0.2.1.tar.gz:
Publisher:
publish.yml on sociocom/prism-annotator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
prism_annotator-0.2.1.tar.gz -
Subject digest:
3c86b72defe986540518e67337ea2ede7f9f32ecbd87177560aa5089d132cdaf - Sigstore transparency entry: 1252132727
- Sigstore integration time:
-
Permalink:
sociocom/prism-annotator@6e2a38e4ce4f563be2fb92f7fa7b4fa29fc64e95 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/sociocom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6e2a38e4ce4f563be2fb92f7fa7b4fa29fc64e95 -
Trigger Event:
release
-
Statement type:
File details
Details for the file prism_annotator-0.2.1-py3-none-any.whl.
File metadata
- Download URL: prism_annotator-0.2.1-py3-none-any.whl
- Upload date:
- Size: 48.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
033ec8e4a8a01ad56fcb46fbf61dae6a88888c69a8ce428a6733a8d4c1edd396
|
|
| MD5 |
4677d1fad74ee7243219190f19946dd1
|
|
| BLAKE2b-256 |
2efcb86e231e29814f7952f1949bf6a9d266c33cb7941686b28f4e0b51c5ba72
|
Provenance
The following attestation bundles were made for prism_annotator-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on sociocom/prism-annotator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
prism_annotator-0.2.1-py3-none-any.whl -
Subject digest:
033ec8e4a8a01ad56fcb46fbf61dae6a88888c69a8ce428a6733a8d4c1edd396 - Sigstore transparency entry: 1252132735
- Sigstore integration time:
-
Permalink:
sociocom/prism-annotator@6e2a38e4ce4f563be2fb92f7fa7b4fa29fc64e95 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/sociocom
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6e2a38e4ce4f563be2fb92f7fa7b4fa29fc64e95 -
Trigger Event:
release
-
Statement type: