The Digital Registrar — a schema-first framework for multi-cancer, privacy-preserving pathology abstraction via local LLMs.
Project description
Digital Registrar
A schema-first framework for multi-cancer, privacy-preserving pathology abstraction via local LLMs.
Digital Registrar transforms free-text surgical pathology reports into machine-readable registry records using a College of American Pathologists (CAP)-aligned clinical ontology, encoded as strictly-typed DSPy signatures. The system covers 10 major cancer types across 193 registry fields — including complex variable-length structures like lymph-node groups and surgical margins — and is model-agnostic: any local LLM can serve as the inference engine. Designed for on-premise deployment on a single 48 GB GPU, it keeps sensitive clinical text inside the institution.
Highlights
- Schema-first architecture — the clinical ontology is the durable contribution; LLMs are interchangeable engines.
- CAP-aligned, registry-grade — 10 cancer types, 193 fields, validated against gold-standard human annotations.
- Privacy-preserving by design — local LLMs only, single 48 GB GPU, no cloud round-trip required.
- Validated generalizability — 94.3 % mean exact-match on 893 internal reports; 92.4 % on the external TCGA cohort of 150 reports (preprint).
Quickstart (end users)
The toolkit ships as four pip-installable packages. Pick the apps you need:
# Inference GUI — paste a report, see the structured extraction
pip install digital-registrar-gui
registrar-infer-gui # opens http://localhost:8502
# Annotation tool — review pipeline output against gold
pip install digital-registrar-annotator
registrar-annotate-workspace
# Schema editor — curate the CAP-aligned per-organ schema
pip install digital-registrar-schema-editor
registrar-schema-gui
# Core only (CLI + Python API) — for pipelines, scripts, and downstream tools
pip install digital-registrar
registrar-pipeline --input <folder>
Each app depends on digital-registrar (the core), so installing any of the apps brings the pipeline along automatically.
Audience
Built for cancer registrars, pathology informatics teams, and clinical researchers who need registry-grade structured extraction from narrative pathology reports without sending PHI off-premise.
Repository layout
drr-next/
├── src/digital_registrar/ ← THE core (pipeline, schemas, signatures, eval, paths)
├── apps/
│ ├── infer-gui/ ← digital-registrar-gui (Streamlit inference)
│ ├── schema-editor/ ← digital-registrar-schema-editor
│ └── annotator/ ← digital-registrar-annotator
├── attic/ ← research scaffolding (benchmarks, ablations, baselines, obfuscator)
├── packaging/ ← release pipeline (PyInstaller, Docker, hosted demo)
├── workspace/ ← gitignored runtime data (data, results, runs)
├── examples/ ← small read-only fixtures
├── tests/ ← core tests
└── docs/ ← architecture, API, eval, release
Dev install (cloners)
git clone https://github.com/kblab2024/digitalregistrar.git drr-next
cd drr-next
make install-dev # installs core + 3 apps + dev tooling
make test # core + app test suites
make lint # ruff
make install-dev installs the vendored tnmhelper wheel first, then pip install -e . (core), then pip install -e apps/<each> for the three downstream apps. Anyone with pip can clone and install in one command — no uv required.
Public Python API
from digital_registrar import (
run_pipeline, setup_pipeline, # extraction
load_pydantic_model, load_json_schema, # schemas
list_organs, CASE_MODELS, build_case_model,
build_extraction_signatures, ExtractionStep, # signatures
field_metrics, nested_field_metrics, # eval
pairwise_compare, completeness, score_case,
WORKSPACE_ROOT, workspace_root, results_root, # paths
)
See docs/api.md for the full reference.
Documentation
| Topic | Where |
|---|---|
| Pipeline architecture (v1 legacy, v2 factory) | docs/architecture/pipeline.md |
| Three-layer schema architecture | docs/architecture/schemas.md |
AJCC TNM staging via tnmhelper |
docs/architecture/staging.md |
| DSPy deep dive | docs/architecture/dspy_deep_dive.md |
| Annotation workflow | docs/workflows/annotation.md |
| Eval (prediction vs annotation) | docs/eval/index.md |
| Public Python API | docs/api.md |
| Release pipeline (PyPI / hosted demo / bundles / Docker) | docs/release.md |
| Research scaffolding (benchmarks, ablations, obfuscator) | attic/README.md |
Releasing
The project supports three distribution paths for layman users (see docs/release.md):
- PyPI —
pip install digital-registrar-guifor Python users. - Hosted Streamlit demo — public URL for paper reviewers / casual visitors. Safety checklist in docs/release.md.
- Native bundles + Docker —
.dmg/.exe/ Docker images for non-technical end users, built viamake bundleandmake docker-build.
Citation
If you use the Digital Registrar in your research, please cite:
Chow N-H, Chang H, Chen H-K, et al. Digital Registrar: A Schema-First Framework for Multi-Cancer Privacy-Preserving Pathology Abstraction via Local LLMs. medRxiv 2026. doi: 10.1101/2025.10.21.25338475
(Preprint; the citation will be updated to the published-journal version when available. Machine-readable metadata in CITATION.cff.)
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file digital_registrar-0.2.0b1.tar.gz.
File metadata
- Download URL: digital_registrar-0.2.0b1.tar.gz
- Upload date:
- Size: 340.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52ef320c855738bcae673b93b9029b63ee700731403a01b5d19f7d17570aa763
|
|
| MD5 |
439764a15c10573eb563cb9fea6c84b8
|
|
| BLAKE2b-256 |
a37dddb77242aeb7553acd7219f0cda9bb07548e02d6774dc5f0249c87448a14
|
File details
Details for the file digital_registrar-0.2.0b1-py3-none-any.whl.
File metadata
- Download URL: digital_registrar-0.2.0b1-py3-none-any.whl
- Upload date:
- Size: 419.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18fa42bc08b72582c5b8fbfe9e0d4b899efbd84fa384d16e7215e7b9454129f7
|
|
| MD5 |
218e29246424559522f5d0e22618dedf
|
|
| BLAKE2b-256 |
38300ce2b16124a65b19d3431176005e421986c897b4bc79013ab56c3b2c82d1
|