Medical cOmputational Suite for Advanced Intelligent eXtraction
Project description
MOSAICX: Structure first. Insight follows.
MOSAICX turns unstructured clinical documents into validated, structured data—locally, privately, reproducibly. It supports:
- Schema generation from natural language (Pydantic v2)
- Extraction from PDFs/text using the generated schema
- Summarization of radiology reports (single or multi-report per patient) → critical timeline + one-paragraph executive summary as JSON
Local LLMs via Ollama (OpenAI-compatible). PDF text via Docling. Rich terminal UI.
🚀 Quick Start
1) Requirements
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model that behaves well with JSON
ollama pull llama3.1:8b-instruct # or: qwen2.5:7b-instruct, gpt-oss:120b
2) Install MOSAICX
pip install mosaicx
# or (faster resolver)
uv add mosaicx
3) Smoke test
mosaicx --help
✨ New: Summarize (Timeline + JSON)
Goal: give clinicians an at-a-glance patient trajectory from one or more radiology reports (same patient), without reading everything.
- Input: one or many reports (
.pdfor.txt) for a single patient - Logic: radiology-first prompt (modality-adaptive), concise, no recommendations/differentials
- Output:
- Terminal: header + timeline + executive summary (Rich)
- JSON: standardized object (
patient,timeline[],overall)
Example
mosaicx summarize \
--report P001_CT_2025-08-01.pdf \
--report P001_CT_2025-09-10.pdf \
--patient P001 \
--model llama3.1:8b-instruct \
--json-out out/summary_P001.json \
or
mosaicx summarize \
--dir ./patient_directory
--json-out ./longitudinal_summary.json
--model gpt-oss:120b
Summary JSON (shape)
{
"patient": {
"patient_id": "P001",
"dob": null,
"sex": null,
"last_updated": "2025-09-19T12:34:56Z"
},
"timeline": [
{ "date": "2025-08-01", "source": "CT 2025-08-01", "note": "Baseline nodal disease; R ext-iliac LN short-axis 12 mm" },
{ "date": "2025-09-10", "source": "CT 2025-09-10", "note": "R ext-iliac LN 12→16 mm — progression; no visceral mets" }
],
"overall": "Nodal-only disease with interval progression of the right external iliac node [CT 2025-09-10]; baseline nodal disease without visceral metastases [CT 2025-08-01]."
}
Under the hood (robust fallbacks)
- Instructor JSON → Pydantic → ✅
- Raw JSON extraction → Pydantic → ✅
- Heuristic timeline/summary → ✅
Core Workflows
1) Generate a schema (from plain English)
mosaicx generate \
--desc "Echocardiography with patient_id, exam_date, EF %, valve grades (Normal/Mild/Moderate/Severe), impression" \
--model gpt-oss:120b
2) Extract structured data with that schema(PDF → JSON)
mosaicx extract \
--pdf echo_report.pdf \
--schema EchocardiographyReport_20250919_143022 \
--model gpt-oss:120b \
--save out/echo_001.json
3) Summarize radiology reports (timeline + JSON)
# Multiple inputs for the same patient
mosaicx summarize \
--dir ./reports/P001 \
--patient P001 \
--model gpt-oss:120b \
--json-out out/summary_P001.json \
CLI options (summarize)
--report… (repeatable)--dir(recursively picks.pdf,.txt)--patient PSEUDONYM--json-out path.jsonand--print-json--model,--base-url,--api-key,--temperature
Tips for Great Results
- Models: prefer
llama3.1:8b-instructorqwen2.5:7b-instructfor clean JSON. - Prompts (summarize): MOSAICX uses a conciseness-first prompt, modality-adaptive, no DDx or recommendations.
- PDFs: If scanned (no text), run OCR before MOSAICX or add an OCR pre-step in your pipeline.
Troubleshooting
- Connection refused / model not found: start Ollama,
ollama list, pull your model. - Empty summary: try a more JSON-obedient model; lower
--temperature(0.0–0.2). - PDF yields no text: the PDF likely has no text layer; OCR it first.
Why MOSAICX (one paragraph)
MOSAICX is infrastructure for clinical data: schema-driven, validated, local, and reproducible. Structure reports once, then reuse the same schemas and summarizers across departments and time—enabling longitudinal analysis, cross-modal integration, and downstream intelligence without sending data to the cloud.
License
AGPL-3.0. See LICENSE.
Contact
DIGIT-X Lab · Department of Radiology · LMU Klinikum
lalith.shiyam@med.uni-muenchen.de
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mosaicx-1.0.9.tar.gz.
File metadata
- Download URL: mosaicx-1.0.9.tar.gz
- Upload date:
- Size: 784.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95cafeeaf4e33860999d8da67f4f8e8cc36f559888f2da1a41acf8654f2ca31d
|
|
| MD5 |
bffe8d293da841a35988619d05cccad0
|
|
| BLAKE2b-256 |
2adcefc802e70d827ab113aa8872e4234995d57814041831b94f773cbe64580a
|
File details
Details for the file mosaicx-1.0.9-py3-none-any.whl.
File metadata
- Download URL: mosaicx-1.0.9-py3-none-any.whl
- Upload date:
- Size: 50.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59bdcca2113124a435af1958fb44905f4d87a8aa336f33d6c9b65adb73628966
|
|
| MD5 |
b98fe679fbed01bdf71859d897450ac2
|
|
| BLAKE2b-256 |
353ee4d531f21805fd38db4e944e98a865b8dddbf4996adc0afa159cc0247582
|