Skip to main content

Medical cOmputational Suite for Advanced Intelligent eXtraction

Project description

MOSAICX Logo

PyPI Python License Downloads Pydantic v2 Ollama Compatible Commercial License

MOSAICX: Structure first. Insight follows.

MOSAICX turns unstructured clinical documents into validated, structured data—locally, privately, reproducibly. It supports:

  • Schema generation from natural language (Pydantic v2)
  • Extraction from PDFs/text using the generated schema
  • Summarization of radiology reports (single or multi-report per patient) → critical timeline + one-paragraph executive summary as JSON

Local LLMs via Ollama (OpenAI-compatible). PDF text via Docling. Rich terminal UI.


🚀 Quick Start

1) Requirements

# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model that behaves well with JSON
ollama pull llama3.1:8b-instruct         # or: qwen2.5:7b-instruct, gpt-oss:120b

2) Install MOSAICX

pip install mosaicx
# or (faster resolver)
uv add mosaicx

3) Smoke test

mosaicx --help

✨ New: Summarize (Timeline + JSON)

Goal: give clinicians an at-a-glance patient trajectory from one or more radiology reports (same patient), without reading everything.

  • Input: one or many reports (.pdf or .txt) for a single patient
  • Logic: radiology-first prompt (modality-adaptive), concise, no recommendations/differentials
  • Output:
    • Terminal: header + timeline + executive summary (Rich)
    • JSON: standardized object (patient, timeline[], overall)

Example

mosaicx summarize \
  --report P001_CT_2025-08-01.pdf \
  --report P001_CT_2025-09-10.pdf \
  --patient P001 \
  --model llama3.1:8b-instruct \
  --json-out out/summary_P001.json \

or

mosaicx summarize \
  --dir ./patient_directory
  --json-out ./longitudinal_summary.json
  --model gpt-oss:120b

Summary JSON (shape)

{
  "patient": {
    "patient_id": "P001",
    "dob": null,
    "sex": null,
    "last_updated": "2025-09-19T12:34:56Z"
  },
  "timeline": [
    { "date": "2025-08-01", "source": "CT 2025-08-01", "note": "Baseline nodal disease; R ext-iliac LN short-axis 12 mm" },
    { "date": "2025-09-10", "source": "CT 2025-09-10", "note": "R ext-iliac LN 12→16 mm — progression; no visceral mets" }
  ],
  "overall": "Nodal-only disease with interval progression of the right external iliac node [CT 2025-09-10]; baseline nodal disease without visceral metastases [CT 2025-08-01]."
}

Under the hood (robust fallbacks)

  1. Instructor JSON → Pydantic → ✅
  2. Raw JSON extraction → Pydantic → ✅
  3. Heuristic timeline/summary → ✅

Core Workflows

1) Generate a schema (from plain English)

mosaicx generate \
  --desc "Echocardiography with patient_id, exam_date, EF %, valve grades (Normal/Mild/Moderate/Severe), impression" \
  --model gpt-oss:120b

2) Extract structured data with that schema(PDF → JSON)

mosaicx extract \
  --pdf echo_report.pdf \
  --schema EchocardiographyReport_20250919_143022 \
  --model gpt-oss:120b \
  --save out/echo_001.json

3) Summarize radiology reports (timeline + JSON)

# Multiple inputs for the same patient
mosaicx summarize \
  --dir ./reports/P001 \
  --patient P001 \
  --model gpt-oss:120b \
  --json-out out/summary_P001.json \

CLI options (summarize)

  • --report … (repeatable)
  • --dir (recursively picks .pdf, .txt)
  • --patient PSEUDONYM
  • --json-out path.json and --print-json
  • --model, --base-url, --api-key, --temperature

Tips for Great Results

  • Models: prefer llama3.1:8b-instruct or qwen2.5:7b-instruct for clean JSON.
  • Prompts (summarize): MOSAICX uses a conciseness-first prompt, modality-adaptive, no DDx or recommendations.
  • PDFs: If scanned (no text), run OCR before MOSAICX or add an OCR pre-step in your pipeline.

Troubleshooting

  • Connection refused / model not found: start Ollama, ollama list, pull your model.
  • Empty summary: try a more JSON-obedient model; lower --temperature (0.0–0.2).
  • PDF yields no text: the PDF likely has no text layer; OCR it first.

Why MOSAICX (one paragraph)

MOSAICX is infrastructure for clinical data: schema-driven, validated, local, and reproducible. Structure reports once, then reuse the same schemas and summarizers across departments and time—enabling longitudinal analysis, cross-modal integration, and downstream intelligence without sending data to the cloud.


License

AGPL-3.0. See LICENSE.

Contact

DIGIT-X Lab · Department of Radiology · LMU Klinikum
lalith.shiyam@med.uni-muenchen.de

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaicx-1.0.9.tar.gz (784.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mosaicx-1.0.9-py3-none-any.whl (50.2 kB view details)

Uploaded Python 3

File details

Details for the file mosaicx-1.0.9.tar.gz.

File metadata

  • Download URL: mosaicx-1.0.9.tar.gz
  • Upload date:
  • Size: 784.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for mosaicx-1.0.9.tar.gz
Algorithm Hash digest
SHA256 95cafeeaf4e33860999d8da67f4f8e8cc36f559888f2da1a41acf8654f2ca31d
MD5 bffe8d293da841a35988619d05cccad0
BLAKE2b-256 2adcefc802e70d827ab113aa8872e4234995d57814041831b94f773cbe64580a

See more details on using hashes here.

File details

Details for the file mosaicx-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: mosaicx-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 50.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for mosaicx-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 59bdcca2113124a435af1958fb44905f4d87a8aa336f33d6c9b65adb73628966
MD5 b98fe679fbed01bdf71859d897450ac2
BLAKE2b-256 353ee4d531f21805fd38db4e944e98a865b8dddbf4996adc0afa159cc0247582

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page