Skip to main content

Medical cOmputational Suite for Advanced Intelligent eXtraction - Intelligent radiology report extraction using local LLMs

Project description

MOSAICX

Medical cOmputational Suite for Advanced Intelligent eXtraction

Python 3.11+ License: Apache 2.0

MOSAICX is an intelligent radiology report extraction tool that uses local Large Language Models (LLMs) to extract structured data from medical reports. It supports both PDF and text inputs, provides configurable output formats, and offers both programmatic and command-line interfaces.

Features

🔬 Intelligent Extraction: Uses local LLMs (Ollama) for context-aware data extraction
📄 Advanced Document Processing: Powered by Docling for superior PDF and document parsing
⚙️ Configurable Schemas: Define custom extraction schemas with interactive brainstorming
📊 Flexible Outputs: Export to JSON, CSV, or custom formats
🔄 Multi-Report Analysis: Process multiple reports for patient history synthesis
🖥️ Dual Interface: Use as Python library or CLI tool
🏠 Local Processing: All processing happens locally using Ollama - no cloud dependencies
Fast Development: Built with uv for lightning-fast dependency management

Quick Start

Installation

pip install mosaicx

For Development (with uv - recommended):

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/LalithShiyam/MOSAICX.git
cd MOSAICX
uv sync --dev
uv run pre-commit install

Basic Usage

Command Line Interface

# Extract from a single PDF report  
uv run mosaicx extract report.pdf --config extraction_config.yaml --output results.json

# Interactive schema building
uv run mosaicx brainstorm --report sample_report.pdf --schema-output custom_schema.yaml

# Batch processing multiple reports
uv run mosaicx extract-batch reports/ --config config.yaml --output-dir results/

Python Library

from mosaicx import ReportExtractor, ExtractionConfig

# Initialize extractor
extractor = ReportExtractor()

# Extract from PDF
config = ExtractionConfig.from_file('config.yaml')
results = extractor.extract_from_pdf('report.pdf', config)

# Extract from text
text_content = "Patient shows signs of pneumonia..."
results = extractor.extract_from_text(text_content, config)

# Multi-report analysis
patient_reports = ['report1.pdf', 'report2.pdf', 'report3.pdf']
timeline = extractor.analyze_patient_history(patient_reports, config)

Configuration

Create a YAML configuration file to define extraction schemas:

schema:
  findings:
    - field: "primary_diagnosis"
      type: "string"
      description: "Main diagnosis from the report"
    - field: "severity"
      type: "enum"
      options: ["mild", "moderate", "severe"]
    - field: "follow_up_required"
      type: "boolean"

output:
  format: "json"
  include_confidence: true
  include_source_text: true

llm:
  model: "llama2"
  temperature: 0.1
  max_tokens: 1000

Documentation

Development

MOSAICX is developed by the DIGITX Lab at the Department of Radiology, LMU Munich University Hospital.

Requirements

  • Python 3.11+
  • Ollama installed locally
  • Local LLM model (e.g., Llama2, CodeLlama)

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

Apache License 2.0 - see LICENSE for details.

Authors

Lalith Kumar Shiyam Sundar, PhD
DIGITX Lab, Department of Radiology
LMU Munich University Hospital
📧 lalith.shiyam@med.uni-muenchen.de

Citation

If you use MOSAICX in your research, please cite:

@software{mosaicx2024,
  title={MOSAICX: Medical cOmputational Suite for Advanced Intelligent eXtraction},
  author={Sundar, Lalith Kumar Shiyam},
  year={2024},
  institution={DIGITX Lab, Department of Radiology, LMU Munich University Hospital},
  url={https://github.com/LalithShiyam/MOSAICX}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaicx-1.0.2.tar.gz (229.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mosaicx-1.0.2-py3-none-any.whl (46.9 kB view details)

Uploaded Python 3

File details

Details for the file mosaicx-1.0.2.tar.gz.

File metadata

  • Download URL: mosaicx-1.0.2.tar.gz
  • Upload date:
  • Size: 229.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for mosaicx-1.0.2.tar.gz
Algorithm Hash digest
SHA256 6ef43991f1828252b7322a74e3ef6884d57232cba0e8a81107df50710c92a7ef
MD5 007920ed088ce6d111ca86dcddbf25ea
BLAKE2b-256 57a9a32206b0e646b672b6f04672d316248802a9fd47291f888de5a798ca3cd3

See more details on using hashes here.

File details

Details for the file mosaicx-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: mosaicx-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 46.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for mosaicx-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6f97e2018869f7bbe1a8045bb3f966504719d2639acddf33394b8635ec8f4ebe
MD5 0b02cfca90672ff4cc236d2224c8ff1e
BLAKE2b-256 ccf418ff67293034ee011c2a581e297835deec288d9a6ba6a7ee0864250943c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page