Skip to main content

Medical cOmputational Suite for Advanced Intelligent eXtraction - Intelligent radiology report extraction using local LLMs

Project description

MOSAICX

Medical cOmputational Suite for Advanced Intelligent eXtraction

Python 3.11+ License: Apache 2.0

MOSAICX is an intelligent radiology report extraction tool that uses local Large Language Models (LLMs) to extract structured data from medical reports. It supports both PDF and text inputs, provides configurable output formats, and offers both programmatic and command-line interfaces.

Features

🔬 Intelligent Extraction: Uses local LLMs (Ollama) for context-aware data extraction
📄 Advanced Document Processing: Powered by Docling for superior PDF and document parsing
⚙️ Configurable Schemas: Define custom extraction schemas with interactive brainstorming
📊 Flexible Outputs: Export to JSON, CSV, or custom formats
🔄 Multi-Report Analysis: Process multiple reports for patient history synthesis
🖥️ Dual Interface: Use as Python library or CLI tool
🏠 Local Processing: All processing happens locally using Ollama - no cloud dependencies
Fast Development: Built with uv for lightning-fast dependency management

Quick Start

Installation

pip install mosaicx

For Development (with uv - recommended):

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/LalithShiyam/MOSAICX.git
cd MOSAICX
uv sync --dev
uv run pre-commit install

Basic Usage

Command Line Interface

# Extract from a single PDF report  
uv run mosaicx extract report.pdf --config extraction_config.yaml --output results.json

# Interactive schema building
uv run mosaicx brainstorm --report sample_report.pdf --schema-output custom_schema.yaml

# Batch processing multiple reports
uv run mosaicx extract-batch reports/ --config config.yaml --output-dir results/

Python Library

from mosaicx import ReportExtractor, ExtractionConfig

# Initialize extractor
extractor = ReportExtractor()

# Extract from PDF
config = ExtractionConfig.from_file('config.yaml')
results = extractor.extract_from_pdf('report.pdf', config)

# Extract from text
text_content = "Patient shows signs of pneumonia..."
results = extractor.extract_from_text(text_content, config)

# Multi-report analysis
patient_reports = ['report1.pdf', 'report2.pdf', 'report3.pdf']
timeline = extractor.analyze_patient_history(patient_reports, config)

Configuration

Create a YAML configuration file to define extraction schemas:

schema:
  findings:
    - field: "primary_diagnosis"
      type: "string"
      description: "Main diagnosis from the report"
    - field: "severity"
      type: "enum"
      options: ["mild", "moderate", "severe"]
    - field: "follow_up_required"
      type: "boolean"

output:
  format: "json"
  include_confidence: true
  include_source_text: true

llm:
  model: "llama2"
  temperature: 0.1
  max_tokens: 1000

Documentation

Development

MOSAICX is developed by the DIGITX Lab at the Department of Radiology, LMU Munich University Hospital.

Requirements

  • Python 3.11+
  • Ollama installed locally
  • Local LLM model (e.g., Llama2, CodeLlama)

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

Apache License 2.0 - see LICENSE for details.

Authors

Lalith Kumar Shiyam Sundar, PhD
DIGITX Lab, Department of Radiology
LMU Munich University Hospital
📧 lalith.shiyam@med.uni-muenchen.de

Citation

If you use MOSAICX in your research, please cite:

@software{mosaicx2024,
  title={MOSAICX: Medical cOmputational Suite for Advanced Intelligent eXtraction},
  author={Sundar, Lalith Kumar Shiyam},
  year={2024},
  institution={DIGITX Lab, Department of Radiology, LMU Munich University Hospital},
  url={https://github.com/LalithShiyam/MOSAICX}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaicx-1.0.1.tar.gz (225.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mosaicx-1.0.1-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file mosaicx-1.0.1.tar.gz.

File metadata

  • Download URL: mosaicx-1.0.1.tar.gz
  • Upload date:
  • Size: 225.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for mosaicx-1.0.1.tar.gz
Algorithm Hash digest
SHA256 9ac89b863276aa1d3a0bcbff1763afbf36a043a82afe9d9a3640728f61463e90
MD5 95cbc40f13ee1ec5af4991e82fd97ad3
BLAKE2b-256 b05d2869cdf60ee16a47ef7fbff5ea14c4759e70d7978e37b6a0fe913102d0ca

See more details on using hashes here.

File details

Details for the file mosaicx-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: mosaicx-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for mosaicx-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3d85c8c89fd5cb674705639a121db38b1115a386aafd3b463e14170356d4e348
MD5 9c28d76a8fd90170c4da526482a6cd23
BLAKE2b-256 40be051d73b30cd344aa2ad8fa787931e144c7951326e020192ef888394f5217

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page