Medical cOmputational Suite for Advanced Intelligent eXtraction - Intelligent radiology report extraction using local LLMs
Project description
MOSAICX
Medical cOmputational Suite for Advanced Intelligent eXtraction
MOSAICX is an intelligent radiology report extraction tool that uses local Large Language Models (LLMs) to extract structured data from medical reports. It supports both PDF and text inputs, provides configurable output formats, and offers both programmatic and command-line interfaces.
Features
🔬 Intelligent Extraction: Uses local LLMs (Ollama) for context-aware data extraction
📄 Advanced Document Processing: Powered by Docling for superior PDF and document parsing
⚙️ Configurable Schemas: Define custom extraction schemas with interactive brainstorming
📊 Flexible Outputs: Export to JSON, CSV, or custom formats
🔄 Multi-Report Analysis: Process multiple reports for patient history synthesis
🖥️ Dual Interface: Use as Python library or CLI tool
🏠 Local Processing: All processing happens locally using Ollama - no cloud dependencies
⚡ Fast Development: Built with uv for lightning-fast dependency management
Quick Start
Installation
pip install mosaicx
For Development (with uv - recommended):
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
git clone https://github.com/LalithShiyam/MOSAICX.git
cd MOSAICX
uv sync --dev
uv run pre-commit install
Basic Usage
Command Line Interface
# Extract from a single PDF report
uv run mosaicx extract report.pdf --config extraction_config.yaml --output results.json
# Interactive schema building
uv run mosaicx brainstorm --report sample_report.pdf --schema-output custom_schema.yaml
# Batch processing multiple reports
uv run mosaicx extract-batch reports/ --config config.yaml --output-dir results/
Python Library
from mosaicx import ReportExtractor, ExtractionConfig
# Initialize extractor
extractor = ReportExtractor()
# Extract from PDF
config = ExtractionConfig.from_file('config.yaml')
results = extractor.extract_from_pdf('report.pdf', config)
# Extract from text
text_content = "Patient shows signs of pneumonia..."
results = extractor.extract_from_text(text_content, config)
# Multi-report analysis
patient_reports = ['report1.pdf', 'report2.pdf', 'report3.pdf']
timeline = extractor.analyze_patient_history(patient_reports, config)
Configuration
Create a YAML configuration file to define extraction schemas:
schema:
findings:
- field: "primary_diagnosis"
type: "string"
description: "Main diagnosis from the report"
- field: "severity"
type: "enum"
options: ["mild", "moderate", "severe"]
- field: "follow_up_required"
type: "boolean"
output:
format: "json"
include_confidence: true
include_source_text: true
llm:
model: "llama2"
temperature: 0.1
max_tokens: 1000
Documentation
Development
MOSAICX is developed by the DIGITX Lab at the Department of Radiology, LMU Munich University Hospital.
Requirements
- Python 3.11+
- Ollama installed locally
- Local LLM model (e.g., Llama2, CodeLlama)
Contributing
We welcome contributions! Please see our Contributing Guide for details.
License
Apache License 2.0 - see LICENSE for details.
Authors
Lalith Kumar Shiyam Sundar, PhD
DIGITX Lab, Department of Radiology
LMU Munich University Hospital
📧 lalith.shiyam@med.uni-muenchen.de
Citation
If you use MOSAICX in your research, please cite:
@software{mosaicx2024,
title={MOSAICX: Medical cOmputational Suite for Advanced Intelligent eXtraction},
author={Sundar, Lalith Kumar Shiyam},
year={2024},
institution={DIGITX Lab, Department of Radiology, LMU Munich University Hospital},
url={https://github.com/LalithShiyam/MOSAICX}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mosaicx-1.0.1.tar.gz.
File metadata
- Download URL: mosaicx-1.0.1.tar.gz
- Upload date:
- Size: 225.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ac89b863276aa1d3a0bcbff1763afbf36a043a82afe9d9a3640728f61463e90
|
|
| MD5 |
95cbc40f13ee1ec5af4991e82fd97ad3
|
|
| BLAKE2b-256 |
b05d2869cdf60ee16a47ef7fbff5ea14c4759e70d7978e37b6a0fe913102d0ca
|
File details
Details for the file mosaicx-1.0.1-py3-none-any.whl.
File metadata
- Download URL: mosaicx-1.0.1-py3-none-any.whl
- Upload date:
- Size: 42.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d85c8c89fd5cb674705639a121db38b1115a386aafd3b463e14170356d4e348
|
|
| MD5 |
9c28d76a8fd90170c4da526482a6cd23
|
|
| BLAKE2b-256 |
40be051d73b30cd344aa2ad8fa787931e144c7951326e020192ef888394f5217
|