Skip to main content

Parse, structure, and export radiology free-text reports to FHIR

Project description

radreport-parser

Parse radiology free-text reports into structured data. No ML. No GPU. No dependencies.

PyPI version Python 3.9+ License: MIT

Radiology reports come out as free-text PDFs. Downstream systems — EMRs, telehealth portals, billing platforms, research pipelines — need structured data. This library bridges that gap.

Three things it does well:

  1. Parse — splits any free-text report into labeled sections, extracts measurements, links findings to anatomy
  2. Detect — flags critical/urgent findings with negation awareness (no false alerts for "no pneumothorax")
  3. Export — outputs FHIR R4 DiagnosticReport resources ready for any EMR

Install

pip install radreport-parser

Zero required dependencies. Works on Python 3.9+.


Quick Start

from radreport_parser import ReportParser, CriticalFindingsDetector, FHIRExporter
import json

report_text = """
INDICATION: Chest pain, rule out PE.

FINDINGS:
Lungs: Filling defect in the right main pulmonary artery consistent with
pulmonary embolism. No pneumothorax.

IMPRESSION:
Pulmonary embolism, right main pulmonary artery. Urgent correlation recommended.
"""

# 1. Parse
parser = ReportParser()
report = parser.parse(report_text, modality="CT")

print(report.impression)
# → "Pulmonary embolism, right main pulmonary artery. Urgent correlation recommended."

# 2. Detect critical findings
detector = CriticalFindingsDetector()
report = detector.detect(report)

for cf in report.critical_findings:
    if not cf.negated:
        print(f"[{cf.severity.upper()}] {cf.term} ({cf.category})")
        print(f"  Context: {cf.context}")
# → [CRITICAL] pulmonary embolism (pulmonary)
#     Context: Filling defect in the right main pulmonary artery consistent with pulmonary embolism.

# 3. Export to FHIR
exporter = FHIRExporter()
fhir = exporter.export(report, patient_id="pt-001")
print(json.dumps(fhir, indent=2))

CLI

After installation, the radreport command is available for single-file and batch processing:

# Parse a single report to JSON
radreport report.txt

# Parse with critical findings detection
radreport report.txt --critical

# Export as FHIR DiagnosticReport
radreport report.txt --fhir --patient-id pt-001 --modality CT

# Batch process multiple files → JSON array
radreport reports/*.txt --critical -o batch.json

# Specify modality for all files
radreport *.txt --modality MRI --fhir -o fhir_batch.json

Flags:

Flag Short Description
--modality MOD -m CT, MRI, XR, US, NM, PET …
--critical -c Run critical findings detection
--fhir -f Export as FHIR R4 DiagnosticReport (implies --critical)
--patient-id ID FHIR Patient resource ID
--output FILE -o Write output to file instead of stdout

Parsing

Sections

The parser recognizes standard radiology report sections regardless of formatting style:

Section key Matched headers
indication Indication, Clinical Indication, History, Reason for Exam
technique Technique, Procedure, Protocol
comparison Comparison, Prior Study, Previous
findings Findings, Observations
impression Impression, Conclusion, Assessment, Diagnosis
recommendation Recommendation, Follow-up, Advised
report = parser.parse(text, modality="MRI")

findings = report.get_section("findings")
print(findings.raw_text)

impression = report.get_section("impression")
print(impression.raw_text)

Measurements

All measurements are extracted and normalized to millimeters:

for m in report.all_measurements:
    print(f"  Raw: {m.raw}")
    print(f"  Normalized (mm): {m.dimensions_mm}")
    print(f"  Largest dimension: {m.largest_dimension_mm} mm")

# Raw: 2.3 x 1.8 cm
# Normalized (mm): [23.0, 18.0]
# Largest dimension: 23.0 mm

Handles: 1.2 x 0.8 cm, 12mm, 1.2cm, 12 x 8 x 5 mm, 1.2 x 0.8 x 0.5 cm

Findings by anatomy

findings_section = report.get_section("findings")
for finding in findings_section.findings:
    print(f"Anatomy: {finding.anatomy or 'unspecified'}")
    print(f"Text: {finding.text}")

Batch processing

reports = parser.parse_batch(list_of_texts, modality="CT")
# Returns list[ParsedReport | None] — None for empty/unparseable inputs
active = [r for r in reports if r is not None]

JSON serialization

report = parser.parse(text, modality="CT")

# As dict
d = report.to_dict()

# As JSON string (shorthand)
json_str = report.to_json()
json_str = report.to_json(indent=4)

Critical Findings Detection

Rule-based. Fully auditable. No black boxes.

Covers 45+ terms across 8 categories:

Category Examples
vascular aortic dissection, DVT, aortic aneurysm
pulmonary pulmonary embolism, PE, pneumothorax, hemothorax
neuro subdural hematoma, midline shift, intracranial hemorrhage
abdominal free air, bowel perforation, appendicitis
cardiac cardiac tamponade, pericardial effusion
spinal cord compression, cervical fracture
oncologic malignancy, metastasis, carcinoma

Negation awareness

# "No pneumothorax identified" → negated=True, won't trigger alert
# "Pneumothorax present" → negated=False, triggers alert

active = [cf for cf in report.critical_findings if not cf.negated]

Severity levels

  • critical — requires immediate action (PE, subdural hematoma, pneumothorax)
  • urgent — requires same-day follow-up (DVT, bowel obstruction, appendicitis)
  • significant — requires follow-up (malignancy, metastasis)

Extending the term list

from radreport_parser.critical_findings import CRITICAL_TERMS

CRITICAL_TERMS["tension pneumothorax"] = ("pulmonary", "critical")
CRITICAL_TERMS["septic emboli"] = ("vascular", "urgent")

FHIR Export

Outputs a valid FHIR R4 DiagnosticReport resource.

from datetime import datetime

fhir = exporter.export(
    report,
    patient_id="pt-001",       # Optional: links to FHIR Patient resource
    report_id="rpt-20240315",   # Optional: custom resource ID
    issued_dt=datetime.now(),   # Optional: defaults to UTC now
)

What's included

  • resourceType: DiagnosticReport
  • status: final
  • code: LOINC code matched to modality (CT, MRI, US, etc.)
  • conclusion: impression text
  • presentedForm: full report text as base64 attachment
  • contained: FHIR Observations for each active (non-negated) critical finding
  • extension: structured sections for downstream parsing
  • subject: patient reference (when patient_id provided)

Full Pipeline Example

import json
from radreport_parser import ReportParser, CriticalFindingsDetector, FHIRExporter

parser   = ReportParser()
detector = CriticalFindingsDetector()
exporter = FHIRExporter()

def process_report(text: str, modality: str, patient_id: str) -> dict:
    report = parser.parse(text, modality=modality)
    report = detector.detect(report)

    active_criticals = [cf for cf in report.critical_findings if not cf.negated]
    if active_criticals:
        print(f"WARNING: {len(active_criticals)} critical finding(s) detected")

    return exporter.export(report, patient_id=patient_id)

fhir_json = process_report(report_text, modality="CT", patient_id="pt-001")
print(json.dumps(fhir_json, indent=2))

See full_pipeline.py for a runnable end-to-end example.


Design Principles

No dependencies. The library installs with no third-party packages. This matters in hospital environments where every dependency goes through security review.

Rule-based, not ML-based. Every decision the library makes is traceable to a specific rule. No model weights, no GPU, no probabilistic outputs. Clinical teams can audit exactly why a finding was flagged.

Negation-aware. A library that can't distinguish "no pneumothorax" from "pneumothorax" is dangerous in clinical contexts. Negation detection is built into the core.

FHIR-first output. Every modern EMR speaks FHIR. The export format is designed to drop into existing integrations without transformation.


Running Tests

pip install radreport-parser[dev]
pytest tests/ -v

Roadmap

  • CLI tool for single-file and batch processing (radreport command)
  • parse_batch() API for processing lists of reports
  • to_json() convenience method on ParsedReport
  • Template matching for common report types (Chest XR, CT Abdomen, MRI Brain)
  • Structured output for follow-up recommendations
  • Additional FHIR resource types (ImagingStudy, Condition)
  • CSV export mode for research/analytics workflows

Disclaimer

This library is a developer tool for structuring report text. It is not a medical device and is not intended for direct clinical decision-making. Critical findings detection is designed to assist human review workflows, not replace radiologist judgment.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

radreport_parser-0.3.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

radreport_parser-0.3.0-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file radreport_parser-0.3.0.tar.gz.

File metadata

  • Download URL: radreport_parser-0.3.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for radreport_parser-0.3.0.tar.gz
Algorithm Hash digest
SHA256 820a6f529aa4503cb68f7ed6a0a2833e606c6db77990a8330665f4c710329166
MD5 d08f2ed7bcc387fa2f59a8eda5c5fe66
BLAKE2b-256 ad7b96ff8038143730d50528d0024455fdbb145d06e3cf76896f58551254b360

See more details on using hashes here.

File details

Details for the file radreport_parser-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for radreport_parser-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 06d58f2b5b4cb97e0340f3bb1775b134ea43e84e0322d09ffbd4094ddc733824
MD5 d2b9b11747d7a90e9fd20b68b1519e79
BLAKE2b-256 76e7c564c87c4af6f766014a9e940f4ca4e8aa96b5f57741dcace62cfa902899

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page