Skip to main content

Convert FHIR JSON bundles into flat, LLM-friendly XLSX spreadsheets.

Project description

flat-fhir

Convert FHIR JSON bundles into flat, LLM-friendly XLSX spreadsheets.

Machines speak FHIR. Humans and LLMs get spreadsheets.

Why

FHIR is the right canonical format for medical interoperability. But nested JSON with cross-references is terrible for:

  • Doctors who want to eyeball trends in a spreadsheet
  • Patients who want to carry their records on a USB stick
  • LLMs that reason far better over flat tabular data than deeply nested JSON
  • Rural clinics that have Excel but not a FHIR parser

flat-fhir bridges that gap. FHIR stays the source of truth. XLSX becomes the universal export.

Install

pip install flat-fhir

Quick Start

from flat_fhir import flatten

# From a FHIR Bundle JSON file
flatten("patient_bundle.json", output="patient_record.xlsx")

# From a Python dict
import json
with open("patient_bundle.json") as f:
    bundle = json.load(f)

flatten(bundle, output="patient_record.xlsx")

What It Produces

A multi-sheet XLSX workbook. Every row follows the same structure:

primary_key | performer (who) | timestamp (when) | ...all the whats
  • primary_key — deterministic hash, unique per row
  • performer — the doctor, lab, or pathologist who recorded it
  • timestamp — when the observation/event happened (ISO 8601)
  • ...whats — the actual clinical data, flattened into columns

If a blood culture and liver ultrasound are ordered on the same day by different doctors, they produce two separate rows. One row per performer per event. Sparse by design — most cells are empty because you only measure a few things per visit.

Sheets

Sheet Description Example Columns
patient Demographics (single row) family_name, given_name, birth_date, gender, phone, email, address
vitals Physical measurements blood_pressure_systolic_mmhg, blood_pressure_diastolic_mmhg, heart_rate_bpm, body_temperature_c, oxygen_saturation_pct, respiratory_rate_bpm, body_weight_kg, body_height_cm, bmi_kg_m2
labs Laboratory results hba1c_pct, blood_glucose_fasting_mg_dl, total_cholesterol_mg_dl, creatinine_mg_dl, egfr_ml_min, tsh_miu_l, hemoglobin_g_dl, wbc_k_ul, troponin_t_hs_ng_l
medications Prescriptions medication_name, rxnorm_code, dose, dose_unit, frequency, route, status, prescriber
conditions Diagnoses condition_name, icd_code, snomed_code, clinical_status, verification_status, severity, category, onset_date, abatement_date
allergies Allergy & intolerance records substance, reaction, severity, criticality, type, category, clinical_status
procedures Surgical & clinical procedures procedure_name, snomed_code, body_site, outcome, status
imaging Imaging studies modality, body_site, dicom_uid, findings
immunizations Vaccination records vaccine_name, cvx_code, dose, dose_unit, site, lot_number, status
encounters Visit metadata class, type, facility, reason, discharge_disposition, status, period_start, period_end
diagnostic_reports Panel-level lab groupings report_name, loinc_code, category, status, conclusion, result_references

Common Columns

Every sheet (except patient) includes these columns in this order:

Column Description
primary_key Deterministic hash — unique row identifier
performer Who recorded it (doctor, lab, pathologist)
timestamp When — ISO 8601 datetime
encounter_id Links rows back to a specific visit
source_resource_id Original FHIR resource ID for traceability
notes Free text annotations

Built-in LOINC Registry

flat-fhir ships with 100+ LOINC code mappings covering the standard FHIR R4 profiles:

  • Vital signs — blood pressure panel (85354-9), heart rate, SpO2, temperature, weight, height, BMI, head circumference
  • CBC — WBC, RBC, hemoglobin, hematocrit, MCV, MCH, MCHC, RDW, platelets
  • BMP/CMP — glucose, BUN, creatinine, sodium, potassium, chloride, CO2, calcium, ALT, AST, ALP, bilirubin, albumin, total protein
  • Lipid panel — total cholesterol, HDL, LDL (direct & calculated), triglycerides
  • Thyroid — TSH, free T4, free T3
  • HbA1c — standard and IFCC
  • Renal — eGFR, creatinine
  • Coagulation — PT, INR, aPTT, fibrinogen, D-dimer
  • Cardiac markers — troponin T/I (standard & high-sensitivity), BNP, NT-proBNP
  • Blood gases — pH, pCO2, pO2, bicarbonate, base excess, lactate
  • Inflammatory — CRP (standard & high-sensitivity), ESR
  • Vitamins — vitamin D, B12, folate, iron, ferritin
  • Urinalysis — specific gravity, pH, protein, glucose, ketones, blood, nitrite, leukocyte esterase

Naming Convention

All column headers use pythonic snake_case with the unit appended:

blood_pressure_systolic_mmhg
body_weight_kg
hba1c_pct
blood_glucose_fasting_mg_dl
oxygen_saturation_pct
body_temperature_c
troponin_t_hs_ng_l

The mapping from FHIR codes (LOINC, SNOMED, RxNorm, CVX) to human-readable column names is maintained in a built-in registry.

CLI

# Basic conversion
flat-fhir convert patient_bundle.json -o patient_record.xlsx

# Only specific sheets
flat-fhir convert patient_bundle.json -o record.xlsx --sheets vitals,labs,medications

# From a FHIR server (auto-appends $everything for Patient URLs)
flat-fhir fetch https://fhir.example.com/Patient/123 -o record.xlsx

# Merge multiple bundles (e.g., from different providers)
flat-fhir merge hospital_a.json hospital_b.json clinic.json -o merged.xlsx

# Reverse: XLSX back to FHIR Bundle JSON
flat-fhir unflatten patient_record.xlsx -o patient_bundle.json

Python API

from flat_fhir import flatten, unflatten, merge, FlatFHIR

# Basic flatten
flatten("bundle.json", output="record.xlsx")

# Fine-grained control
ff = FlatFHIR("bundle.json")

# Access individual sheets as pandas DataFrames
vitals_df = ff.vitals
labs_df = ff.labs

# Filter and query
recent_bp = ff.vitals[ff.vitals["timestamp"] > "2025-01-01"][
    ["timestamp", "blood_pressure_systolic_mmhg", "blood_pressure_diastolic_mmhg"]
]

# Plot directly
ff.labs.plot(x="timestamp", y="hba1c_pct", title="HbA1c Over Time")

# Export specific sheets
ff.to_xlsx("record.xlsx", sheets=["vitals", "labs"])

# Export to other formats
ff.to_csv("vitals.csv", sheet="vitals")
ff.to_parquet("record.parquet")
ff.to_json("record_flat.json")  # flat JSON, not FHIR

# Round-trip back to FHIR
fhir_bundle = ff.to_fhir()
unflatten("record.xlsx", output="bundle.json")

# Merge records from multiple providers
merged = merge(["hospital_a.json", "clinic_b.json"])
merged.to_xlsx("complete_record.xlsx")

Custom Column Registry

The built-in registry maps LOINC/SNOMED codes to snake_case column names. You can extend it:

from flat_fhir import registry

# Add a custom mapping
registry.add("85354-9", "blood_pressure_systolic_mmhg", sheet="vitals")

# Or load a custom registry file
registry.load("my_mappings.json")

# See all registered mappings
print(registry.list())

LLM Integration

The whole point. Flat FHIR spreadsheets are purpose-built for LLM consumption:

from flat_fhir import FlatFHIR

ff = FlatFHIR("bundle.json")

# Generate a compact text summary for an LLM prompt
prompt_context = ff.to_prompt()

# Or get a specific sheet as markdown table
vitals_table = ff.vitals.to_markdown()

# Token-aware truncation — keeps the most recent N rows
prompt_context = ff.to_prompt(max_tokens=2000)

Example output of ff.to_prompt():

## Patient: Amir Khan, M, DOB 1990-05-15

## Active Conditions
- Essential hypertension (diagnosed 2024-09-12)
- Type 2 diabetes mellitus without complications (diagnosed 2024-01-10)

## Allergies
- Penicillin (causes Skin rash)

## Active Medications
| medication | dose | frequency |
|---|---|---|
| Lisinopril 10 MG Oral Tablet | 10 mg | once daily |
| Metformin 500 MG Oral Tablet | 500 mg | twice daily |

## Vitals (last 2)
| timestamp | bp_sys | bp_dia | temp_c | weight_kg | hr | spo2 |
|---|---|---|---|---|---|---|
| 2024-12-01 | 115 | 74 | 36.5 | 75.2 | 64 | 99 |
| 2024-09-12 | 118 | 76 | — | 75.8 | 66 | — |

## Labs (last 1)
| timestamp | hba1c | glucose_fasting | cholesterol | triglycerides |
|---|---|---|---|---|
| 2024-12-01 | 6.3 | 102 | 180 | 125 |

Round-Trip Fidelity

flat-fhir preserves FHIR resource IDs and code systems so you can go back:

FHIR Bundle JSON → flatten → XLSX → unflatten → FHIR Bundle JSON

The source_resource_id column on every sheet ensures nothing gets lost. LOINC, SNOMED, ICD, RxNorm, and CVX codes are stored alongside human-readable names so the round-trip is lossless.

Testing

178 tests covering every extractor, output format, CLI command, and round-trip path. Tested against 13 external FHIR samples from HL7, IHE, SMART on FHIR, and Synthea.

# Run tests
pytest

# Run with coverage
pytest --cov=flat_fhir --cov-report=term-missing

# Current coverage: 99%

Registry Codegen

The LOINC registry is auto-generated from the FHIR R4 spec. To regenerate after a spec update:

python scripts/sync_fhir_spec.py --synthea

This downloads the FHIR R4 definitions, parses vital signs LOINC codes, scans Synthea sample bundles for real-world codes, and regenerates flat_fhir/registry.py.

Roadmap

  • Core flatten/unflatten for all major FHIR resource types
  • CLI with convert, fetch, merge, unflatten
  • Built-in LOINC/SNOMED/CVX → snake_case column registry (150+ codes)
  • to_prompt() for LLM-ready output
  • FHIR server direct fetch (flat-fhir fetch <url>)
  • Merge records from multiple providers
  • DiagnosticReport support
  • Parquet export
  • Auto-codegen from FHIR R4 spec (sync_fhir_spec.py)
  • 99% test coverage with external FHIR fixtures
  • Google Sheets export
  • Streaming flatten for massive bundles
  • SMART on FHIR auth for direct patient access
  • Deduplication across merged bundles

Project Philosophy

  1. FHIR is the source of truth. We don't replace it. We make it accessible.
  2. Snake case everything. No camelCase, no abbreviation soup. blood_pressure_systolic_mmhg not BPSys.
  3. Doctor-first rows. Every row is primary_key | who | when | ...whats. One row per performer per event.
  4. Sparse is fine. A 200-column sheet that's 95% empty is better than nested JSON nobody can read.
  5. Round-trip or bust. If you can't go back to FHIR, the flattening is lossy and therefore broken.
  6. LLMs are first-class consumers. Every design decision asks "can an LLM reason over this?"

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flat_fhir-0.1.1.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flat_fhir-0.1.1-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file flat_fhir-0.1.1.tar.gz.

File metadata

  • Download URL: flat_fhir-0.1.1.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flat_fhir-0.1.1.tar.gz
Algorithm Hash digest
SHA256 de6ab74a3d2a97f790a6ccc3ae6e5426f8845d72fcffc516b43027026a8ec0a5
MD5 6efb438ea072d755c1f78586f2a30131
BLAKE2b-256 4416185b0a7e34a2fedf7e6c143c3aacb75604f65a56616bcb4d8f8ada198d43

See more details on using hashes here.

Provenance

The following attestation bundles were made for flat_fhir-0.1.1.tar.gz:

Publisher: publish.yml on Ghost---Shadow/flat-fhir

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flat_fhir-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: flat_fhir-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flat_fhir-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 247a5e63af8ad09daedb08024d33f4bffe97f1b7fc2cd126f296b3ec32c8440d
MD5 8b9d99f69f87cb76bb2fb2de579a7539
BLAKE2b-256 e8bae161664546b726fa99118fdef090e5d8e21285894a4c61a020dd6d833a62

See more details on using hashes here.

Provenance

The following attestation bundles were made for flat_fhir-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Ghost---Shadow/flat-fhir

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page