Convert FHIR JSON bundles into flat, LLM-friendly XLSX spreadsheets.
Project description
flat-fhir
Convert FHIR JSON bundles into flat, LLM-friendly XLSX spreadsheets.
Machines speak FHIR. Humans and LLMs get spreadsheets.
Why
FHIR is the right canonical format for medical interoperability. But nested JSON with cross-references is terrible for:
- Doctors who want to eyeball trends in a spreadsheet
- Patients who want to carry their records on a USB stick
- LLMs that reason far better over flat tabular data than deeply nested JSON
- Rural clinics that have Excel but not a FHIR parser
flat-fhir bridges that gap. FHIR stays the source of truth. XLSX becomes the universal export.
Install
pip install flat-fhir
Quick Start
from flat_fhir import flatten
# From a FHIR Bundle JSON file
flatten("patient_bundle.json", output="patient_record.xlsx")
# From a Python dict
import json
with open("patient_bundle.json") as f:
bundle = json.load(f)
flatten(bundle, output="patient_record.xlsx")
What It Produces
A multi-sheet XLSX workbook. Every row follows the same structure:
primary_key | performer (who) | timestamp (when) | ...all the whats
- primary_key — deterministic hash, unique per row
- performer — the doctor, lab, or pathologist who recorded it
- timestamp — when the observation/event happened (ISO 8601)
- ...whats — the actual clinical data, flattened into columns
If a blood culture and liver ultrasound are ordered on the same day by different doctors, they produce two separate rows. One row per performer per event. Sparse by design — most cells are empty because you only measure a few things per visit.
Sheets
| Sheet | Description | Example Columns |
|---|---|---|
patient |
Demographics (single row) | family_name, given_name, birth_date, gender, phone, email, address |
vitals |
Physical measurements | blood_pressure_systolic_mmhg, blood_pressure_diastolic_mmhg, heart_rate_bpm, body_temperature_c, oxygen_saturation_pct, respiratory_rate_bpm, body_weight_kg, body_height_cm, bmi_kg_m2 |
labs |
Laboratory results | hba1c_pct, blood_glucose_fasting_mg_dl, total_cholesterol_mg_dl, creatinine_mg_dl, egfr_ml_min, tsh_miu_l, hemoglobin_g_dl, wbc_k_ul, troponin_t_hs_ng_l |
medications |
Prescriptions | medication_name, rxnorm_code, dose, dose_unit, frequency, route, status, prescriber |
conditions |
Diagnoses | condition_name, icd_code, snomed_code, clinical_status, verification_status, severity, category, onset_date, abatement_date |
allergies |
Allergy & intolerance records | substance, reaction, severity, criticality, type, category, clinical_status |
procedures |
Surgical & clinical procedures | procedure_name, snomed_code, body_site, outcome, status |
imaging |
Imaging studies | modality, body_site, dicom_uid, findings |
immunizations |
Vaccination records | vaccine_name, cvx_code, dose, dose_unit, site, lot_number, status |
encounters |
Visit metadata | class, type, facility, reason, discharge_disposition, status, period_start, period_end |
diagnostic_reports |
Panel-level lab groupings | report_name, loinc_code, category, status, conclusion, result_references |
Common Columns
Every sheet (except patient) includes these columns in this order:
| Column | Description |
|---|---|
primary_key |
Deterministic hash — unique row identifier |
performer |
Who recorded it (doctor, lab, pathologist) |
timestamp |
When — ISO 8601 datetime |
encounter_id |
Links rows back to a specific visit |
source_resource_id |
Original FHIR resource ID for traceability |
notes |
Free text annotations |
Built-in LOINC Registry
flat-fhir ships with 100+ LOINC code mappings covering the standard FHIR R4 profiles:
- Vital signs — blood pressure panel (85354-9), heart rate, SpO2, temperature, weight, height, BMI, head circumference
- CBC — WBC, RBC, hemoglobin, hematocrit, MCV, MCH, MCHC, RDW, platelets
- BMP/CMP — glucose, BUN, creatinine, sodium, potassium, chloride, CO2, calcium, ALT, AST, ALP, bilirubin, albumin, total protein
- Lipid panel — total cholesterol, HDL, LDL (direct & calculated), triglycerides
- Thyroid — TSH, free T4, free T3
- HbA1c — standard and IFCC
- Renal — eGFR, creatinine
- Coagulation — PT, INR, aPTT, fibrinogen, D-dimer
- Cardiac markers — troponin T/I (standard & high-sensitivity), BNP, NT-proBNP
- Blood gases — pH, pCO2, pO2, bicarbonate, base excess, lactate
- Inflammatory — CRP (standard & high-sensitivity), ESR
- Vitamins — vitamin D, B12, folate, iron, ferritin
- Urinalysis — specific gravity, pH, protein, glucose, ketones, blood, nitrite, leukocyte esterase
Naming Convention
All column headers use pythonic snake_case with the unit appended:
blood_pressure_systolic_mmhg
body_weight_kg
hba1c_pct
blood_glucose_fasting_mg_dl
oxygen_saturation_pct
body_temperature_c
troponin_t_hs_ng_l
The mapping from FHIR codes (LOINC, SNOMED, RxNorm, CVX) to human-readable column names is maintained in a built-in registry.
CLI
# Basic conversion
flat-fhir convert patient_bundle.json -o patient_record.xlsx
# Only specific sheets
flat-fhir convert patient_bundle.json -o record.xlsx --sheets vitals,labs,medications
# From a FHIR server (auto-appends $everything for Patient URLs)
flat-fhir fetch https://fhir.example.com/Patient/123 -o record.xlsx
# Merge multiple bundles (e.g., from different providers)
flat-fhir merge hospital_a.json hospital_b.json clinic.json -o merged.xlsx
# Reverse: XLSX back to FHIR Bundle JSON
flat-fhir unflatten patient_record.xlsx -o patient_bundle.json
Python API
from flat_fhir import flatten, unflatten, merge, FlatFHIR
# Basic flatten
flatten("bundle.json", output="record.xlsx")
# Fine-grained control
ff = FlatFHIR("bundle.json")
# Access individual sheets as pandas DataFrames
vitals_df = ff.vitals
labs_df = ff.labs
# Filter and query
recent_bp = ff.vitals[ff.vitals["timestamp"] > "2025-01-01"][
["timestamp", "blood_pressure_systolic_mmhg", "blood_pressure_diastolic_mmhg"]
]
# Plot directly
ff.labs.plot(x="timestamp", y="hba1c_pct", title="HbA1c Over Time")
# Export specific sheets
ff.to_xlsx("record.xlsx", sheets=["vitals", "labs"])
# Export to other formats
ff.to_csv("vitals.csv", sheet="vitals")
ff.to_parquet("record.parquet")
ff.to_json("record_flat.json") # flat JSON, not FHIR
# Round-trip back to FHIR
fhir_bundle = ff.to_fhir()
unflatten("record.xlsx", output="bundle.json")
# Merge records from multiple providers
merged = merge(["hospital_a.json", "clinic_b.json"])
merged.to_xlsx("complete_record.xlsx")
Custom Column Registry
The built-in registry maps LOINC/SNOMED codes to snake_case column names. You can extend it:
from flat_fhir import registry
# Add a custom mapping
registry.add("85354-9", "blood_pressure_systolic_mmhg", sheet="vitals")
# Or load a custom registry file
registry.load("my_mappings.json")
# See all registered mappings
print(registry.list())
LLM Integration
The whole point. Flat FHIR spreadsheets are purpose-built for LLM consumption:
from flat_fhir import FlatFHIR
ff = FlatFHIR("bundle.json")
# Generate a compact text summary for an LLM prompt
prompt_context = ff.to_prompt()
# Or get a specific sheet as markdown table
vitals_table = ff.vitals.to_markdown()
# Token-aware truncation — keeps the most recent N rows
prompt_context = ff.to_prompt(max_tokens=2000)
Example output of ff.to_prompt():
## Patient: Amir Khan, M, DOB 1990-05-15
## Active Conditions
- Essential hypertension (diagnosed 2024-09-12)
- Type 2 diabetes mellitus without complications (diagnosed 2024-01-10)
## Allergies
- Penicillin (causes Skin rash)
## Active Medications
| medication | dose | frequency |
|---|---|---|
| Lisinopril 10 MG Oral Tablet | 10 mg | once daily |
| Metformin 500 MG Oral Tablet | 500 mg | twice daily |
## Vitals (last 2)
| timestamp | bp_sys | bp_dia | temp_c | weight_kg | hr | spo2 |
|---|---|---|---|---|---|---|
| 2024-12-01 | 115 | 74 | 36.5 | 75.2 | 64 | 99 |
| 2024-09-12 | 118 | 76 | — | 75.8 | 66 | — |
## Labs (last 1)
| timestamp | hba1c | glucose_fasting | cholesterol | triglycerides |
|---|---|---|---|---|
| 2024-12-01 | 6.3 | 102 | 180 | 125 |
Round-Trip Fidelity
flat-fhir preserves FHIR resource IDs and code systems so you can go back:
FHIR Bundle JSON → flatten → XLSX → unflatten → FHIR Bundle JSON
The source_resource_id column on every sheet ensures nothing gets lost. LOINC, SNOMED, ICD, RxNorm, and CVX codes are stored alongside human-readable names so the round-trip is lossless.
Testing
178 tests covering every extractor, output format, CLI command, and round-trip path. Tested against 13 external FHIR samples from HL7, IHE, SMART on FHIR, and Synthea.
# Run tests
pytest
# Run with coverage
pytest --cov=flat_fhir --cov-report=term-missing
# Current coverage: 99%
Registry Codegen
The LOINC registry is auto-generated from the FHIR R4 spec. To regenerate after a spec update:
python scripts/sync_fhir_spec.py --synthea
This downloads the FHIR R4 definitions, parses vital signs LOINC codes, scans Synthea sample bundles for real-world codes, and regenerates flat_fhir/registry.py.
Roadmap
- Core flatten/unflatten for all major FHIR resource types
- CLI with
convert,fetch,merge,unflatten - Built-in LOINC/SNOMED/CVX → snake_case column registry (150+ codes)
-
to_prompt()for LLM-ready output - FHIR server direct fetch (
flat-fhir fetch <url>) - Merge records from multiple providers
- DiagnosticReport support
- Parquet export
- Auto-codegen from FHIR R4 spec (
sync_fhir_spec.py) - 99% test coverage with external FHIR fixtures
- Google Sheets export
- Streaming flatten for massive bundles
- SMART on FHIR auth for direct patient access
- Deduplication across merged bundles
Project Philosophy
- FHIR is the source of truth. We don't replace it. We make it accessible.
- Snake case everything. No camelCase, no abbreviation soup.
blood_pressure_systolic_mmhgnotBPSys. - Doctor-first rows. Every row is
primary_key | who | when | ...whats. One row per performer per event. - Sparse is fine. A 200-column sheet that's 95% empty is better than nested JSON nobody can read.
- Round-trip or bust. If you can't go back to FHIR, the flattening is lossy and therefore broken.
- LLMs are first-class consumers. Every design decision asks "can an LLM reason over this?"
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flat_fhir-0.1.1.tar.gz.
File metadata
- Download URL: flat_fhir-0.1.1.tar.gz
- Upload date:
- Size: 38.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de6ab74a3d2a97f790a6ccc3ae6e5426f8845d72fcffc516b43027026a8ec0a5
|
|
| MD5 |
6efb438ea072d755c1f78586f2a30131
|
|
| BLAKE2b-256 |
4416185b0a7e34a2fedf7e6c143c3aacb75604f65a56616bcb4d8f8ada198d43
|
Provenance
The following attestation bundles were made for flat_fhir-0.1.1.tar.gz:
Publisher:
publish.yml on Ghost---Shadow/flat-fhir
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flat_fhir-0.1.1.tar.gz -
Subject digest:
de6ab74a3d2a97f790a6ccc3ae6e5426f8845d72fcffc516b43027026a8ec0a5 - Sigstore transparency entry: 1153974325
- Sigstore integration time:
-
Permalink:
Ghost---Shadow/flat-fhir@d56e752697b8ae382baf579176ce3611cae1e7a8 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Ghost---Shadow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d56e752697b8ae382baf579176ce3611cae1e7a8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file flat_fhir-0.1.1-py3-none-any.whl.
File metadata
- Download URL: flat_fhir-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
247a5e63af8ad09daedb08024d33f4bffe97f1b7fc2cd126f296b3ec32c8440d
|
|
| MD5 |
8b9d99f69f87cb76bb2fb2de579a7539
|
|
| BLAKE2b-256 |
e8bae161664546b726fa99118fdef090e5d8e21285894a4c61a020dd6d833a62
|
Provenance
The following attestation bundles were made for flat_fhir-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on Ghost---Shadow/flat-fhir
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flat_fhir-0.1.1-py3-none-any.whl -
Subject digest:
247a5e63af8ad09daedb08024d33f4bffe97f1b7fc2cd126f296b3ec32c8440d - Sigstore transparency entry: 1153974327
- Sigstore integration time:
-
Permalink:
Ghost---Shadow/flat-fhir@d56e752697b8ae382baf579176ce3611cae1e7a8 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Ghost---Shadow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d56e752697b8ae382baf579176ce3611cae1e7a8 -
Trigger Event:
push
-
Statement type: