Skip to main content

Unified platform for longitudinal EHR experiments across ML, DL, and LLM agents

Project description

OneEHR

Python 3.12+ License: MIT Docs

OneEHR is a unified Python platform for longitudinal EHR experiments across ML, DL, and LLM agents. It provides shared infrastructure for preprocessing, modeling, testing, and analysis on one shared run contract — the first toolkit bridging classical machine learning, deep learning, and agentic AI for clinical prediction.

Key Features

  • 38 model architectures — tabular ML, recurrent/non-recurrent DL, irregular-time, KG-enhanced, and survival models
  • Unified ML/DL/LLM comparison — all predictions in one predictions.parquet with bootstrap CI and statistical tests
  • Dataset converters — built-in support for MIMIC-III, MIMIC-IV, and eICU
  • Medical code ontologies — ICD-9/10 mapping, CCS grouping, ATC drug hierarchy
  • Survival analysis — DeepSurv, DeepHit, concordance index, Kaplan-Meier visualization
  • Fairness & interpretability — demographic parity, equalized odds, SHAP, LIME, integrated gradients, attention visualization
  • Publication-quality figures — ROC, PR, calibration, DCA, forest plots, KM curves with Nature/Lancet style presets
  • Reproducibility by design — single TOML config = complete experiment specification

Workflow At A Glance

oneehr preprocess --config experiment.toml   # Bin features, split patients
oneehr train      --config experiment.toml   # Train ML/DL models
oneehr test       --config experiment.toml   # Evaluate on test set
oneehr analyze    --config experiment.toml   # Cross-system comparison
oneehr plot       --config experiment.toml   # Publication figures

All commands operate on the same run directory under {output.root}/{output.run_name}/.

Install

OneEHR requires Python 3.12+.

pip install oneehr

# Or from source:
uv venv .venv --python 3.12
uv pip install -e .
oneehr --help

Quickstart

Use the bundled TJH COVID-19 ICU example:

# Convert source data (only needed once)
python examples/tjh/convert.py

# Run the full pipeline
oneehr preprocess --config examples/tjh/mortality_patient.toml
oneehr train      --config examples/tjh/mortality_patient.toml
oneehr test       --config examples/tjh/mortality_patient.toml
oneehr analyze    --config examples/tjh/mortality_patient.toml

Or use the Python API:

import oneehr

config = oneehr.load_config("examples/tjh/mortality_patient.toml")
oneehr.preprocess(config)
oneehr.train(config)
oneehr.test(config)
oneehr.analyze(config)

Dataset Converters

Convert standard clinical datasets into OneEHR's three-table format:

# MIMIC-III
oneehr convert --dataset mimic3 --raw-dir /path/to/mimic3 --output-dir data/mimic3/ --task mortality

# MIMIC-IV
oneehr convert --dataset mimic4 --raw-dir /path/to/mimic4 --output-dir data/mimic4/ --task mortality

# eICU
oneehr convert --dataset eicu --raw-dir /path/to/eicu --output-dir data/eicu/ --task mortality

Each converter produces labels for mortality, readmission, and length-of-stay tasks.

Models

OneEHR ships 38 model architectures:

Category Models
Tabular ML XGBoost, CatBoost, Random Forest, Decision Tree, GBDT, Logistic Regression
Recurrent GRU, LSTM, RNN, GRU-D, Dipole, HiTANet, M3Care, PAI
Non-recurrent CNN, TCN, Transformer, SAnD, MLP, Deepr, EHR-Mamba, Jamba, LSAN
Irregular-time mTAND, Raindrop, ContiFormer, TECO
EHR-specialised AdaCare, StageNet, RETAIN, ConCare, GRASP, MCGRU, DrAgent, PRISM, SAFARI
KG-enhanced GraphCare, KerPrint, ProtoEHR
Survival DeepSurv, DeepHit

Models with static branches (ConCare, GRASP, MCGRU, DrAgent, PRISM, SAFARI, TECO) automatically use patient-level static features when static.csv is provided.

Task Types

Task Config Description
Binary classification kind = "binary" Mortality, readmission, etc.
Multiclass kind = "multiclass" Phenotyping, diagnosis groups
Regression kind = "regression" Length of stay, lab value prediction
Survival kind = "survival" Time-to-event with censoring
Multi-label kind = "multilabel" ICD coding, multi-diagnosis

Medical Code Ontologies

from oneehr.medcode import ICD9, ICD10, CodeMapper, CCSGrouper, ATCHierarchy

# ICD code utilities
ICD9.chapter("401.9")    # → "Circulatory system"
ICD10.category("I10.0")  # → "I10"

# Aggregate codes by ontology for dimensionality reduction
mapper = CodeMapper()
mapper.add_icd_chapter_mapping(version=9)
mapped_events = mapper.apply(events_df)

Configuration

OneEHR uses TOML as the experiment contract:

  • [dataset] — input table paths (dynamic, static, label)
  • [preprocess] — binning, feature engineering, preprocessing pipeline
  • [task] — task kind and prediction mode (patient or time)
  • [split] — patient-level train/val/test splitting
  • [[models]] — model selection with per-model params
  • [trainer] — DL training config (mixed precision, LR schedulers, early stopping)
  • [[systems]] — LLM/agent system definitions
  • [output] — run root and run name

Tutorials

Tutorial Description
01 Quickstart End-to-end TJH mortality prediction
02 Custom Dataset Bring your own data + medical code mapping
03 Model Comparison ML vs DL with bootstrap CI and statistical tests
04 Fairness & Explainability Bias detection + feature importance
05 Survival Analysis DeepSurv, C-index, Kaplan-Meier curves

Documentation

Full documentation: medx-pku.github.io/OneEHR/

Build docs locally:

uvx --from "mkdocs @ git+https://github.com/encode/mkdocs.git" mkdocs serve

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Validation

pytest tests/ -v                                                    # 150 tests
oneehr preprocess --config examples/tjh/mortality_patient.toml      # End-to-end

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oneehr-0.1.1.tar.gz (159.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oneehr-0.1.1-py3-none-any.whl (198.9 kB view details)

Uploaded Python 3

File details

Details for the file oneehr-0.1.1.tar.gz.

File metadata

  • Download URL: oneehr-0.1.1.tar.gz
  • Upload date:
  • Size: 159.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oneehr-0.1.1.tar.gz
Algorithm Hash digest
SHA256 89bffcdaff2936b7fb8e472eb2a559f8523cd3e339c3c4b85c61c5aa8cfa655a
MD5 a88326087bd03ac635f5f1faacc224eb
BLAKE2b-256 e6f9f4f045224678cb5ddaabc8f259b24ccd81744843d738efa6d43106307657

See more details on using hashes here.

Provenance

The following attestation bundles were made for oneehr-0.1.1.tar.gz:

Publisher: publish.yml on MedX-PKU/OneEHR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oneehr-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: oneehr-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 198.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for oneehr-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ec115a6c0fa2299c03fb77759c83a62cd6f5ff215076436000afac905f630dd
MD5 3e981c6573781e77da8291468f0eee14
BLAKE2b-256 8fcb1c453dcaa607512bf29481c82a2cf41c0855abfdaafdea140304f35d5ef6

See more details on using hashes here.

Provenance

The following attestation bundles were made for oneehr-0.1.1-py3-none-any.whl:

Publisher: publish.yml on MedX-PKU/OneEHR

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page