Skip to main content

A tiny EHR dataset for learning, prototyping, and building — 100 patients in MIMIC and OMOP formats.

Project description

TinyEHR : A Tiny Electronic Health Records Dataset for Learning, Prototyping, and Building

TinyEHR is a small, open, reproducible clinical dataset with 100 patients available in two formats - MIMIC and OMOP. It is derived from the MIMIC-IV Clinical Database Demo v2.2, the publicly available subset of MIMIC-IV published by the MIT Laboratory for Computational Physiology.

Open and ready to use — no credentialing and no data use agreements. Install and start exploring clinical data in seconds.

Website tinyehr.org
GitHub github.com/vidulpanickan/TinyEHR
HuggingFace datasets/vidulpanickan/TinyEHR
PyPI pip install tinyehr

Install

pip install tinyehr

Python API

import tinyehr

# Quick reference of all functions
tinyehr.help()

# Overview of all tables with row counts
tinyehr.info()
tinyehr.info(format="tinyehr_omop_format")

# List table names
tinyehr.list_tables()
tinyehr.list_tables(format="tinyehr_omop_format")

# Column names, types, and sample rows for a table
tinyehr.describe_table("patients")
tinyehr.describe_table("person", format="tinyehr_omop_format")

# Find tables by keyword in table and column names
tinyehr.search_tables("lab")
tinyehr.search_tables("drug")

# Load a table as a pandas DataFrame
patients = tinyehr.load_table("patients")
person = tinyehr.load_table("person", format="tinyehr_omop_format")

# All data for one patient across all tables
data = tinyehr.get_patient(10000032)
data["admissions"]    # DataFrame of this patient's admissions
data["labevents"]     # DataFrame of this patient's labs
data["noteevents"]    # DataFrame of this patient's notes

# Build a local SQLite database
db_path = tinyehr.build_sqlite(format="tinyehr_mimic_format")
db_path = tinyehr.build_sqlite(format="tinyehr_omop_format")

# Query the SQLite database
import sqlite3
conn = sqlite3.connect(db_path)
conn.execute("SELECT * FROM admissions LIMIT 5").fetchall()

Direct from HuggingFace

import pandas as pd

patients = pd.read_parquet(
    "hf://datasets/vidulpanickan/tinyehr/tinyehr_mimic_format/patients.parquet"
)

No dependencies beyond pandas and pyarrow.

Trouble downloading?

You can download the raw CSV files directly from GitHub:

  1. Go to github.com/vidulpanickan/TinyEHR
  2. Click the green Code button
  3. Select Download ZIP

Or clone via terminal:

git clone https://github.com/vidulpanickan/TinyEHR.git

Formats

TinyEHR ships in two formats from the same underlying patient cohort:

MIMIC format follows the original MIMIC-IV schema with dates shifted to realistic years, ICD codes reformatted with decimal points, and 4,580 synthetic clinical notes added.

OMOP format follows the OHDSI CDM v5.3.1 schema with hashed person IDs, dates shifted to realistic years, ICD codes formatted with periods, and clinical codes mapped to SNOMED, LOINC, and RxNorm via a custom MIMIC specific concept vocabulary.

For full dataset structure, schema documentation, and table details, visit tinyehr.org.

Differences from MIMIC-IV Demo

TinyEHR applies four targeted transformations to the original MIMIC-IV Demo data. All clinical values, patient demographics, table structures, referential integrity, and row counts are unchanged.

Transformation What changed Why
Date shifting All dates shifted from synthetic 2100+ range to realistic 2010s-2020s using per-patient offsets derived from anchor_year_group. Affects 21 MIMIC tables and 15 OMOP tables. Offsets saved in metadata/date_offsets.csv. Realistic dates for teaching and prototyping.
ICD code formatting Decimal points inserted into ICD codes (E119 - E11.9, V707 - V70.7). ICD-10-PCS codes left unchanged. Affects diagnoses_icd, d_icd_diagnoses, procedures_icd, d_icd_procedures (MIMIC) and condition_source_value, procedure_source_value (OMOP). Matches real-world clinical code formatting.
Synthetic clinical notes 4,580 notes across 14 types added (not present in original Demo). Generated using a large language model, grounded in each patient's demographics, diagnoses, and admission data. Added as noteevents (MIMIC) and note (OMOP) with proper concept mappings. The original Demo has no clinical notes.
OMOP note concepts 19 note-related concepts added to 2b_concept.csv (10 Note Type, 7 LOINC Document Ontology, 2 utility). Row count: 3,885 - 3,904. Required for OMOP note table concept references.

License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyehr-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinyehr-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file tinyehr-0.1.0.tar.gz.

File metadata

  • Download URL: tinyehr-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tinyehr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 643ecf7f2975db16b05867329c76f85a7b320795e57e7c92a36740c53e79f0ba
MD5 ffd58e9a7f8bdd06d2e90b83eb0fcab2
BLAKE2b-256 45d09684d40c0e94c5516f3b0a8832725d6d73cf2024b833b23540785c6d8ee1

See more details on using hashes here.

File details

Details for the file tinyehr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tinyehr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tinyehr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c06f849d96041be86dcafca751ffc258d9e8452abc780fd925e7a4feb8fadf9
MD5 84054b0677051a611a1b6118ec14feec
BLAKE2b-256 980a438a764eb067be89978986437211393167c53ee62402abdf3cec25252533

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page