Skip to main content

Type-safe Pydantic models for Synthea health data CSV exports

Project description

synthea-pydantic

License: MIT Python 3.12+

Type-safe Pydantic models for parsing and validating Synthea's synthetic healthcare data CSV exports.

Overview

synthea-pydantic provides lightweight, type-annotated Pydantic models that make it easy to work with Synthea's CSV output format in Python. Synthea is a synthetic patient generator that creates realistic (but not real) patient health records for research, education, and software development.

Key Features

  • 🏥 Complete Coverage: Models for all 19 Synthea CSV export types
  • 🔍 Type Safety: Full type annotations with proper validation
  • 🚀 Easy to Use: Simple API that works with standard CSV libraries
  • 📋 Well Documented: Comprehensive field descriptions from Synthea specifications
  • 🔧 Flexible: Handles optional fields and empty values gracefully
  • Lightweight: Minimal dependencies (just Pydantic)

Installation

pip install synthea-pydantic

Or with uv:

uv pip install synthea-pydantic

Quick Start

import csv
from synthea_pydantic import Patient, Medication, Condition

# Load patients from CSV
with open('patients.csv') as f:
    reader = csv.DictReader(f)
    patients = [Patient(**row) for row in reader]

# Access patient data with full type safety
for patient in patients:
    print(f"{patient.first} {patient.last} - Born: {patient.birthdate}")
    if patient.deathdate:
        print(f"  Died: {patient.deathdate}")

# Load related data
with open('medications.csv') as f:
    reader = csv.DictReader(f)
    medications = [Medication(**row) for row in reader]

# Filter medications for a specific patient
patient_meds = [m for m in medications if m.patient == patient.id]

Supported Models

synthea-pydantic includes models for all Synthea CSV export types:

Model Description Key Fields
Patient Patient demographics id, birthdate, name, address, ssn
Encounter Healthcare encounters id, patient, start/stop, type, provider
Condition Medical conditions patient, code, description, onset
Medication Prescriptions patient, code, description, start/stop
Observation Clinical observations patient, code, value, units
Procedure Medical procedures patient, code, description, date
Immunization Vaccination records patient, code, date
CarePlan Treatment plans patient, code, activities
Allergy Allergy records patient, code, description
Device Medical devices patient, code, start/stop
Supply Medical supplies patient, code, quantity
Organization Healthcare facilities id, name, address, phone
Provider Healthcare providers id, name, speciality, organization
Payer Insurance companies id, name, ownership
PayerTransition Insurance changes patient, payer, start/stop
Claim Insurance claims id, patient, provider, total
ClaimTransaction Claim line items claim, type, amount
ImagingStudy Medical imaging patient, modality, body_site

Usage Examples

Loading CSV Data

The models work with Python's built-in csv module:

import csv
from synthea_pydantic import Patient

# Load from CSV file
with open('data/patients.csv') as f:
    reader = csv.DictReader(f)
    patients = [Patient(**row) for row in reader]

Working with Optional Fields

Synthea CSVs often have empty values. The models handle these gracefully:

# Empty strings in CSV are converted to None
patient = Patient(**{
    'Id': '123e4567-e89b-12d3-a456-426614174000',
    'BIRTHDATE': '1980-01-01',
    'DEATHDATE': '',  # Empty string becomes None
    'PREFIX': '',     # Empty string becomes None
    'FIRST': 'John',
    'LAST': 'Doe',
    # ... other required fields
})

assert patient.deathdate is None
assert patient.prefix is None

Type Validation

All fields are validated according to their types:

from decimal import Decimal
from datetime import date, datetime
from uuid import UUID

# UUIDs are automatically parsed
assert isinstance(patient.id, UUID)

# Dates are parsed from YYYY-MM-DD format
assert isinstance(patient.birthdate, date)

# Decimals maintain precision for monetary values
assert isinstance(patient.healthcare_expenses, Decimal)

Linking Related Data

Use the UUID foreign keys to link related records:

# Find all medications for a patient
patient_meds = [
    med for med in medications 
    if med.patient == patient.id
]

# Find all conditions treated in an encounter
encounter_conditions = [
    cond for cond in conditions 
    if cond.encounter == encounter.id
]

Error Handling

The models provide clear error messages for invalid data:

try:
    patient = Patient(**invalid_data)
except ValidationError as e:
    print(f"Validation failed: {e}")

Model Details

Common Field Types

  • IDs: UUID fields for primary and foreign keys
  • Dates: date fields for dates (YYYY-MM-DD)
  • Timestamps: datetime fields for date/time values
  • Money: Decimal fields for monetary amounts
  • Codes: String fields for medical codes (SNOMED-CT, RxNorm, etc.)

Base Model Features

All models inherit from SyntheaBaseModel which provides:

  • Automatic whitespace stripping
  • Empty string to None conversion
  • Case-insensitive literal field matching
  • Field alias support for CSV column mapping

Development

Setup

To develop or contribute to synthea-pydantic:

# Clone the repository
git clone https://github.com/yourusername/synthea-pydantic.git
cd synthea-pydantic

# Install in development mode
uv pip install -e .

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=synthea_pydantic

# Run specific test file
uv run pytest tests/test_patients.py

Code Quality

# Type checking
uv run mypy synthea_pydantic/

# Linting
uv run ruff check

# Formatting
uv run ruff format

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Synthea - The synthetic patient generator
  • Pydantic - Data validation using Python type annotations

Resources

Citation

Synthea is a registered trademark of The MITRE Corporation.

Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, Scott McLachlan, Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record, Journal of the American Medical Informatics Association, Volume 25, Issue 3, March 2018, Pages 230–238, https://doi.org/10.1093/jamia/ocx079

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthea_pydantic-0.1.1.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthea_pydantic-0.1.1-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file synthea_pydantic-0.1.1.tar.gz.

File metadata

  • Download URL: synthea_pydantic-0.1.1.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for synthea_pydantic-0.1.1.tar.gz
Algorithm Hash digest
SHA256 37c697083cfd613bcfcdc3c92654312f9293651a5faf760a5ef3a9d3517b2aeb
MD5 f0d4a2ea1d09ff27d9a6ba7f32c8e936
BLAKE2b-256 4bdfb8fff86dc04287361b4874a80789001f929dcc9bf5a4d9152fcb7c85f6e1

See more details on using hashes here.

File details

Details for the file synthea_pydantic-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for synthea_pydantic-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 66c6e944763e6eaffb221c597c9e2842575fdf935f6065c3ecfd7a9cee657890
MD5 2471d8a7dc298b6e47784bd30fcead07
BLAKE2b-256 593d84d7a4d4baa983e4a83b2088f09bf57ea6a74873391ee19ad1789507839c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page