Type-safe Pydantic models for Synthea health data CSV exports
Project description
synthea-pydantic
Type-safe Pydantic models for parsing and validating Synthea's synthetic healthcare data CSV exports.
Overview
synthea-pydantic provides lightweight, type-annotated Pydantic models that make it easy to work with Synthea's CSV output format in Python. Synthea is a synthetic patient generator that creates realistic (but not real) patient health records for research, education, and software development.
Key Features
- 🏥 Complete Coverage: Models for all 19 Synthea CSV export types
- 🔍 Type Safety: Full type annotations with proper validation
- 🚀 Easy to Use: Simple API that works with standard CSV libraries
- 📋 Well Documented: Comprehensive field descriptions from Synthea specifications
- 🔧 Flexible: Handles optional fields and empty values gracefully
- ⚡ Lightweight: Minimal dependencies (just Pydantic)
Installation
pip install synthea-pydantic
Or with uv:
uv pip install synthea-pydantic
Quick Start
import csv
from synthea_pydantic import Patient, Medication, Condition
# Load patients from CSV
with open('patients.csv') as f:
reader = csv.DictReader(f)
patients = [Patient(**row) for row in reader]
# Access patient data with full type safety
for patient in patients:
print(f"{patient.first} {patient.last} - Born: {patient.birthdate}")
if patient.deathdate:
print(f" Died: {patient.deathdate}")
# Load related data
with open('medications.csv') as f:
reader = csv.DictReader(f)
medications = [Medication(**row) for row in reader]
# Filter medications for a specific patient
patient_meds = [m for m in medications if m.patient == patient.id]
Supported Models
synthea-pydantic includes models for all Synthea CSV export types:
| Model | Description | Key Fields |
|---|---|---|
Patient |
Patient demographics | id, birthdate, name, address, ssn |
Encounter |
Healthcare encounters | id, patient, start/stop, type, provider |
Condition |
Medical conditions | patient, code, description, onset |
Medication |
Prescriptions | patient, code, description, start/stop |
Observation |
Clinical observations | patient, code, value, units |
Procedure |
Medical procedures | patient, code, description, date |
Immunization |
Vaccination records | patient, code, date |
CarePlan |
Treatment plans | patient, code, activities |
Allergy |
Allergy records | patient, code, description |
Device |
Medical devices | patient, code, start/stop |
Supply |
Medical supplies | patient, code, quantity |
Organization |
Healthcare facilities | id, name, address, phone |
Provider |
Healthcare providers | id, name, speciality, organization |
Payer |
Insurance companies | id, name, ownership |
PayerTransition |
Insurance changes | patient, payer, start/stop |
Claim |
Insurance claims | id, patient, provider, total |
ClaimTransaction |
Claim line items | claim, type, amount |
ImagingStudy |
Medical imaging | patient, modality, body_site |
Usage Examples
Loading CSV Data
The models work with Python's built-in csv module:
import csv
from synthea_pydantic import Patient
# Load from CSV file
with open('data/patients.csv') as f:
reader = csv.DictReader(f)
patients = [Patient(**row) for row in reader]
Working with Optional Fields
Synthea CSVs often have empty values. The models handle these gracefully:
# Empty strings in CSV are converted to None
patient = Patient(**{
'Id': '123e4567-e89b-12d3-a456-426614174000',
'BIRTHDATE': '1980-01-01',
'DEATHDATE': '', # Empty string becomes None
'PREFIX': '', # Empty string becomes None
'FIRST': 'John',
'LAST': 'Doe',
# ... other required fields
})
assert patient.deathdate is None
assert patient.prefix is None
Type Validation
All fields are validated according to their types:
from decimal import Decimal
from datetime import date, datetime
from uuid import UUID
# UUIDs are automatically parsed
assert isinstance(patient.id, UUID)
# Dates are parsed from YYYY-MM-DD format
assert isinstance(patient.birthdate, date)
# Decimals maintain precision for monetary values
assert isinstance(patient.healthcare_expenses, Decimal)
Linking Related Data
Use the UUID foreign keys to link related records:
# Find all medications for a patient
patient_meds = [
med for med in medications
if med.patient == patient.id
]
# Find all conditions treated in an encounter
encounter_conditions = [
cond for cond in conditions
if cond.encounter == encounter.id
]
Error Handling
The models provide clear error messages for invalid data:
try:
patient = Patient(**invalid_data)
except ValidationError as e:
print(f"Validation failed: {e}")
Model Details
Common Field Types
- IDs: UUID fields for primary and foreign keys
- Dates:
datefields for dates (YYYY-MM-DD) - Timestamps:
datetimefields for date/time values - Money:
Decimalfields for monetary amounts - Codes: String fields for medical codes (SNOMED-CT, RxNorm, etc.)
Base Model Features
All models inherit from SyntheaBaseModel which provides:
- Automatic whitespace stripping
- Empty string to None conversion
- Case-insensitive literal field matching
- Field alias support for CSV column mapping
Development
Setup
To develop or contribute to synthea-pydantic:
# Clone the repository
git clone https://github.com/yourusername/synthea-pydantic.git
cd synthea-pydantic
# Install in development mode
uv pip install -e .
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=synthea_pydantic
# Run specific test file
uv run pytest tests/test_patients.py
Code Quality
# Type checking
uv run mypy synthea_pydantic/
# Linting
uv run ruff check
# Formatting
uv run ruff format
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Resources
Citation
Synthea is a registered trademark of The MITRE Corporation.
Jason Walonoski, Mark Kramer, Joseph Nichols, Andre Quina, Chris Moesel, Dylan Hall, Carlton Duffett, Kudakwashe Dube, Thomas Gallagher, Scott McLachlan, Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record, Journal of the American Medical Informatics Association, Volume 25, Issue 3, March 2018, Pages 230–238, https://doi.org/10.1093/jamia/ocx079
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthea_pydantic-0.1.1.tar.gz.
File metadata
- Download URL: synthea_pydantic-0.1.1.tar.gz
- Upload date:
- Size: 38.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37c697083cfd613bcfcdc3c92654312f9293651a5faf760a5ef3a9d3517b2aeb
|
|
| MD5 |
f0d4a2ea1d09ff27d9a6ba7f32c8e936
|
|
| BLAKE2b-256 |
4bdfb8fff86dc04287361b4874a80789001f929dcc9bf5a4d9152fcb7c85f6e1
|
File details
Details for the file synthea_pydantic-0.1.1-py3-none-any.whl.
File metadata
- Download URL: synthea_pydantic-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66c6e944763e6eaffb221c597c9e2842575fdf935f6065c3ecfd7a9cee657890
|
|
| MD5 |
2471d8a7dc298b6e47784bd30fcead07
|
|
| BLAKE2b-256 |
593d84d7a4d4baa983e4a83b2088f09bf57ea6a74873391ee19ad1789507839c
|