Skip to main content

Synthetic patient record generator (Synthea-inspired) trained on pristine-healthy episode data

Project description

๐Ÿฉบ syntha

A Synthea-inspired hybrid synthetic patient record generator โ€” learns the joint distribution of real anonymized Turkish-cohort EHR episodes with a Gaussian copula, then layers Synthea-style clinical pathways on top to emit fully-coded FHIR R4 bundles in Turkish.

CI Cross-platform Release Install buttons Codecov Latest release Downloads License: Apache 2.0 Python 3.10+ FHIR R4 Locale: tr-TR


๐Ÿ–ฅ๏ธ Desktop app โ€” generate synthetic patients without code

Download macOS Apple Silicon (.dmg) ย  Download Windows installer (.exe) ย  Download Linux AppImage

A Tauri 2 desktop app that bundles the trained Gaussian copula and samples synthetic patients fully client-side (no Python required). Pick cohort + n + seed + constraints, hit Generate, and download a CSV.

๐Ÿ“ฆ Installers are produced by the release workflow on every v* tag push and live at stable filenames (syntha_aarch64.dmg, syntha_x64-setup.exe, syntha_amd64.AppImage). The buttons above all use releases/latest/download/โ€ฆ so they track the latest release automatically โ€” no manual link maintenance per version. A daily Install-buttons verification workflow HEAD-checks each URL and opens an issue if any 404s. Source for the app lives in app/.

๐Ÿ›ก๏ธ macOS sees "syntha.app" is damaged? That's Gatekeeper's misleading error for unsigned apps. Until the signing pipeline ships (app/README.md โ†’ signing setup), strip the quarantine flag manually:

xattr -dr com.apple.quarantine /Applications/syntha.app

๐Ÿ“‘ Table of contents


๐Ÿ” Why syntha?

Synthea is the gold standard for synthetic FHIR patients, but it is rules-only and tuned to US population priors. CTGAN-style purely-generative models capture data faithfully but emit physiologically impossible tuples and have no clinical-pathway awareness. syntha gives you both:

Synthea (rules-only) CTGAN / copula-only syntha (hybrid)
Matches this cohort's lab distributions โŒ generic US priors โœ… โœ…
Coherent prescriptions per condition โœ… โŒ โœ…
Physiologically valid (BP, eGFRโ€ฆ) โœ… โš ๏ธ sometimes โœ…
LOINC + SNOMED + ICD-10 + RxNorm-coded FHIR โœ… โŒ โœ…
Longitudinal trajectories โœ… state machines โŒ โœ… drift + sticky flags
Turkish locale (names, addresses, displays) โŒ โŒ โœ…

๐ŸŽฏ What it produces

For each synthetic patient, syntha emits a FHIR R4 transaction Bundle containing:

  • ๐Ÿ‘ค Patient โ€” Turkish HumanName + Address + tr language code, derived birthDate
  • ๐Ÿงช Observation ร— ~12 โ€” LOINC-coded labs and vitals (glucose, lipid panel, CBC, LFTs, eGFR, BP, โ€ฆ)
  • ๐Ÿฉบ Condition ร— N โ€” every active comorbidity flag, dual-coded SNOMED CT + ICD-10, with English/Turkish display text
  • ๐Ÿฅ Encounter ร— M โ€” one per active condition, driven by the relevant clinical module
  • ๐Ÿ’Š MedicationRequest ร— P โ€” RxNorm-coded, dosage included
  • ๐Ÿ”ฌ Procedure ร— Q โ€” e.g. HbA1c, lipid panel, ECG, spirometry
  • ๐Ÿ“‹ CarePlan ร— R โ€” disease-specific lifestyle / monitoring plans

Plus a flat CSV that matches the input schema for drop-in use as training data.

โš ๏ธ The catch (what it is not)

  • ๐Ÿšซ Not a substitute for real PHI when validity hinges on rare events โ€” the copula reproduces the bulk of the joint distribution, not the long tails.
  • ๐Ÿšซ Not privacy-proof. Gaussian copulas are not differentially private; if the source has fewer than ~50 patients with a rare combination, syntha may reproduce that combination too closely. Do not use when the source is a small sensitive cohort without adding a DP mechanism.
  • ๐Ÿšซ No disease progression simulator yet โ€” the copula gives a cross-sectional snapshot; longitudinal mode adds plausible drift but is not a Synthea-PADM state machine. (See v0.8 in the roadmap.)
  • ๐Ÿšซ The source CSVs are anonymized retrospective Turkish-cohort episodes of healthy patients โ€” synthetic disease prevalence is lower than Turkish national averages (TรœฤฐK). If you need a population-representative Turkish cohort, calibrate per the v0.6 roadmap items.
  • โš ๏ธ Continuousโ†”binary correlations are attenuated ~50% in magnitude (signs are correct since v0.3.2). Pure Spearman rank correlation on tied binary columns is biased toward zero; the proper fix is the polyserial/tetrachoric correlation, queued as v0.4 in the roadmap. For most downstream uses (training risk models, healthy-control comparisons) this is acceptable; if you need exact labโ†”disease correlations, wait for v0.4 or contribute the fix.

๐Ÿ‡น๐Ÿ‡ท Turkish cohort + Turkish output

The training data comes from pristine_strict_episodes.csv and pristine_tolerant_episodes.csv โ€” anonymized retrospective EHR episodes from a Turkish patient cohort selected to represent clinically pristine (i.e. healthy / minimally medicated) adults. Source CSVs are never committed to this repo (gitignored).

Synthetic output is Turkish-localized:

  • Patient names sampled from common Turkish given-name and family-name distributions (src/syntha/locale/turkish.py).
  • Addresses use real Turkish cities weighted by approximate population, with ISO 3166-2:TR province codes.
  • Every Condition emits both an English SNOMED display and a clinical-Turkish translation in Condition.code.text.
  • Patient.communication is set to tr.

All clinical terminology used (LOINC, SNOMED CT, ICD-10, RxNorm) comes from open international standards โ€” no licensed terminology content is reproduced or embedded.

๐Ÿงช Use cases

Where to use it Why
๐Ÿค– Training ML risk models without exposing real PHI The copula preserves joint distributions, so a model trained on synthetic data transfers reasonably to real test sets (TSTR benchmark in v0.9).
๐Ÿงฌ Bioinformatics healthy-control cohorts The source is pristine healthy episodes โ€” use the synthetic patients as a normal-baseline group to compare against your disease cohort.
๐Ÿ› ๏ธ EHR pipeline / ETL integration testing Realistic-but-fake FHIR R4 bundles with valid LOINC/SNOMED/ICD-10/RxNorm codes are ideal for testing FHIR consumers, mapping pipelines, and OMOP/i2b2 ETLs without DPA paperwork.
๐Ÿ“š Teaching / coursework Drop-in dataset for biostatistics, epidemiology, and clinical-informatics teaching without IRB.
๐Ÿ”ฌ Data augmentation Boost rare-event coverage by oversampling synthetic patients with specific comorbidity combinations (conditional sampling lands in v0.7).

๐Ÿš€ Quick start

# 1. Install
git clone https://github.com/ArioMoniri/syntha.git
cd syntha
pip install -e .

# 2. (Optional) Ingest your source CSVs โ€” files in data/raw/ are gitignored
bash scripts/ingest_csvs.sh

# 3. Generate 1000 synthetic episodes + FHIR bundles + model card + validation report
syntha generate \
  --input data/raw/pristine_tolerant_episodes.csv \
  --output output/tolerant \
  --n 1000 --cohort tolerant

# 4. Longitudinal โ€” 500 baseline patients ร— ~4 encounters over 3 years
syntha generate \
  --input data/raw/pristine_tolerant_episodes.csv \
  --output output/tolerant_long \
  --n 2000 --cohort tolerant \
  --longitudinal --encounters-per-patient 4 --years-of-history 3

# 5. Validate any synthetic CSV against its source
syntha validate \
  --source data/raw/pristine_tolerant_episodes.csv \
  --synthetic output/tolerant/synthetic_tolerant_episodes.csv \
  --output output/tolerant/validation.json

๐Ÿ“Š Distribution fidelity

A 100-episode sample of tolerant cohort vs the full 135 569-row source:

Marginal distributions

Marginal distributions โ€” source vs synthetic

Spearman correlation structure

Spearman correlations โ€” source vs synthetic vs diff

Disease prevalence

Comorbidity prevalence โ€” source vs synthetic

Numbers (from examples/sample_output/sample_validation_report.json)

Metric Value
n (source / synthetic) 135 569 / 100
Max Kolmogorovโ€“Smirnov across continuous columns 0.14
Mean KS 0.07
Max binary-prevalence error 0.025 (has_rx_data)
Disease-prevalence error (HTN / DM / hyperlipidemia) 0.015 / 0.004 / 0.010
Spearman correlation-matrix Frobenius diff 2.94

๐Ÿ“ The KS statistic is well below the typical 0.20 "noticeable difference" threshold for every column; binary marginals (gender, disease prevalence) match to within ~1 percentage point.

๐Ÿ“ฆ Example output (embedded)

A pretty-printed sample FHIR Bundle, a 100-episode synthetic CSV, the model card, and the validation report all live under examples/sample_output/ and are tracked in git.

File Click to view (GitHub built-in viewer) What's inside
๐Ÿงพ Full FHIR Bundle (pretty) sample_bundle_pretty.json One transaction Bundle: Patient + Observations + Conditions + Encounter + MedicationRequests + Procedure + CarePlan
๐Ÿ“ก 100 bundles, NDJSON sample_bundles.ndjson Bulk-FHIR-style export, one transaction Bundle per line
๐Ÿ“Š Flat CSV sample_episodes.csv 100 synthetic episodes matching input schema
๐Ÿ—’๏ธ Model card sample_model_card.json source sha256, n_train, marginals, top correlations
โœ… Validation report sample_validation_report.json KS / Wasserstein / correlation-Frobenius per column

๐Ÿ’ก Embedded viewer. GitHub renders the linked JSON files with syntax highlighting and a collapsible outline (click the {} icon top-right of the file view). For full FHIR-aware validation and tree-view rendering, drag the file onto simplifier.net or paste it into the official HL7 Clinical FHIR Renderer.

๐Ÿ‘๏ธ Inline preview โ€” first synthetic patient (click to expand)
{
  "resourceType": "Bundle",
  "type": "transaction",
  "timestamp": "2017-05-27T21:49:42Z",
  "entry": [
    {
      "resource": {
        "resourceType": "Patient",
        "id": "20f13c43-d17b-443b-b7a7-69ccc40631c6",
        "gender": "male",
        "name": [{"use": "official", "family": "Avcฤฑ", "given": ["Furkan"]}],
        "address": [{
          "use": "home", "type": "physical",
          "city": "ฤฐstanbul", "state": "TR-34", "country": "TR"
        }],
        "communication": [{
          "language": {"coding": [{"system": "urn:ietf:bcp:47", "code": "tr", "display": "Turkish"}]},
          "preferred": true
        }],
        "birthDate": "1975-โ€ฆ"
      }
    },
    {
      "resource": {
        "resourceType": "Observation",
        "code": {
          "coding": [{"system": "http://loinc.org", "code": "8480-6",
                       "display": "Systolic blood pressure"}]
        },
        "valueQuantity": {"value": 118.72, "unit": "mm[Hg]"}
      }
    },
    {
      "resource": {
        "resourceType": "Condition",
        "code": {
          "coding": [
            {"system": "http://snomed.info/sct", "code": "414545008",
             "display": "Ischemic heart disease (disorder)"},
            {"system": "http://hl7.org/fhir/sid/icd-10", "code": "I25.9",
             "display": "Chronic ischaemic heart disease, unspecified"}
          ],
          "text": "Ischemic heart disease (disorder) / ฤฐskemik kalp hastalฤฑฤŸฤฑ"
        }
      }
    },
    {
      "resource": {
        "resourceType": "MedicationRequest",
        "medicationCodeableConcept": {
          "coding": [{
            "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
            "code": "243670", "display": "Aspirin 81 MG Oral Tablet"
          }]
        },
        "dosageInstruction": [{"text": "81 mg daily"}]
      }
    }
  ]
}
๐Ÿ‘๏ธ Inline preview โ€” first 5 rows of the CSV
RF_EPISODE2 HASTA_ID episode_date gender age bp_sys bp_dia hdl ldl hgb egfr Hipertansiyon DM_Tum
92893619 SYN_7D70431D 2017-05-27 M 42 118.7 63.0 95.0 58.0 12.9 105.7 0 0
โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ โ€ฆ

Full file: examples/sample_output/sample_episodes.csv (100 rows ร— 73 cols).

๐Ÿ‘๏ธ Inline preview โ€” validation report summary
{
  "n_source": 135569,
  "n_synthetic": 100,
  "ks_max": 0.14,
  "ks_mean": 0.07,
  "binary_max_abs_error": 0.025,
  "correlation_frobenius": 2.94
}

๐ŸŒ FHIR endpoints

syntha emits canonical FHIR R4 resources, so every emitted resource type maps to its standard REST endpoint:

Resource type GET (read) GET (search) Create (POST to base)
๐Ÿ‘ค Patient GET /Patient/{id} GET /Patient as part of transaction Bundle
๐Ÿงช Observation GET /Observation/{id} GET /Observation?subject={ref} โ†‘
๐Ÿฉบ Condition GET /Condition/{id} GET /Condition?patient={id} โ†‘
๐Ÿฅ Encounter GET /Encounter/{id} GET /Encounter?patient={id} โ†‘
๐Ÿ’Š MedicationRequest GET /MedicationRequest/{id} GET /MedicationRequest?patient={id} โ†‘
๐Ÿ”ฌ Procedure GET /Procedure/{id} GET /Procedure?patient={id} โ†‘
๐Ÿ“‹ CarePlan GET /CarePlan/{id} GET /CarePlan?patient={id} โ†‘
๐Ÿ“ฆ Bundle GET /Bundle/{id} โ€” POST / (transaction)

Spin up a demo FHIR server locally

syntha serve --bundles examples/sample_output/sample_bundles.ndjson --port 8080

Then:

curl http://127.0.0.1:8080/metadata           # CapabilityStatement
curl http://127.0.0.1:8080/Patient            # searchset Bundle (all Patients)
curl http://127.0.0.1:8080/Patient/{id}       # single Patient
curl http://127.0.0.1:8080/Observation        # all Observations
curl http://127.0.0.1:8080/\$export           # FHIR Bulk Data export (NDJSON)

This is a read-only demo server (stdlib http.server, no dependencies). For a production-grade FHIR server, POST the bundles to a HAPI / Microsoft FHIR / Google Healthcare API instance โ€” see below.

POST the bundles to any FHIR R4 server

scripts/post_to_fhir.sh POSTs every transaction Bundle in an NDJSON file to a configurable FHIR endpoint (default: the public HAPI test server):

# To the public HAPI playground:
bash scripts/post_to_fhir.sh examples/sample_output/sample_bundles.ndjson

# To your own server:
FHIR_BASE=http://localhost:8080/fhir bash scripts/post_to_fhir.sh

Once uploaded, you can browse the resources in any FHIR UI โ€” e.g. HAPI's built-in browser or the Open Patient Browser.

๐Ÿงฑ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Source CSV  โ”‚โ”€โ”€โ–ถโ”‚  Gaussian copula  โ”‚โ”€โ”€โ–ถโ”‚ Physiologic filter   โ”‚
โ”‚ (Turkish     โ”‚    โ”‚ (Spearman โ†’ ฯ;   โ”‚    โ”‚ (BP, Friedewald,     โ”‚
โ”‚  pristine)   โ”‚    โ”‚ nearest-PSD)     โ”‚    โ”‚  eGFR/creatinine)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                       โ”‚
                                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                  โ”‚                                         โ”‚
                                  โ–ผ                                         โ–ผ
                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ”‚ Longitudinal     โ”‚   (optional)     โ”‚  Direct single-episode   โ”‚
                       โ”‚ expansion        โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  CSV + FHIR R4 export     โ”‚
                       โ”‚ (drift, Poisson) โ”‚                  โ”‚  with Synthea-style       โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚  module activation        โ”‚
                                 โ”‚                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ–ผ
                          (same FHIR export)

Read docs/ARCHITECTURE.md for the math (Spearmanโ†’Gaussian transform, nearest-PSD projection, constraint rules).

๐Ÿงฌ Synthea-style clinical modules

Nine modules ship out of the box (src/syntha/modules/); each fires on its corresponding source-CSV comorbidity flag:

Module Flag(s) Emits
๐Ÿซ€ Hypertension Hipertansiyon Encounter, 1โ€“2 antihypertensives (stage 2 โ†’ dual), CarePlan
๐Ÿฌ Diabetes DM_Tum, DM_Komplikasyonlu Encounter, HbA1c, metformin (+ insulin if severe), CarePlan
๐Ÿง€ Hyperlipidemia Hiperlipidemi Encounter, lipid panel, statin (high-intensity if LDL โ‰ฅ 190)
๐Ÿฆ‹ Thyroid Tiroid Encounter, TSH, levothyroxine
๐Ÿ˜” Depression Depresyon Psych encounter, sertraline, CBT CarePlan
๐Ÿ˜ฐ Anxiety Anksiyete Psych encounter, escitalopram (or buspirone if already on SSRI)
โค๏ธ IHD Iskemik_Kalp Cardiology encounter, ECG, aspirin + ฮฒ-blocker + statin
๐ŸŒฌ๏ธ Asthma Astim Resp encounter, spirometry, SABA + ICS
๐Ÿšญ COPD COPD Resp encounter, spirometry, LABA + SABA

See docs/MODULES.md for the authoring guide. Clinician contributions for TR-specific drug choices are highly welcome โ€” see CONTRIBUTING.md.

๐Ÿ› ๏ธ CLI reference

Command Description
syntha generate End-to-end: train copula + sample + modules + CSV/FHIR + model card + validation report
syntha fit Fit and persist a copula in a registry without sampling
syntha sample Raw sampling from a registered model
syntha fhir Convert an existing synthetic CSV to FHIR bundles
syntha validate KS / Wasserstein / correlation diff between source and synthetic
syntha serve Boot a read-only FHIR R4 demo server from a bundles NDJSON file
syntha export-model Export a registered copula to a compact JSON the desktop app consumes
syntha list-models List models in a registry
syntha show-card Print a model card

Run syntha <cmd> --help for full option lists.

๐Ÿ—บ๏ธ Roadmap

The full phased roadmap (v0.1 โ†’ v1.0) lives in ROADMAP.md. Highlights:

  • v0.6 โ€” clinician curation ๐ŸŸฃ โ€” needs Dr. Moniri (or a collaborator)
  • v0.7 โ€” optional CTGAN/TVAE backend โฌœ
  • v0.8 โ€” true Synthea PADM-style state machines โฌœ
  • v0.9 โ€” TSTR benchmark โฌœ
  • v1.0 โ€” PyPI + paper โฌœ

๐Ÿค Contributing + clinician curation welcome

There are three ways to feed clinical guidance into syntha โ€” pick whichever is least friction for you:

1. ๐Ÿš€ Just tell me (lowest friction)

Reply in any open conversation with Claude (the agent that maintains this repo) saying e.g.

"In Tรผrkiye, perindopril 5 mg is the typical first-line ACEi for uncomplicated hypertension per TKD 2023 โ€” switch the default in the hypertension module."

โ€ฆand I'll edit the relevant file, push, and re-run CI. No GitHub UI needed.

2. ๐Ÿ“ GitHub issue (recommended for asynchronous tracking)

Open an issue using the ๐Ÿง‘โ€โš•๏ธ Clinical curation template โ€” one click:

๐Ÿ‘‰ Open a Clinical curation issue ๐Ÿ‘ˆ

The template pre-lists the files most likely to need changes:

If you want to changeโ€ฆ Edit this file
Which drug a module prescribes src/syntha/modules/<condition>.py
The RxNorm code or dose text src/syntha/fhir/rxnorm.py
The SNOMED / ICD-10 code for a Condition src/syntha/fhir/codes.py
Turkish display strings src/syntha/locale/turkish.py
Prevalence calibration / disease-progression rules src/syntha/longitudinal.py

3. ๐Ÿ”ง Pull request

git clone https://github.com/ArioMoniri/syntha
cd syntha
pip install -e ".[dev]"
# โ€ฆ edit files โ€ฆ
pytest -q
git checkout -b clinical/<short-topic>
git commit -am "clinical: <what you changed and why>"
git push -u origin clinical/<short-topic>
gh pr create   # or open via the GitHub UI

What's currently flagged ๐ŸŸฃ (waiting for clinician input)

Per ROADMAP.md โ†’ v0.6:

  • ๐ŸŸฃ TR-specific first-line drug calibration โ€” current defaults are international (lisinopril/amlodipine for HTN, metformin for DM, atorvastatin for hyperlipidemia). Turkish primary-care reality may differ (e.g. perindopril, nebivolol).
  • ๐ŸŸฃ New modules: CKD staging (eGFR-driven), MAFLD (ALT/AST + obesity), anemia (Hb-driven), B12 deficiency (vit B12 column directly available).
  • ๐ŸŸฃ Prevalence calibration to TรœฤฐK โ€” synthetic disease rates currently mirror the pristine-healthy source cohort. To use syntha as a Turkish-population baseline rather than a healthy baseline, the marginals should be calibrated to TรœฤฐK figures.
  • ๐ŸŸฃ Turkish display string review โ€” confirm clinical-Turkish preferred terms match Tรผrk Tabipleri BirliฤŸi / TR-specific usage rather than literal translations.
  • ๐ŸŸฃ ICD-10 specificity โ€” the current mapping uses unspecified (".9") forms; specifying further (E11.65, I50.32, etc.) when the source flag carries the information would improve downstream realism.

Full developer guide: CONTRIBUTING.md. All PRs must pass the CI matrix (Py 3.10 โ†’ 3.13) before merge.

๐Ÿ“„ License + citation

Apache 2.0 ยฉ 2026 Ariorad Moniri โ€” see LICENSE.

If you use syntha in academic work, please cite:

Moniri, A. (2026). syntha: hybrid synthetic patient record generator
trained on Turkish pristine-healthy EHR cohorts.
https://github.com/ArioMoniri/syntha

Acknowledgements

  • ๐Ÿฉบ Synthea โ€” the inspiration for the clinical-module layer and FHIR output format.
  • ๐ŸŒ Open clinical terminologies: LOINC, SNOMED CT, ICD-10, RxNorm.
  • ๐Ÿ“Š The anonymized Turkish-cohort EHR data used to train the copula (de-identified by the upstream data steward; never redistributed by this repo).

Contributors

Ariorad Moniri
Ariorad Moniri

๐Ÿ’ป ๐ŸŽจ ๐Ÿ“– ๐Ÿšง ๐Ÿค” ๐Ÿ‘€ ๐Ÿš‡ โš ๏ธ

This project follows the all-contributors specification โ€” contributions of any kind welcome. Comment @all-contributors please add @username for code,doc on an issue or PR to nominate someone.

๐Ÿ’ฌ Community

  • ๐Ÿ—จ๏ธ GitHub Discussions โ€” open questions, "is this the right tool for X?", show-and-tell
  • ๐Ÿ› Issues โ€” bug reports + feature requests + clinical-curation
  • ๐Ÿ“– Contributing โ€” dev setup + commit conventions + clinical-curation workflow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syntha_ehr-0.5.6.tar.gz (93.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syntha_ehr-0.5.6-py3-none-any.whl (79.5 kB view details)

Uploaded Python 3

File details

Details for the file syntha_ehr-0.5.6.tar.gz.

File metadata

  • Download URL: syntha_ehr-0.5.6.tar.gz
  • Upload date:
  • Size: 93.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for syntha_ehr-0.5.6.tar.gz
Algorithm Hash digest
SHA256 f7b103c98b5633015745352670156659f232813f73d7011338b1a5fe910d8503
MD5 92b381ccc1de1548f3db28a1b5f44758
BLAKE2b-256 af21a1cc55b8ab8167e52e8ad1999609bf28d2867bc089dfc6e9fa21d7e8d7a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for syntha_ehr-0.5.6.tar.gz:

Publisher: pypi-publish.yml on ArioMoniri/syntha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file syntha_ehr-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: syntha_ehr-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 79.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for syntha_ehr-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 76e5e2a346420a023bc644792761273ef626c2cbdbf634cb5b244c76c0cf5b04
MD5 4cd38e3ccae2be0ac7c88ba078d62939
BLAKE2b-256 606550805f9024830375352aae5cacb207e380f69c6837dbe83d556148a46c20

See more details on using hashes here.

Provenance

The following attestation bundles were made for syntha_ehr-0.5.6-py3-none-any.whl:

Publisher: pypi-publish.yml on ArioMoniri/syntha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page