Skip to main content

Generate fake patient reports as PDFs.

Project description

mednotegen

This project uses Synthea™ to generate realistic synthetic patient data for medical notes.


Usage

from mednotegen.generator import NoteGenerator

gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")

# Or specify Synthea CSV directory directly:
gen = NoteGenerator(synthea_csv_dir="/path/to/synthea/output/csv")
gen.generate_notes(10, "output_dir")

Using a Custom Synthea Directory with config.yaml

You can specify the Synthea CSV directory directly in your config file. Add the following line to your config.yaml:

Example config.yaml:

count: 10
output_dir: output_dir
synthea_csv_dir: /path/to/synthea/output/csv

Then generate notes using:

from mednotegen.generator import NoteGenerator

gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")

⚠️ Synthea Dependency Required

This project requires Synthea™, an open-source synthetic patient generator, as an external dependency. You must clone and build Synthea yourself before using mednotegen.

To set up Synthea:

  1. Clone Synthea
    git clone https://github.com/synthetichealth/synthea.git
    
  2. Build the Synthea JAR
    cd synthea
    ./gradlew build check test
    cp build/libs/synthea-with-dependencies.jar .
    cd ..
    
    Ensure synthea-with-dependencies.jar is in the synthea/ directory at the root of your project.

Configuration (config.yaml)

You can customize patient generation and report output using a config.yaml file. Example options:

count: 10                    # Number of reports to generate
output_dir: output_dir       # Output directory for PDFs
use_llm: false               # Use LLM for report generation
synthea_csv_dir: /path/to/synthea/output/csv   # Path to Synthea-generated CSV files
seed: 1234                   # Random seed for reproducibility
reference_date: "20250628"   # Reference date for data generation (YYYYMMDD)
clinician_seed: 5678         # Optional: separate seed for clinician assignment
gender: female               # male, female, or any
min_age: 30                  # Minimum patient age
max_age: 60                  # Maximum patient age
state: New York              # Synthea state parameter
modules:
  - cardiovascular-disease
  - diabetes      
  - hypertension
  - asthma          
local_config: custom_synthea.properties  # Custom Synthea config file
local_modules: ./synthea_modules         # Directory for custom modules
  • count: Number of reports to generate
  • output_dir: Directory to save generated PDFs
  • use_llm: If true, uses OpenAI LLM for report text
  • seed: Random seed for reproducibility
  • reference_date: Reference date for age calculations (YYYYMMDD)
  • clinician_seed: Optional, separate seed for clinician assignment
  • gender: Gender filter for patients (male, female, or any)
  • min_age, max_age: Age range for patients
  • state: US state for Synthea simulation
  • modules: Synthea disease modules to enable
  • local_config: Path to a custom Synthea config file
  • local_modules: Directory for custom Synthea modules

More Synthea Modules

For an up-to-date and complete list of available modules, see the official Synthea modules directory.


Troubleshooting:

Synthea Data Location

If you see errors about missing patients.csv, medications.csv, or conditions.csv, make sure you have generated Synthea data and that the path you provide (via synthea_csv_dir, CLI, or config) points to the correct directory containing those files.

If you installed mednotegen via pip, the default location is inside the package directory. For custom or system-wide Synthea runs, always specify the output CSV directory explicitly.

  • No CSV files generated:
    • Make sure you edited the correct synthea.properties and used the -c flag when running Synthea.
    • Ensure exporter.csv.export = true is set and not overridden elsewhere in the file.
  • FileNotFoundError for CSVs:
    • Confirm the CSV files exist in the path specified by synthea_csv_dir or in the expected package location.
  • ValueError: No patients found matching the specified filters:
    • Check your age/gender filters in config.yaml. Try relaxing them if you have too few patients.

Configure Synthea to Export CSVs

Edit src/main/resources/synthea.properties in your Synthea directory:

exporter.csv.export = true

(Ensure any exporter.csv.export = false lines are removed or commented out.)

Generate Patient Data with Synthea

From your Synthea directory, clean any old output and generate new data:

rm -rf output/
java -jar synthea-with-dependencies.jar -c src/main/resources/synthea.properties -p 1000
  • The -p 1000 flag generates 1000 patients.
  • After running, check for CSV files in output/csv/.

Attribution

See README_SYNTHEA_NOTICE.md and LICENSE-APACHE-2.0 for license and attribution requirements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mednotegen-0.1.2.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mednotegen-0.1.2-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file mednotegen-0.1.2.tar.gz.

File metadata

  • Download URL: mednotegen-0.1.2.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for mednotegen-0.1.2.tar.gz
Algorithm Hash digest
SHA256 23ec9e2edf97e77c004818d8335de72500e64673594a4cb3ce5c6874311fbd0e
MD5 7e41d2f361dc63cc6d6b0cd8d28d161c
BLAKE2b-256 f19fb4dd049b9a73a93b35a666b110ebf3569c0bcdcaeb39eb6298caa9c238b7

See more details on using hashes here.

File details

Details for the file mednotegen-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mednotegen-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for mednotegen-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9dea9607f01eb415a5c6bafb19725eed60d20db2898bcda3c000556ada22758f
MD5 5db869dea0fba542ee4251ba28e80da4
BLAKE2b-256 f7c59c20c2c5c3c9cb55e44cd904e4bb0590ab4f981d970429256b0505604d3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page