Generate fake patient reports as PDFs.
Project description
mednotegen
This project uses Synthea™ to generate realistic synthetic patient data for medical notes.
Usage
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")
# Or specify Synthea CSV directory directly:
gen = NoteGenerator(synthea_csv_dir="/path/to/synthea/output/csv")
gen.generate_notes(10, "output_dir")
Using a Custom Synthea Directory with config.yaml
You can specify the Synthea CSV directory directly in your config file. Add the following line to your config.yaml:
Example config.yaml:
count: 10
output_dir: output_dir
synthea_csv_dir: /path/to/synthea/output/csv
Then generate notes using:
from mednotegen.generator import NoteGenerator
gen = NoteGenerator.from_config("config.yaml")
gen.generate_notes(10, "output_dir")
⚠️ Synthea Dependency Required
This project requires Synthea™, an open-source synthetic patient generator, as an external dependency. You must clone and build Synthea yourself before using mednotegen.
To set up Synthea:
- Clone Synthea
git clone https://github.com/synthetichealth/synthea.git
- Build the Synthea JAR
cd synthea ./gradlew build check test cp build/libs/synthea-with-dependencies.jar . cd ..
Ensuresynthea-with-dependencies.jaris in thesynthea/directory at the root of your project.
Configuration (config.yaml)
You can customize patient generation and report output using a config.yaml file. Example options:
count: 10 # Number of reports to generate
output_dir: output_dir # Output directory for PDFs
use_llm: false # Use LLM for report generation
synthea_csv_dir: /path/to/synthea/output/csv # Path to Synthea-generated CSV files
seed: 1234 # Random seed for reproducibility
reference_date: "20250628" # Reference date for data generation (YYYYMMDD)
clinician_seed: 5678 # Optional: separate seed for clinician assignment
gender: female # male, female, or any
min_age: 30 # Minimum patient age
max_age: 60 # Maximum patient age
state: New York # Synthea state parameter
modules:
- cardiovascular-disease
- diabetes
- hypertension
- asthma
local_config: custom_synthea.properties # Custom Synthea config file
local_modules: ./synthea_modules # Directory for custom modules
- count: Number of reports to generate
- output_dir: Directory to save generated PDFs
- use_llm: If true, uses OpenAI LLM for report text
- seed: Random seed for reproducibility
- reference_date: Reference date for age calculations (YYYYMMDD)
- clinician_seed: Optional, separate seed for clinician assignment
- gender: Gender filter for patients (
male,female, orany) - min_age, max_age: Age range for patients
- state: US state for Synthea simulation
- modules: Synthea disease modules to enable
- local_config: Path to a custom Synthea config file
- local_modules: Directory for custom Synthea modules
More Synthea Modules
For an up-to-date and complete list of available modules, see the official Synthea modules directory.
Troubleshooting:
Synthea Data Location
If you see errors about missing patients.csv, medications.csv, or conditions.csv, make sure you have generated Synthea data and that the path you provide (via synthea_csv_dir, CLI, or config) points to the correct directory containing those files.
If you installed mednotegen via pip, the default location is inside the package directory. For custom or system-wide Synthea runs, always specify the output CSV directory explicitly.
- No CSV files generated:
- Make sure you edited the correct
synthea.propertiesand used the-cflag when running Synthea. - Ensure
exporter.csv.export = trueis set and not overridden elsewhere in the file.
- Make sure you edited the correct
- FileNotFoundError for CSVs:
- Confirm the CSV files exist in the path specified by
synthea_csv_diror in the expected package location.
- Confirm the CSV files exist in the path specified by
- ValueError: No patients found matching the specified filters:
- Check your age/gender filters in
config.yaml. Try relaxing them if you have too few patients.
- Check your age/gender filters in
Configure Synthea to Export CSVs
Edit src/main/resources/synthea.properties in your Synthea directory:
exporter.csv.export = true
(Ensure any exporter.csv.export = false lines are removed or commented out.)
Generate Patient Data with Synthea
From your Synthea directory, clean any old output and generate new data:
rm -rf output/
java -jar synthea-with-dependencies.jar -c src/main/resources/synthea.properties -p 1000
- The
-p 1000flag generates 1000 patients. - After running, check for CSV files in
output/csv/.
Attribution
See README_SYNTHEA_NOTICE.md and LICENSE-APACHE-2.0 for license and attribution requirements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mednotegen-0.1.2.tar.gz.
File metadata
- Download URL: mednotegen-0.1.2.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23ec9e2edf97e77c004818d8335de72500e64673594a4cb3ce5c6874311fbd0e
|
|
| MD5 |
7e41d2f361dc63cc6d6b0cd8d28d161c
|
|
| BLAKE2b-256 |
f19fb4dd049b9a73a93b35a666b110ebf3569c0bcdcaeb39eb6298caa9c238b7
|
File details
Details for the file mednotegen-0.1.2-py3-none-any.whl.
File metadata
- Download URL: mednotegen-0.1.2-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dea9607f01eb415a5c6bafb19725eed60d20db2898bcda3c000556ada22758f
|
|
| MD5 |
5db869dea0fba542ee4251ba28e80da4
|
|
| BLAKE2b-256 |
f7c59c20c2c5c3c9cb55e44cd904e4bb0590ab4f981d970429256b0505604d3b
|