Synthetic outpatient scheduling dataset generator (slots, patients, appointments).
Project description
Medscheduler
Synthetic Outpatient Appointment Data Generator
medscheduler creates realistic, privacy-safe outpatient datasets — including appointment calendars, patient demographics, and visit outcomes — suitable for education, analytics, and research in healthcare operations.
Features
- End-to-end simulation
- Generates
slots,patients, andappointmentstables. - Reproduces booking, cancellation, and rebooking dynamics.
- Simulates punctuality and in-clinic timing (arrival, start, end, waiting time, duration).
- Generates
- Realistic defaults
- Parameters reflect NHS England outpatient activity (2023–24) and peer-reviewed literature.
- Configurable
- Calendar structure (days, hours, slot density), fill rate, booking horizon, lead time, attendance outcomes, rebooking intensity, demographics, and randomness.
- Reproducible
- Controlled via
seedandnoise.
- Controlled via
- Lightweight
- Minimal scientific-Python dependencies; plotting utilities are optional.
Installation
pip install medscheduler
Requires Python 3.9 or newer.
Quickstart
from medscheduler import AppointmentScheduler
# Initialize the scheduler
sched = AppointmentScheduler(
seed=42,
date_ranges=[("2024-01-01", "2024-12-31")],
working_days=[0, 1, 2, 3, 4], # Monday–Friday
appointments_per_hour=4, # 15-minute slots
fill_rate=0.9
)
# Run the full pipeline
slots_df, appointments_df, patients_df = sched.generate()
# Export CSV files
sched.to_csv(
slots_path="slots.csv",
appointments_path="appointments.csv",
patients_path="patients.csv",
)
Outputs:
| Table | Description |
|---|---|
slots.csv |
Calendar capacity (one row per slot). |
patients.csv |
Synthetic patient registry (demographics). |
appointments.csv |
Central table combining patient, slot, timing, and outcome data. |
Core Concepts
Calendar and Capacity
date_rangesandref_datedelimit the simulation window and separate past from future.working_days,working_hours, andappointments_per_hourdefine slot structure and density.
Booking Dynamics
fill_ratecontrols overall utilization.booking_horizonandmedian_lead_timeshape how far ahead and how early patients book.rebook_category(min,med,max) defines the probability of rebooking cancellations.
Attendance and Flow
status_ratesdetermines attended / cancelled / did not attend / unknown proportions.visits_per_yearandfirst_attendanceregulate repeat visits and the share of new patients.
Demographics
age_gender_probs,bin_size,lower_cutoff,upper_cutoff,truncatedcontrol the cohort, derived from NHS distributions by default.
Timing
check_in_time_meancontrols early/late arrivals.- Durations follow a Beta(1.48, 3.6) model (mean ≈ 17 minutes).
Randomness
seedensures reproducibility;noiseintroduces controlled variability.
API Surface (selected)
AppointmentScheduler.generate()— full pipeline: slots → appointments → patientsAppointmentScheduler.generate_slots()AppointmentScheduler.generate_appointments()AppointmentScheduler.assign_actual_times()AppointmentScheduler.generate_patients()AppointmentScheduler.assign_patients()AppointmentScheduler.add_custom_column()AppointmentScheduler.to_csv()
See the full API in the documentation.
Plotting Utilities
Module: medscheduler.utils.plotting
summarize_slots(df, scheduler, ...)— summary metrics for calendar and availability.plot_population_pyramid(df, ...)— age–sex pyramid.plot_past_slot_availability(slots_df, ...)— availability beforeref_date(Y/Q/M/W auto-aggregation).plot_future_slot_availability(slots_df, ...)— availability on/afterref_date(D/W/M).plot_monthly_appointment_distribution(df)— appointments by month (%).plot_weekday_appointment_distribution(df)— appointments by weekday (%).plot_status_distribution_last_days(df, scheduler, days_back=30, ...)— daily status counts last N days.plot_status_distribution_next_days(df, scheduler, days_ahead=30, ...)— daily status counts next N days.plot_appointments_by_status(df, scheduler, ...)— past appointments by status (%).plot_appointments_by_status_future(df, scheduler, ...)— future appointments by status (%).plot_scheduling_interval_distribution(df, interval_col="scheduling_interval", ...)— lead-time distribution.plot_appointment_duration_distribution(df, ...)— consultation duration distribution (attended only).plot_waiting_time_distribution(df, ...)— waiting time distribution (attended only).plot_arrival_time_distribution(df, ...)— arrival offset distribution vs scheduled time.plot_first_attendance_distribution(df, scheduler, ...)— first vs. returning attendance ratio.plot_custom_column_distribution(df, column_name, ...)— categorical distribution for user-added columns.plot_patients_visits(df, scheduler, ...)— distribution of patient visit frequency.
All functions return a matplotlib.axes.Axes and follow a consistent, publication-grade styling.
Visualization Gallery
See the complete gallery at https://medscheduler.readthedocs.io/en/latest/visualization.
Repository Structure
medscheduler/
├─ src/medscheduler/
│ ├─ __init__.py
│ ├─ constants.py
│ ├─ scheduler.py
│ └─ utils/
│ ├─ plotting.py
│ └─ reference_data_utils.py
├─ tests/
│ └─ test_scheduler.py
├─ docs/
│ ├─ _static/logo.png
│ └─ _static/visuals/...
├─ README.md
├─ LICENSE
└─ pyproject.toml
Testing
pytest -q
The test suite covers constructor validation and generation logic. Target coverage ≥ 80%.
Documentation
Comprehensive documentation (User Guide, API Reference, Examples, and Visualization gallery) is available at:
References
- Buttz, L. (2004). How to use scheduling data to improve efficiency. Family Practice Management, 11(7), 27–29. PMID: 15315285.
- Cerruti, B., Garavaldi, D., & Lerario, A. (2023). Patient's punctuality in an outpatient clinic: the role of age, medical branch and geographical factors. BMC Health Services Research, 23(1), 1385. https://doi.org/10.1186/s12913-023-10379-w
- Ellis, D. A., & Jenkins, R. (2012). Weekday affects attendance rate for medical appointments: Large-scale data analysis and implications. PLoS ONE, 7(12), e51365. https://doi.org/10.1371/journal.pone.0051365
- Grande, D., Zuo, J. X., Venkat, R., Chen, X., Ward, K. R., Seymour, J. W., & Mitra, N. (2018). Differences in Primary Care Appointment Availability and Wait Times by Neighborhood Characteristics: a Mystery Shopper Study. Journal of General Internal Medicine, 33(9), 1441–1443. https://doi.org/10.1007/s11606-018-4407-9
- NHS Digital. Provisional Monthly Hospital Episode Statistics for Admitted Patient Care, Outpatient and Accident and Emergency Data. https://digital.nhs.uk/data-and-information/publications/statistical/provisional-monthly-hospital-episode-statistics-for-admitted-patient-care-outpatient-and-accident-and-emergency-data/april-2025---may-2025
- NHS England (2024). Hospital Outpatient Activity 2023–24: Summary Reports 1–3. https://files.digital.nhs.uk/34/18846B/hosp-epis-stat-outp-rep-tabs-2023-24-tab.xlsx
- Rao, A., Shi, Z., Ray, K. N., Mehrotra, A., & Ganguli, I. (2019). National Trends in Primary Care Visit Use and Practice Capabilities, 2008–2015. Annals of Family Medicine, 17(6), 538–544. https://doi.org/10.1370/afm.2474
- Tai-Seale, M., McGuire, T. G., & Zhang, W. (2007). Time allocation in primary care office visits. Health Services Research, 42(5), 1871–1894. https://doi.org/10.1111/j.1475-6773.2006.00689.x
- Faker library documentation. https://faker.readthedocs.io/
License
This project is released under the MIT License. See LICENSE for details.
Citation
If this library is helpful in your work, please cite:
Carolina González Galtier. medscheduler: A synthetic outpatient appointment simulator, 2025.
GitHub: https://github.com/carogaltier/medscheduler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file medscheduler-0.2.2.tar.gz.
File metadata
- Download URL: medscheduler-0.2.2.tar.gz
- Upload date:
- Size: 51.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c023c077632781a1687c3d7278a631ad31437c181e8df03ad0f3d5b9780c9ea1
|
|
| MD5 |
dc3fd8f12a50216d6a13bb4500c0217f
|
|
| BLAKE2b-256 |
7cec619e8db8b6e39132a3af26b89c65757086db03df086b50af103956c02aa9
|
File details
Details for the file medscheduler-0.2.2-py3-none-any.whl.
File metadata
- Download URL: medscheduler-0.2.2-py3-none-any.whl
- Upload date:
- Size: 50.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0568337b1c7052ce19415788c46d4156b030923fe1c9d3514c30cd125f726fca
|
|
| MD5 |
5a982d0953fa9c03b700905ebeb9b1ae
|
|
| BLAKE2b-256 |
033d9dfc70fff3683f9cd7672bc5a6c04e03f3e1758c378598663275bd60814f
|