Skip to main content

Synthetic outpatient scheduling dataset generator (slots, patients, appointments).

Project description

medscheduler logo

Medscheduler

Synthetic Outpatient Appointment Data Generator

medscheduler creates realistic, privacy-safe outpatient datasets — including appointment calendars, patient demographics, and visit outcomes — suitable for education, analytics, and research in healthcare operations.


Features

  • End-to-end simulation
    • Generates slots, patients, and appointments tables.
    • Reproduces booking, cancellation, and rebooking dynamics.
    • Simulates punctuality and in-clinic timing (arrival, start, end, waiting time, duration).
  • Realistic defaults
    • Parameters reflect NHS England outpatient activity (2023–24) and peer-reviewed literature.
  • Configurable
    • Calendar structure (days, hours, slot density), fill rate, booking horizon, lead time, attendance outcomes, rebooking intensity, demographics, and randomness.
  • Reproducible
    • Controlled via seed and noise.
  • Lightweight
    • Minimal scientific-Python dependencies; plotting utilities are optional.

Installation

pip install medscheduler

Requires Python 3.9 or newer.


Quickstart

from medscheduler import AppointmentScheduler

# Initialize the scheduler
sched = AppointmentScheduler(
    seed=42,
    date_ranges=[("2024-01-01", "2024-12-31")],
    working_days=[0, 1, 2, 3, 4],  # Monday–Friday
    appointments_per_hour=4,       # 15-minute slots
    fill_rate=0.9
)

# Run the full pipeline
slots_df, appointments_df, patients_df = sched.generate()

# Export CSV files
sched.to_csv(
    slots_path="slots.csv",
    appointments_path="appointments.csv",
    patients_path="patients.csv",
)

Outputs:

Table Description
slots.csv Calendar capacity (one row per slot).
patients.csv Synthetic patient registry (demographics).
appointments.csv Central table combining patient, slot, timing, and outcome data.

Core Concepts

Calendar and Capacity

  • date_ranges and ref_date delimit the simulation window and separate past from future.
  • working_days, working_hours, and appointments_per_hour define slot structure and density.

Booking Dynamics

  • fill_rate controls overall utilization.
  • booking_horizon and median_lead_time shape how far ahead and how early patients book.
  • rebook_category (min, med, max) defines the probability of rebooking cancellations.

Attendance and Flow

  • status_rates determines attended / cancelled / did not attend / unknown proportions.
  • visits_per_year and first_attendance regulate repeat visits and the share of new patients.

Demographics

  • age_gender_probs, bin_size, lower_cutoff, upper_cutoff, truncated control the cohort, derived from NHS distributions by default.

Timing

  • check_in_time_mean controls early/late arrivals.
  • Durations follow a Beta(1.48, 3.6) model (mean ≈ 17 minutes).

Randomness

  • seed ensures reproducibility; noise introduces controlled variability.

API Surface (selected)

  • AppointmentScheduler.generate() — full pipeline: slots → appointments → patients
  • AppointmentScheduler.generate_slots()
  • AppointmentScheduler.generate_appointments()
  • AppointmentScheduler.assign_actual_times()
  • AppointmentScheduler.generate_patients()
  • AppointmentScheduler.assign_patients()
  • AppointmentScheduler.add_custom_column()
  • AppointmentScheduler.to_csv()

See the full API in the documentation.


Plotting Utilities

Module: medscheduler.utils.plotting

  • summarize_slots(df, scheduler, ...) — summary metrics for calendar and availability.
  • plot_population_pyramid(df, ...) — age–sex pyramid.
  • plot_past_slot_availability(slots_df, ...) — availability before ref_date (Y/Q/M/W auto-aggregation).
  • plot_future_slot_availability(slots_df, ...) — availability on/after ref_date (D/W/M).
  • plot_monthly_appointment_distribution(df) — appointments by month (%).
  • plot_weekday_appointment_distribution(df) — appointments by weekday (%).
  • plot_status_distribution_last_days(df, scheduler, days_back=30, ...) — daily status counts last N days.
  • plot_status_distribution_next_days(df, scheduler, days_ahead=30, ...) — daily status counts next N days.
  • plot_appointments_by_status(df, scheduler, ...) — past appointments by status (%).
  • plot_appointments_by_status_future(df, scheduler, ...) — future appointments by status (%).
  • plot_scheduling_interval_distribution(df, interval_col="scheduling_interval", ...) — lead-time distribution.
  • plot_appointment_duration_distribution(df, ...) — consultation duration distribution (attended only).
  • plot_waiting_time_distribution(df, ...) — waiting time distribution (attended only).
  • plot_arrival_time_distribution(df, ...) — arrival offset distribution vs scheduled time.
  • plot_first_attendance_distribution(df, scheduler, ...) — first vs. returning attendance ratio.
  • plot_custom_column_distribution(df, column_name, ...) — categorical distribution for user-added columns.
  • plot_patients_visits(df, scheduler, ...) — distribution of patient visit frequency.

All functions return a matplotlib.axes.Axes and follow a consistent, publication-grade styling.


Visualization Gallery

Below are examples of the default visualization set produced by medscheduler.utils.plotting.





See the complete gallery at https://medscheduler.readthedocs.io/en/latest/visualization.


Repository Structure

medscheduler/
├─ src/medscheduler/
│  ├─ __init__.py
│  ├─ constants.py
│  ├─ scheduler.py
│  └─ utils/
│     ├─ plotting.py
│     └─ reference_data_utils.py
├─ tests/
│  └─ test_scheduler.py
├─ docs/
│  ├─ _static/logo.png
│  └─ _static/visuals/...
├─ README.md
├─ LICENSE
└─ pyproject.toml

Testing

pytest -q

The test suite covers constructor validation and generation logic. Target coverage ≥ 80%.


Documentation

Comprehensive documentation (User Guide, API Reference, Examples, and Visualization gallery) is available at:


References


License

This project is released under the MIT License. See LICENSE for details.


Citation

If this library is helpful in your work, please cite:

Carolina González Galtier. medscheduler: A synthetic outpatient appointment simulator, 2025.
GitHub: https://github.com/carogaltier/medscheduler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medscheduler-0.2.0.tar.gz (52.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medscheduler-0.2.0-py3-none-any.whl (50.1 kB view details)

Uploaded Python 3

File details

Details for the file medscheduler-0.2.0.tar.gz.

File metadata

  • Download URL: medscheduler-0.2.0.tar.gz
  • Upload date:
  • Size: 52.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-0.2.0.tar.gz
Algorithm Hash digest
SHA256 659dc12167c97b6748d669eb58e71961b2e04381faa0874fc56b661bae3ff247
MD5 de306925ef4ca85a39de5d4aa73b3bdf
BLAKE2b-256 e04bc0ce24bfe9235a8c2f5b5f21cba580c391abe3e3735a1098aae41f8144ac

See more details on using hashes here.

File details

Details for the file medscheduler-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: medscheduler-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 50.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2113f04023e264113045b78f7b73893f77456c2372a8b54a7c9a41076238911
MD5 1966db4bfaeba7e48b12977ed0da293b
BLAKE2b-256 341b151356dbd97535acb86131cedb89a0d6c26d2e6bfb9a7a702e271a341dbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page