Skip to main content

Synthetic outpatient scheduling dataset generator (slots, patients, appointments).

Project description

Medscheduler

Synthetic Outpatient Appointment Data Generator

medscheduler creates realistic, privacy-safe outpatient datasets — including appointment calendars, patient demographics, and visit outcomes — suitable for education, analytics, and research in healthcare operations.


Features

  • End-to-end simulation
    • Generates slots, patients, and appointments tables.
    • Reproduces booking, cancellation, and rebooking dynamics.
    • Simulates punctuality and in-clinic timing (arrival, start, end, waiting time, duration).
  • Realistic defaults
    • Parameters reflect NHS England outpatient activity (2023–24) and peer-reviewed literature.
  • Configurable
    • Calendar structure (days, hours, slot density), fill rate, booking horizon, lead time, attendance outcomes, rebooking intensity, demographics, and randomness.
  • Reproducible
    • Controlled via seed and noise.
  • Lightweight
    • Minimal scientific-Python dependencies; plotting utilities are optional.

Installation

pip install medscheduler

Requires Python 3.9 or newer.


Quickstart

from medscheduler import AppointmentScheduler

# Initialize the scheduler
sched = AppointmentScheduler(
    seed=42,
    date_ranges=[("2024-01-01", "2024-12-31")],
    working_days=[0, 1, 2, 3, 4],  # Monday–Friday
    appointments_per_hour=4,       # 15-minute slots
    fill_rate=0.9
)

# Run the full pipeline
slots_df, appointments_df, patients_df = sched.generate()

# Export CSV files
sched.to_csv(
    slots_path="slots.csv",
    appointments_path="appointments.csv",
    patients_path="patients.csv",
)

Outputs:

Table Description
slots.csv Calendar capacity (one row per slot).
patients.csv Synthetic patient registry (demographics).
appointments.csv Central table combining patient, slot, timing, and outcome data.

Core Concepts

Calendar and Capacity

  • date_ranges and ref_date delimit the simulation window and separate past from future.
  • working_days, working_hours, and appointments_per_hour define slot structure and density.

Booking Dynamics

  • fill_rate controls overall utilization.
  • booking_horizon and median_lead_time shape how far ahead and how early patients book.
  • rebook_category (min, med, max) defines the probability of rebooking cancellations.

Attendance and Flow

  • status_rates determines attended / cancelled / did not attend / unknown proportions.
  • visits_per_year and first_attendance regulate repeat visits and the share of new patients.

Demographics

  • age_gender_probs, bin_size, lower_cutoff, upper_cutoff, truncated control the cohort, derived from NHS distributions by default.

Timing

  • check_in_time_mean controls early/late arrivals.
  • Durations follow a Beta(1.48, 3.6) model (mean ≈ 17 minutes).

Randomness

  • seed ensures reproducibility; noise introduces controlled variability.

API Surface (selected)

  • AppointmentScheduler.generate() — full pipeline: slots → appointments → patients
  • AppointmentScheduler.generate_slots()
  • AppointmentScheduler.generate_appointments()
  • AppointmentScheduler.assign_actual_times()
  • AppointmentScheduler.generate_patients()
  • AppointmentScheduler.assign_patients()
  • AppointmentScheduler.add_custom_column()
  • AppointmentScheduler.to_csv()

See the full API in the documentation.


Plotting Utilities

Module: medscheduler.utils.plotting

  • summarize_slots(df, scheduler, ...) — summary metrics for calendar and availability.
  • plot_population_pyramid(df, ...) — age–sex pyramid.
  • plot_past_slot_availability(slots_df, ...) — availability before ref_date (Y/Q/M/W auto-aggregation).
  • plot_future_slot_availability(slots_df, ...) — availability on/after ref_date (D/W/M).
  • plot_monthly_appointment_distribution(df) — appointments by month (%).
  • plot_weekday_appointment_distribution(df) — appointments by weekday (%).
  • plot_status_distribution_last_days(df, scheduler, days_back=30, ...) — daily status counts last N days.
  • plot_status_distribution_next_days(df, scheduler, days_ahead=30, ...) — daily status counts next N days.
  • plot_appointments_by_status(df, scheduler, ...) — past appointments by status (%).
  • plot_appointments_by_status_future(df, scheduler, ...) — future appointments by status (%).
  • plot_scheduling_interval_distribution(df, interval_col="scheduling_interval", ...) — lead-time distribution.
  • plot_appointment_duration_distribution(df, ...) — consultation duration distribution (attended only).
  • plot_waiting_time_distribution(df, ...) — waiting time distribution (attended only).
  • plot_arrival_time_distribution(df, ...) — arrival offset distribution vs scheduled time.
  • plot_first_attendance_distribution(df, scheduler, ...) — first vs. returning attendance ratio.
  • plot_custom_column_distribution(df, column_name, ...) — categorical distribution for user-added columns.
  • plot_patients_visits(df, scheduler, ...) — distribution of patient visit frequency.

All functions return a matplotlib.axes.Axes and follow a consistent, publication-grade styling.


Visualization Gallery

See the complete gallery at https://medscheduler.readthedocs.io/en/latest/visualization.


Repository Structure

medscheduler/
├─ src/medscheduler/
│  ├─ __init__.py
│  ├─ constants.py
│  ├─ scheduler.py
│  └─ utils/
│     ├─ plotting.py
│     └─ reference_data_utils.py
├─ tests/
│  └─ test_scheduler.py
├─ docs/
│  ├─ _static/logo.png
│  └─ _static/visuals/...
├─ README.md
├─ LICENSE
└─ pyproject.toml

Testing

pytest -q

The test suite covers constructor validation and generation logic. Target coverage ≥ 80%.


Documentation

Comprehensive documentation (User Guide, API Reference, Examples, and Visualization gallery) is available at:


References


License

This project is released under the MIT License. See LICENSE for details.


Citation

If this library is helpful in your work, please cite:

Carolina González Galtier. medscheduler: A synthetic outpatient appointment simulator, 2025.
GitHub: https://github.com/carogaltier/medscheduler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medscheduler-1.0.0.tar.gz (52.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medscheduler-1.0.0-py3-none-any.whl (50.3 kB view details)

Uploaded Python 3

File details

Details for the file medscheduler-1.0.0.tar.gz.

File metadata

  • Download URL: medscheduler-1.0.0.tar.gz
  • Upload date:
  • Size: 52.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3bb361fb691f9bcb0607f91496b13c4e142a415f2eec03a2273f3c59d3b995c8
MD5 01666ef5a7a8d0638446bf690d08ad06
BLAKE2b-256 0a009791644888a5f331b6048071b0220b49f3dc8ca0038a3b727e379e3d2aa1

See more details on using hashes here.

File details

Details for the file medscheduler-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: medscheduler-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 50.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2565cb6c63dda3a2ad987ea17efcbee0f6540ed0e4bd448b3191cef9bc3d1396
MD5 ad5cbc689547a9ebf00bfd0c47e4f147
BLAKE2b-256 684d44967d4e6ff61e6ba18c1bc05fa20486bb2a023f052c2a7c3142c50d521a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page