Skip to main content

Synthetic outpatient scheduling dataset generator (slots, patients, appointments).

Project description

Medscheduler

Synthetic Outpatient Appointment Data Generator

medscheduler creates realistic, privacy-safe outpatient datasets — including appointment calendars, patient demographics, and visit outcomes — suitable for education, analytics, and research in healthcare operations.


Features

  • End-to-end simulation
    • Generates slots, patients, and appointments tables.
    • Reproduces booking, cancellation, and rebooking dynamics.
    • Simulates punctuality and in-clinic timing (arrival, start, end, waiting time, duration).
  • Realistic defaults
    • Parameters reflect NHS England outpatient activity (2023–24) and peer-reviewed literature.
  • Configurable
    • Calendar structure (days, hours, slot density), fill rate, booking horizon, lead time, attendance outcomes, rebooking intensity, demographics, and randomness.
  • Reproducible
    • Controlled via seed and noise.
  • Lightweight
    • Minimal scientific-Python dependencies; plotting utilities are optional.

Installation

pip install medscheduler

Requires Python 3.9 or newer.


Quickstart

from medscheduler import AppointmentScheduler

# Initialize the scheduler
sched = AppointmentScheduler(
    seed=42,
    date_ranges=[("2024-01-01", "2024-12-31")],
    working_days=[0, 1, 2, 3, 4],  # Monday–Friday
    appointments_per_hour=4,       # 15-minute slots
    fill_rate=0.9
)

# Run the full pipeline
slots_df, appointments_df, patients_df = sched.generate()

# Export CSV files
sched.to_csv(
    slots_path="slots.csv",
    appointments_path="appointments.csv",
    patients_path="patients.csv",
)

Outputs:

Table Description
slots.csv Calendar capacity (one row per slot).
patients.csv Synthetic patient registry (demographics).
appointments.csv Central table combining patient, slot, timing, and outcome data.

Core Concepts

Calendar and Capacity

  • date_ranges and ref_date delimit the simulation window and separate past from future.
  • working_days, working_hours, and appointments_per_hour define slot structure and density.

Booking Dynamics

  • fill_rate controls overall utilization.
  • booking_horizon and median_lead_time shape how far ahead and how early patients book.
  • rebook_category (min, med, max) defines the probability of rebooking cancellations.

Attendance and Flow

  • status_rates determines attended / cancelled / did not attend / unknown proportions.
  • visits_per_year and first_attendance regulate repeat visits and the share of new patients.

Demographics

  • age_gender_probs, bin_size, lower_cutoff, upper_cutoff, truncated control the cohort, derived from NHS distributions by default.

Timing

  • check_in_time_mean controls early/late arrivals.
  • Durations follow a Beta(1.48, 3.6) model (mean ≈ 17 minutes).

Randomness

  • seed ensures reproducibility; noise introduces controlled variability.

API Surface (selected)

  • AppointmentScheduler.generate() — full pipeline: slots → appointments → patients
  • AppointmentScheduler.generate_slots()
  • AppointmentScheduler.generate_appointments()
  • AppointmentScheduler.assign_actual_times()
  • AppointmentScheduler.generate_patients()
  • AppointmentScheduler.assign_patients()
  • AppointmentScheduler.add_custom_column()
  • AppointmentScheduler.to_csv()

See the full API in the documentation.


Plotting Utilities

Module: medscheduler.utils.plotting

  • summarize_slots(df, scheduler, ...) — summary metrics for calendar and availability.
  • plot_population_pyramid(df, ...) — age–sex pyramid.
  • plot_past_slot_availability(slots_df, ...) — availability before ref_date (Y/Q/M/W auto-aggregation).
  • plot_future_slot_availability(slots_df, ...) — availability on/after ref_date (D/W/M).
  • plot_monthly_appointment_distribution(df) — appointments by month (%).
  • plot_weekday_appointment_distribution(df) — appointments by weekday (%).
  • plot_status_distribution_last_days(df, scheduler, days_back=30, ...) — daily status counts last N days.
  • plot_status_distribution_next_days(df, scheduler, days_ahead=30, ...) — daily status counts next N days.
  • plot_appointments_by_status(df, scheduler, ...) — past appointments by status (%).
  • plot_appointments_by_status_future(df, scheduler, ...) — future appointments by status (%).
  • plot_scheduling_interval_distribution(df, interval_col="scheduling_interval", ...) — lead-time distribution.
  • plot_appointment_duration_distribution(df, ...) — consultation duration distribution (attended only).
  • plot_waiting_time_distribution(df, ...) — waiting time distribution (attended only).
  • plot_arrival_time_distribution(df, ...) — arrival offset distribution vs scheduled time.
  • plot_first_attendance_distribution(df, scheduler, ...) — first vs. returning attendance ratio.
  • plot_custom_column_distribution(df, column_name, ...) — categorical distribution for user-added columns.
  • plot_patients_visits(df, scheduler, ...) — distribution of patient visit frequency.

All functions return a matplotlib.axes.Axes and follow a consistent, publication-grade styling.


Visualization Gallery

Below are examples of the default visualization set produced by medscheduler.utils.plotting.





See the complete gallery at https://medscheduler.readthedocs.io/en/latest/visualization.


Repository Structure

medscheduler/
├─ src/medscheduler/
│  ├─ __init__.py
│  ├─ constants.py
│  ├─ scheduler.py
│  └─ utils/
│     ├─ plotting.py
│     └─ reference_data_utils.py
├─ tests/
│  └─ test_scheduler.py
├─ docs/
│  ├─ _static/logo.png
│  └─ _static/visuals/...
├─ README.md
├─ LICENSE
└─ pyproject.toml

Testing

pytest -q

The test suite covers constructor validation and generation logic. Target coverage ≥ 80%.


Documentation

Comprehensive documentation (User Guide, API Reference, Examples, and Visualization gallery) is available at:


References


License

This project is released under the MIT License. See LICENSE for details.


Citation

If this library is helpful in your work, please cite:

Carolina González Galtier. medscheduler: A synthetic outpatient appointment simulator, 2025.
GitHub: https://github.com/carogaltier/medscheduler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medscheduler-0.2.1.tar.gz (51.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medscheduler-0.2.1-py3-none-any.whl (50.1 kB view details)

Uploaded Python 3

File details

Details for the file medscheduler-0.2.1.tar.gz.

File metadata

  • Download URL: medscheduler-0.2.1.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4c723dc780575c287834f1ec37bab5ed012fbe4f396545764518ebe482dac1fa
MD5 3c7763ae91f03ecbe90f648d2ca14264
BLAKE2b-256 5ad49c829663ef9778f7dd3eaf13ed767f5d30a6a24de0e218ec681617165ef1

See more details on using hashes here.

File details

Details for the file medscheduler-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: medscheduler-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 50.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4aba134eb34e6dea536e81a0dd12e85bd8b1b1a97bc62c8951d754224b5ae7a5
MD5 c88317617fa931f4581b9d1b6a0e65a8
BLAKE2b-256 7872b7fba936f80220c601730ed793a3cb4874f7718b24781788e414c1f637e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page