Skip to main content

Synthetic outpatient scheduling dataset generator (slots, patients, appointments).

Project description

Medscheduler

Synthetic Outpatient Appointment Data Generator

medscheduler creates realistic, privacy-safe outpatient datasets — including appointment calendars, patient demographics, and visit outcomes — suitable for education, analytics, and research in healthcare operations.


Features

  • End-to-end simulation
    • Generates slots, patients, and appointments tables.
    • Reproduces booking, cancellation, and rebooking dynamics.
    • Simulates punctuality and in-clinic timing (arrival, start, end, waiting time, duration).
  • Realistic defaults
    • Parameters reflect NHS England outpatient activity (2023–24) and peer-reviewed literature.
  • Configurable
    • Calendar structure (days, hours, slot density), fill rate, booking horizon, lead time, attendance outcomes, rebooking intensity, demographics, and randomness.
  • Reproducible
    • Controlled via seed and noise.
  • Lightweight
    • Minimal scientific-Python dependencies; plotting utilities are optional.

Installation

pip install medscheduler

Requires Python 3.9 or newer.


Quickstart

from medscheduler import AppointmentScheduler

# Initialize the scheduler
sched = AppointmentScheduler(
    seed=42,
    date_ranges=[("2024-01-01", "2024-12-31")],
    working_days=[0, 1, 2, 3, 4],  # Monday–Friday
    appointments_per_hour=4,       # 15-minute slots
    fill_rate=0.9
)

# Run the full pipeline
slots_df, appointments_df, patients_df = sched.generate()

# Export CSV files
sched.to_csv(
    slots_path="slots.csv",
    appointments_path="appointments.csv",
    patients_path="patients.csv",
)

Outputs:

Table Description
slots.csv Calendar capacity (one row per slot).
patients.csv Synthetic patient registry (demographics).
appointments.csv Central table combining patient, slot, timing, and outcome data.

Core Concepts

Calendar and Capacity

  • date_ranges and ref_date delimit the simulation window and separate past from future.
  • working_days, working_hours, and appointments_per_hour define slot structure and density.

Booking Dynamics

  • fill_rate controls overall utilization.
  • booking_horizon and median_lead_time shape how far ahead and how early patients book.
  • rebook_category (min, med, max) defines the probability of rebooking cancellations.

Attendance and Flow

  • status_rates determines attended / cancelled / did not attend / unknown proportions.
  • visits_per_year and first_attendance regulate repeat visits and the share of new patients.

Demographics

  • age_gender_probs, bin_size, lower_cutoff, upper_cutoff, truncated control the cohort, derived from NHS distributions by default.

Timing

  • check_in_time_mean controls early/late arrivals.
  • Durations follow a Beta(1.48, 3.6) model (mean ≈ 17 minutes).

Randomness

  • seed ensures reproducibility; noise introduces controlled variability.

API Surface (selected)

  • AppointmentScheduler.generate() — full pipeline: slots → appointments → patients
  • AppointmentScheduler.generate_slots()
  • AppointmentScheduler.generate_appointments()
  • AppointmentScheduler.assign_actual_times()
  • AppointmentScheduler.generate_patients()
  • AppointmentScheduler.assign_patients()
  • AppointmentScheduler.add_custom_column()
  • AppointmentScheduler.to_csv()

See the full API in the documentation.


Plotting Utilities

Module: medscheduler.utils.plotting

  • summarize_slots(df, scheduler, ...) — summary metrics for calendar and availability.
  • plot_population_pyramid(df, ...) — age–sex pyramid.
  • plot_past_slot_availability(slots_df, ...) — availability before ref_date (Y/Q/M/W auto-aggregation).
  • plot_future_slot_availability(slots_df, ...) — availability on/after ref_date (D/W/M).
  • plot_monthly_appointment_distribution(df) — appointments by month (%).
  • plot_weekday_appointment_distribution(df) — appointments by weekday (%).
  • plot_status_distribution_last_days(df, scheduler, days_back=30, ...) — daily status counts last N days.
  • plot_status_distribution_next_days(df, scheduler, days_ahead=30, ...) — daily status counts next N days.
  • plot_appointments_by_status(df, scheduler, ...) — past appointments by status (%).
  • plot_appointments_by_status_future(df, scheduler, ...) — future appointments by status (%).
  • plot_scheduling_interval_distribution(df, interval_col="scheduling_interval", ...) — lead-time distribution.
  • plot_appointment_duration_distribution(df, ...) — consultation duration distribution (attended only).
  • plot_waiting_time_distribution(df, ...) — waiting time distribution (attended only).
  • plot_arrival_time_distribution(df, ...) — arrival offset distribution vs scheduled time.
  • plot_first_attendance_distribution(df, scheduler, ...) — first vs. returning attendance ratio.
  • plot_custom_column_distribution(df, column_name, ...) — categorical distribution for user-added columns.
  • plot_patients_visits(df, scheduler, ...) — distribution of patient visit frequency.

All functions return a matplotlib.axes.Axes and follow a consistent, publication-grade styling.


Visualization Gallery

See the complete gallery at https://medscheduler.readthedocs.io/en/latest/visualization.


Repository Structure

medscheduler/
├─ src/medscheduler/
│  ├─ __init__.py
│  ├─ constants.py
│  ├─ scheduler.py
│  └─ utils/
│     ├─ plotting.py
│     └─ reference_data_utils.py
├─ tests/
│  └─ test_scheduler.py
├─ docs/
│  ├─ _static/logo.png
│  └─ _static/visuals/...
├─ README.md
├─ LICENSE
└─ pyproject.toml

Testing

pytest -q

The test suite covers constructor validation and generation logic. Target coverage ≥ 80%.


Documentation

Comprehensive documentation (User Guide, API Reference, Examples, and Visualization gallery) is available at:


References


License

This project is released under the MIT License. See LICENSE for details.


Citation

If this library is helpful in your work, please cite:

Carolina González Galtier. medscheduler: A synthetic outpatient appointment simulator, 2025.
GitHub: https://github.com/carogaltier/medscheduler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medscheduler-0.2.2.tar.gz (51.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medscheduler-0.2.2-py3-none-any.whl (50.0 kB view details)

Uploaded Python 3

File details

Details for the file medscheduler-0.2.2.tar.gz.

File metadata

  • Download URL: medscheduler-0.2.2.tar.gz
  • Upload date:
  • Size: 51.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c023c077632781a1687c3d7278a631ad31437c181e8df03ad0f3d5b9780c9ea1
MD5 dc3fd8f12a50216d6a13bb4500c0217f
BLAKE2b-256 7cec619e8db8b6e39132a3af26b89c65757086db03df086b50af103956c02aa9

See more details on using hashes here.

File details

Details for the file medscheduler-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: medscheduler-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 50.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for medscheduler-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0568337b1c7052ce19415788c46d4156b030923fe1c9d3514c30cd125f726fca
MD5 5a982d0953fa9c03b700905ebeb9b1ae
BLAKE2b-256 033d9dfc70fff3683f9cd7672bc5a6c04e03f3e1758c378598663275bd60814f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page