Skip to main content

Realistic flight schedule generation from empirical operational records.

Project description

ROSTER - Realistic Operational Schedules Through Empirical Records

PyPI version Python versions License: GPLv3

ROSTER is a Python package for generating realistic flight schedules from historical data. It is designed for ATM researchers that require synthetic schedules (with or without modification) for simulation purposes.

The goal of ROSTER is to provide a transparent and reproducible pipeline for building these synthetic schedules, which preserve important operational structure: airline and wake-category mixes, airport and route usage, scheduled flight times and turnarounds, and fleet initial conditions.

Installation

ROSTER requires Python 3.12 or newer. Install the latest released package with:

python -m pip install roster-generator

The installed import package is named roster_generator:

import roster_generator

For local development from a source checkout:

cd roster
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"

Runtime dependencies are declared in pyproject.toml and include pandas, numpy, airportsdata, aircraft-list, and pytz. Development and release tools, including pytest, build, and twine, are available through the optional dev extra.

Input Data

The main ROSTER pipeline expects a cleaned flight schedule with this normalized schema:

DEP_ICAO, ARR_ICAO, STD_REFTZ, STA_REFTZ, ATD_REFTZ, ATA_REFTZ,
AC_OPER, AC_REG, AC_WAKE

Any dataset with those columns can be used directly as schedule_file in PipelineConfig. The built-in cleaning step is optional: it is a convenience utility for converting EUROCONTROL-style flight records into the normalized ROSTER input format.

For that optional cleaner, the raw EUROCONTROL-style file must include, at minimum, the following source columns:

ADEP, ADES, FILED OFF BLOCK TIME, FILED ARRIVAL TIME,
ACTUAL OFF BLOCK TIME, ACTUAL ARRIVAL TIME,
AC Type, AC Operator, AC Registration

The cleaner validates airport ICAO codes, parses timestamps, adds ICAO wake turbulence categories from aircraft type data, and writes the normalized schema above.

Using BTS Data

ROSTER can also clean BTS on-time performance downloads into the same normalized schema. Download the BTS source data yourself, extract the CSV files into BTS/, and run the BTS cleaning tutorial:

mkdir -p BTS
# Put the extracted BTS on-time schedule CSV and aircraft inventory CSV in BTS/.
# The tutorial auto-detects BTS/on_time.csv, BTS/aircraft.csv, or the default
# extracted BTS filenames shown below.
python tutorials/tutorial_bts_cleaning.py
python main.py --schedule-file input/bts_clean.csv --seed 42 --suffix bts

You can also call the cleaner directly by passing both extracted CSV paths:

python -m roster_generator.data_cleaning.clean_bts \
  BTS/On_Time_Reporting_Carrier_On_Time_Performance_\(1987_present\)_2024_12.csv \
  BTS/T_F41SCHEDULE_B43.csv \
  --output input/bts_clean.csv

For example, from a user workspace such as /home/josu/Escritorio/main_pc/roster-workspace, the command above expects the extracted CSV files under /home/josu/Escritorio/main_pc/roster-workspace/BTS.

The two required BTS CSV inputs are:

Need BTS table Where to get it Filename
Required Reporting Carrier On-Time Performance (1987-present) schedule https://transtats.bts.gov/PREZIP/ Extracted CSV, for example On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2024_12.csv
Required Schedule B-43 aircraft inventory https://www.transtats.bts.gov/DL_SelectFields.aspx?QO_fu146_anzr=Nv4+Pn44vr4+Sv0n0pvny&gnoyr_VQ=GEH Extracted CSV, for example T_F41SCHEDULE_B43.csv

BTS on-time field definitions are documented at https://transtats.bts.gov/Fields.asp?gnoyr_VQ=FGJ. BTS Schedule B-43 fields are documented at https://www.transtats.bts.gov/Fields.asp?gnoyr_VQ=GEH.

ROSTER does not download, cache, or read zipped BTS support data. Both the on-time schedule CSV and the Schedule B-43 aircraft inventory CSV are mandatory.

BTS stores its scheduled and actual clock times as local hhmm values. The BTS cleaner localizes those times using each airport timezone, converts them into the normalized ROSTER timestamp convention, and writes the standard cleaned columns shown above.

Quick Start

Run the full tutorial pipeline from a source checkout:

python tutorials/tutorial_pipeline.py --seed 42 --suffix demo

The seed controls stochastic sampling. The optional suffix is appended to generated files, so --suffix demo produces outputs such as schedule_demo.csv.

Pipeline

The standard tutorial workflow runs the following stages:

  1. Optionally clean historical flight records into the normalized ROSTER schema.
  2. Build empirical Markov transition tables and sample fleet initial conditions.
  3. Analyze scheduled turnaround and flight-time distributions.
  4. Generate auxiliary simulator input files for airlines, airports, fleet, and routes.
  5. Generate a synthetic schedule through greedy forward construction with airport capacity checks.

Intermediate analysis files are written under computed/, including initial_conditions, markov, scheduled flight-time distributions, and turnaround profiles. Simulator-facing outputs are written under output/, including airlines, airports, fleet, routes, phys_ta, and schedule.

Configuration

Runtime window behavior can be configured in tutorials/params.yaml:

REFTZ: UTC
WINDOW_START: "00:00"
WINDOW_LENGTH_HOURS: 24
ACTUAL_TIMES: false

REFTZ defines the reference timezone used for time-of-day and day-boundary logic. WINDOW_START and WINDOW_LENGTH_HOURS define the simulated operating window. ACTUAL_TIMES controls whether actual timestamp columns are required and used by stages that support them.

Programmatic workflows use roster_generator.PipelineConfig to define input paths, output paths, random seed, suffix, time-window settings, and optional manipulation callbacks.

Schedule manipulation

ROSTER supports controlled scenario manipulation without editing the generated analysis tables by hand:

  • manipulation_fn modifies scalar distribution parameters at runtime, such as turnaround distributions, fleet-size parameters, route durations, physical turnaround minima, and prior-day probabilities.
  • markov_manipulation_fn receives a MarkovContext and reweights existing destination probabilities before each Markov row is normalized.

See tutorials/tutorial_manipulation.py for a worked example of both hooks.

Some Features Of ROSTER

  • Written in Python 3 and installable with pip.
  • Reproducible stochastic generation through explicit random seeds.
  • Empirical Markov transition models for aircraft continuation behavior.
  • Synthetic fleet initial-condition sampling from historical records.
  • Reference-timezone operating windows with configurable start time and length.
  • Wake-category handling based on aircraft type data.
  • Scheduled turnaround and flight-time distribution analysis.
  • Airport capacity-aware schedule construction.
  • CSV outputs suitable for downstream simulation workflows.
  • Unit tests covering cleaning, configuration, distributions, Markov models, auxiliary files, and schedule generation.

Testing

Run the test suite from the project root:

python -m pytest

Contributions

Contributions are welcome from researchers and developers. Please keep changes focused, add or update tests for behavioural changes.

License

ROSTER is distributed under the GNU General Public License v3. See LICENSE for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roster_generator-0.2.7.tar.gz (116.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roster_generator-0.2.7-py3-none-any.whl (85.9 kB view details)

Uploaded Python 3

File details

Details for the file roster_generator-0.2.7.tar.gz.

File metadata

  • Download URL: roster_generator-0.2.7.tar.gz
  • Upload date:
  • Size: 116.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for roster_generator-0.2.7.tar.gz
Algorithm Hash digest
SHA256 bd3f61d8f5861d46aa7fbf506aafa81b8650223f56e35299e276be796a1ea99b
MD5 ee85a814e6e5fd7a5b7dc1985ab190b2
BLAKE2b-256 ddeefb16355f2d285db231ea27399c8c2e6202ac1ec1548329a98269e1c67a08

See more details on using hashes here.

File details

Details for the file roster_generator-0.2.7-py3-none-any.whl.

File metadata

File hashes

Hashes for roster_generator-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 6b69aa8e71eb31dd51043c766b89d2c0a8284865c161e1c46cfbe910b935fb29
MD5 670a42958e5b4837b2ff5b5322684f71
BLAKE2b-256 9483f55bf632df8a3e006b80e33fc6b2a254616934b209ca719cf4c830f73a0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page