Skip to main content

GTFS Segments: A fast and efficient library to generate bus stop spacings

Project description

Tests Documentation Status PyPI version Downloads image DOI

Elsevier Stargazers Issues MIT License

Logo

GTFS Segments

A fast and efficient library to generate bus stop spacings

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

The gtfs-segments is a Python (3.9+) package that represents GTFS data for buses in a concise tabular manner using segments. The distribution of bus stop spacings can be viewed by generating histograms. The stop spacings can be visualized at the network, route, or segment level. The segment data can be exported to well-known formats such as .csv or .geojson for further analysis. Additionally, the package provides commands to download the latest data from @mobility data sources.

The package condenses the raw GTFS data by considering the services offered only on the busiest day(in the data). More discussion on the interpretation of different weightings for stop spacings, and the process in which the package condenses information can be seen in our paper. The usage of the package is detailed in documentation. The stop spacings dataset containing over 540 transit providers in the US generated using this package can be found on Harvard Dataverse.

(back to top)

Getting Started

Prerequisites

The major dependencies of this library are the following packages.

  • numpy
  • shapely
  • pandas
  • scipy
  • geopandas
  • matplotlib
  • contextily

The detailed list of package dependencies can be found in requirements.txt

Installation

Option A

Use pip to install the package.

pip install gtfs-segments

ℹ️ Windows users may have to download and install Microsoft Visual C++ distributions. Follow these instructions.

📓 Google Colab : You can install and use the gtfs-segments via google colab. Here is a tutorial to help you get started. Make a copy and get started with your work!

Option B

  1. Clone the repo

    git clone https://github.com/UTEL-UIUC/gtfs_segments.git
    
  2. Install geopandas using the following code. Read more here

    conda create -n geo_env -c conda-forge python=3.11 geopandas
    conda activate geo_env
    
  3. Install the gtfs_segments package

    cd gtfs_segments
    python setup.py install
    

(back to top)

Usage

ℹ️ For documentation, please refer to the Documentation

Import the package using

import gtfs_segments

Get GTFS Files

Fetch all sources

from gtfs_segments import fetch_gtfs_source
sources_df = fetch_gtfs_source()
sources_df.head()

Fetch source by name/provider/state

from gtfs_segments import fetch_gtfs_source
sources_df = fetch_gtfs_source(place ='Chicago')
sources_df

Automated Download

from gtfs_segments import download_latest_data
download_latest_data(sources_df,"output_folder")

Manual Download

Download the GTFS .zip files from @transitfeeds or @mobility data.

Get GTFS Segments

from gtfs_segments import get_gtfs_segments
segments_df = get_gtfs_segments("path_to_gtfs_zip_file")
# [Optional] Run in parallel using multiple CPU cores
segments_df = get_gtfs_segments("path_to_gtfs_zip_file", parallel = True)

Alternatively, filter a specific agency by passing agency_id as a string or multiple agencies as list ["SFMTA",]

segments_df = get_gtfs_segments("path_to_gtfs_zip_file",agency_id = "SFMTA")
segments_df
Table generated by gtfs-segments using data from San Francisco’s Muni system. Each row contains the following columns:
  1. segment_id: the segment's identifier, produced by gtfs-segments
  2. stop_id1: the identifier of the segment's beginning stop. The identifier is the same one the agency has chosen in the stops.txt file of its GTFS package.
  3. stop_id2: The identifier of the segment's ending stop.
  4. route_id: The same route ID listed in the agency's routes.txt file.
  5. direction_id: The route's direction identifier.
  6. traversals: The number of times the indicated route traverses the segment during the "measurement interval." The "measurement interval" chosen is the busiest day in the GTFS schedule: the day which has the most bus services running.
  7. distance: The length of the bus segment in meters.
  8. geometry: The segment's LINESTRING (a format for encoding geographic paths). All geometries are re-projected onto Mercator (EPSG:4326/WGS84) to maintain consistency.
  9. traversal_time: The time (in seconds) that it takes for the bus to traverse the segment.
  10. speed: The speed of the bus (in kmph) while traversing the segment. Default to np.inf♾ in case traversal_time is zero.

Each row does not represent one segment. Rather, each row maps to a combination of a segment, a route that includes that segment, and a direction. For instance, a segment included in eight routes will appear as eight rows, which will have the same information except for route_id and traversals (since some routes might traverse the segment more than others). This choice enables filtering by route and preserves how many times each route traverses each segment during the measurement interval. The direction identifier is used for very rare cases (mostly loops) in which a route visits the same two stops, in the same order, but in different directions.

Visualize Spacings

Visualize stop spacings at network, route and segments levels along with basemaps and stop locations.

ℹ️ For more information on visualization refer to the Visualization Tutorial

ℹ️ Alternatively, use view_spacings_interactive to view the stop spacings interactively.

from gtfs_segments import view_spacings
view_spacings(segments_df,route = '8',segment = '6364-3725-1',basemap=True)

Heatmap

View the heatmap of stop spacings ("distance" as metric). Use Diverging colormaps to highlight narrow and wide spacings. Set light_mode = False for Dark mode.

from gtfs_segments import view_heatmap
f = view_heatmap(df, cmap='RdBu', light_mode=True)
view_heatmap(df, cmap="YlOrRd", interactive=True, light_mode=False)

Plot Distributions

from gtfs_segments import plot_hist
plot_hist(segments_df, max_spacing = 1200)
Optionally save figures using
plot_hist(segments_df,file_path = "spacings_hist.png",save_fig = True)

Summary Statistics

Get Network Summary Stats

from gtfs_segments import summary_stats
summary_stats(segments_df,max_spacing = 3000,export = True,file_path = "summary.csv")

Get Route Summary Stats

from gtfs_segments import get_route_stats,get_bus_feed
feed = get_bus_feed('path_to_gtfs.zip')
get_route_stats(feed)

Here each row contains the following columns:

  1. route: The route_id for the route of interest
  2. direction: The direction_id of the route
  3. route_length: The total length of the route. Units: Kilometers (Km)
  4. total time: The total scheduled time to travel the whole route. Units: Hours (Hr)
  5. headway: The average headway between consecutive buses for the route. A NaN indicates only 1 trip. Units: Hours (Hr)
  6. peak_buses: The 15-minute interval where the route has the maximum number of buses concurrently running.
  7. average_speed: The average speed of the bus along the route. Units: Kmph
  8. n_bus_avg: The average number of buses concurrently running
  9. bus_spacing: The average spacing (in distance) between consecutive buses. Units: Kilometers (Km)
  10. stop_spacing: The average distance between two consecutive stops. Units: Kilometers (Km)

Download Segments Data

Download the data as either .csv or .geojson

from gtfs_segments import export_segments
export_segments(segments_df,'filename', output_format ='geojson')
# Get csv without geometry
export_segments(segments_df,'filename', output_format ='csv',geometry = False)

(back to top)

Roadmapackage

  • Add interactive visualization with folium
  • Visualize catchment areas for stops
  • Log trips that do not have shapes

See the open issues for a full list of proposed features (and known issues).

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

Citing gtfs-segments

If you use gtfs-segments in your research please use the following BibTeX entry:

@software{devunuri_gtfs_segments,
author = {Devunuri, Saipraneeth and Lehe, Lewis},
doi = {10.5281/zenodo.10019419},
month = Oct,
title = {{GTFS Segments: A fast and efficient library to generate bus stop spacings}},
url = {https://github.com/UTEL-UIUC/gtfs_segments},
version = {2.0.3},
year = {2023}
}

Citing stop spacings paper

If you use stop spacings paper in your research please use the following BibTeX entry:

@article{Devunuri2024,
  title = {Bus Stop Spacing Statistics: {{Theory}} and Evidence},
  shorttitle = {Bus Stop Spacing Statistics},
  author = {Devunuri, Saipraneeth and Lehe, Lewis J. and Qiam, Shirin and Pandey, Ayush and Monzer, Dana},
  year = {2024},
  month = jan,
  journal = {Journal of Public Transportation},
  volume = {26},
  pages = {100083},
  issn = {1077-291X},
  doi = {10.1016/j.jpubtr.2024.100083},
  url = {https://www.sciencedirect.com/science/article/pii/S1077291X24000031},
  urldate = {2024-03-07},
  keywords = {Bus stop,GTFS,Public Transit,Stop Spacings,Transit Planning}
}

(back to top)

Contributing

Contributions are what makes the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

For more information refer to CONTRIBUTING.md

Contact

Saipraneeth Devunuri - @praneethDevunu1 - sd37@illinois.edu

Project Link: https://github.com/UTEL-UIUC/gtfs_segments

Acknowledgments

  • Parts of the code use the Partridge library
  • Do check out gtfs_functions which was an inspiration for this project
  • Shoutout to Mobility Data for compiling GTFS from around the globe and constantly maintaining them

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtfs_segments-2.1.7.tar.gz (25.5 MB view details)

Uploaded Source

Built Distribution

gtfs_segments-2.1.7-py3-none-any.whl (42.8 kB view details)

Uploaded Python 3

File details

Details for the file gtfs_segments-2.1.7.tar.gz.

File metadata

  • Download URL: gtfs_segments-2.1.7.tar.gz
  • Upload date:
  • Size: 25.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for gtfs_segments-2.1.7.tar.gz
Algorithm Hash digest
SHA256 47a457d5a2f1d126a7f1cc492a838f213fb43b0f52dc55d3417f3e397657979e
MD5 62fdd0deaf178597272afe69f390bc28
BLAKE2b-256 3ca0f2a2bb1f15e9f4baa9eabdc97f1faf972e218be74b9b6fd135c85a19ac3f

See more details on using hashes here.

File details

Details for the file gtfs_segments-2.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for gtfs_segments-2.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 9aac6e80bd28c73b2b11e4532216cc64fed7066fcfd0581fa22a40370832ef89
MD5 1415a19da13ccbc326e95c04b1aa8d9c
BLAKE2b-256 9474a56a4934ebf0d34641222d64a8ebd58f54f5b7d00354b206cbeef3d3504d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page