Skip to main content

Package to easily wrangle GTFS files geospatially.

Project description

GTFS functions

This package allows you to create various layers directly from the GTFS and visualize the results in the most straightforward way possible.

Update November 2023:

  • Possibility to check the service_id for a given date:
parsed_calendar = Feed(gtfs_path).parse_calendar()

or if you want it already grouped by date:

date_service = Feed(gtfs_path).get_dates_service_id()

Update August 2023:

  • Possibility to parse the GTFS for a specific date range.
feed = Feed(gtfs_path, start_date='2023-03-31', end_date='2023-04-04')

Update March 2023:

  • Removed dependency with partridge. As much as we love this package and think it is absolutely great, removing a dependency gives us more control and keeps this package from failing whenever something changes in partridge.
  • We treat the GTFS as a class, where each file is a property. See examples below to find out how to work with it. We hope this simplifies your code.
  • Fixed and enhanced segment cutting. Shout out to Mattijs De Paepe
  • Support to identify route patterns!! Check it out using feed.routes_patterns. Shout out to Tobias Bartsch
  • The rest should stay the same.

Warning!

Make sure stop_times.txt has no Null values in the columns arrival_time and departure_time. If this is not the case, some functions on this package might fail.

Table of contents

Python version

The package requires python>=3.8. You can create a new environment with this version using conda:

conda create -n new-env python=3.8

Installation

You can install the package running the following in your console:

pip install gtfs_functions

Import the package in your script/notebook

from gtfs_functions import Feed

GTFS Import

Now you can interact with your GTFS with the class Feed. Take a look at the class with ?Feed to check what arguments you can specify.

gtfs_path = 'data/sfmta.zip'

# It also works with URL's
gtfs_path = 'https://transitfeeds.com/p/sfmta/60/latest/download'

feed = Feed(gtfs_path, time_windows=[0, 6, 10, 12, 16, 19, 24])
routes = feed.routes
routes.head(2)
route_id agency_id route_short_name route_long_name route_desc route_type route_url route_color route_text_color
0 15761 SFMTA 1 CALIFORNIA 3 https://SFMTA.com/1
1 15766 SFMTA 5 FULTON 3 https://SFMTA.com/5
stops = feed.stops
stops.head(2)
stop_id stop_code stop_name stop_desc zone_id stop_url geometry
0 390 10390 19th Avenue & Holloway St POINT (-122.47510 37.72119)
1 3016 13016 3rd St & 4th St POINT (-122.38979 37.77262)
stop_times = feed.stop_times
stop_times.head(2)
trip_id arrival_time departure_time stop_id stop_sequence stop_headsign pickup_type drop_off_type shape_dist_traveled route_id service_id direction_id shape_id stop_code stop_name stop_desc zone_id stop_url geometry
0 9413147 81840.0 81840.0 4015 1 NaN NaN 15761 1 0 179928 14015 Clay St & Drumm St POINT (-122.39682 37.79544)
1 9413147 81902.0 81902.0 6294 2 NaN NaN 15761 1 0 179928 16294 Sacramento St & Davis St POINT (-122.39761 37.79450)
trips = feed.trips
trips.head(2)
trip_id route_id service_id direction_id shape_id
0 9547346 15804 1 0 180140
1 9547345 15804 1 0 180140
shapes = feed.shapes
shapes.head(2)
shape_id geometry
0 179928 LINESTRING (-122.39697 37.79544, -122.39678 37...
1 179929 LINESTRING (-122.39697 37.79544, -122.39678 37...

Stop frequencies

Returns a geodataframe with the frequency for each combination of stop, time of day and direction. Each row with a Point geometry. The user can optionally specify cutoffs as a list in case the default is not good. These cutoffs should be specified at the moment of reading the Feed class. These cutoffs are the times of days to use as aggregation.

time_windows = [0, 6, 9, 15.5, 19, 22, 24]

feed = Feed(gtfs_path, time_windows=time_windows)
stop_freq = feed.stops_freq
stop_freq.head(2)
stop_id dir_id window ntrips min_per_trip stop_name geometry
8157 5763 Inbound 0:00-6:00 1 360 Noriega St & 48th Ave POINT (-122.50785 37.75293)
13102 7982 Outbound 0:00-6:00 1 360 Moscow St & RussiaAvet POINT (-122.42996 37.71804)
9539 6113 Inbound 0:00-6:00 1 360 Portola Dr & Laguna Honda Blvd POINT (-122.45526 37.74310)
12654 7719 Inbound 0:00-6:00 1 360 Middle Point & Acacia POINT (-122.37952 37.73707)
9553 6116 Inbound 0:00-6:00 1 360 Portola Dr & San Pablo Ave POINT (-122.46107 37.74040)

Line frequencies

Returns a geodataframe with the frequency for each combination of line, time of day and direction. Each row with a LineString geometry. The user can optionally specify cutoffs as a list in case the default is not good. These cutoffs should be specified at the moment of reading the Feed class. These cutoffs are the times of days to use as aggregation.

line_freq = feed.lines_freq
line_freq.head()
route_id route_name dir_id window min_per_trip ntrips geometry
376 15808 44 O'SHAUGHNESSY Inbound 0:00-6:00 360 1 LINESTRING (-122.46459 37.78500, -122.46352 37...
378 15808 44 O'SHAUGHNESSY Inbound 0:00-6:00 360 1 LINESTRING (-122.43416 37.73355, -122.43299 37...
242 15787 25 TREASURE ISLAND Inbound 0:00-6:00 360 1 LINESTRING (-122.39611 37.79013, -122.39603 37...
451 15814 54 FELTON Inbound 0:00-6:00 360 1 LINESTRING (-122.38845 37.73994, -122.38844 37...
241 15787 25 TREASURE ISLAND Inbound 0:00-6:00 360 1 LINESTRING (-122.39542 37.78978, -122.39563 37...

Bus segments

Returns a geodataframe where each segment is a row and has a LineString geometry.

segments_gdf = feed.segments
segments_gdf.head(2)
route_id direction_id stop_sequence start_stop_name end_stop_name start_stop_id end_stop_id segment_id shape_id geometry distance_m
0 15761 0 1 Clay St & Drumm St Sacramento St & Davis St 4015 6294 4015-6294 179928 LINESTRING (-122.39697 37.79544, -122.39678 37... 205.281653
1 15761 0 2 Sacramento St & Davis St Sacramento St & Battery St 6294 6290 6294-6290 179928 LINESTRING (-122.39761 37.79446, -122.39781 37... 238.047505

Scheduled Speeds

Returns a geodataframe with the speed_kmh for each combination of route, segment, time of day and direction. Each row with a LineString geometry. The user can optionally specify cutoffs as explained in previous sections.

# Cutoffs to make get hourly values
speeds = feed.avg_speeds
speeds.head(1)
route_id route_name direction_id segment_id window speed_kmh start_stop_id start_stop_name end_stop_id end_stop_name distance_m stop_sequence runtime_sec segment_max_speed_kmh geometry
0 15761 1 CALIFORNIA Inbound 4015-6294 10:00-11:00 12.0 4015 Clay St & Drumm St 6294 Sacramento St & Davis St 205.281653 1 61.9 12.0 LINESTRING (-122.39697 37.79544, -122.39678 37...

Segment frequencies

segments_freq = feed.segments_freq
segments_freq.head(2)
route_id route_name direction_id segment_name window min_per_trip ntrips start_stop_id start_stop_name end_stop_name geometry
23191 ALL_LINES All lines NA 3628-3622 0:00-6:00 360 1 3628 Alemany Blvd & St Charles Ave Alemany Blvd & Arch St LINESTRING (-122.46949 37.71045, -122.46941 37...
6160 15787 25 TREASURE ISLAND Inbound 7948-8017 0:00-6:00 360 1 7948 Transit Center Bay 29 Shoreline Access Road LINESTRING (-122.39611 37.79013, -122.39603 37...

Map your work

Stop frequencies

# Stops
from gtfs_functions.gtfs_plots import map_gdf

condition_dir = stop_freq.dir_id == 'Inbound'
condition_window = stop_freq.window == '6:00-9:00'

gdf = stop_freq.loc[(condition_dir & condition_window),:].reset_index()

map_gdf(
  gdf = gdf, 
  variable = 'ntrips', 
  colors = ["#d13870", "#e895b3" ,'#55d992', '#3ab071', '#0e8955','#066a40'], 
  tooltip_var = ['min_per_trip'] , 
  tooltip_labels = ['Frequency: '], 
  breaks = [10, 20, 30, 40, 120, 200]
)

stops

Line frequencies

# Line frequencies
from gtfs_functions.gtfs_plots import map_gdf

condition_dir = line_freq.direction_id == 'Inbound'
condition_window = line_freq.window == '6:00-9:00'

gdf = line_freq.loc[(condition_dir & condition_window),:].reset_index()

map_gdf(
  gdf = gdf, 
  variable = 'ntrips', 
  colors = ["#d13870", "#e895b3" ,'#55d992', '#3ab071', '#0e8955','#066a40'], 
  tooltip_var = ['route_name'] , 
  tooltip_labels = ['Route: '], 
  breaks = [5, 10, 20, 50]
)

line

Speeds

If you are looking to visualize data at the segment level for all lines I recommend you go with something more powerful like kepler.gl (AKA my favorite data viz library). For example, to check the scheduled speeds per segment:

# Speeds
import keplergl as kp
m = kp.KeplerGl(data=dict(data=speeds, name='Speed Lines'), height=400)
m

kepler_speeds

Segment frequencies

# Segment frequencies
import keplergl as kp
m = kp.KeplerGl(data=dict(data=seg_freq, name='Segment frequency'), height=400)
m

kepler_segment_freq

Other plots

Histogram

# Histogram
import plotly.express as px
px.histogram(
    stop_freq.loc[stop_freq.min_per_trip<50], 
    x='frequency', 
    title='Stop frequencies',
    template='simple_white', 
    nbins =20)

histogram

Heatmap

# Heatmap
import plotly.graph_objects as go
dir_0 = speeds.loc[(speeds.dir_id=='Inbound')&(speeds.route_name=='1 CALIFORNIA')].sort_values(by='stop_sequence') 
dir_0['hour'] = dir_0.window.apply(lambda x: int(x.split(':')[0]))
dir_0.sort_values(by='hour', ascending=True, inplace=True)

fig = go.Figure(data=go.Heatmap(
                   z=dir_0.speed_kmh,
                   y=dir_0.start_stop_name,
                   x=dir_0.window,
                   hoverongaps = False,
                   colorscale=px.colors.colorbrewer.RdYlBu, 
                   reversescale=False
))

fig.update_yaxes(title_text='Stop', autorange='reversed')
fig.update_xaxes(title_text='Hour of day', side='top')
fig.update_layout(showlegend=False, height=600, width=1000,
                 title='Speed heatmap per direction and hour of the day')

fig.show()

heatmap

Line chart

by_hour = speeds.pivot_table('speed_kmh', index = ['window'], aggfunc = ['mean','std'] ).reset_index()
by_hour.columns = ['_'.join(col).strip() for col in by_hour.columns.values]
by_hour['hour'] = by_hour.window_.apply(lambda x: int(x.split(':')[0]))
by_hour.sort_values(by='hour', ascending=True, inplace=True)

# Scatter
fig = px.line(by_hour, 
           x='window_', 
           y='mean_speed_kmh', 
           template='simple_white', 
           #error_y = 'std_speed_kmh'
                )

fig.update_yaxes(rangemode='tozero')

fig.show()

line_chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtfs_functions-2.5.tar.gz (965.0 kB view details)

Uploaded Source

Built Distribution

gtfs_functions-2.5-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file gtfs_functions-2.5.tar.gz.

File metadata

  • Download URL: gtfs_functions-2.5.tar.gz
  • Upload date:
  • Size: 965.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for gtfs_functions-2.5.tar.gz
Algorithm Hash digest
SHA256 bbb5ceb96ecda81c871b65b7e625c4dd8fefa28cd9666f49ac58c1036b6130b3
MD5 7f34947846d86af79e61b701499d606b
BLAKE2b-256 aa0d5261e5e81c87cde5be8e4a906d7211fb041ea6fe67109bb999d232c029eb

See more details on using hashes here.

File details

Details for the file gtfs_functions-2.5-py3-none-any.whl.

File metadata

  • Download URL: gtfs_functions-2.5-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for gtfs_functions-2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 fe7cc6995fbf9a0a37be1a6bab4fa89ebfaa7dd50cc9221e672cbe86eb8c3651
MD5 a6561a8040d6e521e6bee1ba415d4439
BLAKE2b-256 4b8573dd9da1a7933f718aeb670999f0baaacccee0c4237ef701f7b594d5d804

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page