A Python 3.5+ tool kit that analyzes GTFS data
GTFSTK is a Python 3.5+ tool kit for analyzing General Transit Feed Specification (GTFS) data in memory without a database. It uses Pandas and Shapely to do the heavy lifting.
Using Pipenv, do pipenv install gtfstk.
You can play with ipynb/examples.ipynb in a Jupyter notebook
Documentation is in docs/ and also on RawGit here.
- Development status is Alpha
- This project uses semantic versioning
- Thanks to MRCagney for donating to this project
- Constructive feedback and code contributions welcome
- Bugfixed geometrize_stops which was putting some NaNs in the geometry column
- Added trip direction arrows to maps produced by map_trips
- Fixed bug HTML-escaping apostrophes in make_html
- Added map_trips which works like map_routes
- Changed route_to_geojson to return LineStrings instead of a MultiLineString and added a date keyword argurment
- Changed shapes_to_geojson to accept an optional list of shape IDs to restrict to
- Added map_routes function to draw routes and their stops on a Folium map, if Folium is installed
- Inserted stars in function signatures to separate boolean keyword arguments. Is this a breaking change? I say no, but it’s debatable.
- Changed compute_trip_stats to accept an optional list of route IDs to restrict to
- Clarified the doctstrings of compute_route_stats and compute_route_time_series to note that those functions can accept slices of trip stats
- Changed compute_stop_stats and compute_stop_time_series to accept an optional list of stop IDs
- Stopped drop_zombies from dropping stops with location type 1 or 2
- Changed CRS_WGS84 to WGS84 and removed the no_defs key to agree with GeoPandas’s WGS84 CRS
- Replaced some None outputs with empty dictionary outputs where appropriate, e.g. in build_shape_by_geometry
- Bugfixed the get_dates() function. It was throwing an error when the calendar or calendar_dates table was empty.
- Bugfixed the stats and time series functions. They were throwing errors in the edge case where all the given dates had no active trips.
- Bugfixed combine_time_series(). Its direction ID column names were '0' and '1' but should be 0 and 1.
- Added informative printing for Feeds
- Removed the time_it decorator in favor of IPython’s %time magic .
- Inspired by the Transitland Dispatcher, added the summarize function and the list_gtfs function
- Extended several functions to accept date lists, a breaking change for the outputs of those functions. For example, now you can compute feed stats for the entire feed period more easily and quickly (by memoizing active trip IDs) than computing the stats separately for each date.
- By popular demand, redefined the num_trips indicator in route and feed time series to be the number of unique trips active in a time bin instead of the time weighted average thereof
- Removed columns from empty DataFrames returned by compute_route_stats etc.
- Elaborated docstrings
- Updated the installation requirements in setup.py
- Fixed the bug where setup.py could not find the license file
- Finally knuckled down and wrote a GTFS validator: validators.py. It’s basic, easy to read, and, thanks to Pandas, fast. It checks this 31 MB Southeast Queensland feed in 22 seconds on my 2.8-GHz-processor-16-GB-memory computer. With the same computer and feed and in fast mode (--memory_db), Google’s GTFS validator takes 420 seconds. That’s about 19 times slower. Part of the latter validator’s slowness is its many checks beyond the GTFS, such as checks for too fast travel between every pair of stop times.
- Moved all but the most basic Feed methods into other modules grouped by theme, routes.py, stops.py, etc. Eases reading and additionally exposes the methods as functions on feeds, like in the GTFSTK versions before 7.0.0.
- Speeded up miscellany.py::asssess_quality
- Refactored constants.py
- Renamed some functions
- Rewrote most feed functions as Feed methods
- Rewrote tests for pytest
- Removed some miscellaneous functions, such as plotting functions
- Changed feed.read_gtfs to unzip to temporary directory
- Enabled feed.write_gtfs to write to a directory
- Improved function names, e.g. compute_trips_stats -> compute_trip_stats
- Added functions to cleaner.py and changed cleaning function outputs to feed instances
- Made feed.copy a method
- Simplified Feed objects and added auto-updates to secondary attributes
- Changed the signatures of a few functions, e.g. calculator.append_dist_to_shapes now returns a feed instead of a shapes data frame
- Fixed formatting of properties field in calculator.trip_to_geojson and calculator.route_to_geojson
- Bugfix: Added 'from_stop_id' and 'to_stop_id' to list of string data types in constants.py. Previously, they were sometimes getting interpreted as floats, which stripped leading zeros from the IDs, which then did not match the IDs in the stops data frame
- Added trip ID parameter to calculator.get_stops
- Created calculator.trip_to_geojson
- Added whitespace stripping to cleaner.clean_route_short_names
- Renamed the function calculator.get_feed_intersecting_polygon to calculator.restrict_by_polygon
- Added the function calculator.restrict_by_routes
- Added the function calculator.get_start_and_end_times
- Added the functions calculator.compute_center, calculator. compute_bounds, calculator.route_to_geojson
- Extended the function calculator.get_stops to accept an optional route ID
- Extended the function calculator.build_geometry_by_shape to accept and optional set of shape IDs
- Extended the function calculator.build_geometry_by_stop to accept and optional set of stop IDs
- Improved distance sanity checks in calculator.compute_trip_stats and calculator.append_dist_to_stop_times
- Bugfixed feed.copy so that the dist_units_in of the copy equals dist_units_out of the original
- Added some more distance sanity checks to calculator.compute_trip_stats and calculator.append_dist_to_stop_times
- Improved cleaner.clean_route_short_names
- Removed utilities.clean_series
- Improved cleaner.aggregate_routes
- Removed some unnecessary print statements
- Deleted an extraneous print statement in calculator.create_shapes
- Added utilities.is_not_null
- Changed calculator.shapes_to_geojson to return a dictionary instead of a string
- Upgraded to Pandas 0.18.1 and fixed calculator.downsample accordingly
- Added cleaner.aggregate_routes
- Bugfix: formatted parent_station as a string in constants.DTYPE
- Changed signature and behavior of create_shapes
- Added duplicate route short name count to assess
- Changed the behavior of clean_route_short_names
- Changed INT_COLS to INT_COLUMNS
- Moved some functions
- Added some functions, such as a function to copy feeds
- Added more functions to calculator.py, some of which are optional and depend on GeoPandas
- Documented more
- Made read_gtfs raise a more helpful error when an input path does not exist
- Made Matplotlib import optional
- Updated plotter function chart colors
- Moved the Feed class into a separate file
- Fixed a fatal bug in plot_routes_time_series and renamed it plot_feed_time_series
- Added route_type to trips stats and routes stats
- Added more functions to the cleaner module
- Modularized more
- Refactored the Feed class, exporting most methods to functions
- Changed function names, favoring a compute_ prefix over a get_ prefix for complex functions
- Bug fix: in INT_COLUMNS changed 'dropoff_type' to 'drop_off_type'.
- Changed to return empty data frames instead of None where appropriate
- Added Feed.clean_route_short_names
- Changed the inputs and outputs of get_stops_stats and get_stops_time_series
- Replaced assert statements with exceptions
- Changed name to gtfstk
- Added route_short_name and min_headway to trips stats and routes stats
- Changed the default handling of distance units in Feed
- Assembled feed.py and utils.py into a unified top-level package by tweaking __init__.py
- Renamed get_linestring_by_shape and get_point_by_stop to get_geometry_by_shape and get_geometry_by_stop, respectively
- Added min_transfer_time to INT_COLUMNS
- Fixed get_route_timetable sort order
- Added data frame empty checks to Feed.__init__, because i was getting errors on feeds with empty calendar.txt files
- Removed parent_station from INT_COLUMNS, which should have never been there in the first place
- Now you can specify the output distance units
- Changed most functions to return an empty data frame instead of None
- Fixed export so that integer columns, such as ‘bike_allowed’, that have at least on NaN value no longer get formatted as floats in the output CSVs
- Reduced columns in get_trips_activity
- Added clean_series
- Fixed a bug/typo in the computation of the service_distance and service_duration columns of feed stats
- Fixed a bug in the computation of the peak_start_time and peak_end_time columns of routes stats and feed stats
- Added more columns to get_routes_stats
- Added get_feed_stats and get_feed_time_series and removed the similar agg_routes_stats and agg_routes_time_series
- Removed dump_all_stats, because it wasn’t very useful
- Replaced get_busiest_date_of_first_week with get_busiest_date
- Cleaned code slightly
- Added ‘speed’ column in trips stats
- Added ‘is_loop’ column in trips stats and routes stats
- Added more tests
- Added route and stop timetable methods
- Improved tests slightly
- Tidied code slightly
- Change occurrences of ‘vehicle’ to ‘trips’, because that’s clearer
- Updated some packages
- Changed name to gtfs-tk
- Add get_shapes_geojson
- Renamed get_active_trips and get_active_stops to get_trips and get_stops
- Upgraded to Pandas 0.15.2
- Scooped out main logic from Feed.get_stops_stats and Feed.get_stops_time_series and put it into top level functions for the sake of greater flexibility. Similar to what i did for Feed.get_routes_stats and Feed.get_routes_time_series
- Fixed a bug in computing the last stop of each trip in get_trips_stats
- Improved the accuracy of trip distances in get_trips_stats
- Upgraded to Pandas 0.15.1
- Added fill_nan_route_short_names
- Switched back to version numbering in the style of major.minor.micro, because that seems more useful
- Fixed a bug in Feed.get_routes_stats that modified the input data frame and therefore affected the same data frame outside of the function (dumb Pandas gotcha). Changed it to operate on a copy of the data frame instead.
- Speeded up time series computations by at least a factor of 10
- Switched from representing dates as datetime.date objects to ‘%Y%m%d’ strings (the GTFS way of representing dates), because that’s simpler and faster. Added an export method to feed objects
- Minor tweaks to append_dist_to_stop_times.
- Scooped out main logic from Feed.get_routes_stats and Feed.get_routes_time_series and put it into top level functions for the sake of greater flexibility. I at least need that flexibility to plug into another project.
- Simplified methods to accept a single date instead of a list of dates.
- Whoops, lost track of the changes for this version.
- Changed seconds_to_time to timestr_to_seconds.. Added get_busiest_date_of_first_week.
- Converted headways to minutes
- Added option to change headway start and end time cutoffs in get_stops_stats and get_stations_stats
- Fixed a bug in get_trips_stats that caused a failure when a trip was missing a shape ID
- Switched from major.minor.micro version numbering to major.minor numbering
- Added get_vehicle_locations.
- Added append_dist_to_stop_times and append_dist_to_shapes
- Changed get_xy_by_stop name and output type
- Changed from period indices to timestamp indices for time series, because the latter are better supported in Pandas.
- Upgraded to Pandas 0.14.1.
- Restructured modules
- Created stats and time series aggregating functions
- Added get_dist_from_shapes keyword to get_trips_stats
- Fixed some typos and cleaned up the directory
- Changed get_routes_stats headway calculation
- Fixed inconsistent outputs in time series functions.
- Minor tweak to downsample
- Improved get_trips_stats and cleaned up code
- Changed time series format
- Added documentation
- Upgraded to Python 3.4
- Created utils.py and updated Pandas to 0.14.0
-Minor refactoring and tweaks to packaging
- Minor tweaks to packaging
- Initial version
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|gtfstk-9.2.3-py3-none-any.whl (67.4 kB) Copy SHA256 hash SHA256||Wheel||py3||May 24, 2018|
|gtfstk-9.2.3.tar.gz (59.2 kB) Copy SHA256 hash SHA256||Source||None||May 24, 2018|