Skip to main content

Library for processing GTFS and Shape files for Itsim.

Project description

Itsim project creation library

How to use

[itsim-project-data] $ pip install itsim_project_creation_library

Then in python:

from itsim_project_creation_library import (
    feed_processing,
    gis_processing,
    log_progress,
)

You can also import aliases:

from itsim_project_creation_library import (
    feedp,
    gisp,
    log_progress,
)

log_progress is just something to print progress on stdout. Same syntax as print.

How to develop on it

pip install -e .

How to create a new ItSim project

This guide covers the steps necessary to create a new project in the ItSim application. This document is intended for developers and people with skills in Python and data processing. Most of the processing operations are done thanks to the Pandas and GeoPandas libraries.

Prerequisites for a new ItSim project

First of all, to create a new project, we will need:

  • Feed input data: Zero, one or several GTFS file(s). In these files we find the description of the transport plan and network.
  • Geographic Information System (GIS) input data: Zero, one or several shapefiles (.shp). These files will contain geographical data.
  • A filled-out project creation form that contains parameters for the project.

Processing feed input data

In this section, we will see how to check feed input data and how to correct the file(s) if required.

Check validity of a GTFS

First, it is necessary to check the validity of the given GTFS with Google's Feed validator (for Windows or Linux). Once downloaded, you can execute it and drag & drop your GTFS file into it. The software will check the compatibility of the file with the GTFS Reference.

After a while, you will get the validity report displayed in your web browser. Then, you will have to check the report and its error and warning messages. Note that errors will most likely cause problems when loading the GTFS into ItSim. That is the reason why the GTFS must be corrected with a processing script and using some of the processing functions of the ItSim project creation library.

Warnings are less likely to cause problems but it is recommended to check each message to ensure that it will not be harmful to the import.

Check the GTFS content

After checking validity of the GTFS file, we would like to have a better idea of its content. We might ask ourselves many questions:

  • How much routes does the network have?
  • Which transport mode(s) does the network have?
  • Do the routes have geometries (shapes), and are they accurate?

To visualize the GTFS in software like QGIS, it should be converted to a GeoJSON file. You can convert your GTFS file with the gtfs_to_geojson script, using either pipenv shell or pipenv run.

Example:

[itsim-project-data] $ gtfs_to_geojson -g path/to/mycity/MyCity_GTFS_2020.zip -j path/to/mycity/test.json

Once it is done, you can import the generated GeoJSON file into QGIS and check its content.

Write a feed processing file

To correct the GTFS file, create a gtfs_generator.py file in a directory that has the name of your project in the itsm-project-data's repository.

Here is an example of how the file should be structured. In this case we are removing shapes of routes and group routes:

from itsim_project_creation_library import feedp, log_progress
from pathlib import Path
from os import environ
from gtfstk import read_gtfs, write_gtfs


if __name__ == "__main__":
    # Initialize paths
    # path of the local project directory
    PROJECT_DATA_PATH = Path(__file__).resolve().parent
    # path of the volume in which the original data is saved
    ITSIM_DATA_PATH = Path(environ.get('ITSIM_DATA_PATH', '/mnt/systra/G'))
    # path to the original data folder
    mycity_path = (
        ITSIM_DATA_PATH
        / "PLT/ZZ_DCME/3_GESTION/7_LE LAB DIGITAL/800_PROJETS/PROJET_20160622_ITSIM/1_TRAVAIL/6_RUN/1_Données d'entrée/ItSim/MyCity"
    )
    # path to the GTFS to use
    mycity_gtfs_path = mycity_path / 'GTFS' / 'MyCity_GTFS_2020.zip'
    # output path
    gtfs_output_path = PROJECT_DATA_PATH / 'mycity-gtfs.zip'
    # Define bbox
    bbox = {
        'north_lat': 48.902192,
        'south_lat': 48.814099,
        'west_lon': 2.250824,
        'east_lon': 2.425232,
    }
    # Processing
    nb_of_steps = 3
    log_progress(f"Step 1/{nb_of_steps}: Read GTFS")
    feed = read_gtfs(mycity_gtfs_path, dist_units='km')
    log_progress(f"Step 2/{nb_of_steps}: Remove shapes")
    new_feed = feedp.remove_shapes(feed=feed)
    log_progress(f"Step 3/{nb_of_steps}: Group routes")
    new_feed = feedp.group_routes(feed=new_feed)
    # You can add as much operations as required and even yours…
    write_gtfs(new_feed, gtfs_output_path)
    log_progress(f"GTFS file {gtfs_output_path} created!")

Operations on the feed

In this section, you will find all the recurring operations you might need if you have to correct your GTFS.

Merge several feeds

ItSim can only load one GTFS when creating a new project, but we might be given several. For instance, train lines and bus lines might be split into two distinct GTFS files, simply because bus and train lines are operated by two different companies.

To import data, we will have to merge these files into one. To do so, we will use the merge_gtfs() function. In the following example, GTFS files contained in the directory provided as an argument will be merged in one single GTFS file:

gtfs_path = "path/to/feed/folder"
feed = merge_gtfs(origin_gtfs_path=gtfs_path)

If two elements have the same ID, a suffix is added. It consists of an underscore with a number. An element with the ID 12 will have the ID 12_1 afterwards if there was already an element with the same ID.

Get a feed sample based on a bounding box

Sometimes, we might want to use a sample of a large feed. For instance, we might only have at our disposal a GTFS that covers a full region or country. However, we might only want to use data that cover a single city of this region.

To get a sample from a large feed, the first thing we will have to do is to define a bounding box. To help you define your bounding box, you can use this website.

Then, use get_feed_sample_from_bbox() function to only keep a sample of the given feed, like in the following example:

gtfs_path = "path/to/feed/folder/feed.zip"
feed = read_gtfs(gtfs_path, dist_units='km')
bbox = {
    'north_lat': 48.902192,
    'south_lat': 48.814099,
    'west_lon': 2.250824,
    'east_lon': 2.425232,
}
feed = feedp.get_feed_sample_from_bbox(feed=feed, bbox=bbox)

At first, this function will detect the stops that are located in the zone defined by the bounding box. Then, it will keep only the routes that have all their stops in the zone.

Filter routes based on transport modes

Sometimes, we would like to work with only one or several specific transport mode(s). For example, we would like to work only on the Bus network, but the original feed also contains Light rail and Rail routes. The function route_types_filter() will help you do that.

In this example, we would like to remove Light rail (0) and Rail (2) routes:

gtfs_path = "path/to/feed/folder/feed.zip"
feed = read_gtfs(gtfs_path, dist_units='km')
route_types = [0, 2]
new_feed = feedp.route_types_filter(feed=feed, route_types=route_types, filter_type='remove'):

Note that the filter_type could be 'keep' or 'remove'.

The transport modes are indicated using the route_type values described in the GTFS Reference.

Filter routes based on short names

remove_routes_from_feed_by_short_names()

feed, routes_to_remove_short_names=['bad route A', 'bad route B']
new_feed = feedp.remove_routes_from_feed_by_short_names(feed, routes_to_remove_short_names)

Regroup routes

In our GTFS, some patterns that describe the same route could be dispatched between several ones. This case is not always obvious to detect but the fact that many routes have the same name in your network could be a clue. You can also check by comparing your network with its map given by the operator. Google is your friend! (or not).

To group the routes by name, we can use the group_routes() function as follow:

gtfs_path = "path/to/feed/folder/feed.zip"
feed = read_gtfs(gtfs_path, dist_units='km')
new_feed = feedp.group_routes(feed=feed)

Remove geometries (shapes)

Sometimes, shapes that are given in our GTFS might be only made of straight lines or be completely inaccurate. In this case, we most likely want to regenerate these shapes when the project is imported in ItSim. To enable ItSim to generate shapes for lines that will follow the road network (buses…), we will have to remove all the existing shapes.

To do so, we will use the remove_shapes() function. In this example, the geometries will be removed from the given feed.

gtfs_path = "path/to/feed/folder/feed.zip"
feed = read_gtfs(gtfs_path, dist_units='km')
new_feed = feedp.remove_shapes(feed=feed)

Processing GIS data

There are several GIS data types:

  • surfaces_density: for zonal data expressed in densities
  • dots_value: represents dots on the map with an absolute value
  • lines_and_dots: to use for layers that are not densities or values

Check content of GIS data

It is possible to easily check the content of a shapefile by importing it directly into QGIS. In QGIS, you could check the projection in which the layer is expressed by having a look at its properties.

Also take a look at the attribute table to have a better idea of how the data are expressed.

Here is a list of some items to check before processing shapefiles:

  • Which type of shapefile (zonal, dots…)?
  • Which columns are useful? Which are not?
  • How are values expressed (absolute, density)?
  • Do values seem coherent? (maximum, minimum, spread)
  • Are there missing values?
  • Which properties are already included in the file? Which have to be set?

Operations on GIS data

In this section, you will find all the recurring operations you might need in case you have to correct your GIS data.

Reproject data

ItSim only manages layers expressed in the WGS84 projection (EPSG:4326), which is the standard projection for GPS systems. However, many of the layers we have at our disposal might be expressed in the projection used by the agency that created the data. As an example, the Lambert 93 projection (EPSG:2154) is the projection used for most of the layers in metropolitan France. This is due to the fact that this projection gives accurate areas for this latitude.

To reproject data, you can use the reproject_layer_in_WGS84() function which will reproject the data in WGS84 like in our example:

gis_data_path = "path/to/gis_files/pop.shp"
data = read_file(gis_data_path)
new_data = gisp.reproject_layer_in_WGS84(data)

Note that you should not use the WGS84 projection to calculate areas. It is advised to project in WGS84 at the end of your process.

Calculate area

In case you need to process a zonal layer but the areas of the zones are missing, it is possible to generate them by using the calculate_area() function.

gis_data_path = "path/to/gis_files/pop.shp"
data = read_file(gis_data_path)
new_data = gisp.calculate_area(data)

The data DataFrame will contain an area column with area expressed in km². Note that if the area_unit_in_km2 option is False, area will be expressed in m².

ItSim expects density expressed in /km², so be careful if you use areas expressed in .

The result DataFrame will be projected in Gall-Peters projection, this corresponds to SR-ORG:22 projection.

Calculate densities

Data in zonal shapefiles could be expressed in absolute values but ItSim only manages data expressed in densities. In this case, it will be necessary to convert data into densities values by using the calculate_density() function.

gis_data_path = "path/to/gis_files/pop.shp"
data = read_file(gis_data_path)
new_data = gisp.calculate_density(
    data=data,
    field_from='pop',
    field_to='pop_dens',
    area_field='area',
)

The new_data DataFrame will contain a pop_dens column expressed in [unit]/km². The area column is used for the input area and should be expressed in km² unless area_field_unit_in_km2 option is False, then it is expected to be in m².

If no area_field is indicated, the area is calculated in this function and the area_field_unit_in_km2 parameter is ignored.

The result DataFrame will be projected in Gall-Peters projection, this corresponds to SR-ORG:22 projection.

Bounding box and coordinates

You can easily generate a bounding box thanks to the create_bbox() function by only giving the coordinates of the zone to define.

Example:

min_lat = 48.814099
max_lat = 48.902192
min_lon = 2.250824
max_lon = 2.425232
bbox = gisp.create_bbox(min_lat=min_lat, max_lat=max_lat, min_lon=min_lon, max_lon=max_lon)

Otherwise, you can retrieve coordinates from a bounding box with the get_coordinates_from_bbox() function.

Example:

bbox = {
    'north_lat': 48.902192,
    'south_lat': 48.814099,
    'west_lon': 2.250824,
    'east_lon': 2.425232,
}
(min_lat, max_lat, min_lon, max_lon) = gisp.get_coordinates_from_bbox(bbox=bbox)

Merge layers

In case we have several data sources at our disposal, we can merge several shapefiles into one by using the merge_and_clean_layers() function. However, be careful, this function can only be used with layers of the same structure (same type, same columns).

gis_data_path = "path/to/gis_files"
raw_data_paths = [
    gis_data_path + "/pop_76.shp",
    gis_data_path + "/pop_27.shp"
]
columns_to_keep = ['pop_dens', 'superficie']
population_data = gisp.merge_and_clean_layers(raw_data_paths, *columns_to_keep)

Note that even if we only keep columns that are indicated in the columns_to_keep list, the function will also keep the geometry column.

Generate a layer from carroyage

If you are looking for population data on metropolitan France and you do not have any other sources, you can use the INSEE's "carroyage": a georeferenced grid with population density data. To generate a population shapefile from the carroyage, you can use the create_carroyage() function.

# Define paths for carroyage
carroyage_root = (
    ITSIM_DATA_PATH
    / "PLT/ZZ_DCME/3_GESTION/7_LE LAB DIGITAL/800_PROJETS/PROJET_20160622_ITSIM"
    / "1_TRAVAIL/2_DEVELOPPEMENT/1.1_Données SYSTRA/France"
)
carroyage_path = carroyage_root / "carroyage_insee_200m_2010" / "200m-carreaux-metropole" / "car_m.mif"
population_path = carroyage_root / "carroyage_insee_200m_2010" / "200m-carreaux-metropole" / "car_m.dbf"
# Define output path
population_output_path = Path("/path/to/mycity/population.shp")
bbox = {
    'north_lat': 48.902192,
    'south_lat': 48.814099,
    'west_lon': 2.250824,
    'east_lon': 2.425232,
}
pop_data = gisp.create_carroyage(
    carroyage_path=carroyage_path,
    data_path=population_path,
    output_path=population_output_path,
    bbox=bbox,
)

It is also possible to give coordinates as parameters instead of bbox:

# Define paths for carroyage
carroyage_root = (
    ITSIM_DATA_PATH
    / "PLT/ZZ_DCME/3_GESTION/7_LE LAB DIGITAL/800_PROJETS/PROJET_20160622_ITSIM"
    / "1_TRAVAIL/2_DEVELOPPEMENT/1.1_Données SYSTRA/France"
)
carroyage_path = carroyage_root / "carroyage_insee_200m_2010" / "200m-carreaux-metropole" / "car_m.mif"
population_path = carroyage_root / "carroyage_insee_200m_2010" / "200m-carreaux-metropole" / "car_m.dbf"
# Define output path
population_output_path = Path("/path/to/mycity/population.shp")
min_lat = 48.814099
max_lat = 48.902192
min_lon = 2.250824
max_lon = 2.425232
pop_data = gisp.create_carroyage(
    carroyage_path=carroyage_path,
    data_path=population_path,
    output_path=population_output_path,
    min_lat=min_lat,
    max_lat=max_lat,
    min_lon=min_lon,
    max_lon=max_lon
)

Customize GIS layers aspect

It is possible to customize parameters to changes the look of line_and_dots GIS layers in ItSim. There are functions for each type of layer (dots, lines, shapes). When setting properties, you can choose to use already existing values that are in one of the layer's columns or a single value that will be used for every entry.

Dots

Dots layers are mostly used to represent interest points, stations, buildings… You can use use the set_dot_properties() function to customize the following parameters:

  • Title (text that will be displayed into a popup)
  • Color
  • Radius
  • Opacity

In the following example, I already have the information for names, so I will give the name of the column to use as a data source. Meanwhile, I do not have any information about the colors, so I would like to set a color for every entry. in this case, all I need to do is to create a callable (a lambda function below) that will be applied to every entry. I also would like to set a specific value for the radius and for the opacity, so I will just provide the value to apply to every entry:

data = read_file('path/to/my/file')
dot_title_field = 'NAME'
dot_color_function = lambda l: '#A15C33'

data_with_props = gisp.set_dots_properties(
    data=data,
    title=dot_title_field,
    color=dot_color_function,
    radius=5,
    opacity=0.8,
):
Lines

Line layers are mostly used to represent an uneditable transport network. For example, it could be used to represent a railway network that interacts with the bus network we are editing. It could also be used

You can use the set_lines_properties() function to customize the following parameters:

  • Title
  • Color
  • Route type (the route type will influence the lines' width as they are displayed on the map; use it only if it represents a transport network)
  • Width
  • Opacity

In the following example, I would like to compute the title based on two existing columns in my dataframe. To do so, I will give a lambda that will return a string from the two other existing columns. In the meantime, I would like to set a color for the lines to display but I do not have any information about the colors in my dataframe. This is why I will set a color based on random color for the route_color parameter.

from random import randint

data = read_file('path/to/my/file')
line_title_field = lambda l: l['name'] + ' - ' + l['neighbourhood']
colors = ['#169f5c', '#345b64', '#543c10', '#8eea21', '#116f95', '#cf26a9', '#a1d188', '#c0c577', '#daef5d', '#50c04e', '#16ce77']
nb_colors = len(colors)
line_color_function = lambda l: colors[randint(1, nb_colors) -1]

data_with_props = gisp.set_lines_properties(
    data=data,
    title=line_title_field,
    color=line_color_function,
    opacity=0.8,
    route_type=3,
)

Note that if the width parameter is provided, it will override the route_type's computed width.

Shapes

Shapes layers are used to represents zones on the map. They could be used to represent buildings, project areas, facilities…

You can use the set_shapes_properties() function to customize the following parameters:

  • Title
  • Fill color
  • Stroke color
  • Opacity

In the following example, we will retrieve the name from the already existing data, set colors for fill and stroke of the shape with callables (lambda functions) and a single value for opacity.

data = read_file('path/to/my/file')
shape_title_field = 'NAME'
shape_fill_color_function = lambda x: '#2A7AE0'
shape_fill_color_function = lambda x: '#1959A0'

data_with_props = gisp.set_shapes_properties(
    data=data,
    title=shape_title_field,
    fill_color=shape_fill_color_function,
    stroke_color=shape_stroke_color_function,
    opacity=0.5,
)

Property fields

In case you need to get the names of the properties' fields in order to clean a layer's columns, you can use the following functions to get them:

  • get_dot_properties_names()
  • get_line_properties_names()
  • get_shape_properties_names()

Write a project creation script

To create a project, ItSim's back-end will need:

  • A project description JSON file (mandatory)
  • A valid GTFS file (optional)
  • One or several shapefile(s) (optional)

In order to generate all the required files in a single command, we will write a shell script at "itsim-project-data" repository's root.

In this example, we will create a project "MyCity" based on a GTFS and on a single zonal population shapefile. The first thing is to write a create_mycity.sh script calling data processing scripts in case files are missing:

#!/bin/sh
set -e
PROJECTS_DATA_PATH="$(dirname $(readlink -f "$0"))"

# Creating GTFS archive if needed
if [ ! -f "$PROJECTS_DATA_PATH/mycity/mycity-gtfs.zip" ]; then
    echo "Generate missing GTFS file..."
    pipenv run python $PROJECTS_DATA_PATH/mycity/gtfs_generator.py
fi

# Creating shapefiles if needed
NEW_SHP_GENERATED=false
if [ ! -f "$PROJECTS_DATA_PATH/mycity/mycity_population_2020.shp" ]; then
    echo "Generate missing shapefiles..."
    python $PROJECTS_DATA_PATH/mycity/data_processing.py
    NEW_SHP_GENERATED=true
fi

Then, zonal layers should be simplified if needed. This step reduces shapefile size and geometry complexity:

# Simplifying shapefiles if needed (updated shapefiles or missing simplified versions)
if [ ! -f "$PROJECTS_DATA_PATH/mycity/mycity_population_2020.shp" ] || $NEW_SHP_GENERATED; then
    echo "Simplifying shapes..."
    simplify_shapefile "$PROJECTS_DATA_PATH/mycity/mycity_population_2020.shp" pop_dens 30 0.9 "$PROJECTS_DATA_PATH/mycity/mycity_population_2020_simplified.shp"
fi

Afterwards, we will determine the legend of density layers. This is an operation to carry out by hand on simplified layers. It is important to check the accuracy of results given by the ./scripts/shp2geojson script and round up them if needed:

# Generating legends from simplified layers

### Color legends scales
# Population colors: FEE5D9-A50F15 (red)

# shp2geojson -i $PROJECTS_DATA_PATH/mycity/mycity_population_2020_simplified.shp -f pop_dens -c FEE5D9 -C A50F15
# => [Enter the results given by the script]
# rounded => [Enter the legend to use]

Finally, we will define our project's parameters by generating a project JSON file. This JSON file will contain a full description of the project to create with all parameters, layers to use, scenarios to create, scenario parameters…

Instead of creating this file by hand, we will use the create_json_project script with options as follows:

echo Creating json project
create_json_project \
    -o "systra" \
    -n "MyCity" \
    -g "$PROJECTS_DATA_PATH/mycity/mycity-gtfs.zip" \
    -w DATA \
    --ref-name "Base System" \
    -s "Scénario 1" all False \
    -l surfaces_density \
        systrasaas.mycity_pop_2018 \
        mycity_pop_2018 \
        pop pop pop pop pop \
        "$PROJECTS_DATA_PATH/mycity/mycity_population_2020_simplified.shp" \
        permis_den \
        permisDens \
        5,EFF3FF,20,B5CAE6,100,7BA2CD,500,4179B4,1700,08519C \
    --param-buffer \
        2 1000 \
        3 500 \
        300 \
    --param-typical-days \
        "JOB" 20200121 \
        "SAT" 20200125 \
        "SUN" 20200119 \
    --param-time-types \
        "Morning rush hour" 070000 095959 \
        "Midday" 100000 155959 \
        "Evening rush hour" 160000 192959 \
    --param-center "6.342383" "46.075223" \
    --param-distance "m" \
    --param-currency-symbol "€"

You can find additional information on how to use this script by using the -h option.

All output files needed by ItSim's backend will be output in the DATA directory.

Import a project into ItSim

Congratulations! Now, that you have a valid GTFS, shapefiles and a project description JSON, you need to copy them in the sws_itsim backend directory:

[itsim-project-data] $ cp -r DATA/* ../sws_itsim/

You are finally ready to import a project into ItSim!

Go to your sws_itsim directory and activate your Python virtual environment by executing the command pipenv shell. Then, check that all the required files exist.

We can now use the script ./bootstrap_db/launcher to import the project. We will cover the essential options you will need below. In case you need more information on all the options and how to use the import script, it is strongly advised to use the -h option. Example: ./bootstrap_db/launcher -h

First of all, we will have to know on which platform we will deploy the project and have the rights to do it. Indicate the backend's URL in the -u option and the access token with the -t option. To use your token easily, you can store its value into an environment variable: export $ITSIM_TOKEN=[your token]. Note that your token must be a super-admin's token because only a super-admin has the right to create a project.

In our case, we have a layer to display on the map, so we would like to use Mapbox. To be able to connect to the service we also need a token. When you have the mapbox access token, you can store it into an environment variable as we did for ItSim's token: export $MAPBOX_TOKEN=[your mapbox token]

Finally, we will have to indicate the project and the GTFS file's path with the -P option as well as the organization with the -o option.

For our example, we will import the project on our local backend (on 5001 port):

[sws_itsim] $ ./bootstrap_db/launcher -t "$ITSIM_TOKEN" -T "$MAPBOX_TOKEN" -u http://localhost:5001 -o systra -P mycity gtfs/mycity-gtfs.zip

Note that you can check that your project description JSON is valid by adding the -n option at the end of the command. It will not do anything else and it is advised before trying to import a project for the first time.

When it is all right, you can remove the -n option and execute the command. Then, confirm the operation by entering y or yes when asked.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itsim_project_creation_library-0.3.0.tar.gz (33.7 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page