Skip to main content

Download, read/parse and import/export OpenStreetMap data extracts

Project description

pydriosm

Author: Qian Fu Twitter URL

PyPI PyPI - Python Version PyPI - License GitHub code size in bytes PyPI - Downloads

This package provides helpful utilities for researchers to easily download and read/parse the OpenStreetMap data extracts (in .pbf and .shp.zip) which are available at the free download servers: Geofabrik and BBBike. In addition, it also provides a convenient way to import/dump the parsed data to, and retrieve it from, a PostgreSQL sever.

Installation

Windows OS users may use the pip install in Command Prompt:

pip3 install pydriosm
NOTE: Installation of pydriosm (and ensuring its full functionality) requires a few dependencies.
  • For Windows users:

    The pip3 method may fail to install some dependencies, such as Fiona, GDAL, Shapely and python-Levenshtein. If errors occur, you should try to pip3 install their .whl files instead, which can be downloaded from the Unofficial Windows Binaries for Python Extension Packages. After you have installed them successfully, try again the above pip3 command.

  • For Linux users:

    If you want to try out any earlier version (<=1.0.17) on Linux, check this link for installation instructions. (However, you are always recommended to use the latest version.)

Quick start

Firstly, import the package:

import pydriosm as dri

The current version of the package works only with subregion data files available on the free server. To get a full list of subregion names that are available, you can run the following line:

subregion_list = dri.fetch_subregion_info_catalogue("GeoFabrik-subregion-name-list")
print(subregion_list)

For a quick start, some examples are provided below, which demonstrate a few core functions of this package.

1. Download data

To download the OSM data for a region (or rather, a subregion) of which the data extract is available, you need to specify the name of the region (e.g. "Greater London"):

subregion_name = 'London'
# or, subregion_name = 'london'; case-insensitive and fuzzy (but not toooo... fuzzy)

Download .pbf data of "Greater London":

dri.download_subregion_osm_file(subregion_name, osm_file_format=".osm.pbf",
                                download_dir=None, update=False,
                                download_confirmation_required=True, deep_retry=False,
                                verbose=True)

Note that download_dir is None by default, in which case a default file path will be created and the downloaded file will be saved there.

Check the default file path and name:

default_fn, default_fp = dri.get_default_path_to_osm_file(subregion_name, 
                                                          osm_file_format=".osm.pbf", 
                                                          mkdir=False, update=False)
print("Default filename: {}".format(default_fn))
print("Default file path: {}".format(default_fp))

However, you may also set download_dir to be any other valid directory, especially when downloading data of multiple subregions. For example,

# Specify the our own data directory
customised_data_dir = "test_data"
# So "test_data" folder will be created in our current working directory

# Alternatively, we could specify a full path 
# import os
# customised_data_dir = os.path.join(os.getcwd(), "test_data")

# Download .pbf data of both 'London' and 'Kent' to the `customised_data_dir`
dri.download_subregion_osm_file('London', 'Kent', osm_file_format=".osm.pbf",
                                download_dir=customised_data_dir, update=False,
                                download_confirmation_required=True, deep_retry=False, 
                                verbose=True)

The .pbf data file will then be saved to the download_dir as specified.

2. Read/parse data

The package can read/parse the OSM data extracts in both .pbf and .shp.zip (and .shp).

2.1 .osm.pbf data

Parsing the .pbf data relies mainly on GDAL/OGR, using read_osm_pbf() function.

greater_london = dri.read_osm_pbf(subregion_name, data_dir=None, parsed=True,
                                  file_size_limit=50, fmt_other_tags=True,
                                  fmt_single_geom=True, fmt_multi_geom=True,
                                  update=False, download_confirmation_required=True,
                                  pickle_it=True, rm_osm_pbf=False, verbose=True)

Note that dri.read_osm_pbf() may take a few minutes or even longer if the data file is too large. If the file size is greater than the given file_size_limit (default: 50 MB), the data will be parsed in a chunk-wise manner.

The returned, greater_london, is in a dict type; its keys are: "points", "lines", "multilinestrings", "multipolygons" and "other_relations", which are also the names of the five different layers.

# Examples:
greater_london['points']  # points
greater_london['lines']  # lines

If only the name of a subregion is given, i.e. greater_london = dri.read_osm_pbf(subregion_name), the function will go to look for the data file from the default file path (i.e. default_fp). Otherwise, the function requires specification of a data directory. For example, to read/parse the data in customised_data_dir, i.e. "test_data" folder, you need to set data_dir=customised_data_dir as follows:

greater_london_test = dri.read_osm_pbf(subregion_name, data_dir=customised_data_dir, 
                                       verbose=True)

In the above, greater_london and greater_london_test should be the same.

To make life easier, you can simply skip the download step and use read_osm_pbf() directly. That is, if the targeted data is not available, read_osm_pbf() will download the data file first. By default, a confirmation of downloading the data will be prompted, given that download_confirmation_required=True.

Setting pickle_it=True is to save a local copy of the parsed data as a pickle file.

If update=False, when you run read_osm_pbf(subregion_name) again, the function will load the pickle file directly; if update=True, the function will try to download the latest version of the data file and parse it again.

2.2 .shp.zip / .shp data

You can read the .shp.zip and .shp file of the above subregion_name (i.e. 'London') by using read_shp_zip(), which relies mainly on GeoPandas:

# We must specify a layer, e.g. 'railways'
layer_name = 'railways'

# Read the .shp.zip file
greater_london_shp = dri.read_shp_zip(subregion_name, layer=layer_name,
                                      feature=None, data_dir=None, update=False,
                                      download_confirmation_required=True,
                                      pickle_it=True, rm_extracts=False,
                                      rm_shp_zip=False, verbose=True)

The parameter feature is related to 'fclass' in greater_london_shp. You may just specify a feature to get a subset of greater_london_shp. For example:

greater_london_shp_rail = dri.read_shp_zip(subregion_name, layer=layer_name, 
                                           feature='rail')
# greater_london_shp_rail.equals(greater_london_shp[greater_london_shp.fclass == 'rail'])
# >>> True

Similarly, there is no need to download the .shp.zip file; read_shp_zip() will do it if the file is not available. Setting rm_extracts=True and rm_shp_zip=True can remove both the downloaded .shp.zip file and all extracted files from it.

Note that greater_london_shp and greater_london are different.

To get data about more than one subregion, you can also merge .shp files of specific layers from those subregions. For example, to merge the "railways" layer of two subregions: "Greater London" and "Kent":

subregion_names = ['London', 'Kent']
# layer_name = 'railways'
dri.merge_multi_shp(subregion_names, layer=layer_name, update_shp_zip=False,
                    download_confirmation_required=True, data_dir=None, 
                    prefix="gis_osm", rm_zip_extracts=False, rm_shp_parts=False, 
                    merged_shp_dir=None, verbose=True)

You could also set data_dir=customised_data_dir to save the downloaded .shp.zip files and make the merged .shp file available into customised_data_dir. Otherwise, when data_dir=None, all files will be found via the default path. Check also:

default_fn_, default_fp_ = dri.get_default_path_to_osm_file(subregion_names[0], 
                                                            osm_file_format=".shp.zip")
print(default_fp_)

3. Import and retrieve data with a PostgreSQL server

The package provides a class, named "OSM", which communicates with PostgreSQL server.

To establish a connection with the server, you need to specify your username (default: 'postgres'), password (default: None), host name (or address; default: 'localhost') and name of the database (default: 'postgres') you intend to connect. For example:

osmdb = dri.OSM(username='postgres', password=None, host='localhost', port=5432, 
                database_name='postgres')
# Or simply, osmdb = dri.OSM()

If password=None, you will then be asked to type in your password.

Now you can connect your database, e.g. "osm_pbf_data_extracts":

osmdb.connect_db(database_name='osm_pbf_data_extracts')

If the database "osm_pbf_data_extracts" does not exist before the connection is established, the method connect_db() will just create it.

3.1 Import the data to the database

To import greater_london (i.e. the parsed .pbf data of "London") to the database, "osm_pbf_data_extracts":

osmdb.dump_osm_pbf_data(greater_london, table_name=subregion_name, parsed=True,
                        if_exists='replace', chunk_size=None,
                        subregion_name_as_table_name=True, verbose=True)

Each element (i.e. layer) of greater_london will be stored in a different schema. Each schema is named as the name of each layer.

3.2 Retrieve data from the database

To retrieve the dumped data:

greater_london_retrieval = osmdb.read_osm_pbf_data(table_name=subregion_name, 
                                                   parsed=True, 
                                                   subregion_name_as_table_name=True,
                                                   chunk_size=None, id_sorted=True)

Note that greater_london_retrieval may not be exactly the same as greater_london. This is because the "keys" of the elements in greater_london are in the following order: 'points', 'lines', 'multilinestrings', 'multipolygons' and 'other_relations'.

However, when dumping greater_london to the database, the five different schemas are sorted alphabetically as follows: 'lines', 'multilinestrings', 'multipolygons', 'other_relations', and 'points', and so retrieving data from the server will be in the latter order. Despite that, the data contained in both greater_london and greater_london_retrieval is consistent. Check:

greater_london['points'].equals(greater_london_retrieval['points'])
# >>> True

If you need to query data of a specific layer (or several layers), or in a specific order of layers (schemas):

london_points_lines = osmdb.read_osm_pbf_data(subregion_name, 'points', 'lines')

Another example:

london_lines_mul = osmdb.read_osm_pbf_data('london', 'lines', 'multilinestrings')

3.3 Import data of all subregions of a given (sub)region to the database

Find all subregions (without sub-subregions) of a (sub)region. For example, to find all subregions of "Central America":

subregions = dri.retrieve_names_of_subregions_of('Central America')

To import the .pbf data of subregions:

# Note that this example may take quite a long time!!
dri.psql_osm_pbf_data_extracts(*subregions, confirmation_required=True,
                               username='postgres', password=None, 
                               host='localhost', port=5432,
                               database_name='osm_pbf_data_extracts',
                               data_dir=customised_data_dir,
                               update_osm_pbf=False, if_table_exists='replace',
                               file_size_limit=50, parsed=True,
                               fmt_other_tags=True, fmt_single_geom=True,
                               fmt_multi_geom=True,
                               pickle_raw_file=False,
                               rm_raw_file=True, verbose=True)

Setting rm_raw_file=False and data_dir=None will keep all the raw .pbf data files in the default data folder.

If you would like to import all subregion data of "Great Britain", try two ways of finding its all subregions:

gb_subregions_shallow = dri.retrieve_names_of_subregions_of('Great Britain', deep=False)
print(gb_subregions_shallow)
gb_subregions_deep = dri.retrieve_names_of_subregions_of('Great Britain', deep=True)
print(gb_subregions_deep)

When deep=False, the result gb_subregions_shallow will only include "England", "Scotland", and "Wales". Note the difference when deep=True, that the list gb_subregions_deep will include "Scotland", "Wales", and all subregions of "England".

Bonus - Pretend you never did the above:

# Drop the database 'osm_pbf_data_extracts'
osmdb.drop()

# Remove all folders created above
import os
from pyhelpers.dir import rm_dir

rm_dir(dri.cd_dat_geofabrik())
rm_dir(dri.regulate_input_data_dir(customised_data_dir))

Website Website

Data/Map data © Geofabrik GmbH and OpenStreetMap Contributors

All data from the OpenStreetMap is licensed under the OpenStreetMap License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydriosm-1.0.18.tar.gz (35.3 kB view hashes)

Uploaded Source

Built Distribution

pydriosm-1.0.18-py3-none-any.whl (194.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page