Skip to main content

Download, read/parse and import/export OpenStreetMap data extracts

Project description

pydriosm

Author: Qian Fu Twitter URL

PyPI PyPI - Python Version PyPI - License GitHub code size in bytes PyPI - Downloads

This package provides helpful utilities for researchers to easily download and read/parse the OpenStreetMap data extracts (in .pbf and .shp.zip) which are available at the free download servers: Geofabrik and BBBike. In addition, it also provides a convenient way to import/dump the parsed data to, and retrieve it from, a PostgreSQL sever.

Installation

Windows OS users may the "pip install" command in Command Prompt:

pip3 install pydriosm

(If you are using PyCharm, go to "Settings" and find pydriosm in "Project Interpreter"; to install it, select pydriosm and then click "Install Package".)

Note:
  • For Windows users:

    Successful installation of pydriosm (and ensuring its full functionality) requires a few dependencies. On Windows OS, however, pip install may fail to go through the installation of some supporting packages, such as python-Levenshtein, Fiona, GDAL(>=2.4, <3.0) and Shapely. In that case, you might have to resort to installing their .whl files, which can be downloaded from the Unofficial Windows Binaries for Python Extension Packages. Once those packages are all ready, we could go ahead with the pip command.

  • For Linux users:

    To ensure successful installation of pydriosm, make sure you have the official stable UbuntuGIS packages available on your system. GDAL (>=2.4, <3.0). If not, please go through the steps before installing pydriosm:

    1. Remove all other versions of GDAL (if you have already installed any, say ver.2.2.3)

    2. Add the official stable UbuntuGIS packages to your system by running the following command (This provides a stable repository with gdal 2.4.0. You may also want to have a look at this page):

      sudo add-apt-repository ppa:ubuntugis/ppa
      sudo apt-get update
      
    3. Important! Then you should install "gdal-bin (== 2.4.0/2.4.2)":

      sudo apt-get install gdal-bin=2.4.0+dfsg-1~bionic0
      
    4. Then you’ll need to install the GDAL development libraries "libgdal-dev" before installing the GDAL for Python:

      sudo apt-get install libpq-dev
      
    5. After successful completion of the above, you should now be able to install pydriosm.

      sudo python3 –m pip install pydriosm
      

    (See also this link for more info.)

Quick start

Firstly, we import the package:

import pydriosm as dri

The current version of the package deals only with subregion data files provided on the free server. To get a full list of subregion names that are available, you can use

subregion_list = dri.fetch_subregion_info_catalogue("GeoFabrik-subregion-name-list")
print(subregion_list)

Below is an example of using .pbf data of the "Greater London" area to demonstrate briefly some main functions this package can do.

Download data

To download the OSM data for a region (or rather, a subregion) of which the data extract is available, you need to specify the name of the region (e.g. "Greater London"):

subregion_name = 'London'
# or, subregion_name = 'london'; case-insensitive and fuzzy (but not toooo... fuzzy)

Download .pbf data of "Greater London":

dri.download_subregion_osm_file(subregion_name, osm_file_format=".osm.pbf", 
                                download_dir=None, update=False,
                                download_confirmation_required=True, verbose=True)

Note that download_dir is None by default. In that case, a default file path will be created and the downloaded file will be saved there.

Check the default file path and name:

default_fn, default_fp = dri.get_default_path_to_osm_file(subregion_name, 
                                                          osm_file_format=".osm.pbf", 
                                                          mkdir=False, update=False)
print("Default filename: {}".format(default_fn))
print("Default file path: {}".format(default_fp))

However, you may also set download_dir to be any other valid directory, especially when downloading data of multiple subregions. For example,

# Specify the our own data directory
customised_data_dir = "test_data"
# So "test_data" folder will be created in our current working directory

# Alternatively, we could specify a full path 
# import os
# customised_data_dir = os.path.join(os.getcwd(), "test_data")

# Download .pbf data of both 'London' and 'Kent' to the `customised_data_dir`
dri.download_subregion_osm_file('London', 'Kent', osm_file_format=".osm.pbf",
                                download_dir=customised_data_dir, update=False,
                                download_confirmation_required=True, verbose=True)

The .pbf data file will then be saved to the download_dir as specified.

Read/parse data

The package can read/parse the OSM data extracts in both .osm.pbf and .shp.zip (and .shp).

(1) .osm.pbf data

Parsing the .pbf data relies mainly on GDAL/OGR, using read_osm_pbf() function.

greater_london = dri.read_osm_pbf(subregion_name, data_dir=None, parsed=True,
                                  file_size_limit=50, fmt_other_tags=True,
                                  fmt_single_geom=True, fmt_multi_geom=True,
                                  update=False, download_confirmation_required=True,
                                  pickle_it=True, rm_osm_pbf=False, verbose=True)

The parsing process may take a few minutes or even longer if the data file is too large. If the file size is greater than the given file_size_limit (default: 50 MB), the data will be parsed in a chunk-wise manner.

Note that greater_london is a dict with its keys being the name of five different layers: "points", "lines", "multilinestrings", "multipolygons" and "other_relations".

If only the name of a subregion is given, i.e. read_osm_pbf(subregion_name, ...), the function will go to look for the data file from the default file path (i.e. default_fp). Otherwise, the function requires a specific data directory. For example, to read/parse the data in customised_data_dir, i.e. "test_data" folder, you need to set data_dir=customised_data_dir as follows:

greater_london_test = dri.read_osm_pbf(subregion_name, data_dir=customised_data_dir, 
                                       verbose=True)

greater_london and greater_london_test should be the same.

To make life easier, you can simply skip the download step and use read_osm_pbf() directly. That is, if the targeted data is not available, read_osm_pbf() will download the data file first. By default, a confirmation of downloading the data will be prompted, given that download_confirmation_required=True.

Setting pickle_it=True is to save a local copy of the parsed data as a pickle file.

If update=False, when you run read_osm_pbf(subregion_name) again, the function will load the pickle file directly; if update=True, the function will try to download the latest version of the data file and parse it again.

(2) .shp.zip / .shp data

You can read the .shp.zip and .shp file of the above subregion_name (i.e. 'London') by using read_shp_zip(), which relies mainly on GeoPandas:

# We must specify a layer, e.g. 'railways'
layer_name = 'railways'

# Read the .shp.zip file
greater_london_shp = dri.read_shp_zip(subregion_name, layer=layer_name, 
                                      feature=None, data_dir=None, update=False, 
                                      download_confirmation_required=True, 
                                      pickle_it=True, rm_extracts=False, 
                                      rm_shp_zip=True, verbose=True)

Similarly, there is no need to download the .shp.zip file; read_shp_zip() will do it if the file is not available. Setting rm_extracts=True and rm_shp_zip=True can remove both the downloaded .shp.zip file and all extracted files from it.

Note that greater_london_shp and greater_london are different.

To get information about more than one subregion, you can also merge .shp files of specific layers from those subregions. For example, to merge the "railways" layer of two subregions: "Greater London" and "Kent":

subregion_names = ['Greater London', 'Kent']
# layer_name = 'railways'
dri.merge_multi_shp(subregion_names, layer=layer_name, update_shp_zip=False, 
                   download_confirmation_required=True, data_dir=None, verbose=True)

You could also set data_dir=customised_data_dir to save the downloaded .shp.zip files; or data_dir=customised_data_dir to make the merged .shp file available into customised_data_dir.

Import and retrieve data with a PostgreSQL server

The package provides a class, named "OSM", which communicates with PostgreSQL server.

To establish a connection with the server, you need to specify your username (default: 'postgres'), password (default: None), host name (or address; default: 'localhost') and name of the database (default: 'postgres') you intend to connect. For example:

# osmdb = dri.OSM(username='postgres', password=None, host='localhost', port=5432, 
#				  database_name='postgres')
osmdb = dri.OSM()

If password=None, you will then be asked to type it in.

Now you can connect any other database:

osmdb.connect_db(database_name='osm_pbf_data_extracts')

If the database named "osm_pbf_data_extracts" does not exist before the connection is established, it will be created automatically.

(1) Import the data to the database

To import greater_london (i.e. the parsed .pbf data of "London") to the database, "osm_pbf_data_extracts":

osmdb.dump_osm_pbf_data(greater_london, table_name=subregion_name, parsed=True, 
                        if_exists='replace', chunk_size=None,
                        subregion_name_as_table_name=True)

Each element (i.e. layer) of greater_london will be stored in a different schema. Each schema is named as the name of each layer.

(2) Retrieve data from the database

To retrieve the dumped data:

greater_london_retrieval = osmdb.read_osm_pbf_data(table_name=subregion_name, 
                                                   parsed=True, 
                                                   subregion_name_as_table_name=True,
                                                   chunk_size=None, id_sorted=True)

Note that greater_london_retrieval may not be exactly the same as greater_london. This is because the "keys" of the elements in greater_london are in the following order: 'points', 'lines', 'multilinestrings', 'multipolygons' and 'other_relations'; whereas when dumping greater_london to the database, the five different schemas are sorted alphabetically as follows: 'lines', 'multilinestrings', 'multipolygons', 'other_relations', and 'points', and so retrieving data from the server will be in the latter order. Despite that, the data contained in both greater_london and greater_london_retrieval is consistent.

If you need to query data of a specific layer (or several layers), or in a specific order of layers (schemas):

london_points_lines = osmdb.read_osm_pbf_data(subregion_name, 'points', 'lines')

Another example:

london_lines_mul = osmdb.read_osm_pbf_data('london', 'lines', 'multilinestrings')

(3) Import data of all subregions of a given (sub)region to the database

Find all subregions (without smaller subregions) of a (sub)region. Take for example, to find all subregions of 'England':

subregions = dri.retrieve_names_of_subregions_of('England')

Import data of all contained in subregions:

# Note that this example may take quite a long time!!
dri.psql_osm_pbf_data_extracts(*subregions, database_name='osm_pbf_data_extracts', 
                               data_dir=customised_data_dir, update_osm_pbf=False, 
                               if_table_exists='replace', file_size_limit=50,
                               parsed=True, fmt_other_tags=True, 
                               fmt_single_geom=True, fmt_multi_geom=True, 
                               rm_raw_file=True, verbose=True)

To keep all raw .pbf data files in the default data folder, just set rm_raw_file=False and data_dir=None.

If you would like to import all subregion data of "Great Britain", find all subregions of "Great Britain":

gb_subregions = dri.retrieve_names_of_subregions_of('Great Britain')

Instead of returning ['England', 'Scotland', 'Wales'], the list gb_subregions will include "Scotland", "Wales", and all subregions of "England".


Website Website

Data/Map data © Geofabrik GmbH and OpenStreetMap Contributors

All data from the OpenStreetMap is licensed under the OpenStreetMap License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydriosm-1.0.17.tar.gz (33.4 kB view hashes)

Uploaded Source

Built Distribution

pydriosm-1.0.17-py3-none-any.whl (329.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page