Skip to main content

Download, read/parse and import/export OpenStreetMap data extracts

Project description

pydriosm

Author: Qian Fu Twitter URL

PyPI PyPI - Python Version PyPI - License GitHub code size in bytes PyPI - Downloads

This package provides helpful utilities for researchers to easily download and read/parse the OpenStreetMap data extracts (in .pbf and .shp.zip) which are available at the free download servers: Geofabrik and BBBike. In addition, it also provides a convenient way to import/dump the parsed data to, and retrieve it from, a PostgreSQL sever.

Installation

Windows OS users may use the pip install in Command Prompt:

pip3 install pydriosm
NOTE: Installation of pydriosm (and ensuring its full functionality) requires a few dependencies.
  • For Windows users:

    The pip3 method may fail to install some dependencies, such as Fiona, GDAL, Shapely and python-Levenshtein. If errors occur, you should try to pip3 install their .whl files instead, which can be downloaded from the Unofficial Windows Binaries for Python Extension Packages. After you have installed them successfully, try again the above pip3 command.

  • For Linux users:

    If you want to try out any earlier version (<=1.0.17) on Linux, check this link for installation instructions. (However, you are always recommended to use the latest version.)

Quick start

Firstly, import the package:

import pydriosm as dri

The current version of the package works only with subregion data files available on the free server. To get a full list of subregion names that are available, you can run the following line:

subregion_list = dri.fetch_subregion_info_catalogue("GeoFabrik-subregion-name-list")
print(subregion_list)

For a quick start, some examples are provided below, which demonstrate a few core functions of this package.

1. Download data

To download the OSM data for a region (or rather, a subregion) of which the data extract is available, you need to specify the name of the region (e.g. "Greater London"):

subregion_name = 'London'
# or, subregion_name = 'london'; case-insensitive and fuzzy (but not toooo... fuzzy)

Download .pbf data of "Greater London":

dri.download_subregion_osm_file(subregion_name, osm_file_format=".osm.pbf",
                                download_dir=None, update=False,
                                download_confirmation_required=True, deep_retry=False,
                                verbose=True)

Note that download_dir is None by default, in which case a default file path will be created and the downloaded file will be saved there.

Check the default file path and name:

default_fn, default_fp = dri.get_default_path_to_osm_file(subregion_name, 
                                                          osm_file_format=".osm.pbf", 
                                                          mkdir=False, update=False)
print("Default filename: {}".format(default_fn))
print("Default file path: {}".format(default_fp))

However, you may also set download_dir to be any other valid directory, especially when downloading data of multiple subregions. For example,

# Specify the our own data directory
customised_data_dir = "test_data"
# So "test_data" folder will be created in our current working directory

# Alternatively, we could specify a full path 
# import os
# customised_data_dir = os.path.join(os.getcwd(), "test_data")

# Download .pbf data of both 'London' and 'Kent' to the `customised_data_dir`
dri.download_subregion_osm_file('London', 'Kent', osm_file_format=".osm.pbf",
                                download_dir=customised_data_dir, update=False,
                                download_confirmation_required=True, deep_retry=False, 
                                verbose=True)

The .pbf data file will then be saved to the download_dir as specified.

2. Read/parse data

The package can read/parse the OSM data extracts in both .pbf and .shp.zip (and .shp).

2.1 .osm.pbf data

Parsing the .pbf data relies mainly on GDAL/OGR, using read_osm_pbf() function.

greater_london = dri.read_osm_pbf(subregion_name, data_dir=None, parsed=True,
                                  file_size_limit=50, fmt_other_tags=True,
                                  fmt_single_geom=True, fmt_multi_geom=True,
                                  update=False, download_confirmation_required=True,
                                  pickle_it=True, rm_osm_pbf=False, verbose=True)

Note that dri.read_osm_pbf() may take a few minutes or even longer if the data file is too large. If the file size is greater than the given file_size_limit (default: 50 MB), the data will be parsed in a chunk-wise manner.

The returned, greater_london, is in a dict type; its keys are: "points", "lines", "multilinestrings", "multipolygons" and "other_relations", which are also the names of the five different layers.

# Examples:
greater_london['points']  # points
greater_london['lines']  # lines

If only the name of a subregion is given, i.e. greater_london = dri.read_osm_pbf(subregion_name), the function will go to look for the data file from the default file path (i.e. default_fp). Otherwise, the function requires specification of a data directory. For example, to read/parse the data in customised_data_dir, i.e. "test_data" folder, you need to set data_dir=customised_data_dir as follows:

greater_london_test = dri.read_osm_pbf(subregion_name, data_dir=customised_data_dir, 
                                       verbose=True)

In the above, greater_london and greater_london_test should be the same.

To make life easier, you can simply skip the download step and use read_osm_pbf() directly. That is, if the targeted data is not available, read_osm_pbf() will download the data file first. By default, a confirmation of downloading the data will be prompted, given that download_confirmation_required=True.

Setting pickle_it=True is to save a local copy of the parsed data as a pickle file.

If update=False, when you run read_osm_pbf(subregion_name) again, the function will load the pickle file directly; if update=True, the function will try to download the latest version of the data file and parse it again.

2.2 .shp.zip / .shp data

You can read the .shp.zip and .shp file of the above subregion_name (i.e. 'London') by using read_shp_zip(), which relies mainly on GeoPandas:

# We must specify a layer, e.g. 'railways'
layer_name = 'railways'

# Read the .shp.zip file
greater_london_shp = dri.read_shp_zip(subregion_name, layer=layer_name,
                                      feature=None, data_dir=None, update=False,
                                      download_confirmation_required=True,
                                      pickle_it=True, rm_extracts=False,
                                      rm_shp_zip=False, verbose=True)

The parameter feature is related to 'fclass' in greater_london_shp. You may just specify a feature to get a subset of greater_london_shp. For example:

greater_london_shp_rail = dri.read_shp_zip(subregion_name, layer=layer_name, 
                                           feature='rail')
# greater_london_shp_rail.equals(greater_london_shp[greater_london_shp.fclass == 'rail'])
# >>> True

Similarly, there is no need to download the .shp.zip file; read_shp_zip() will do it if the file is not available. Setting rm_extracts=True and rm_shp_zip=True can remove both the downloaded .shp.zip file and all extracted files from it.

Note that greater_london_shp and greater_london are different.

To get data about more than one subregion, you can also merge .shp files of specific layers from those subregions. For example, to merge the "railways" layer of two subregions: "Greater London" and "Kent":

subregion_names = ['London', 'Kent']
# layer_name = 'railways'
dri.merge_multi_shp(subregion_names, layer=layer_name, update_shp_zip=False,
                    download_confirmation_required=True, data_dir=None, 
                    prefix="gis_osm", rm_zip_extracts=False, rm_shp_parts=False, 
                    merged_shp_dir=None, verbose=True)

You could also set data_dir=customised_data_dir to save the downloaded .shp.zip files and make the merged .shp file available into customised_data_dir. Otherwise, when data_dir=None, all files will be found via the default path. Check also:

default_fn_, default_fp_ = dri.get_default_path_to_osm_file(subregion_names[0], 
                                                            osm_file_format=".shp.zip")
print(default_fp_)

3. Import and retrieve data with a PostgreSQL server

The package provides a class, named "OSM", which communicates with PostgreSQL server.

To establish a connection with the server, you need to specify your username (default: 'postgres'), password (default: None), host name (or address; default: 'localhost') and name of the database (default: 'postgres') you intend to connect. For example:

osmdb = dri.OSM(username='postgres', password=None, host='localhost', port=5432, 
                database_name='postgres')
# Or simply, osmdb = dri.OSM()

If password=None, you will then be asked to type in your password.

Now you can connect your database, e.g. "osm_pbf_data_extracts":

osmdb.connect_db(database_name='osm_pbf_data_extracts')

If the database "osm_pbf_data_extracts" does not exist before the connection is established, the method connect_db() will just create it.

3.1 Import the data to the database

To import greater_london (i.e. the parsed .pbf data of "London") to the database, "osm_pbf_data_extracts":

osmdb.dump_osm_pbf_data(greater_london, table_name=subregion_name, parsed=True,
                        if_exists='replace', chunk_size=None,
                        subregion_name_as_table_name=True, verbose=True)

Each element (i.e. layer) of greater_london will be stored in a different schema. Each schema is named as the name of each layer.

3.2 Retrieve data from the database

To retrieve the dumped data:

greater_london_retrieval = osmdb.read_osm_pbf_data(table_name=subregion_name, 
                                                   parsed=True, 
                                                   subregion_name_as_table_name=True,
                                                   chunk_size=None, id_sorted=True)

Note that greater_london_retrieval may not be exactly the same as greater_london. This is because the "keys" of the elements in greater_london are in the following order: 'points', 'lines', 'multilinestrings', 'multipolygons' and 'other_relations'.

However, when dumping greater_london to the database, the five different schemas are sorted alphabetically as follows: 'lines', 'multilinestrings', 'multipolygons', 'other_relations', and 'points', and so retrieving data from the server will be in the latter order. Despite that, the data contained in both greater_london and greater_london_retrieval is consistent. Check:

greater_london['points'].equals(greater_london_retrieval['points'])
# >>> True

If you need to query data of a specific layer (or several layers), or in a specific order of layers (schemas):

london_points_lines = osmdb.read_osm_pbf_data(subregion_name, 'points', 'lines')

Another example:

london_lines_mul = osmdb.read_osm_pbf_data('london', 'lines', 'multilinestrings')

3.3 Import data of all subregions of a given (sub)region to the database

Find all subregions (without sub-subregions) of a (sub)region. For example, to find all subregions of "Central America":

subregions = dri.retrieve_names_of_subregions_of('Central America')

To import the .pbf data of subregions:

# Note that this example may take quite a long time!!
dri.psql_osm_pbf_data_extracts(*subregions, confirmation_required=True,
                               username='postgres', password=None, 
                               host='localhost', port=5432,
                               database_name='osm_pbf_data_extracts',
                               data_dir=customised_data_dir,
                               update_osm_pbf=False, if_table_exists='replace',
                               file_size_limit=50, parsed=True,
                               fmt_other_tags=True, fmt_single_geom=True,
                               fmt_multi_geom=True,
                               pickle_raw_file=False,
                               rm_raw_file=True, verbose=True)

Setting rm_raw_file=False and data_dir=None will keep all the raw .pbf data files in the default data folder.

If you would like to import all subregion data of "Great Britain", try two ways of finding its all subregions:

gb_subregions_shallow = dri.retrieve_names_of_subregions_of('Great Britain', deep=False)
print(gb_subregions_shallow)
gb_subregions_deep = dri.retrieve_names_of_subregions_of('Great Britain', deep=True)
print(gb_subregions_deep)

When deep=False, the result gb_subregions_shallow will only include "England", "Scotland", and "Wales". Note the difference when deep=True, that the list gb_subregions_deep will include "Scotland", "Wales", and all subregions of "England".

Bonus - Pretend you never did the above:

# Drop the database 'osm_pbf_data_extracts'
osmdb.drop()

# Remove all folders created above
import os
from pyhelpers.dir import rm_dir

rm_dir(dri.cd_dat_geofabrik())
rm_dir(dri.regulate_input_data_dir(customised_data_dir))

Website Website

Data/Map data © Geofabrik GmbH and OpenStreetMap Contributors

All data from the OpenStreetMap is licensed under the OpenStreetMap License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydriosm-1.0.18.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydriosm-1.0.18-py3-none-any.whl (194.6 kB view details)

Uploaded Python 3

File details

Details for the file pydriosm-1.0.18.tar.gz.

File metadata

  • Download URL: pydriosm-1.0.18.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.1

File hashes

Hashes for pydriosm-1.0.18.tar.gz
Algorithm Hash digest
SHA256 f6d01652a02c8b0ddcb28e17462bfb3d7161e5dd4934d494662d31cc91c0aa00
MD5 5731be55b8cb13493c70cab7a88c2fee
BLAKE2b-256 62838308de43b811ed9294b1f4057f6dfdafc609ab85d658064c444e9794efd7

See more details on using hashes here.

File details

Details for the file pydriosm-1.0.18-py3-none-any.whl.

File metadata

  • Download URL: pydriosm-1.0.18-py3-none-any.whl
  • Upload date:
  • Size: 194.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.1

File hashes

Hashes for pydriosm-1.0.18-py3-none-any.whl
Algorithm Hash digest
SHA256 c7ddb66420013917c92dd73ba2ee5a5d2a68fd2c2e769260c2b353d60bb5570f
MD5 bad67f5d3c60fb6ff3bce1985ecfb6b3
BLAKE2b-256 addaf320d0891d5ac47ef96653b53c9e78137bb384053eb56050b2205a72ab22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page