Skip to main content

A set of python modules for geospatial machine learning and data mining

Project description

GitHub Checks GitHub Workflow Status - DEV GitHub Workflow Status - PROD pre-commit.ci status CodeFactor Grade Codecov Package version Supported Python versions PyPI - Downloads

Spatial Representations for Artificial Intelligence

⚠️🚧 This library is under HEAVY development. Expect breaking changes between `minor` versions 🚧⚠️

💬 Feel free to open an issue if you find anything confusing or not working 🗨️

Project Spatial Representations for Artificial Intelligence (srai) aims to provide simple and efficient solutions to geospatial problems that are accessible to everybody and reusable in various contexts where geospatial data can be used. It is a Python module integrating many geo-related algorithms in a single package with unified API. Please see getting starded for installation and quick srart instructions.

Use cases

In the current state, srai provides the following functionalities:

  • OSM data download - downloading OpenStreetMap data for a given area using different sources
  • OSM data processing - processing OSM data to extract useful information (e.g. road network, buildings, POIs, etc.)
  • GTFS processing - extracting features from GTFS data
  • Regionization - splitting a given area into smaller regions using different algorithms (e.g. Uber's H3[1], Voronoi, etc.)
  • Embedding - embedding regions into a vector space based on different spatial features, and using different algorithms (eg. hex2vec[2], etc.)
  • Utilities for spatial data visualization and processing

For future releases, we plan to add more functionalities, such as:

  • Pre-computed embeddings - pre-computed embeddings for different regions and different embedding algorithms
  • Full pipelines - full pipelines for different embedding approaches, pre-configured from srai components
  • Image data download and processing - downloading and processing image data (eg. OSM tiles, etc.)

Installation

To install srai simply run:

pip install srai

This will install the srai package and dependencies required by most of the use cases. There are several optional dependencies that can be installed to enable additional functionality. These are listed in the optional dependencies section.

Optional dependencies

The following optional dependencies can be installed to enable additional functionality:

  • srai[all] - all optional dependencies
  • srai[osm] - dependencies required to download OpenStreetMap data
  • srai[voronoi] - dependencies to use Voronoi-based regionization method
  • srai[gtfs] - dependencies to process GTFS data
  • srai[plotting] - dependencies to plot graphs and maps
  • srai[torch] - dependencies to use torch-based embedders

Usage

Downloading OSM data

To download OSM data for a given area, using a set of tags use one of OSMLoader classes:

  • OSMOnlineLoader - downloads data from OpenStreetMap API using osmnx - this is faster for smaller areas or tags counts
  • OSMPbfLoader - loads data from automatically downloaded PBF file from protomaps - this is faster for larger areas or tags counts

Example with OSMOnlineLoader:

from srai.loaders import OSMOnlineLoader
from srai.utils import geocode_to_region_gdf
from srai.plotting import plot_regions

query = {"leisure": "park"}
area = geocode_to_region_gdf("Wrocław, Poland")
loader = OSMOnlineLoader()

parks_gdf = loader.load(area, query)
folium_map = plot_regions(area, colormap=["rgba(0,0,0,0)"], tiles_style="CartoDB positron")
parks_gdf.explore(m=folium_map, color="forestgreen")

Downloading road network

Road network downloading is a special case of OSM data downloading. To download road network for a given area, use OSMWayLoader class:

from srai.loaders import OSMWayLoader
from srai.loaders.osm_way_loader import NetworkType
from srai.utils import geocode_to_region_gdf
from srai.plotting import plot_regions

area = geocode_to_region_gdf("Utrecht, Netherlands")
loader = OSMWayLoader(NetworkType.BIKE)

nodes, edges = loader.load(area)

folium_map = plot_regions(area, colormap=["rgba(0,0,0,0.1)"], tiles_style="CartoDB positron")
edges[["geometry"]].explore(m=folium_map, color="seagreen")

Downloading GTFS data

To extract features from GTFS use GTFSLoader. It will extract trip count and available directions for each stop in 1h time windows.

from pathlib import Path

from srai.loaders import GTFSLoader
from srai.utils import geocode_to_region_gdf, download_file
from srai.plotting import plot_regions

area = geocode_to_region_gdf("Vienna, Austria")
gtfs_file = Path("vienna_gtfs.zip")
download_file("https://transitfeeds.com/p/stadt-wien/888/latest/download", gtfs_file.as_posix())
loader = GTFSLoader()

features = loader.load(gtfs_file)

folium_map = plot_regions(area, colormap=["rgba(0,0,0,0.1)"], tiles_style="CartoDB positron")
features[["trips_at_8", "geometry"]].explore("trips_at_8", m=folium_map)

Regionization

Regionization is a process of dividing a given area into smaller regions. This can be done in a variety of ways:

  • H3Regionizer - regionization using Uber's H3 library
  • S2Regionizer - regionization using Google's S2 library
  • VoronoiRegionizer - regionization using Voronoi diagram
  • AdministativeBoundaryRegionizer - regionization using administrative boundaries

Example:

from srai.regionizers import H3Regionizer
from srai.utils import geocode_to_region_gdf

area = geocode_to_region_gdf("Berlin, Germany")
regionizer = H3Regionizer(resolution=7)

regions = regionizer.transform(area)

folium_map = plot_regions(area, colormap=["rgba(0,0,0,0.1)"], tiles_style="CartoDB positron")
plot_regions(regions_gdf=regions, map=folium_map)

Embedding

Embedding is a process of mapping regions into a vector space. This can be done in a variety of ways:

  • Hex2VecEmbedder - embedding using hex2vec[1] algorithm
  • GTFS2VecEmbedder - embedding using GTFS2Vec[2] algorithm
  • CountEmbedder - embedding based on features counts
  • ContextualCountEmbedder - embedding based on features counts with neighbourhood context (proposed in [3])
  • Highway2VecEmbedder - embedding using Highway2Vec[4] algorithm

All of those methods share the same API. All of them require results from Loader (load features), Regionizer (split area into regions) and Joiner (join features to regions) to work. An example using CountEmbedder:

from srai.embedders import CountEmbedder
from srai.joiners import IntersectionJoiner
from srai.loaders import OSMOnlineLoader
from srai.plotting import plot_regions, plot_numeric_data
from srai.regionizers import H3Regionizer
from srai.utils import geocode_to_region_gdf

loader = OSMOnlineLoader()
regionizer = H3Regionizer(resolution=9)
joiner = IntersectionJoiner()

query = {"amenity": "bicycle_parking"}
area = geocode_to_region_gdf("Malmö, Sweden")
features = loader.load(area, query)
regions = regionizer.transform(area)
joint = joiner.transform(regions, features)

embedder = CountEmbedder()
embeddings = embedder.transform(regions, features, joint)

folium_map = plot_regions(area, colormap=["rgba(0,0,0,0.1)"], tiles_style="CartoDB positron")
plot_numeric_data(regions, embeddings, "amenity_bicycle_parking", map=folium_map)

CountEmbedder is a simple method, which does not require fitting. Other methods, such as Hex2VecEmbedder or GTFS2VecEmbedder require fitting and can be used in a similar way to scikit-learn estimators:

from srai.embedders import Hex2VecEmbedder
from srai.joiners import IntersectionJoiner
from srai.loaders import OSMPbfLoader
from srai.loaders.osm_loaders.filters import HEX2VEC_FILTER
from srai.neighbourhoods.h3_neighbourhood import H3Neighbourhood
from srai.regionizers import H3Regionizer
from srai.utils import geocode_to_region_gdf
from srai.plotting import plot_regions, plot_numeric_data

loader = OSMPbfLoader()
regionizer = H3Regionizer(resolution=11)
joiner = IntersectionJoiner()

area = geocode_to_region_gdf("City of London")
features = loader.load(area, HEX2VEC_FILTER)
regions = regionizer.transform(area)
joint = joiner.transform(regions, features)

embedder = Hex2VecEmbedder()
neighbourhood = H3Neighbourhood(regions_gdf=regions)

embedder = Hex2VecEmbedder([15, 10, 3])

# Option 1: fit and transform
# embedder.fit(regions, features, joint, neighbourhood, batch_size=128)
# embeddings = embedder.transform(regions, features, joint)

# Option 2: fit_transform
embeddings = embedder.fit_transform(regions, features, joint, neighbourhood, batch_size=128)

folium_map = plot_regions(area, colormap=["rgba(0,0,0,0.1)"], tiles_style="CartoDB positron")
plot_numeric_data(regions, embeddings, 0, map=folium_map)

Plotting, utilities and more

We also provide utilities for different spatial operations and plotting functions adopted to data formats used in srai For a full list of available methods, please refer to the documentation.

Contributing

If you are willing to contribute to srai, feel free to do so! Visit our contributing guide for more details.

Publications

Some of the methods implemented in srai have been published in scientific journals and conferences.

  1. Szymon Woźniak and Piotr Szymański. 2021. Hex2vec: Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GEOAI '21). Association for Computing Machinery, New York, NY, USA, 61–71. paper, arXiv
  2. Piotr Gramacki, Szymon Woźniak, and Piotr Szymański. 2021. Gtfs2vec: Learning GTFS Embeddings for comparing Public Transport Offer in Microregions. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Searching and Mining Large Collections of Geospatial Data (GeoSearch'21). Association for Computing Machinery, New York, NY, USA, 5–12. paper, arXiv
  3. Kamil Raczycki and Piotr Szymański. 2021. Transfer learning approach to bicycle-sharing systems' station location planning using OpenStreetMap data. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities (ARIC '21). Association for Computing Machinery, New York, NY, USA, 1–12. paper, arXiv
  4. Kacper Leśniara and Piotr Szymański. 2022. Highway2vec: representing OpenStreetMap microregions with respect to their road network characteristics. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GeoAI '22). Association for Computing Machinery, New York, NY, USA, 18–29. paper

Citation

TBD

License

This library is licensed under the Apache Licence 2.0.

The free OpenStreetMap data, which is used for the development of SRAI, is licensed under the Open Data Commons Open Database License (ODbL) by the OpenStreetMap Foundation (OSMF).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srai-0.1.0.tar.gz (576.1 kB view hashes)

Uploaded Source

Built Distribution

srai-0.1.0-py3-none-any.whl (101.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page