Skip to main content

Mosaic: geospatial analytics in python, on Spark

Project description

Databricks

mosaic-logo

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.

Mosaic provides:

  • easy conversion between common spatial data encodings (WKT, WKB and GeoJSON);
  • constructors to easily generate new geometries from Spark native data types;
  • many of the OGC SQL standard ST_ functions implemented as Spark Expressions for transforming, aggregating and joining spatial datasets;
  • high performance through implementation of Spark code generation within the core Mosaic functions;
  • optimisations for performing point-in-polygon joins using an approach we co-developed with Ordnance Survey (blog post); and
  • the choice of a Scala, SQL and Python API.

Getting started

Requirements

The only requirement to start using Mosaic is a Databricks cluster running Databricks Runtime 10.0 (or later) with either of the following attached:

  • (for Python API users) the Python .whl file; or
  • (for Scala or SQL users) the Scala JAR.

Both the .whl and JAR can be found in the 'Releases' section of the Mosaic GitHub repository.

Instructions for how to attach libraries to a Databricks cluster can be found here.

Example notebooks

This repository contains several example notebooks in notebooks/examples. You can import them into your Databricks workspace using the instructions here.

Project Support

Please note that all projects in the databrickslabs github space are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-mosaic-0.1.1.tar.gz (12.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_mosaic-0.1.1-py3-none-any.whl (12.2 MB view details)

Uploaded Python 3

File details

Details for the file databricks-mosaic-0.1.1.tar.gz.

File metadata

  • Download URL: databricks-mosaic-0.1.1.tar.gz
  • Upload date:
  • Size: 12.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for databricks-mosaic-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c9818ca36db2234e81edd72ff7534693702dace9cb08da48d04e53fe46597f6a
MD5 2ce3cba9e8a257f83ec134510cf30af8
BLAKE2b-256 88fdaa2f9870e3d08f1281e5b5d1bbb3da4d7154ecf4087a0a37ff2843aa2edb

See more details on using hashes here.

File details

Details for the file databricks_mosaic-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_mosaic-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e04c3352a71f48d6c1f882a4daa102f87c147155fc2a91bd56b579206b3678d3
MD5 adef97ce0edb3cfa50e18d453f4e5a63
BLAKE2b-256 2ab20717f2e0ec0416bbf186d54f17c14e07b86d60155c9dc6d953ac7bc99a4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page