Skip to main content

Mosaic: geospatial analytics in python, on Spark

Project description

Mosaic by Databricks Labs

mosaic-logo

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.

PyPI version PyPI - Downloads codecov build docs Language grade: Python Code style: black

Mosaic provides:

  • easy conversion between common spatial data encodings (WKT, WKB and GeoJSON);
  • constructors to easily generate new geometries from Spark native data types;
  • many of the OGC SQL standard ST_ functions implemented as Spark Expressions for transforming, aggregating and joining spatial datasets;
  • high performance through implementation of Spark code generation within the core Mosaic functions;
  • optimisations for performing point-in-polygon joins using an approach we co-developed with Ordnance Survey (blog post); and
  • the choice of a Scala, SQL and Python API.

mosaic-logo Image1: Mosaic logical design.

Getting started

Required compute environment

The only requirement to start using Mosaic is a Databricks cluster running Databricks Runtime 10.0 (or later).

Package installation

Installation from PyPI

Python users can install the library directly from PyPI using the instructions here or from within a Databricks notebook using the %pip magic command, e.g.

%pip install databricks-mosaic

Installation from release artifacts

Alternatively, you can access the latest release artifacts here and manually attach the appropriate library to your cluster. Which artifact you choose to attach will depend on the language API you intend to use.

  • For Python API users, choose the Python .whl file.
  • For Scala users, take the Scala JAR (packaged with all necessary dependencies).
  • For R users, download the Scala JAR and the R bindings library see the sparkR readme.

Instructions for how to attach libraries to a Databricks cluster can be found here.

Automatic SQL registration

If you would like to use Mosaic's functions in pure SQL (in a SQL notebook, from a business intelligence tool, or via a middleware layer such as Geoserver, perhaps) then you can configure "Automatic SQL Registration" using the instructions here.

Enabling the Mosaic functions

The mechanism for enabling the Mosaic functions varies by language:

Python

from mosaic import enable_mosaic
enable_mosaic(spark, dbutils)

Scala

import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.ESRI

val mosaicContext = MosaicContext.build(H3, ESRI)
import mosaicContext.functions._

R

library(sparkrMosaic)
enableMosaic()

SQL

If you have not employed automatic SQL registration, you will need to register the Mosaic SQL functions in your SparkSession from a Scala notebook cell:

%scala
import com.databricks.labs.mosaic.functions.MosaicContext
import com.databricks.labs.mosaic.H3
import com.databricks.labs.mosaic.ESRI

val mosaicContext = MosaicContext.build(H3, ESRI)
mosaicContext.register(spark)

Ecosystem

Mosaic is intended to augment the existing system and unlock the potential by integrating spark, delta and 3rd party frameworks into the Lakehouse architecture.

mosaic-logo Image2: Mosaic ecosystem - Lakehouse integration.

Example notebooks

This repository contains several example notebooks in notebooks/examples. You can import them into your Databricks workspace using the instructions here.

Project Support

Please note that all projects in the databrickslabs github space are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-mosaic-0.2.1.tar.gz (12.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_mosaic-0.2.1-py3-none-any.whl (12.4 MB view details)

Uploaded Python 3

File details

Details for the file databricks-mosaic-0.2.1.tar.gz.

File metadata

  • Download URL: databricks-mosaic-0.2.1.tar.gz
  • Upload date:
  • Size: 12.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for databricks-mosaic-0.2.1.tar.gz
Algorithm Hash digest
SHA256 5897b3b249abddd2622b64f0d374038043ec1efa17e81625bdfeda5a568594e3
MD5 f0f10d9d82b761b7fd8fc0c52f0766a3
BLAKE2b-256 9980caa8f52c4d53bebc84576012f11fa7e6e349a3f324cea0dbecddbab37e7a

See more details on using hashes here.

File details

Details for the file databricks_mosaic-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_mosaic-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 79beaeb13431de6e7dff615cbc681e3bb7e0ca49c40b99ec022b40ed9de130b8
MD5 ebb74cc233a5340bf9f8f900d812a64e
BLAKE2b-256 fedb57488d0e199c3d44e9457a57bbd560d312018e70abce39c22e1ff9839eb6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page