Skip to main content

GeoPandas objects backed with Dask

Project description

Parallel GeoPandas with Dask

Status

EXPERIMENTAL This project is in an early state. The basic element-wise spatial methods are implemented, but also not yet much more than that.

If you would like to see this project in a more stable state, then you might consider pitching in with developer time (contributions are very welcome!) or with financial support from you or your company.

This is a new project that builds off the exploration done in https://github.com/mrocklin/dask-geopandas

Example

Given a GeoPandas dataframe

import geopandas
df = geopandas.read_file('...')

We can repartition it into a Dask-GeoPandas dataframe:

import dask_geopandas
ddf = dask_geopandas.from_geopandas(df, npartitions=4)

Currently, this repartitions the data naively by rows. In the future, this will also provide spatial partitioning to take advantage of the spatial structure of the GeoDataFrame (but the current version still provides basic multi-core parallelism).

The familiar spatial attributes and methods of GeoPandas are also available and will be computed in parallel:

ddf.geometry.area.compute()
ddf.within(polygon)

Additionally, if you have a distributed dask.dataframe you can pass columns of x-y points to the set_geometry method. Currently, this only supports point data.

import dask.dataframe as dd
import dask_geopandas

ddf = dd.read_csv('...')

ddf = dask_geopandas.from_dask_dataframe(ddf)
ddf = ddf.set_geometry(
    dask_geopandas.points_from_xy(ddf, 'latitude', 'longitude')
)

Writing files (and reading back) is currently supported for the Parquet file format:

ddf.to_parquet("path/to/dir/")
ddf = dask_geopandas.read_parquet("path/to/dir/")

Installation

This package depends on GeoPandas and Dask. In addition, it is recommended to install PyGEOS, to have faster spatial operations and enable multithreading. See https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for details.

One way is to use the conda package manager to create a new environment:

conda create -n geo_env
conda activate geo_env
conda config --env --add channels conda-forge
conda config --env --set channel_priority strict
conda install python=3 geopandas dask pygeos
pip install git+git://github.com/geopandas/dask-geopandas.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask-geopandas-0.1.0a5.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dask_geopandas-0.1.0a5-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file dask-geopandas-0.1.0a5.tar.gz.

File metadata

  • Download URL: dask-geopandas-0.1.0a5.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for dask-geopandas-0.1.0a5.tar.gz
Algorithm Hash digest
SHA256 7bad981ae837a1ceb7ac5ed78f187fa5954d992c75bd75e4b4906117cb97d041
MD5 5819ec136de373fa077e447206202dbd
BLAKE2b-256 ea33ff89331aa90ca3925722f8f705d67b9ebf2bf8f3704dbab2e7d8008c9c81

See more details on using hashes here.

File details

Details for the file dask_geopandas-0.1.0a5-py3-none-any.whl.

File metadata

  • Download URL: dask_geopandas-0.1.0a5-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for dask_geopandas-0.1.0a5-py3-none-any.whl
Algorithm Hash digest
SHA256 9a396fbe43f9312dde8ad09a9ab835b98c68fb7d9e71ae79398a6733af8e8748
MD5 86cf2e5fa0acca65f3681932187be4bc
BLAKE2b-256 6437ff1dcb98db9ada63abdcd3654185fad2716aa21406254f57a50b52c78a38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page