Skip to main content

Read vector files into a Spark DataFrame with geometry encoded as WKB.

Project description

CI

PySpark Vector Files

Read vector files into a Spark DataFrame with geometry encoded as WKB.

Install

Within a Databricks notebook

%pip install git+https://github.com/Defra-Data-Science-Centre-of-Excellence/pyspark_vector_files

From the command line

python -m pip install git+https://github.com/Defra-Data-Science-Centre-of-Excellence/pyspark_vector_files

Local development

To ensure compatibility with Databricks Runtime 9.1 LTS, this package was developed on a Linux machine running the Ubuntu 20.04 LTS operating system using Python 3.8.8, GDAL 3.4.0, and spark 3.1.2.

Install Python 3.8.8 using pyenv

See the pyenv-installer's Installation / Update / Uninstallation instructions.

Install Python 3.8.8 globally:

pyenv install 3.8.8

Then install it locally in the repository you're using:

pyenv local 3.8.8

Install GDAL 3.4.0

Add the UbuntuGIS unstable Private Package Archive (PPA) and update your package list:

sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable \
    && sudo apt-get update

Install gdal 3.4.0, I found I also had to install python3-gdal (even though I'm going to use poetry to install it in a virtual environment later) to avoid version conflicts:

sudo apt-get install -y gdal-bin=3.4.0+dfsg-1~focal0 \
    libgdal-dev=3.4.0+dfsg-1~focal0 \
    python3-gdal=3.4.0+dfsg-1~focal0

Verify the installation:

ogrinfo --version
# GDAL 3.4.0, released 2021/11/04

Install poetry 1.1.13

See poetry's osx / linux / bashonwindows install instructions

Clone this repository

git clone https://github.com/Defra-Data-Science-Centre-of-Excellence/pyspark_vector_files.git

Install dependencies using poetry

poetry install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_vector_files-0.1.0.tar.gz (10.9 kB view hashes)

Uploaded Source

Built Distribution

pyspark_vector_files-0.1.0-py3-none-any.whl (11.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page