Read vector files into a Spark DataFrame with geometry encoded as WKB.
Project description
PySpark Vector Files
Read vector files into a Spark DataFrame with geometry encoded as WKB.
Install
Within a Databricks notebook
%pip install git+https://github.com/Defra-Data-Science-Centre-of-Excellence/pyspark_vector_files
From the command line
python -m pip install git+https://github.com/Defra-Data-Science-Centre-of-Excellence/pyspark_vector_files
Local development
To ensure compatibility with Databricks Runtime 9.1 LTS, this package was developed on a Linux machine running the Ubuntu 20.04 LTS
operating system using Python 3.8.8
, GDAL 3.4.0
, and spark 3.1.2
.
Install Python 3.8.8
using pyenv
See the pyenv-installer
's Installation / Update / Uninstallation instructions.
Install Python 3.8.8 globally:
pyenv install 3.8.8
Then install it locally in the repository you're using:
pyenv local 3.8.8
Install GDAL 3.4.0
Add the UbuntuGIS unstable Private Package Archive (PPA) and update your package list:
sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable \
&& sudo apt-get update
Install gdal 3.4.0
, I found I also had to install python3-gdal (even though
I'm going to use poetry to install it in a virtual environment later) to
avoid version conflicts:
sudo apt-get install -y gdal-bin=3.4.0+dfsg-1~focal0 \
libgdal-dev=3.4.0+dfsg-1~focal0 \
python3-gdal=3.4.0+dfsg-1~focal0
Verify the installation:
ogrinfo --version
# GDAL 3.4.0, released 2021/11/04
Install poetry 1.1.13
See poetry's osx / linux / bashonwindows install instructions
Clone this repository
git clone https://github.com/Defra-Data-Science-Centre-of-Excellence/pyspark_vector_files.git
Install dependencies using poetry
poetry install
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyspark_vector_files-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59e73053ec52df77bb0c8faca24089db3b9dae71ac5184f6a17013ec7574971f |
|
MD5 | 04178e9a6ccfc1449c4d292468081a58 |
|
BLAKE2b-256 | 19cce64ca4d19ad954c6f39cf39154eb543720f583aec7d1fd6896eece68dd32 |
Hashes for pyspark_vector_files-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f5e7cb3bc6dacb0ae6a2896df34e58c6e6a9fb934b65125d0a6e2a57bbfa6b1 |
|
MD5 | 5ec4c295ed6a3d2fa4a85adf2df40d36 |
|
BLAKE2b-256 | faa4f11b1e158cbb1ac7acdad8c17615690527abc0af4349c263430e318ff298 |