Tools for dealing with geospatial data
Project description
Geowrangler
Tools for wrangling with geospatial data
Overview
Geowrangler is a python package for geodata wrangling. It helps you build data transformation workflows with no out-of-the-box solutions from other geospatial libraries.
We have surveyed our past geospatial projects to extract these solutions for our work and hope it will be useful for others as well.
Our audience are researchers, analysts and engineers delivering geospatial projects.
We welcome your comments, suggestions, bug reports and code contributions to make Geowrangler better.
Modules
- Grid Tile Generation
- Geometry Validation
- Vector Zonal Stats
- Raster Zonal Stats
- Area Zonal Stats
- Distance Zonal Stats
- Demographic and Health Survey (DHS) Processing Utils
- Geofabrik (OSM) Data Download
- Ookla Data Download
Check this page for more details about our Roadmap
Installation
pip install git+https://github.com/thinkingmachines/geowrangler.git
Documentation
The documentation for the package is available here
Development
Development Setup
If you want to learn more about Geowrangler and explore its inner workings, you can setup a local development environment. You can run geowrangler's jupyter notebooks to see how the different modules are built and how they work.
Pre-requisites
-
OS: Linux, MacOS, Windows Subsystem for Linux (WSL) on Windows
-
Requirements:
- python 3.7 or higher
- virtualenv, venv or conda for python environment virtualization
- poetry for dependency management
Github Repo Fork
If you plan to make contributions to geowrangler, we encourage you to create your fork of the Geowrangler repo.
This will then allow you to push commits to your forked repo and then create a Pull Request (PR) from your repo to the main geowrangler repo for approval by geowrangler's maintainers.
Development Installation
We recommend creating a virtual python environment via virtualenv or conda for your geowrangler development environment. Please see the relevant documentation for more details.
The example below uses virtualenv
to create a separate environment on Linux or WSL
using python3.9
.
This next command will install libgeos
( version >=3.8 required for building pygeos/shapely). See libgeos documentation for installation details on other systems.
sudo apt install libgeos-dev # skip this if you already have GEOS
Replace the github url below with git@github.com:<your-github-id>/geowrangler.git
if you created a fork.
git clone https://github.com/thinkingmachines/geowrangler.git
cd geowrangler
virtualenv -p /usr/bin/python3.9 .venv
source .venv/bin/activate
pip install pre-commit poetry==1.2.0b3
pre-commit install
poetry config --local installer.no-binary pygeos,shapely
poetry install
This completes the installation and setup of a local geowrangler environment.
Activating the geowrangler environment
To activate the geowrangler environment, you can cd <your-local-geowrangler-folder>
and run poetry shell
to activate the environment.
Jupyter Notebook Development
The code for the geowrangler python package resides in Jupyter notebooks located in the notebooks
folder.
Using nbdev, we generate the python modules residing in the geowrangler
folder from code cells in jupyter notebooks marked with an #export
comment. A #default_exp <module_name>
comment at the first code cell of each notebook directs nbdev
to put the code in a module named <module_name>
in the geowrangler
folder.
See the nbdev cli documentation for more details on the commands to generate the package as well as the documentation.
Running notebooks
Run the following to view the jupyter notebooks in the notebooks
folder
poetry run jupyter lab
Generating and viewing the documentation site
To generate and view the documentation site on your local machine, the quickest way is to setup Docker. The following assumes that you have setup docker on your system.
poetry run nbdev_build_docs --mk_readme False --force_all True
docker-compose up jekyll
As an alternative if you don't want to use Docker you can install jekyll to view the documentation site locally.
nbdev
converts notebooks within the notebooks/
folder into a jekyll site.
From this jekyll site, you can then create a static site.
To generate the docs, run the following
poetry run nbdev_build_docs -mk_readme False --force_all True
cd docs && bundle i && cd ..
To run the jekyll site, run the following
cd docs
bundle exec jekyll serve
Running tests
We are using pytest
as our test framework. To run all tests and generate a generate a coverage report, run the following.
poetry run pytest --cov --cov-config=.coveragerc -n auto
To run a single test or test file
# for a single test function
poetry run pytest tests/test_grids.py::test_create_grids
# for a single test file
poetry run pytest tests/test_grids.py
Contributing
Please read CONTRIBUTING.md and CODE_OF_CONDUCT.md before anything.
Development Notes
For more details regarding our development standards and processes, please see our wiki.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for geowrangler-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3dfe700bd85aeb54d31f4cfbbaafdddde7d0a8c64798c8f88c660b947f8af809 |
|
MD5 | 70acebd7248930abd3ba38452cf5b282 |
|
BLAKE2b-256 | 10ca0751644f5d3f48f3d09b1516fbf5d246c55fc7fb44a31eb2ba938c590df4 |