Skip to main content

Utilities for applying scikit-learn to spatial datasets

Project description


## Python module for geospatial prediction using scikit-learn and rasterio

`pyimpute` provides high-level python functions for bridging the gap between spatial data formats and machine learning software to facilitate supervised classification and regression on geospatial data. This allows you to create landscape-scale predictions based on sparse observations.

The observations, known as the **training data**, consists of:

* response variables: what we are trying to predict
* explanatory variables: variables which explain the spatial patterns of responses

The **target data** consists of explanatory variables represented by raster datasets. There are no response variables available for the target data; the goal is to *predict* a raster surface of responses. The responses can either be discrete (classification) or continuous (regression).


## Pyimpute Functions

* `load_training_vector`: Load training data where responses are vector data (explanatory variables are always raster)
* `load_training_raster`: Load training data where responses are raster data
* `stratified_sample_raster`: Random sampling of raster cells based on discrete classes
* `evaluate_clf`: Performs cross-validation and prints metrics to help tune your scikit-learn classifiers.
* `load_targets`: Loads target raster data into data structures required by scikit-learn
* `impute`: takes target data and your scikit-learn classifier and makes predictions, outputing GeoTiffs

These functions don't really provide any ground-breaking new functionality, they merely saves lots of tedious data wrangling that would otherwise bog your analysis down in low-level details. In other words, `pyimpute` provides a high-level python workflow for spatial prediction, making it easier to:

* explore new variables more easily
* frequently update predictions with new information (e.g. new Landsat imagery as it becomes available)
* bring the technique to other disciplines and geographies

### Basic example

Here's what a `pyimpute` workflow might look like. In this example, we have two explanatory variables as rasters (temperature and precipitation) and a geojson with point observations of habitat suitability for a plant species. Our goal is to predict habitat suitability across the entire region based only on the explanatory variables.

from pyimpute import load_training_vector, load_targets, impute, evaluate_clf
from sklearn.ensemble import RandomForestClassifier

Load some training data
explanatory_rasters = ['temperature.tif', 'precipitation.tif']
response_data = 'point_observations.geojson'

train_xs, train_y = load_training_vector(response_data,

Train a scikit-learn classifier
clf = RandomForestClassifier(n_estimators=10, n_jobs=1), train_y)

Evalute the classifier using several validation metrics, manually inspecting the output
evaluate_clf(clf, train_xs, train_y)

Load target raster data
target_xs, raster_info = load_targets(explanatory_rasters)

Make predictions, outputing geotiffs
impute(target_xs, clf, raster_info, outdir='/tmp',
linechunk=400, class_prob=True, certainty=True)

assert os.path.exists("/tmp/responses.tif")
assert os.path.exists("/tmp/certainty.tif")
assert os.path.exists("/tmp/probability_0.tif")
assert os.path.exists("/tmp/probability_1.tif")

### Installation

Assuming you have `libgdal` and the scipy system dependencies installed, you can install with pip

pip install pyimpute

Alternatively, install from the source code
git clone
cd pyimpute
pip install -e .

See the `.travis.yml` file for a working example on Ubuntu systems.

### Other resources

For an overview, watch my presentation at FOSS4G 2014: <a href="">Spatial-Temporal Prediction of Climate Change Impacts using pyimpute, scikit-learn and GDAL — Matthew Perry</a>

Also, check out [the examples]( and [the wiki](

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions (13.5 kB view hashes)

Uploaded source

pyimpute-0.1.1.tar.gz (7.7 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page