Skip to main content

High performance rasterization tool for Python built in Rust

Project description

rusterize

High performance rasterization tool for Python built in Rust. This repository stems from the fasterize package built in C++ for R and ports parts of the logics into Python with a Rust backend, in addition to some useful improvements (see API).

rusterize is designed to work on (multi)polygons and (multi)linestrings, even when they are nested inside complex geometry collections. Functionally, it takes an input geopandas dataframe and returns a xarray, a numpy, or a sparse array in COOrdinate format.

Installation

rusterize is distributed in two flavors. A core library that performs the rasterization and returns a bare numpy array, or a xarray-flavored version that returns a georeferenced xarray. This latter requires xarray and rioxarray to be installed. This is the recommended flavor.

Install the current version with pip:

# Core library
pip install rusterize

# With xarray capabilities
pip install 'rusterize[xarray]'

Contributing

Any contribution is welcome! You can install rusterize directly from this repo using maturin as an editable package. For this to work, you’ll need to have Rust and cargo installed.

# Clone repo
git clone https://github.com/<username>/rusterize.git
cd rusterize

# Install the Rust nightly toolchain
rustup toolchain install nightly-2025-07-31

 # Install maturin
pip install maturin

# Install editable version with optmized code
maturin develop --profile dist-release

API

This package has a simple API:

from rusterize import rusterize

# gdf = <geodataframe>

# rusterize
rusterize(
    gdf,
    like=None,
    res=(30, 30),
    out_shape=(10, 10),
    extent=(0, 10, 10, 20),
    field="field",
    by="by",
    burn=None,
    fun="sum",
    background=0,
    encoding="xarray",
    dtype="uint8"
)
  • gdf: geopandas dataframe to rasterize
  • like: xr.DataArray to use as template for res, out_shape, and extent. Mutually exclusive with these parameters (default: None)
  • res: (xres, yres) for desired resolution (default: None)
  • out_shape: (nrows, ncols) for desired output shape (default: None)
  • extent: (xmin, ymin, xmax, ymax) for desired output extent (default: None)
  • field: column to rasterize. Mutually exclusive with burn (default: None -> a value of 1 is rasterized)
  • by: column for grouping. Assign each group to a band in the stack. Values are taken from field if specified, else burn is rasterized (default: None -> singleband raster)
  • burn: a single value to burn. Mutually exclusive with field (default: None). If no field is found in gdf or if field is None, then burn=1
  • fun: pixel function to use when multiple values overlap. Available options are sum, first, last, min, max, count, or any (default: last)
  • background: background value in final raster (default: np.nan). A None value corresponds to the default of the specified dtype. An illegal value for a dtype will be replaced with the default of that dtype. For example, a background=np.nan for dtype="uint8" will become background=0, where 0 is the default for uint8.
  • encoding: defines the output format of the rasterization. This is either a dense xarray/numpy representing the burned rasterized geometries, or a sparse array in COOrdinate format good for sparse observations and low memory consumption. Available options are xarray, numpy, sparse (default: xarray -> will trigger an error if xarray and rioxarray are not found).
  • dtype: dtype of the final raster. Available options are uint8, uint16, uint32, uint64, int8, int16, int32, int64, float32, float64 (default: float64)

Note that control over the desired extent is not as strict as for resolution and shape. That is, when resolution, output shape, and extent are specified, priority is given to resolution and shape. So, extent is not guaranteed, but resolution and shape are. If extent is not given, it is taken from the polygons and is not modified, unless you specify a resolution value. If you only specify an output shape, the extent is maintained. This mimics the logics of gdal_rasterize.

Encoding

Version 0.5.0 introduced a new encoding parameter to control the output format of the rasterization. This means that you can return a xarray/numpy with the burned rasterized geometries, or a new SparseArray structure. This SparseArray structure stores the band/row/column triplets of where the geometries should be burned onto the final raster, as well as their corresponding values before applying any pixel function. This can be used as an intermediate output to avoid allocating memory before materializing the final raster, or as a final product. SparseArray has three convenience functions: to_xarray(), to_numpy(), and to_frame(). The first two return the final xarray/numpy, the last returns a polars dataframe with only the coordinates and values of the rasterized geometries. Note that SparseArray avoids allocating memory for the array during rasterization until when it's actually needed (e.g. calling to_xarray()). See below for an example.

Usage

rusterize consists of a single function rusterize().

from rusterize import rusterize
import geopandas as gpd
from shapely import wkt
import matplotlib.pyplot as plt

# Construct geometries
geoms = [
    "POLYGON ((-180 -20, -140 55, 10 0, -140 -60, -180 -20), (-150 -20, -100 -10, -110 20, -150 -20))",
    "POLYGON ((-10 0, 140 60, 160 0, 140 -55, -10 0))",
    "POLYGON ((-125 0, 0 60, 40 5, 15 -45, -125 0))",
    "MULTILINESTRING ((-180 -70, -140 -50), (-140 -50, -100 -70), (-100 -70, -60 -50), (-60 -50, -20 -70), (-20 -70, 20 -50), (20 -50, 60 -70), (60 -70, 100 -50), (100 -50, 140 -70), (140 -70, 180 -50))",
    "GEOMETRYCOLLECTION (POINT (50 -40), POLYGON ((75 -40, 75 -30, 100 -30, 100 -40, 75 -40)), LINESTRING (60 -40, 80 0), GEOMETRYCOLLECTION (POLYGON ((100 20, 100 30, 110 30, 110 20, 100 20))))"
]

# Convert WKT strings to Shapely geometries
geometries = [wkt.loads(geom) for geom in geoms]

# Create a GeoDataFrame
gdf = gpd.GeoDataFrame({'value': range(1, len(geoms) + 1)}, geometry=geometries, crs='EPSG:32619')

# rusterize to "xarray" -> return a xarray with the burned geometries and spatial reference (default)
# will raise a ModuleNotFoundError if xarray and rioxarray are not found
output = rusterize(
    gdf,
    res=(1, 1),
    field="value",
    fun="sum",
).squeeze()

# plot it
fig, ax = plt.subplots(figsize=(12, 6))
output.plot.imshow(ax=ax)
plt.show()

# rusterize to "sparse" -> custom structure storing the coordinates and values of the rasterized geometries
output = rusterize(
    gdf,
    res=(1, 1),
    field="value",
    fun="sum",
    encoding="sparse"
)
output
# SparseArray:
# - Shape: (131, 361)
# - Extent: (-180.5, -70.5, 180.5, 60.5)
# - Resolution: (1.0, 1.0)
# - EPSG: 32619
# - Estimated size: 369.46 KB

# materialize into xarray or numpy
array = output.to_xarray()
array = output.to_numpy()

# get only coordinates and values
output.to_frame()
# shape: (29_340, 3)
# ┌─────┬─────┬──────┐
# │ row ┆ col ┆ data │
# │ --- ┆ --- ┆ ---  │
# │ u32 ┆ u32 ┆ f64  │
# ╞═════╪═════╪══════╡
# │ 6   ┆ 40  ┆ 1.0  │
# │ 6   ┆ 41  ┆ 1.0  │
# │ 6   ┆ 42  ┆ 1.0  │
# │ 7   ┆ 39  ┆ 1.0  │
# │ 7   ┆ 40  ┆ 1.0  │
# │ …   ┆ …   ┆ …    │
# │ 64  ┆ 258 ┆ 1.0  │
# │ 63  ┆ 259 ┆ 1.0  │
# │ 62  ┆ 259 ┆ 1.0  │
# │ 61  ┆ 260 ┆ 1.0  │
# │ 60  ┆ 260 ┆ 1.0  │
# └─────┴─────┴──────┘

Benchmarks

rusterize is fast! Let’s try it on small and large datasets.

from rusterize import rusterize
import geopandas as gpd
import requests
import zipfile
from io import BytesIO

# large dataset (~380 MB)
url = "https://s3.amazonaws.com/hp3-shapefiles/Mammals_Terrestrial.zip"
response = requests.get(url)

# unzip
with zipfile.ZipFile(BytesIO(response.content), 'r') as zip_ref:
    zip_ref.extractall()

# read
gdf_large = gpd.read_file("Mammals_Terrestrial/Mammals_Terrestrial.shp")

# small dataset (first 1000 rows)
gdf_small = gdf_large.iloc[:1000, :]

# rusterize at 1/6 degree resolution
def test_large(benchmark):
  benchmark(rusterize, gdf_large, res=(1/6, 1/6), fun="sum")

def test_small(benchmark):
  benchmark(rusterize, gdf_small, res=(1/6, 1/6), fun="sum")

Then you can run it with pytest and pytest-benchmark:

pytest <python file> --benchmark-min-rounds=20 --benchmark-time-unit='s'

--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
rusterize_small       0.0791    0.0899   0.0812  0.0027   0.0803  0.0020       2;2  12.3214     20          1
rusterize_large     1.379545    1.4474   1.4006  0.0178   1.3966  0.0214       5;1   0.7140     20          1
-------------------------------------------------------------------------------------------------------------

And fasterize:

library(sf)
library(raster)
library(fasterize)
library(microbenchmark)

large <- st_read("Mammals_Terrestrial/Mammals_Terrestrial.shp", quiet = TRUE)
small <- large[1:1000, ]
fn <- function(v) {
  r <- raster(v, res = 1/6)
  return(fasterize(v, r, fun = "sum"))
}
microbenchmark(
  fasterize_large = f <- fn(large),
  fasterize_small = f <- fn(small),
  times=20L,
  unit='s'
)
Unit: seconds
            expr       min         lq       mean     median        uq        max neval
 fasterize_small 0.4741043  0.4926114  0.5191707  0.5193289  0.536741  0.5859029    20
 fasterize_large 9.2199426 10.3595465 10.6653139 10.5369429 11.025771 11.7944567    20

And on an even larger datasets? Here we use a layer from the province of Quebec, Canada representing ~2M polygons of forest stands, rasterized at 30 meters (20 rounds) with no field value, pixel function any, and dense encoding. The comparison with gdal_rasterize was run with hyperfine --runs 20 "gdal_rasterize -tr 30 30 -burn 1 <data_in> <data_out>".

# rusterize
--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
rusterize             5.9331   7.2308   6.1302  0.3183  5.9903   0.1736       2;4  0.1631      20           1
-------------------------------------------------------------------------------------------------------------

# fasterize
Unit: seconds
      expr      min       lq     mean   median       uq      max neval
 fasterize 157.4734 177.2055 194.3222 194.6455 213.9195 230.6504    20

# gdal_rasterize (CLI) - read from fast drive, write to fast drive
Time (mean ± σ):      5.495 s ±  0.038 s    [User: 4.268 s, System: 1.225 s]
Range (min … max):    5.452 s …  5.623 s    20 runs

In terms of (multi)line rasterization speed, here's a benchmark against gdal_rasterize using a layer from the province of Quebec, Canada, representing a subset of the road network for a total of ~535K multilinestrings.

# rusterize
--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
test                  4.5272   5.9488   4.7171  0.3236   4.6360  0.1680       2;2  0.2120      20           1
-------------------------------------------------------------------------------------------------------------

# gdal_rasterize (CLI) - read from fast drive, write to fast drive
Time (mean ± σ):      8.719 s ±  0.063 s    [User: 3.782 s, System: 4.917 s]
Range (min … max):    8.658 s …  8.874 s    20 runs

Comparison with other tools

While rusterize is fast, there are other fast alternatives out there, including GDAL, rasterio and geocube. However, rusterize allows for a seamless, Rust-native processing with similar or lower memory footprint that doesn't require you to leave Python, and returns the geoinformation you need for downstream processing with ample control over resolution, shape, extent, and data type.

The following is a time comparison on a single run on the same forest stands dataset used earlier.

rusterize:    5.9 sec
rasterio:     68  sec (but no spatial information)
fasterize:    157 sec (including raster creation)
geocube:      260 sec (larger memory footprint)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rusterize-0.6.0.tar.gz (76.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rusterize-0.6.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (16.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

rusterize-0.6.0-cp311-abi3-win_amd64.whl (16.2 MB view details)

Uploaded CPython 3.11+Windows x86-64

rusterize-0.6.0-cp311-abi3-manylinux_2_28_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

rusterize-0.6.0-cp311-abi3-manylinux_2_28_ppc64le.whl (17.3 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ppc64le

rusterize-0.6.0-cp311-abi3-manylinux_2_28_armv7l.whl (16.1 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARMv7l

rusterize-0.6.0-cp311-abi3-manylinux_2_28_aarch64.whl (15.2 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARM64

rusterize-0.6.0-cp311-abi3-macosx_11_0_arm64.whl (14.7 MB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

rusterize-0.6.0-cp311-abi3-macosx_10_12_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file rusterize-0.6.0.tar.gz.

File metadata

  • Download URL: rusterize-0.6.0.tar.gz
  • Upload date:
  • Size: 76.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rusterize-0.6.0.tar.gz
Algorithm Hash digest
SHA256 b735231ce9ffed324b25dcfa9a77dbd1b24a6d3098949c26f8ff0b9d1479d22c
MD5 759d8c445a46145e65b13ff4ae1ddf54
BLAKE2b-256 fa0a6a2dfc27a0586ed652b4cd6685c7dceb8727588b36eb62902b84a4e4a9ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0.tar.gz:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for rusterize-0.6.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e395f16d782a827afd56f56d39b0274b8b8d3e319987b7df6aafa8bbe41566ac
MD5 03145d7e934624c9ad495a1e787c1987
BLAKE2b-256 3d4f23eb6d8047543c5b7a014b74f853b3cd42e1abb2a82043749a9e18c2571c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: rusterize-0.6.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 16.2 MB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rusterize-0.6.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4558a8e8bfe7ca2be29354d22cf2e2d3f5f3bb9596d3e2fe62da619dbe07643e
MD5 37517631c518fd2f1054aa6444dfddd4
BLAKE2b-256 a6e13a5131a096bff68181394a5edb6f859294d70147d018e66c2a1e796f39d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-cp311-abi3-win_amd64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for rusterize-0.6.0-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 54d1775a839da648381a33e2e74098de92bf29fb9126aa2bfe2a4d57d49be618
MD5 b8b476d653b2753efe81a56b18e4e329
BLAKE2b-256 4efc06128accdbee48c9aeb4d3c10f07128c9f1d2deea4d2f26631c9976a472b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-cp311-abi3-manylinux_2_28_x86_64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-cp311-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for rusterize-0.6.0-cp311-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 470f5ec66f73cb4c95c663d2114fa1493f02e2e915ba89a3b7bac5157d633c1d
MD5 be61a39fc006b54329fed4389b184692
BLAKE2b-256 1af2e7aa91eff2759c218e147a7b208a19a9a2b43f943fa50ef6ed48e13485e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-cp311-abi3-manylinux_2_28_ppc64le.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-cp311-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for rusterize-0.6.0-cp311-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 c4e79c59ede9d660b110da3857608f0975787d12782d6c258f109c74e4af2604
MD5 0462ceec2d26034d8403a98e836acaab
BLAKE2b-256 b4756236e85d9a7c96a4ef105d534c20006d3d341fa8c65ece5a9cbc9656bc1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-cp311-abi3-manylinux_2_28_armv7l.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-cp311-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for rusterize-0.6.0-cp311-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0b2ac6603786ea0cc6b29f968f068c097724eb9ce886a231b7e939b7b44206f0
MD5 7b6c4b2df7a2a64b4193c17fcf527871
BLAKE2b-256 101e7b86a33222acb7423f829b400d908dabd726f2022f08848f8a949d9bf135

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-cp311-abi3-manylinux_2_28_aarch64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rusterize-0.6.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c1575ebde9429a4a711b3cb8e9a9480a9b506fbda3ad098032e8afdfd11bb40e
MD5 37ba7781a74b3db99764b08a30d99ed3
BLAKE2b-256 4c7a64009dede6f29e1501465fe4b5ae19f50740c461dc27d753a3f0f03f9cd7

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.6.0-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rusterize-0.6.0-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d993922c1da5f44f203b1334f49e6ccf9995260a61f38698abf00e28442e7b33
MD5 31c1f8582b9655ba3c29e08173e23ed2
BLAKE2b-256 3f340e5f29ff457b5d8170e861319ae73911088c9b07b2deab8920918d9b4c3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.6.0-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page