Skip to main content

High performance rasterization tool for Python built in Rust

Project description

rusterize

High performance rasterization tool for Python built in Rust. This repository stems from the fasterize package built in C++ for R and ports parts of the logics into Python with a Rust backend, in addition to some useful improvements (see API).

rusterize is designed to work on (multi)polygons and (multi)linestrings, even when they are nested inside complex geometry collections. Functionally, it takes an input geopandas dataframe and returns a xarray or a sparse array in COOrdinate format.

Installation

Install the current version with pip:

pip install rusterize

Contributing

Any contribution is welcome! You can install rusterize directly from this repo using maturin as an editable package. For this to work, you’ll need to have Rust and cargo installed.

# Clone repo
git clone https://github.com/<username>/rusterize.git
cd rusterize

# Install the Rust nightly toolchain
rustup toolchain install nightly-2025-07-31

 # Install maturin
pip install maturin

# Install editable version with optmized code
maturin develop --profile dist-release

API

This package has a simple API:

from rusterize import rusterize

# gdf = <import/modify dataframe as needed>

# rusterize
rusterize(
    gdf,
    like=None,
    res=(30, 30),
    out_shape=(10, 10),
    extent=(0, 10, 10, 20),
    field="field",
    by="by",
    burn=None,
    fun="sum",
    background=0,
    encoding="dense",
    dtype="uint8"
)
  • gdf: geopandas dataframe to rasterize
  • like: xr.DataArray to use as template for res, out_shape, and extent. Mutually exclusive with these parameters (default: None)
  • res: (xres, yres) for desired resolution (default: None)
  • out_shape: (nrows, ncols) for desired output shape (default: None)
  • extent: (xmin, ymin, xmax, ymax) for desired output extent (default: None)
  • field: column to rasterize. Mutually exclusive with burn. (default: None -> a value of 1 is rasterized)
  • by: column for grouping. Assign each group to a band in the stack. Values are taken from field if specified, else burn is rasterized. (default: None -> singleband raster)
  • burn: a single value to burn. Mutually exclusive with field. (default: None). If no field is found in gdf or if field is None, then burn=1
  • fun: pixel function to use when multiple values overlap. Available options are sum, first, last, min, max, count, or any. (default: last)
  • background: background value in final raster. (default: np.nan). A None value corresponds to the default of the specified dtype. An illegal value for a dtype will be replaced with the default of that dtype. For example, a background=np.nan for dtype="uint8" will become background=0, where 0 is the default for uint8.
  • encoding: defines the output format of the rasterization. This is either a dense xarray representing the burned rasterized geometries, or a sparse array in COOrdinate format good for sparse observations and low memory consumption.
  • dtype: dtype of the final raster. Possible values are uint8, uint16, uint32, uint64, int8, int16, int32, int64, float32, float64 (default: float64)

Note that control over the desired extent is not as strict as for resolution and shape. That is, when resolution, output shape, and extent are specified, priority is given to resolution and shape. So, extent is not guaranteed, but resolution and shape are. If extent is not given, it is taken from the polygons and is not modified, unless you specify a resolution value. If you only specify an output shape, the extent is maintained. This mimics the logics of gdal_rasterize.

Encoding

Version 0.5.0 introduces a new encoding parameter to control the output format of the rasterization. This means that you can return a xarray with the burned rasterized geometries, or a new structure SparseArray. This SparseArray structure stores the band/row/column triplets of where the geometries should be burned onto the final raster, as well as their corresponding values before applying any pixel function. This can be used as an intermediate output to avoid allocating memory before materializing the final raster, or as a final product. SparseArray has two convenience functions: to_xarray() and to_frame(). The first returns the final xarray, the second produces a polars dataframe with only the coordinates and values of the rasterized geometries. Note that SparseArray avoids allocating memory for the array during rasterization until when it's actually needed (calling to_xarray()). See below for an example.

Usage

rusterize consists of a single function rusterize().

from rusterize import rusterize
import geopandas as gpd
from shapely import wkt
import matplotlib.pyplot as plt

# Construct geometries
geoms = [
    "POLYGON ((-180 -20, -140 55, 10 0, -140 -60, -180 -20), (-150 -20, -100 -10, -110 20, -150 -20))",
    "POLYGON ((-10 0, 140 60, 160 0, 140 -55, -10 0))",
    "POLYGON ((-125 0, 0 60, 40 5, 15 -45, -125 0))",
    "MULTILINESTRING ((-180 -70, -140 -50), (-140 -50, -100 -70), (-100 -70, -60 -50), (-60 -50, -20 -70), (-20 -70, 20 -50), (20 -50, 60 -70), (60 -70, 100 -50), (100 -50, 140 -70), (140 -70, 180 -50))",
    "GEOMETRYCOLLECTION (POINT (50 -40), POLYGON ((75 -40, 75 -30, 100 -30, 100 -40, 75 -40)), LINESTRING (60 -40, 80 0), GEOMETRYCOLLECTION (POLYGON ((100 20, 100 30, 110 30, 110 20, 100 20))))"
]

# Convert WKT strings to Shapely geometries
geometries = [wkt.loads(geom) for geom in geoms]

# Create a GeoDataFrame
gdf = gpd.GeoDataFrame({'value': range(1, len(geoms) + 1)}, geometry=geometries, crs='EPSG:32619')

# rusterize to "dense" -> return a xarray with the burned geometries (default)
output = rusterize(
    gdf,
    res=(1, 1),
    field="value",
    fun="sum",
).squeeze()

# plot it
fig, ax = plt.subplots(figsize=(12, 6))
output.plot.imshow(ax=ax)
plt.show()

# rusterize to "sparse" -> custom structure storing the coordinates and values of the rasterized geometries
output = rusterize(
    gdf,
    res=(1, 1),
    field="value",
    fun="sum",
    encoding="sparse"
)
output
# SparseArray:
# - Shape: (131, 361)
# - Extent: (-180.5, -70.5, 180.5, 60.5)
# - Resolution: (1.0, 1.0)
# - EPSG: 32619
# - Estimated size: 369.46 KB

# materialize into xarray
array = output.to_xarray()

# get only coordinates and values
coo = output.to_frame()
# shape: (29_340, 3)
# ┌─────┬─────┬──────┐
# │ row ┆ col ┆ data │
# │ --- ┆ --- ┆ ---  │
# │ u32 ┆ u32 ┆ f64  │
# ╞═════╪═════╪══════╡
# │ 6   ┆ 40  ┆ 1.0  │
# │ 6   ┆ 41  ┆ 1.0  │
# │ 6   ┆ 42  ┆ 1.0  │
# │ 7   ┆ 39  ┆ 1.0  │
# │ 7   ┆ 40  ┆ 1.0  │
# │ …   ┆ …   ┆ …    │
# │ 64  ┆ 258 ┆ 1.0  │
# │ 63  ┆ 259 ┆ 1.0  │
# │ 62  ┆ 259 ┆ 1.0  │
# │ 61  ┆ 260 ┆ 1.0  │
# │ 60  ┆ 260 ┆ 1.0  │
# └─────┴─────┴──────┘

Benchmarks

rusterize is fast! Let’s try it on small and large datasets.

from rusterize import rusterize
import geopandas as gpd
import requests
import zipfile
from io import BytesIO

# large dataset (~380 MB)
url = "https://s3.amazonaws.com/hp3-shapefiles/Mammals_Terrestrial.zip"
response = requests.get(url)

# unzip
with zipfile.ZipFile(BytesIO(response.content), 'r') as zip_ref:
    zip_ref.extractall()

# read
gdf_large = gpd.read_file("Mammals_Terrestrial/Mammals_Terrestrial.shp")

# small dataset (first 1000 rows)
gdf_small = gdf_large.iloc[:1000, :]

# rusterize at 1/6 degree resolution
def test_large(benchmark):
  benchmark(rusterize, gdf_large, res=(1/6, 1/6), fun="sum")

def test_small(benchmark):
  benchmark(rusterize, gdf_small, res=(1/6, 1/6), fun="sum")

Then you can run it with pytest and pytest-benchmark:

pytest <python file> --benchmark-min-rounds=20 --benchmark-time-unit='s'

--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
rusterize_small       0.0791    0.0899   0.0812  0.0027   0.0803  0.0020       2;2  12.3214     20          1
rusterize_large     1.379545    1.4474   1.4006  0.0178   1.3966  0.0214       5;1   0.7140     20          1
-------------------------------------------------------------------------------------------------------------

And fasterize:

library(sf)
library(raster)
library(fasterize)
library(microbenchmark)

large <- st_read("Mammals_Terrestrial/Mammals_Terrestrial.shp", quiet = TRUE)
small <- large[1:1000, ]
fn <- function(v) {
  r <- raster(v, res = 1/6)
  return(fasterize(v, r, fun = "sum"))
}
microbenchmark(
  fasterize_large = f <- fn(large),
  fasterize_small = f <- fn(small),
  times=20L,
  unit='s'
)
Unit: seconds
            expr       min         lq       mean     median        uq        max neval
 fasterize_small 0.4741043  0.4926114  0.5191707  0.5193289  0.536741  0.5859029    20
 fasterize_large 9.2199426 10.3595465 10.6653139 10.5369429 11.025771 11.7944567    20

And on an even larger datasets? Here we use a layer from the province of Quebec, Canada representing ~2M polygons of forest stands, rasterized at 30 meters (20 rounds) with no field value, pixel function any, and dense encoding. The comparison with gdal_rasterize was run with hyperfine --runs 20 "gdal_rasterize -tr 30 30 -burn 1 <data_in> <data_out>".

# rusterize
--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
rusterize             5.9331   7.2308   6.1302  0.3183  5.9903   0.1736       2;4  0.1631      20           1
-------------------------------------------------------------------------------------------------------------

# fasterize
Unit: seconds
      expr      min       lq     mean   median       uq      max neval
 fasterize 157.4734 177.2055 194.3222 194.6455 213.9195 230.6504    20

# gdal_rasterize (CLI) - read from fast drive, write to fast drive
Time (mean ± σ):      5.495 s ±  0.038 s    [User: 4.268 s, System: 1.225 s]
Range (min … max):    5.452 s …  5.623 s    20 runs

In terms of (multi)line rasterization speed, here's a benchmark against gdal_rasterize using a layer from the province of Quebec, Canada, representing a subset of the road network for a total of ~535K multilinestrings.

# rusterize
--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
test                  4.5272   5.9488   4.7171  0.3236   4.6360  0.1680       2;2  0.2120      20           1
-------------------------------------------------------------------------------------------------------------

# gdal_rasterize (CLI) - read from fast drive, write to fast drive
Time (mean ± σ):      8.719 s ±  0.063 s    [User: 3.782 s, System: 4.917 s]
Range (min … max):    8.658 s …  8.874 s    20 runs

Comparison with other tools

While rusterize is fast, there are other fast alternatives out there, including GDAL, rasterio and geocube. However, rusterize allows for a seamless, Rust-native processing with similar or lower memory footprint that doesn't require you to leave Python, and returns the geoinformation you need for downstream processing with ample control over resolution, shape, extent, and data type.

The following is a time comparison on a single run on the same forest stands dataset used earlier.

rusterize:    5.9 sec
rasterio:     68  sec (but no spatial information)
fasterize:    157 sec (including raster creation)
geocube:      260 sec (larger memory footprint)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rusterize-0.5.0.tar.gz (75.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rusterize-0.5.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl (16.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ x86-64

rusterize-0.5.0-cp311-abi3-win_amd64.whl (16.2 MB view details)

Uploaded CPython 3.11+Windows x86-64

rusterize-0.5.0-cp311-abi3-manylinux_2_28_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ x86-64

rusterize-0.5.0-cp311-abi3-manylinux_2_28_ppc64le.whl (17.3 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ppc64le

rusterize-0.5.0-cp311-abi3-manylinux_2_28_armv7l.whl (16.1 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARMv7l

rusterize-0.5.0-cp311-abi3-manylinux_2_28_aarch64.whl (15.2 MB view details)

Uploaded CPython 3.11+manylinux: glibc 2.28+ ARM64

rusterize-0.5.0-cp311-abi3-macosx_11_0_arm64.whl (14.7 MB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

rusterize-0.5.0-cp311-abi3-macosx_10_12_x86_64.whl (15.8 MB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file rusterize-0.5.0.tar.gz.

File metadata

  • Download URL: rusterize-0.5.0.tar.gz
  • Upload date:
  • Size: 75.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rusterize-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c53dbb58465ee0e9aae1e50764cccc672e3e2b2f58cc9a91914aff269b8e9fca
MD5 72b53e9189620e79f1dc4ceae6796028
BLAKE2b-256 1f22c2034751218ae2547ed3b1e686741a397c216183b6a36dfcce1557c54e9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0.tar.gz:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for rusterize-0.5.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ccc4622d829e1702301c3800c62dbcded83ebd696c90f071d0cf608cdbc105e4
MD5 7bcf9b122900982a083b24089a619e2d
BLAKE2b-256 23776c78f0756cd133cabc7cf14665cc92db15967dabc526ce8101fe6c5b328f

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-pp311-pypy311_pp73-manylinux_2_28_x86_64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: rusterize-0.5.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 16.2 MB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rusterize-0.5.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b27d76b5dd680aa88cd3a156268d4269718e03bb6713d2e9fb1aece813eae97e
MD5 9044f9f24cf64ef553d6aca3109676e2
BLAKE2b-256 ed4e5e62a6358b91482914a07b688e463d2daa6e19353bf1059dc9f3a7bddcce

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-cp311-abi3-win_amd64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-cp311-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for rusterize-0.5.0-cp311-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bd869054d2ba0e43c472a78fabdb4e02b8d08032bac10c7e0335cd4dff2d8a31
MD5 478bc8401a59b7cb54f2e1f6fa40a8d6
BLAKE2b-256 b0c5a4deadd0a6151b4086ec0e58a6067a97d3d8220cca1d56a5a4a569ac9455

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-cp311-abi3-manylinux_2_28_x86_64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-cp311-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for rusterize-0.5.0-cp311-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 a33dd97cb594ea3a8396920e71ec54d23572b1f1ee89f13daf5b0f92253c7599
MD5 0951a3b4776b579ff86def407ac4a089
BLAKE2b-256 8740ef26fb024ba5c50500babfef6abe817592fe70c7356dcf20cf149bee993e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-cp311-abi3-manylinux_2_28_ppc64le.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-cp311-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for rusterize-0.5.0-cp311-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 6ea3234a5b41f14b25f05adb77b06ff8a673a4acd7bc00a5015abf4ccbfa0688
MD5 f39e204ed2ad80bdb9ec87ba49b8c0d6
BLAKE2b-256 fdb4b97e28fd2efccd7c29c4b3196e044fbc3fd0d06f21c2dddb4a05f180b4e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-cp311-abi3-manylinux_2_28_armv7l.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-cp311-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for rusterize-0.5.0-cp311-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1199d5af57ebe86a95b781d49c98eb6e65c0adfa83e81b0375ad4f4c5964247c
MD5 fd387d61c1fe7a82c966115ad7f7bd15
BLAKE2b-256 cdb2a4bdca8fddb4212c5a1c5931ccac13a4c4384a824dc0e40f40866a2c9a74

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-cp311-abi3-manylinux_2_28_aarch64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rusterize-0.5.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c66c18fdc8cee4badd4200842623f9a379046fc73378b0dbda37b1d3ccb423a1
MD5 0940ef59198eb997a420e7d5388e38f7
BLAKE2b-256 3c5c613ef0b702925b2459d7a8f3f185a3b5fcb986a355d9d935586e2be0a326

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rusterize-0.5.0-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for rusterize-0.5.0-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 309a3d21d134c15050f4fd9765c3e1553a9743e00cef95bb10fd134737ba89d4
MD5 644cff59ce3a8c625ad5e478760c9f79
BLAKE2b-256 fccb25d8ef31f0bd0f0d2a5400b78cfd3c8afc2b66ae8559323c22e5d1707212

See more details on using hashes here.

Provenance

The following attestation bundles were made for rusterize-0.5.0-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on ttrotto/rusterize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page