Skip to main content

High-performance Rust UDFs for Earth Observation processing

Project description

eo-processor

Coverage

High-performance Rust UDFs for Earth Observation (EO) processing with Python bindings.

Overview

eo-processor is a framework that provides Rust-based User Defined Functions (UDFs) for common Earth Observation and geospatial computations. These functions can be used within local and remote (Dask/Kubernetes) workflows, leveraging PyO3 to create highly efficient and optimized operations.

The Rust implementation bypasses Python's Global Interpreter Lock (GIL), making it ideal for:

  • Long-running computations on large satellite imagery
  • Parallel processing with Dask
  • XArray apply_ufunc and map_blocks workflows
  • CPU-intensive geospatial operations

Features

  • High Performance: Rust-accelerated computations that bypass Python's GIL
  • Easy Integration: Works seamlessly with NumPy, XArray, and Dask
  • Common EO Indices: Pre-implemented functions for NDVI, NDWI, and generic normalized differences
  • Type Safe: Full type hints for Python IDE support
  • Flexible: Supports both 1D and 2D arrays with automatic dimension detection

Installation

From Source

Requirements:

  • Python 3.8+
  • Rust toolchain (install from rustup.rs)
# Install maturin for building
pip install maturin

# Build and install the package
maturin develop --release

# Or build wheel for distribution
maturin build --release
pip install target/wheels/*.whl

Usage

Basic Usage

import numpy as np
from eo_processor import ndvi, ndwi, normalized_difference

# Compute NDVI from NIR and Red bands
nir = np.array([0.8, 0.7, 0.6])
red = np.array([0.2, 0.1, 0.3])
ndvi_result = ndvi(nir, red)
print(ndvi_result)  # [0.6, 0.75, 0.33333333]

# Note: the functions in `eo_processor` now return NumPy arrays directly.
# They no longer return a (array, dims) tuple or any dims metadata.

# Works with 2D arrays (images)
nir_image = np.random.rand(1000, 1000)
red_image = np.random.rand(1000, 1000)
ndvi_image = ndvi(nir_image, red_image)

# Compute NDWI (water index)
green = np.array([0.3, 0.4, 0.5])
ndwi_result = ndwi(green, nir)

# Generic normalized difference: (a - b) / (a + b)
custom_index = normalized_difference(band_a, band_b)

XArray Integration

import xarray as xr
from eo_processor import ndvi

# Create XArray DataArrays
nir = xr.DataArray(nir_data, dims=["y", "x"])
red = xr.DataArray(red_data, dims=["y", "x"])

# Apply using xr.apply_ufunc
ndvi_result = xr.apply_ufunc(
    ndvi,
    nir,
    red,
    dask="parallelized",
    output_dtypes=[float],
)

Dask Integration (Parallel Processing)

import dask.array as da
import xarray as xr
from eo_processor import ndvi

# Create large Dask arrays (chunked for parallel processing)
nir_dask = da.random.random((10000, 10000), chunks=(1000, 1000))
red_dask = da.random.random((10000, 10000), chunks=(1000, 1000))

# Wrap in XArray
nir_xr = xr.DataArray(nir_dask, dims=["y", "x"])
red_xr = xr.DataArray(red_dask, dims=["y", "x"])

# Compute NDVI (bypasses GIL, enables true parallelism)
ndvi_result = xr.apply_ufunc(
    ndvi,
    nir_xr,
    red_xr,
    dask="parallelized",
    output_dtypes=[float],
)

# Compute result
ndvi_computed = ndvi_result.compute()

Using map_blocks with Dask

import dask.array as da
from eo_processor import ndvi

nir_dask = da.random.random((5000, 5000), chunks=(500, 500))
red_dask = da.random.random((5000, 5000), chunks=(500, 500))

# Apply to blocks (each block processed independently)
ndvi_result = da.map_blocks(
    ndvi,
    nir_dask,
    red_dask,
    dtype=np.float64,
)

result = ndvi_result.compute()

Available Functions

Normalized Difference Functions

  • normalized_difference(a, b): Generic normalized difference (a - b) / (a + b)
  • normalized_difference_1d(a, b): 1D version
  • normalized_difference_2d(a, b): 2D version

Vegetation Indices

  • ndvi(nir, red): Normalized Difference Vegetation Index
  • ndvi_1d(nir, red): 1D version
  • ndvi_2d(nir, red): 2D version

Water Indices

  • ndwi(green, nir): Normalized Difference Water Index
  • ndwi_1d(green, nir): 1D version
  • ndwi_2d(green, nir): 2D version

Performance

The Rust implementation provides significant performance improvements over pure Python/NumPy, especially for large arrays:

import numpy as np
import time
from eo_processor import ndvi

# Large array
nir = np.random.rand(5000, 5000)
red = np.random.rand(5000, 5000)

# Rust implementation
start = time.time()
result_rust = ndvi(nir, red)
time_rust = time.time() - start

# NumPy implementation
start = time.time()
result_numpy = (nir - red) / (nir + red)
time_numpy = time.time() - start

print(f"Rust: {time_rust:.4f}s")
print(f"NumPy: {time_numpy:.4f}s")
print(f"Speedup: {time_numpy/time_rust:.2f}x")

Development

Building

# Development build
maturin develop

# Release build
maturin develop --release

Testing

# Run Rust tests
cargo test

# Run Python tests (if pytest is installed)
pytest

Running Examples

# Basic usage examples
python examples/basic_usage.py

# XArray/Dask examples (requires: pip install eo-processor[dask])
python examples/xarray_dask_usage.py

Why Rust + PyO3?

  1. Performance: Rust provides C-level performance with memory safety
  2. GIL-Free: Rust code releases the Python GIL, enabling true parallelism
  3. Type Safety: Compile-time guarantees reduce runtime errors
  4. Easy Integration: PyO3 makes it seamless to call Rust from Python
  5. Modern Tooling: Cargo and maturin provide excellent development experience

Use Cases

  • Processing large satellite imagery datasets (Sentinel, Landsat, etc.)
  • Real-time vegetation monitoring using NDVI
  • Water body detection using NDWI
  • Custom spectral indices computation
  • Distributed processing on Dask/Kubernetes clusters
  • Time-series analysis of Earth Observation data

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Citation

If you use this library in your research, please cite:

@software{eo_processor,
  title = {eo-processor: High-performance Rust UDFs for Earth Observation},
  author = {Ben},
  year = {2025},
  url = {https://github.com/BnJam/eo-processor}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eo_processor-0.2.0.tar.gz (95.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eo_processor-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl (289.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file eo_processor-0.2.0.tar.gz.

File metadata

  • Download URL: eo_processor-0.2.0.tar.gz
  • Upload date:
  • Size: 95.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.10.0

File hashes

Hashes for eo_processor-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2ebe935820ccb003b60cb34f04bb57378dfd483f992fac660dc6655c50dad81d
MD5 d916eb9ff4124c59f83bfccc311d8218
BLAKE2b-256 62c3b9d08382eae5bc070a7869a1fcf3eeaca4fd6e8055577e3279b0d29f8250

See more details on using hashes here.

File details

Details for the file eo_processor-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for eo_processor-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9e5af6e537e085cf4da7b46cbd16ec92a7f84afe3d2f505e9b6e431d2dee8760
MD5 23627c952af234db795c280e8bf5ec1d
BLAKE2b-256 72b1bef15ed0bf4804c62776238b9278bf929f6037b3c9297d5168484c44ee94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page