Skip to main content

High-performance Rust UDFs for Earth Observation processing

Project description

eo-processor

Coverage

High-performance Rust UDFs for Earth Observation (EO) processing with Python bindings.

Overview

eo-processor is a framework that provides Rust-based User Defined Functions (UDFs) for common Earth Observation and geospatial computations. These functions can be used within local and remote (Dask/Kubernetes) workflows, leveraging PyO3 to create highly efficient and optimized operations.

The Rust implementation bypasses Python's Global Interpreter Lock (GIL), making it ideal for:

  • Long-running computations on large satellite imagery
  • Parallel processing with Dask
  • XArray apply_ufunc and map_blocks workflows
  • CPU-intensive geospatial operations

Features

  • High Performance: Rust-accelerated computations that bypass Python's GIL
  • Easy Integration: Works seamlessly with NumPy, XArray, and Dask
  • Common EO Indices: Pre-implemented functions for NDVI, NDWI, and generic normalized differences
  • Type Safe: Full type hints for Python IDE support
  • Flexible: Supports both 1D and 2D arrays with automatic dimension detection

Installation

From Source

Requirements:

  • Python 3.8+
  • Rust toolchain (install from rustup.rs)
# Install maturin for building
pip install maturin

# Build and install the package
maturin develop --release

# Or build wheel for distribution
maturin build --release
pip install target/wheels/*.whl

Usage

Basic Usage

import numpy as np
from eo_processor import ndvi, ndwi, normalized_difference

# Compute NDVI from NIR and Red bands
nir = np.array([0.8, 0.7, 0.6])
red = np.array([0.2, 0.1, 0.3])
ndvi_result = ndvi(nir, red)
print(ndvi_result)  # [0.6, 0.75, 0.33333333]

# Note: the functions in `eo_processor` now return NumPy arrays directly.
# They no longer return a (array, dims) tuple or any dims metadata.

# Works with 2D arrays (images)
nir_image = np.random.rand(1000, 1000)
red_image = np.random.rand(1000, 1000)
ndvi_image = ndvi(nir_image, red_image)

# Compute NDWI (water index)
green = np.array([0.3, 0.4, 0.5])
ndwi_result = ndwi(green, nir)

# Generic normalized difference: (a - b) / (a + b)
custom_index = normalized_difference(band_a, band_b)

XArray Integration

import xarray as xr
from eo_processor import ndvi

# Create XArray DataArrays
nir = xr.DataArray(nir_data, dims=["y", "x"])
red = xr.DataArray(red_data, dims=["y", "x"])

# Apply using xr.apply_ufunc
ndvi_result = xr.apply_ufunc(
    ndvi,
    nir,
    red,
    dask="parallelized",
    output_dtypes=[float],
)

Dask Integration (Parallel Processing)

import dask.array as da
import xarray as xr
from eo_processor import ndvi

# Create large Dask arrays (chunked for parallel processing)
nir_dask = da.random.random((10000, 10000), chunks=(1000, 1000))
red_dask = da.random.random((10000, 10000), chunks=(1000, 1000))

# Wrap in XArray
nir_xr = xr.DataArray(nir_dask, dims=["y", "x"])
red_xr = xr.DataArray(red_dask, dims=["y", "x"])

# Compute NDVI (bypasses GIL, enables true parallelism)
ndvi_result = xr.apply_ufunc(
    ndvi,
    nir_xr,
    red_xr,
    dask="parallelized",
    output_dtypes=[float],
)

# Compute result
ndvi_computed = ndvi_result.compute()

Using map_blocks with Dask

import dask.array as da
from eo_processor import ndvi

nir_dask = da.random.random((5000, 5000), chunks=(500, 500))
red_dask = da.random.random((5000, 5000), chunks=(500, 500))

# Apply to blocks (each block processed independently)
ndvi_result = da.map_blocks(
    ndvi,
    nir_dask,
    red_dask,
    dtype=np.float64,
)

result = ndvi_result.compute()

Available Functions

Normalized Difference Functions

  • normalized_difference(a, b): Generic normalized difference (a - b) / (a + b)
  • normalized_difference_1d(a, b): 1D version
  • normalized_difference_2d(a, b): 2D version

Vegetation Indices

  • ndvi(nir, red): Normalized Difference Vegetation Index
  • ndvi_1d(nir, red): 1D version
  • ndvi_2d(nir, red): 2D version

Water Indices

  • ndwi(green, nir): Normalized Difference Water Index
  • ndwi_1d(green, nir): 1D version
  • ndwi_2d(green, nir): 2D version

Performance

The Rust implementation provides significant performance improvements over pure Python/NumPy, especially for large arrays:

import numpy as np
import time
from eo_processor import ndvi

# Large array
nir = np.random.rand(5000, 5000)
red = np.random.rand(5000, 5000)

# Rust implementation
start = time.time()
result_rust = ndvi(nir, red)
time_rust = time.time() - start

# NumPy implementation
start = time.time()
result_numpy = (nir - red) / (nir + red)
time_numpy = time.time() - start

print(f"Rust: {time_rust:.4f}s")
print(f"NumPy: {time_numpy:.4f}s")
print(f"Speedup: {time_numpy/time_rust:.2f}x")

Development

Building

# Development build
maturin develop

# Release build
maturin develop --release

Testing

# Run Rust tests
cargo test

# Run Python tests (if pytest is installed)
pytest

Running Examples

# Basic usage examples
python examples/basic_usage.py

# XArray/Dask examples (requires: pip install eo-processor[dask])
python examples/xarray_dask_usage.py

Why Rust + PyO3?

  1. Performance: Rust provides C-level performance with memory safety
  2. GIL-Free: Rust code releases the Python GIL, enabling true parallelism
  3. Type Safety: Compile-time guarantees reduce runtime errors
  4. Easy Integration: PyO3 makes it seamless to call Rust from Python
  5. Modern Tooling: Cargo and maturin provide excellent development experience

Use Cases

  • Processing large satellite imagery datasets (Sentinel, Landsat, etc.)
  • Real-time vegetation monitoring using NDVI
  • Water body detection using NDWI
  • Custom spectral indices computation
  • Distributed processing on Dask/Kubernetes clusters
  • Time-series analysis of Earth Observation data

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Citation

If you use this library in your research, please cite:

@software{eo_processor,
  title = {eo-processor: High-performance Rust UDFs for Earth Observation},
  author = {Ben},
  year = {2025},
  url = {https://github.com/BnJam/eo-processor}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eo_processor-0.1.0.tar.gz (96.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eo_processor-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (224.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file eo_processor-0.1.0.tar.gz.

File metadata

  • Download URL: eo_processor-0.1.0.tar.gz
  • Upload date:
  • Size: 96.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.6

File hashes

Hashes for eo_processor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e19702c82399b1ba8d776487b4fe09b9f9b2fb3cefd2e11e7319a06edd418fd7
MD5 124e44f9f3f48b827ecb5bccc4c9c66d
BLAKE2b-256 281f8dd6d9816073d0b895af3eabf2cfb8c14b38691b1bedfb147762aee3cf1a

See more details on using hashes here.

File details

Details for the file eo_processor-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for eo_processor-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8903bd8df65f6bc283ccadd3cc256a829f4b080c6c5199c46538c1866d32d7f8
MD5 f4164e0056aeb09fb7762c478d3c49a5
BLAKE2b-256 30e454e39b2d5f35e22a888b48452bf7a333d1a82e49238d64f798bb68f09861

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page