Skip to main content

High-performance Rust UDFs for Earth Observation processing

Project description

eo-processor

Coverage

High-performance Rust UDFs for Earth Observation (EO) processing with Python bindings.

Overview

eo-processor is a framework that provides Rust-based User Defined Functions (UDFs) for common Earth Observation and geospatial computations. These functions can be used within local and remote (Dask/Kubernetes) workflows, leveraging PyO3 to create highly efficient and optimized operations.

The Rust implementation bypasses Python's Global Interpreter Lock (GIL), making it ideal for:

  • Long-running computations on large satellite imagery
  • Parallel processing with Dask
  • XArray apply_ufunc and map_blocks workflows
  • CPU-intensive geospatial operations

Features

  • High Performance: Rust-accelerated computations that bypass Python's GIL
  • Easy Integration: Works seamlessly with NumPy, XArray, and Dask
  • Common EO Indices: Pre-implemented functions for NDVI, NDWI, and generic normalized differences
  • Type Safe: Full type hints for Python IDE support
  • Flexible: Supports both 1D and 2D arrays with automatic dimension detection

Installation

From Source

Requirements:

  • Python 3.8+
  • Rust toolchain (install from rustup.rs)
# Install maturin for building
pip install maturin

# Build and install the package
maturin develop --release

# Or build wheel for distribution
maturin build --release
pip install target/wheels/*.whl

Usage

Basic Usage

import numpy as np
from eo_processor import ndvi, ndwi, normalized_difference

# Compute NDVI from NIR and Red bands
nir = np.array([0.8, 0.7, 0.6])
red = np.array([0.2, 0.1, 0.3])
ndvi_result = ndvi(nir, red)
print(ndvi_result)  # [0.6, 0.75, 0.33333333]

# Note: the functions in `eo_processor` now return NumPy arrays directly.
# They no longer return a (array, dims) tuple or any dims metadata.

# Works with 2D arrays (images)
nir_image = np.random.rand(1000, 1000)
red_image = np.random.rand(1000, 1000)
ndvi_image = ndvi(nir_image, red_image)

# Compute NDWI (water index)
green = np.array([0.3, 0.4, 0.5])
ndwi_result = ndwi(green, nir)

# Generic normalized difference: (a - b) / (a + b)
custom_index = normalized_difference(band_a, band_b)

XArray Integration

import xarray as xr
from eo_processor import ndvi

# Create XArray DataArrays
nir = xr.DataArray(nir_data, dims=["y", "x"])
red = xr.DataArray(red_data, dims=["y", "x"])

# Apply using xr.apply_ufunc
ndvi_result = xr.apply_ufunc(
    ndvi,
    nir,
    red,
    dask="parallelized",
    output_dtypes=[float],
)

Dask Integration (Parallel Processing)

import dask.array as da
import xarray as xr
from eo_processor import ndvi

# Create large Dask arrays (chunked for parallel processing)
nir_dask = da.random.random((10000, 10000), chunks=(1000, 1000))
red_dask = da.random.random((10000, 10000), chunks=(1000, 1000))

# Wrap in XArray
nir_xr = xr.DataArray(nir_dask, dims=["y", "x"])
red_xr = xr.DataArray(red_dask, dims=["y", "x"])

# Compute NDVI (bypasses GIL, enables true parallelism)
ndvi_result = xr.apply_ufunc(
    ndvi,
    nir_xr,
    red_xr,
    dask="parallelized",
    output_dtypes=[float],
)

# Compute result
ndvi_computed = ndvi_result.compute()

Using map_blocks with Dask

import dask.array as da
from eo_processor import ndvi

nir_dask = da.random.random((5000, 5000), chunks=(500, 500))
red_dask = da.random.random((5000, 5000), chunks=(500, 500))

# Apply to blocks (each block processed independently)
ndvi_result = da.map_blocks(
    ndvi,
    nir_dask,
    red_dask,
    dtype=np.float64,
)

result = ndvi_result.compute()

Available Functions

Normalized Difference Functions

  • normalized_difference(a, b): Generic normalized difference (a - b) / (a + b)
  • normalized_difference_1d(a, b): 1D version
  • normalized_difference_2d(a, b): 2D version

Vegetation Indices

  • ndvi(nir, red): Normalized Difference Vegetation Index
  • ndvi_1d(nir, red): 1D version
  • ndvi_2d(nir, red): 2D version

Water Indices

  • ndwi(green, nir): Normalized Difference Water Index
  • ndwi_1d(green, nir): 1D version
  • ndwi_2d(green, nir): 2D version

Performance

The Rust implementation provides significant performance improvements over pure Python/NumPy, especially for large arrays:

import numpy as np
import time
from eo_processor import ndvi

# Large array
nir = np.random.rand(5000, 5000)
red = np.random.rand(5000, 5000)

# Rust implementation
start = time.time()
result_rust = ndvi(nir, red)
time_rust = time.time() - start

# NumPy implementation
start = time.time()
result_numpy = (nir - red) / (nir + red)
time_numpy = time.time() - start

print(f"Rust: {time_rust:.4f}s")
print(f"NumPy: {time_numpy:.4f}s")
print(f"Speedup: {time_numpy/time_rust:.2f}x")

Development

Building

# Development build
maturin develop

# Release build
maturin develop --release

Testing

# Run Rust tests
cargo test

# Run Python tests (if pytest is installed)
pytest

Running Examples

# Basic usage examples
python examples/basic_usage.py

# XArray/Dask examples (requires: pip install eo-processor[dask])
python examples/xarray_dask_usage.py

Why Rust + PyO3?

  1. Performance: Rust provides C-level performance with memory safety
  2. GIL-Free: Rust code releases the Python GIL, enabling true parallelism
  3. Type Safety: Compile-time guarantees reduce runtime errors
  4. Easy Integration: PyO3 makes it seamless to call Rust from Python
  5. Modern Tooling: Cargo and maturin provide excellent development experience

Use Cases

  • Processing large satellite imagery datasets (Sentinel, Landsat, etc.)
  • Real-time vegetation monitoring using NDVI
  • Water body detection using NDWI
  • Custom spectral indices computation
  • Distributed processing on Dask/Kubernetes clusters
  • Time-series analysis of Earth Observation data

License

MIT License - see LICENSE file for details

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

Citation

If you use this library in your research, please cite:

@software{eo_processor,
  title = {eo-processor: High-performance Rust UDFs for Earth Observation},
  author = {Ben},
  year = {2025},
  url = {https://github.com/BnJam/eo-processor}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eo_processor-0.0.0.tar.gz (96.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eo_processor-0.0.0-cp310-cp310-macosx_11_0_arm64.whl (201.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file eo_processor-0.0.0.tar.gz.

File metadata

  • Download URL: eo_processor-0.0.0.tar.gz
  • Upload date:
  • Size: 96.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.6

File hashes

Hashes for eo_processor-0.0.0.tar.gz
Algorithm Hash digest
SHA256 d16c4b8ddb8904904cecae19036f25f0620016c6043d6bbff6a1cd9c128328ea
MD5 7e458121f03e91d26ef3d7d7ea5c2cc2
BLAKE2b-256 7b8266296b16058a920afd1cc285718283edcf78c6bdfbf9d635f543597eaa7a

See more details on using hashes here.

File details

Details for the file eo_processor-0.0.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for eo_processor-0.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b6487a89f00043ccf0f117cad6d74dcf8d485c386fd2f8e4f66ecac03c1489ad
MD5 c0274037740df0e615ddf90724a27526
BLAKE2b-256 29899dfd334a877953ac2bfa19d29775b71f932e683a6045fc4dcee7326bbfb0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page