Skip to main content

Standalone Python core for defining and executing openEO processes locally with xarray, dask, and ML backends.

Project description

openeo-core

PyPI version Python License Tests STAC MLM openEO

A standalone Python library providing a fluent, Pythonic API for working with raster data cubes and vector cubes, implementing selected openEO processes locally using xarray and dask, with STAC MLM-compatible ML model objects.

Features

  • Fluent DataCube API — chain raster and vector operations in a readable pipeline
  • openEO process-aligned — function signatures match the openEO process specs
  • STAC MLM-compatible models — every model carries full STAC Machine Learning Model metadata
  • Multiple ML backends — scikit-learn, XGBoost, and PyTorch (TempCNN, LightTAE)
  • Flexible feature dimensions — control which cube dimensions become model features via dimension
  • Spatial indexing — accelerated vector operations with R-tree spatial index
  • Process Registry — discover and search bundled openEO process specifications

Installation

Install from GitHub

# With uv
uv pip install git+https://github.com/PondiB/openeo-core.git

# With pip
pip install git+https://github.com/PondiB/openeo-core.git

Optional extras (ML backends, dev):

# ML backends
uv pip install "openeo-core[ml-sklearn] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-xgboost] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-torch] @ git+https://github.com/PondiB/openeo-core.git"

# Everything
uv pip install "openeo-core[all] @ git+https://github.com/PondiB/openeo-core.git"

# Dev tools
pip install "openeo-core[dev] @ git+https://github.com/PondiB/openeo-core.git"

Install from source (development)

Clone the repository and sync dependencies:

git clone https://github.com/PondiB/openeo-core.git
cd openeo-core

# Core install (xarray, dask, geopandas, pystac-client, stackstac)
uv sync

# With ML backends
uv sync --extra ml-sklearn
uv sync --extra ml-xgboost
uv sync --extra ml-torch

# Everything including dev tools
uv sync --extra dev

Quick Start

Fluent DataCube API

from openeo_core import DataCube

# Load from Microsoft Planetary Computer (Sentinel-2)
cube = DataCube.load_collection(
    "sentinel-2-l2a",
    spatial_extent={"west": 10.0, "south": 50.0, "east": 11.0, "north": 51.0},
    temporal_extent=("2023-06-01", "2023-06-30"),
    bands=["red", "nir"],
)

# Fluent chaining
result = (
    cube
    .filter_bbox(west=10.2, south=50.2, east=10.8, north=50.8)
    .filter_temporal(extent=("2023-06-10", "2023-06-20"))
    .ndvi(nir="nir", red="red")
    .compute()
)

ML Models (openEO process-aligned, STAC MLM-compatible)

Model objects are STAC MLM-compatible and the API follows the openEO process specs exactly:

from openeo_core.model import (
    mlm_class_random_forest,
    mlm_regr_random_forest,
    mlm_class_xgboost,
    mlm_class_tempcnn,
    mlm_class_lighttae,
    ml_fit,
    ml_predict,
    save_ml_model,
    load_stac_ml,
)

# 1. Initialize (openEO: mlm_class_random_forest)
model = mlm_class_random_forest(
    max_variables="sqrt",
    num_trees=200,
    seed=42,
)

# 2. Train (openEO: ml_fit)
trained = ml_fit(model, training_gdf, target="label")

# 3. Predict (openEO: ml_predict)
predictions = ml_predict(raster_cube, trained)

# 4. Save with STAC Item (openEO: save_ml_model)
save_ml_model(trained, name="my_rf_model")

# 5. Load from STAC Item (openEO: load_stac_ml)
restored = load_stac_ml("my_rf_model/my_rf_model.stac.json")
predictions = ml_predict(new_raster, restored)

Feature dimensions

The dimension parameter controls which data cube dimensions are flattened into the feature vector for model training and prediction. It is set once at model initialisation and used automatically by ml_predict:

# Default: only the "bands" dimension becomes features
model = mlm_class_random_forest(dimension=["bands"])

# Use both spectral and temporal dimensions as features
model = mlm_class_random_forest(
    max_variables="sqrt",
    num_trees=200,
    dimension=["bands", "t"],
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)  # dimension handled automatically

Default values per model type:

Model Default dimension
Random Forest ["bands"]
XGBoost ["bands"]
TempCNN ["bands", "t"]
LightTAE ["bands", "t"]

XGBoost classification

model = mlm_class_xgboost(
    learning_rate=0.15,
    max_depth=5,
    min_child_weight=1,
    subsample=0.8,
    min_split_loss=1,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")

TempCNN classification (PyTorch)

model = mlm_class_tempcnn(
    epochs=100,
    batch_size=64,
    learning_rate=0.001,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)

LightTAE classification (PyTorch)

model = mlm_class_lighttae(
    epochs=150,
    batch_size=128,
    learning_rate=0.0005,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)

STAC MLM metadata on model objects

Every model carries full STAC MLM metadata:

model = mlm_class_random_forest(max_variables="sqrt", num_trees=100)
props = model.to_stac_properties()
# {
#   "mlm:name": "Random Forest Classifier",
#   "mlm:architecture": "Random Forest",
#   "mlm:tasks": ["classification"],
#   "mlm:framework": "scikit-learn",
#   "mlm:hyperparameters": {"max_variables": "sqrt", "num_trees": 100, "seed": null},
#   "mlm:input": [...],
#   "mlm:output": [...],
#   ...
# }

stac_item = model.to_stac_item()
# Full STAC Feature with MLM extension

Convenience factory (backward-compatible)

from openeo_core.model import Model, ml_fit, ml_predict

model = Model.random_forest(task="classification", max_variables="sqrt", num_trees=200)
trained = ml_fit(model, gdf, target="label")
preds = ml_predict(raster, trained)

# PyTorch models
model = Model.tempcnn(epochs=50, batch_size=32)
model = Model.lighttae(epochs=100, learning_rate=0.001)

Process Registry

from openeo_core.processes import ProcessRegistry

registry = ProcessRegistry()
print(registry.list_processes())
ndvi_spec = registry.get_process("ndvi")
results = registry.search("vegetation")

Load from STAC / GeoJSON

cube = DataCube.load_stac(
    "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a",
    assets=["red", "nir"],
)

vector = DataCube.load_geojson({"type": "FeatureCollection", "features": [...]})

Vector cubes (GeoDataFrame and xvec)

Vector cubes can be GeoDataFrames or xarray DataArrays/Datasets with xvec geometry coordinates:

uv pip install "openeo-core[geo]"
import xarray as xr
from shapely.geometry import Point

# Create xvec-backed vector cube
da = xr.DataArray(
    [1.0, 2.0, 3.0],
    dims=["geom"],
    coords={"geom": [Point(10, 50), Point(10.5, 50.5), Point(11, 51)]},
).xvec.set_geom_indexes("geom", crs=4326)

cube = DataCube(da)
result = cube.filter_bbox(west=9, south=49, east=11, north=51)

Documentation

Architecture

openeo_core/
  __init__.py          # DataCube, type aliases
  datacube.py          # Fluent wrapper + dispatch
  types.py             # RasterCube/VectorCube/Cube aliases
  ops/
    raster.py          # xarray/dask raster operations
    vector.py          # geopandas, dask-geopandas, xvec vector operations
  io/
    collection.py      # load_collection (pystac-client + stackstac)
    stac.py            # load_stac (pystac + stackstac)
    geojson.py         # load_geojson (geopandas)
  model/
    __init__.py        # Public API exports
    mlm.py             # MLModel (STAC MLM-compatible object)
    base.py            # openEO process functions + Model factory
    sklearn.py         # scikit-learn estimator builder (internal)
    xgboost_backend.py # XGBoost estimator builder (internal)
    torch.py           # PyTorch wrapper (TempCNN, LightTAE)
    torch_models/      # PyTorch nn.Module implementations
      tempcnn.py       # TempCNN architecture
      lighttae.py      # LightTAE architecture
  processes/
    registry.py        # JSON spec registry
    resources/         # Packaged process JSON specs

openEO ML Process Mapping

openEO Process Python Function Description
mlm_class_random_forest mlm_class_random_forest() Init RF classifier
mlm_regr_random_forest mlm_regr_random_forest() Init RF regressor
mlm_class_xgboost mlm_class_xgboost() Init XGBoost classifier
mlm_class_tempcnn mlm_class_tempcnn() Init TempCNN classifier
mlm_class_lighttae mlm_class_lighttae() Init LightTAE classifier
ml_fit ml_fit(model, training_set, target) Train a model
ml_predict ml_predict(data, model) Predict with trained model
save_ml_model save_ml_model(data, name, options) Save model + STAC Item
load_stac_ml load_stac_ml(uri, ...) Load model from STAC Item

Examples

Notebook Description
01_ndvi.ipynb NDVI computation with the DataCube API
02_ml_random_forest.ipynb Random Forest classification pipeline
03_process_registry.ipynb Exploring the Process Registry
04_ml_tempcnn.ipynb TempCNN temporal classification with PyTorch

Running Tests

uv run pytest tests/ -v

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openeo_core-0.3.0.tar.gz (3.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openeo_core-0.3.0-py3-none-any.whl (113.6 kB view details)

Uploaded Python 3

File details

Details for the file openeo_core-0.3.0.tar.gz.

File metadata

  • Download URL: openeo_core-0.3.0.tar.gz
  • Upload date:
  • Size: 3.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openeo_core-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e25e86a33ffa99183e12db0b7d78b7a3785fa2e041e42a696eb28c4862e00ad1
MD5 6286f7979dbb967a0ada54a4ef9b5d6a
BLAKE2b-256 0cd0756a201da39e47688247b7a0e1bb920f7f0f4deb2368af79aa49109a5363

See more details on using hashes here.

Provenance

The following attestation bundles were made for openeo_core-0.3.0.tar.gz:

Publisher: publish.yml on PondiB/openeo-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openeo_core-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: openeo_core-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 113.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openeo_core-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5028dcfd62af419c7878069dc53b4fb8c383ae47e20ccc6fab82327622cd2b6a
MD5 641bc3b1663a4e8ad9bde63ba65ee8b6
BLAKE2b-256 2e80e61e7769ab93e9dc241d8e192264a4141dd830972933f5850b972e5f884e

See more details on using hashes here.

Provenance

The following attestation bundles were made for openeo_core-0.3.0-py3-none-any.whl:

Publisher: publish.yml on PondiB/openeo-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page