Skip to main content

Standalone Python core for defining and executing openEO processes locally with xarray, dask, and ML backends.

Project description

openeo-core

PyPI version Python License Tests STAC MLM openEO

A standalone Python library providing a fluent, Pythonic API for working with raster data cubes and vector cubes, implementing selected openEO processes locally using xarray and dask, with STAC MLM-compatible ML model objects.

Features

  • Fluent DataCube API — chain raster and vector operations in a readable pipeline
  • openEO process-aligned — function signatures match the openEO process specs
  • STAC MLM-compatible models — every model carries full STAC Machine Learning Model metadata
  • Multiple ML backends — scikit-learn, XGBoost, and PyTorch (TempCNN, LightTAE)
  • Flexible feature dimensions — control which cube dimensions become model features via dimension
  • Spatial indexing — accelerated vector operations with R-tree spatial index
  • Process Registry — discover and search bundled openEO process specifications

Installation

Install from GitHub

# With uv
uv pip install git+https://github.com/PondiB/openeo-core.git

# With pip
pip install git+https://github.com/PondiB/openeo-core.git

Optional extras (ML backends, dev):

# ML backends
uv pip install "openeo-core[ml-sklearn] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-xgboost] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-torch] @ git+https://github.com/PondiB/openeo-core.git"

# Everything
uv pip install "openeo-core[all] @ git+https://github.com/PondiB/openeo-core.git"

# Dev tools
pip install "openeo-core[dev] @ git+https://github.com/PondiB/openeo-core.git"

Install from source (development)

Clone the repository and sync dependencies:

git clone https://github.com/PondiB/openeo-core.git
cd openeo-core

# Core install (xarray, dask, geopandas, pystac-client, stackstac)
uv sync

# With ML backends
uv sync --extra ml-sklearn
uv sync --extra ml-xgboost
uv sync --extra ml-torch

# Everything including dev tools
uv sync --extra dev

Quick Start

Fluent DataCube API

from openeo_core import DataCube

# Load from Microsoft Planetary Computer (Sentinel-2)
cube = DataCube.load_collection(
    "sentinel-2-l2a",
    spatial_extent={"west": 10.0, "south": 50.0, "east": 11.0, "north": 51.0},
    temporal_extent=("2023-06-01", "2023-06-30"),
    bands=["red", "nir"],
)

# Fluent chaining
result = (
    cube
    .filter_bbox(west=10.2, south=50.2, east=10.8, north=50.8)
    .filter_temporal(extent=("2023-06-10", "2023-06-20"))
    .ndvi(nir="nir", red="red")
    .compute()
)

ML Models (openEO process-aligned, STAC MLM-compatible)

Model objects are STAC MLM-compatible and the API follows the openEO process specs exactly:

from openeo_core.model import (
    mlm_class_random_forest,
    mlm_regr_random_forest,
    mlm_class_xgboost,
    mlm_class_tempcnn,
    mlm_class_lighttae,
    ml_fit,
    ml_predict,
    save_ml_model,
    load_stac_ml,
)

# 1. Initialize (openEO: mlm_class_random_forest)
model = mlm_class_random_forest(
    max_variables="sqrt",
    num_trees=200,
    seed=42,
)

# 2. Train (openEO: ml_fit)
trained = ml_fit(model, training_gdf, target="label")

# 3. Predict (openEO: ml_predict)
predictions = ml_predict(raster_cube, trained)

# 4. Save with STAC Item (openEO: save_ml_model)
save_ml_model(trained, name="my_rf_model")

# 5. Load from STAC Item (openEO: load_stac_ml)
restored = load_stac_ml("my_rf_model/my_rf_model.stac.json")
predictions = ml_predict(new_raster, restored)

Feature dimensions

The dimension parameter controls which data cube dimensions are flattened into the feature vector for model training and prediction. It is set once at model initialisation and used automatically by ml_predict:

# Default: only the "bands" dimension becomes features
model = mlm_class_random_forest(dimension=["bands"])

# Use both spectral and temporal dimensions as features
model = mlm_class_random_forest(
    max_variables="sqrt",
    num_trees=200,
    dimension=["bands", "t"],
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)  # dimension handled automatically

Default values per model type:

Model Default dimension
Random Forest ["bands"]
XGBoost ["bands"]
TempCNN ["bands", "t"]
LightTAE ["bands", "t"]

XGBoost classification

model = mlm_class_xgboost(
    learning_rate=0.15,
    max_depth=5,
    min_child_weight=1,
    subsample=0.8,
    min_split_loss=1,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")

TempCNN classification (PyTorch)

model = mlm_class_tempcnn(
    epochs=100,
    batch_size=64,
    learning_rate=0.001,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)

LightTAE classification (PyTorch)

model = mlm_class_lighttae(
    epochs=150,
    batch_size=128,
    learning_rate=0.0005,
    seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)

STAC MLM metadata on model objects

Every model carries full STAC MLM metadata:

model = mlm_class_random_forest(max_variables="sqrt", num_trees=100)
props = model.to_stac_properties()
# {
#   "mlm:name": "Random Forest Classifier",
#   "mlm:architecture": "Random Forest",
#   "mlm:tasks": ["classification"],
#   "mlm:framework": "scikit-learn",
#   "mlm:hyperparameters": {"max_variables": "sqrt", "num_trees": 100, "seed": null},
#   "mlm:input": [...],
#   "mlm:output": [...],
#   ...
# }

stac_item = model.to_stac_item()
# Full STAC Feature with MLM extension

Convenience factory (backward-compatible)

from openeo_core.model import Model, ml_fit, ml_predict

model = Model.random_forest(task="classification", max_variables="sqrt", num_trees=200)
trained = ml_fit(model, gdf, target="label")
preds = ml_predict(raster, trained)

# PyTorch models
model = Model.tempcnn(epochs=50, batch_size=32)
model = Model.lighttae(epochs=100, learning_rate=0.001)

Process Registry

from openeo_core.processes import ProcessRegistry

registry = ProcessRegistry()
print(registry.list_processes())
ndvi_spec = registry.get_process("ndvi")
results = registry.search("vegetation")

Load from STAC / GeoJSON

cube = DataCube.load_stac(
    "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a",
    assets=["red", "nir"],
)

vector = DataCube.load_geojson({"type": "FeatureCollection", "features": [...]})

Vector cubes (GeoDataFrame and xvec)

Vector cubes can be GeoDataFrames or xarray DataArrays/Datasets with xvec geometry coordinates:

uv pip install "openeo-core[geo]"
import xarray as xr
from shapely.geometry import Point

# Create xvec-backed vector cube
da = xr.DataArray(
    [1.0, 2.0, 3.0],
    dims=["geom"],
    coords={"geom": [Point(10, 50), Point(10.5, 50.5), Point(11, 51)]},
).xvec.set_geom_indexes("geom", crs=4326)

cube = DataCube(da)
result = cube.filter_bbox(west=9, south=49, east=11, north=51)

Documentation

Architecture

openeo_core/
  __init__.py          # DataCube, type aliases
  datacube.py          # Fluent wrapper + dispatch
  types.py             # RasterCube/VectorCube/Cube aliases
  ops/
    raster.py          # xarray/dask raster operations
    vector.py          # geopandas, dask-geopandas, xvec vector operations
  io/
    collection.py      # load_collection (pystac-client + stackstac)
    stac.py            # load_stac (pystac + stackstac)
    geojson.py         # load_geojson (geopandas)
  model/
    __init__.py        # Public API exports
    mlm.py             # MLModel (STAC MLM-compatible object)
    base.py            # openEO process functions + Model factory
    sklearn.py         # scikit-learn estimator builder (internal)
    xgboost_backend.py # XGBoost estimator builder (internal)
    torch.py           # PyTorch wrapper (TempCNN, LightTAE)
    torch_models/      # PyTorch nn.Module implementations
      tempcnn.py       # TempCNN architecture
      lighttae.py      # LightTAE architecture
  processes/
    registry.py        # JSON spec registry
    resources/         # Packaged process JSON specs

openEO ML Process Mapping

openEO Process Python Function Description
mlm_class_random_forest mlm_class_random_forest() Init RF classifier
mlm_regr_random_forest mlm_regr_random_forest() Init RF regressor
mlm_class_xgboost mlm_class_xgboost() Init XGBoost classifier
mlm_class_tempcnn mlm_class_tempcnn() Init TempCNN classifier
mlm_class_lighttae mlm_class_lighttae() Init LightTAE classifier
ml_fit ml_fit(model, training_set, target) Train a model
ml_predict ml_predict(data, model) Predict with trained model
save_ml_model save_ml_model(data, name, options) Save model + STAC Item
load_stac_ml load_stac_ml(uri, ...) Load model from STAC Item

Examples

Notebook Description
01_ndvi.ipynb NDVI computation with the DataCube API
02_ml_random_forest.ipynb Random Forest classification pipeline
03_process_registry.ipynb Exploring the Process Registry
04_ml_tempcnn.ipynb TempCNN temporal classification with PyTorch

Running Tests

uv run pytest tests/ -v

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openeo_core-0.2.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openeo_core-0.2.0-py3-none-any.whl (102.8 kB view details)

Uploaded Python 3

File details

Details for the file openeo_core-0.2.0.tar.gz.

File metadata

  • Download URL: openeo_core-0.2.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openeo_core-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d7260f2a7ec078be81763dd3d7b8209cc7387985cdef28ea604325d0baa8afec
MD5 08e85b24deaf7967efc279fe2a149cbd
BLAKE2b-256 3f8f803186d084c16d7cee56fc6fe007d0d603c0adfc16033695e2200eb37726

See more details on using hashes here.

Provenance

The following attestation bundles were made for openeo_core-0.2.0.tar.gz:

Publisher: publish.yml on PondiB/openeo-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openeo_core-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: openeo_core-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 102.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for openeo_core-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e37b77e4aff417040a2b743629c41d1afd0533173ce0c7de10d6daf92f3b7e7f
MD5 c65aa3e29c6bfa9c05fdae168d86730c
BLAKE2b-256 b4ef09d6b02bbc15576549384d9eec82322eef11780f78e34afe94deecaed08d

See more details on using hashes here.

Provenance

The following attestation bundles were made for openeo_core-0.2.0-py3-none-any.whl:

Publisher: publish.yml on PondiB/openeo-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page