Standalone Python core for defining and executing openEO processes locally with xarray, dask, and ML backends.
Project description
openeo-core
A standalone Python library providing a fluent, Pythonic API for working with raster data cubes and vector cubes, implementing selected openEO processes locally using xarray and dask, with STAC MLM-compatible ML model objects.
Features
- Fluent DataCube API — chain raster and vector operations in a readable pipeline
- openEO process-aligned — function signatures match the openEO process specs
- STAC MLM-compatible models — every model carries full STAC Machine Learning Model metadata
- Multiple ML backends — scikit-learn, XGBoost, and PyTorch (TempCNN, LightTAE)
- Flexible feature dimensions — control which cube dimensions become model features via
dimension - Spatial indexing — accelerated vector operations with R-tree spatial index
- Process Registry — discover and search bundled openEO process specifications
Installation
Install from GitHub
# With uv
uv pip install git+https://github.com/PondiB/openeo-core.git
# With pip
pip install git+https://github.com/PondiB/openeo-core.git
Optional extras (ML backends, dev):
# ML backends
uv pip install "openeo-core[ml-sklearn] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-xgboost] @ git+https://github.com/PondiB/openeo-core.git"
uv pip install "openeo-core[ml-torch] @ git+https://github.com/PondiB/openeo-core.git"
# Everything
uv pip install "openeo-core[all] @ git+https://github.com/PondiB/openeo-core.git"
# Dev tools
pip install "openeo-core[dev] @ git+https://github.com/PondiB/openeo-core.git"
Install from source (development)
Clone the repository and sync dependencies:
git clone https://github.com/PondiB/openeo-core.git
cd openeo-core
# Core install (xarray, dask, geopandas, pystac-client, stackstac)
uv sync
# With ML backends
uv sync --extra ml-sklearn
uv sync --extra ml-xgboost
uv sync --extra ml-torch
# Everything including dev tools
uv sync --extra dev
Quick Start
Fluent DataCube API
from openeo_core import DataCube
# Load from Microsoft Planetary Computer (Sentinel-2)
cube = DataCube.load_collection(
"sentinel-2-l2a",
spatial_extent={"west": 10.0, "south": 50.0, "east": 11.0, "north": 51.0},
temporal_extent=("2023-06-01", "2023-06-30"),
bands=["red", "nir"],
)
# Fluent chaining
result = (
cube
.filter_bbox(west=10.2, south=50.2, east=10.8, north=50.8)
.filter_temporal(extent=("2023-06-10", "2023-06-20"))
.ndvi(nir="nir", red="red")
.compute()
)
ML Models (openEO process-aligned, STAC MLM-compatible)
Model objects are STAC MLM-compatible and the API follows the openEO process specs exactly:
from openeo_core.model import (
mlm_class_random_forest,
mlm_regr_random_forest,
mlm_class_xgboost,
mlm_class_tempcnn,
mlm_class_lighttae,
ml_fit,
ml_predict,
save_ml_model,
load_stac_ml,
)
# 1. Initialize (openEO: mlm_class_random_forest)
model = mlm_class_random_forest(
max_variables="sqrt",
num_trees=200,
seed=42,
)
# 2. Train (openEO: ml_fit)
trained = ml_fit(model, training_gdf, target="label")
# 3. Predict (openEO: ml_predict)
predictions = ml_predict(raster_cube, trained)
# 4. Save with STAC Item (openEO: save_ml_model)
save_ml_model(trained, name="my_rf_model")
# 5. Load from STAC Item (openEO: load_stac_ml)
restored = load_stac_ml("my_rf_model/my_rf_model.stac.json")
predictions = ml_predict(new_raster, restored)
Feature dimensions
The dimension parameter controls which data cube dimensions are flattened
into the feature vector for model training and prediction. It is set once at
model initialisation and used automatically by ml_predict:
# Default: only the "bands" dimension becomes features
model = mlm_class_random_forest(dimension=["bands"])
# Use both spectral and temporal dimensions as features
model = mlm_class_random_forest(
max_variables="sqrt",
num_trees=200,
dimension=["bands", "t"],
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained) # dimension handled automatically
Default values per model type:
| Model | Default dimension |
|---|---|
| Random Forest | ["bands"] |
| XGBoost | ["bands"] |
| TempCNN | ["bands", "t"] |
| LightTAE | ["bands", "t"] |
XGBoost classification
model = mlm_class_xgboost(
learning_rate=0.15,
max_depth=5,
min_child_weight=1,
subsample=0.8,
min_split_loss=1,
seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
TempCNN classification (PyTorch)
model = mlm_class_tempcnn(
epochs=100,
batch_size=64,
learning_rate=0.001,
seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)
LightTAE classification (PyTorch)
model = mlm_class_lighttae(
epochs=150,
batch_size=128,
learning_rate=0.0005,
seed=42,
)
trained = ml_fit(model, training_gdf, target="label")
predictions = ml_predict(raster_cube, trained)
STAC MLM metadata on model objects
Every model carries full STAC MLM metadata:
model = mlm_class_random_forest(max_variables="sqrt", num_trees=100)
props = model.to_stac_properties()
# {
# "mlm:name": "Random Forest Classifier",
# "mlm:architecture": "Random Forest",
# "mlm:tasks": ["classification"],
# "mlm:framework": "scikit-learn",
# "mlm:hyperparameters": {"max_variables": "sqrt", "num_trees": 100, "seed": null},
# "mlm:input": [...],
# "mlm:output": [...],
# ...
# }
stac_item = model.to_stac_item()
# Full STAC Feature with MLM extension
Convenience factory (backward-compatible)
from openeo_core.model import Model, ml_fit, ml_predict
model = Model.random_forest(task="classification", max_variables="sqrt", num_trees=200)
trained = ml_fit(model, gdf, target="label")
preds = ml_predict(raster, trained)
# PyTorch models
model = Model.tempcnn(epochs=50, batch_size=32)
model = Model.lighttae(epochs=100, learning_rate=0.001)
Process Registry
from openeo_core.processes import ProcessRegistry
registry = ProcessRegistry()
print(registry.list_processes())
ndvi_spec = registry.get_process("ndvi")
results = registry.search("vegetation")
Load from STAC / GeoJSON
cube = DataCube.load_stac(
"https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a",
assets=["red", "nir"],
)
vector = DataCube.load_geojson({"type": "FeatureCollection", "features": [...]})
Vector cubes (GeoDataFrame and xvec)
Vector cubes can be GeoDataFrames or xarray DataArrays/Datasets with xvec geometry coordinates:
uv pip install "openeo-core[geo]"
import xarray as xr
from shapely.geometry import Point
# Create xvec-backed vector cube
da = xr.DataArray(
[1.0, 2.0, 3.0],
dims=["geom"],
coords={"geom": [Point(10, 50), Point(10.5, 50.5), Point(11, 51)]},
).xvec.set_geom_indexes("geom", crs=4326)
cube = DataCube(da)
result = cube.filter_bbox(west=9, south=49, east=11, north=51)
Documentation
- docs/index.md — Documentation index
- docs/architecture.md — Software structure, design, and component overview
Architecture
openeo_core/
__init__.py # DataCube, type aliases
datacube.py # Fluent wrapper + dispatch
types.py # RasterCube/VectorCube/Cube aliases
ops/
raster.py # xarray/dask raster operations
vector.py # geopandas, dask-geopandas, xvec vector operations
io/
collection.py # load_collection (pystac-client + stackstac)
stac.py # load_stac (pystac + stackstac)
geojson.py # load_geojson (geopandas)
model/
__init__.py # Public API exports
mlm.py # MLModel (STAC MLM-compatible object)
base.py # openEO process functions + Model factory
sklearn.py # scikit-learn estimator builder (internal)
xgboost_backend.py # XGBoost estimator builder (internal)
torch.py # PyTorch wrapper (TempCNN, LightTAE)
torch_models/ # PyTorch nn.Module implementations
tempcnn.py # TempCNN architecture
lighttae.py # LightTAE architecture
processes/
registry.py # JSON spec registry
resources/ # Packaged process JSON specs
openEO ML Process Mapping
| openEO Process | Python Function | Description |
|---|---|---|
mlm_class_random_forest |
mlm_class_random_forest() |
Init RF classifier |
mlm_regr_random_forest |
mlm_regr_random_forest() |
Init RF regressor |
mlm_class_xgboost |
mlm_class_xgboost() |
Init XGBoost classifier |
mlm_class_tempcnn |
mlm_class_tempcnn() |
Init TempCNN classifier |
mlm_class_lighttae |
mlm_class_lighttae() |
Init LightTAE classifier |
ml_fit |
ml_fit(model, training_set, target) |
Train a model |
ml_predict |
ml_predict(data, model) |
Predict with trained model |
save_ml_model |
save_ml_model(data, name, options) |
Save model + STAC Item |
load_stac_ml |
load_stac_ml(uri, ...) |
Load model from STAC Item |
Examples
| Notebook | Description |
|---|---|
| 01_ndvi.ipynb | NDVI computation with the DataCube API |
| 02_ml_random_forest.ipynb | Random Forest classification pipeline |
| 03_process_registry.ipynb | Exploring the Process Registry |
| 04_ml_tempcnn.ipynb | TempCNN temporal classification with PyTorch |
Running Tests
uv run pytest tests/ -v
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openeo_core-0.2.0.tar.gz.
File metadata
- Download URL: openeo_core-0.2.0.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7260f2a7ec078be81763dd3d7b8209cc7387985cdef28ea604325d0baa8afec
|
|
| MD5 |
08e85b24deaf7967efc279fe2a149cbd
|
|
| BLAKE2b-256 |
3f8f803186d084c16d7cee56fc6fe007d0d603c0adfc16033695e2200eb37726
|
Provenance
The following attestation bundles were made for openeo_core-0.2.0.tar.gz:
Publisher:
publish.yml on PondiB/openeo-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openeo_core-0.2.0.tar.gz -
Subject digest:
d7260f2a7ec078be81763dd3d7b8209cc7387985cdef28ea604325d0baa8afec - Sigstore transparency entry: 1342018007
- Sigstore integration time:
-
Permalink:
PondiB/openeo-core@0ad1cc3885e34d470a3d803550f4b205c2f5dc72 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/PondiB
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0ad1cc3885e34d470a3d803550f4b205c2f5dc72 -
Trigger Event:
release
-
Statement type:
File details
Details for the file openeo_core-0.2.0-py3-none-any.whl.
File metadata
- Download URL: openeo_core-0.2.0-py3-none-any.whl
- Upload date:
- Size: 102.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e37b77e4aff417040a2b743629c41d1afd0533173ce0c7de10d6daf92f3b7e7f
|
|
| MD5 |
c65aa3e29c6bfa9c05fdae168d86730c
|
|
| BLAKE2b-256 |
b4ef09d6b02bbc15576549384d9eec82322eef11780f78e34afe94deecaed08d
|
Provenance
The following attestation bundles were made for openeo_core-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on PondiB/openeo-core
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openeo_core-0.2.0-py3-none-any.whl -
Subject digest:
e37b77e4aff417040a2b743629c41d1afd0533173ce0c7de10d6daf92f3b7e7f - Sigstore transparency entry: 1342018050
- Sigstore integration time:
-
Permalink:
PondiB/openeo-core@0ad1cc3885e34d470a3d803550f4b205c2f5dc72 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/PondiB
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0ad1cc3885e34d470a3d803550f4b205c2f5dc72 -
Trigger Event:
release
-
Statement type: