OceanDataStore is an open-source Python library for creating, publishing, discovering, and accessing cloud-native ocean datasets.
Project description
OceanDataStore
OceanDataStore is an open-source Python library for creating, publishing, discovering, and accessing cloud-native ocean datasets.
OceanDataStore enables ocean modelling and observational communities to work with Analysis-Ready, Cloud-Optimised (ARCO) datasets stored in object storage, including a:
-
Command Line Interface (CLI) to convert traditional ocean datasets into scalable cloud formats such as Zarr stores and Icechunk repositories.
-
Intuitive OceanDataCatalog API to discover and access datasets using Spatio-Temporal Asset Catalog (STAC) metadata.
Why OceanDataStore?
Traditional ocean datasets are often distributed as collections of thousands of NetCDF files stored on HPC systems or remote archives. Accessing these datasets can require substantial data transfers, complex file management, and bespoke workflows.
OceanDataStore adopts a cloud-native approach where datasets are stored in ARCO formats and described through a searchable STAC catalogue.
This enables users to:
- Access only the variables, time periods, and spatial domains required for analysis.
- Open datasets directly as
xarray.Datasetobjects without downloading complete archives. - Work seamlessly with the scientific Python ecosystem, including xarray, dask, and zarr.
- Build scalable, reproducible workflows for ocean science.
Key Features
🌊 Discover and access ocean datasets with OceanDataCatalog
- Search STAC catalogs for available ocean model and observational datasets.
- Explore dataset metadata and available variables.
- Open cloud-hosted datasets directly as lazy
xarray.Datasetobjects.
☁️ Create and publish ARCO ocean datasets
- Convert collections of NetCDF files into cloud-native Zarr datasets.
- Write directly to S3-compatible object storage.
- Use Dask for parallel processing of large simulations and observational products.
🔄 Support reproducible ocean data workflows
- Integrate ocean model output and observations through a common access interface.
- Develop scalable model validation workflows.
- Facilitate FAIR data practices for ocean science.
Installation
We recommend installing OceanDataStore within a dedicated Python environment using venv, conda, or mamba. Install the latest development version directly from GitHub:
pip install git+https://github.com/NOC-MSM/OceanDataStore.git
Quick Start
1. Create and Publish an ARCO Dataset
OceanDataStore provides command-line tools for converting collections of NetCDF files into Zarr datasets stored in S3-compatible object storage.
For example, a large ocean model simulation can be converted into a cloud-native dataset using:
ods send_to_zarr \
-f /path/to/files*.nc \
-c credentials.json \
-b my_bucket \
-p my_ocean_model \
-cs '{"x": 2160, "y": 1803}' \
-dc dask_config.json \
-zv 3
More complete publishing workflows and examples are available in the examples directory and documentation.
2. Discover and Access Ocean Datasets
OceanDataCatalog provides a Python interface for searching, exploring, and opening datasets described by STAC metadata.
from oceandatastore import OceanDataCatalog
# Connect to the NOC STAC catalog:
catalog = OceanDataCatalog(catalog_name="noc-stac")
# Search the catalog:
catalog.search(
collection="noc-npd-era5"
)
# Open a dataset directly as an xarray.Dataset:
ds = catalog.open_dataset(
id="noc-npd-era5/npd-eorca1-era5v1/r1i1c1f1/gn/T1m",
variable_names=["tos_con"],
start_datetime="2004-01",
end_datetime="2008-12",
)
Since datasets are opened lazily using xarray and dask, analyses can scale from a laptop to HPC and cloud environments.
Scientific Use Cases
OceanDataStore supports a broad range of ocean science workflows, including:
Ocean Model Validation
-
Compare ocean simulations against observational products using a common data access pattern.
-
Build reproducible evaluation workflows across multiple models and experiments.
Cloud-Native Model Archives
- Publish large-scale ocean simulations as FAIR, discoverable datasets without sharing raw file archives.
Ocean Observations
- Access observational products alongside model output through a single catalog interface.
Documentation
Documentation, examples, and API references are available here
Contributing
OceanDataStore is under active development and we welcome feedback and contributions from the ocean modelling, observational, and wider marine data communities.
Funding
The ongoing development of OceanDataStore is funded by the following projects:
Contact
Ollie Tooth (oliver.tooth@noc.ac.uk)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oceandatastore-0.3.0.tar.gz.
File metadata
- Download URL: oceandatastore-0.3.0.tar.gz
- Upload date:
- Size: 6.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd5309e5a1e9110cbb951151fd28140b8b5fdc86da2fc73cb29c1c6b5a2c7290
|
|
| MD5 |
e6aa96f3920bb2cf051473a4d7857714
|
|
| BLAKE2b-256 |
0e946b6e11794baa5643c820e4e532e4c96e96f2dc98d8c361d9141d6024b83f
|
File details
Details for the file oceandatastore-0.3.0-py3-none-any.whl.
File metadata
- Download URL: oceandatastore-0.3.0-py3-none-any.whl
- Upload date:
- Size: 198.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
687db5f8f76569011827b6e60f85ce0f893ce9ed2732749fd1728e3d11b87656
|
|
| MD5 |
e361d2e974cd00ed026bca105c11a99b
|
|
| BLAKE2b-256 |
2314e2f3e8e54eb258a07faffb68de4eaedd5ac77f8d8f5af2bfbf32b86497e6
|