Skip to main content

OceanDataStore is an open-source Python library for creating, publishing, discovering, and accessing cloud-native ocean datasets.

Project description

National Oceanography Centre logo    OceanDataStore logo

OceanDataStore

Xarray Powered by Pixi Tests Docs

OceanDataStore is an open-source Python library for creating, publishing, discovering, and accessing cloud-native ocean datasets.

OceanDataStore enables ocean modelling and observational communities to work with Analysis-Ready, Cloud-Optimised (ARCO) datasets stored in object storage, including a:

  • Command Line Interface (CLI) to convert traditional ocean datasets into scalable cloud formats such as Zarr stores and Icechunk repositories.

  • Intuitive OceanDataCatalog API to discover and access datasets using Spatio-Temporal Asset Catalog (STAC) metadata.

Why OceanDataStore?

Traditional ocean datasets are often distributed as collections of thousands of NetCDF files stored on HPC systems or remote archives. Accessing these datasets can require substantial data transfers, complex file management, and bespoke workflows.

OceanDataStore adopts a cloud-native approach where datasets are stored in ARCO formats and described through a searchable STAC catalogue.

This enables users to:

  • Access only the variables, time periods, and spatial domains required for analysis.
  • Open datasets directly as xarray.Dataset objects without downloading complete archives.
  • Work seamlessly with the scientific Python ecosystem, including xarray, dask, and zarr.
  • Build scalable, reproducible workflows for ocean science.

Key Features

🌊 Discover and access ocean datasets with OceanDataCatalog

  • Search STAC catalogs for available ocean model and observational datasets.
  • Explore dataset metadata and available variables.
  • Open cloud-hosted datasets directly as lazy xarray.Dataset objects.

☁️ Create and publish ARCO ocean datasets

  • Convert collections of NetCDF files into cloud-native Zarr datasets.
  • Write directly to S3-compatible object storage.
  • Use Dask for parallel processing of large simulations and observational products.

🔄 Support reproducible ocean data workflows

  • Integrate ocean model output and observations through a common access interface.
  • Develop scalable model validation workflows.
  • Facilitate FAIR data practices for ocean science.

Installation

We recommend installing OceanDataStore within a dedicated Python environment using venv, conda, or mamba. Install the latest development version directly from GitHub:

pip install git+https://github.com/NOC-MSM/OceanDataStore.git

Quick Start

1. Create and Publish an ARCO Dataset

OceanDataStore provides command-line tools for converting collections of NetCDF files into Zarr datasets stored in S3-compatible object storage.

For example, a large ocean model simulation can be converted into a cloud-native dataset using:

ods send_to_zarr \
    -f /path/to/files*.nc \
    -c credentials.json \
    -b my_bucket \
    -p my_ocean_model \
    -cs '{"x": 2160, "y": 1803}' \
    -dc dask_config.json \
    -zv 3

More complete publishing workflows and examples are available in the examples directory and documentation.

2. Discover and Access Ocean Datasets

OceanDataCatalog provides a Python interface for searching, exploring, and opening datasets described by STAC metadata.

from oceandatastore import OceanDataCatalog

# Connect to the NOC STAC catalog:
catalog = OceanDataCatalog(catalog_name="noc-stac")

# Search the catalog:
catalog.search(
    collection="noc-npd-era5"
)

# Open a dataset directly as an xarray.Dataset:
ds = catalog.open_dataset(
    id="noc-npd-era5/npd-eorca1-era5v1/r1i1c1f1/gn/T1m",
    variable_names=["tos_con"],
    start_datetime="2004-01",
    end_datetime="2008-12",
)

Since datasets are opened lazily using xarray and dask, analyses can scale from a laptop to HPC and cloud environments.

Scientific Use Cases

OceanDataStore supports a broad range of ocean science workflows, including:

Ocean Model Validation

  • Compare ocean simulations against observational products using a common data access pattern.

  • Build reproducible evaluation workflows across multiple models and experiments.

Cloud-Native Model Archives

  • Publish large-scale ocean simulations as FAIR, discoverable datasets without sharing raw file archives.

Ocean Observations

  • Access observational products alongside model output through a single catalog interface.

Documentation

Documentation, examples, and API references are available here

Contributing

OceanDataStore is under active development and we welcome feedback and contributions from the ocean modelling, observational, and wider marine data communities.

Funding

The ongoing development of OceanDataStore is funded by the following projects:

Contact

Ollie Tooth (oliver.tooth@noc.ac.uk)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oceandatastore-0.3.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oceandatastore-0.3.0-py3-none-any.whl (198.9 kB view details)

Uploaded Python 3

File details

Details for the file oceandatastore-0.3.0.tar.gz.

File metadata

  • Download URL: oceandatastore-0.3.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for oceandatastore-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fd5309e5a1e9110cbb951151fd28140b8b5fdc86da2fc73cb29c1c6b5a2c7290
MD5 e6aa96f3920bb2cf051473a4d7857714
BLAKE2b-256 0e946b6e11794baa5643c820e4e532e4c96e96f2dc98d8c361d9141d6024b83f

See more details on using hashes here.

File details

Details for the file oceandatastore-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: oceandatastore-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 198.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for oceandatastore-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 687db5f8f76569011827b6e60f85ce0f893ce9ed2732749fd1728e3d11b87656
MD5 e361d2e974cd00ed026bca105c11a99b
BLAKE2b-256 2314e2f3e8e54eb258a07faffb68de4eaedd5ac77f8d8f5af2bfbf32b86497e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page