Skip to main content

Reflective's Unified SAI Data Catalog

Project description

Reflective Data Catalog

Reflective CI License: Apache-2.0 PyPI Last Commit

Reflective's unified Python interface for accessing SAI (Stratospheric Aerosol Injection) climate model data across cloud providers (S3, GCS, Azure, Cloudflare R2) for use on the Reflective Cloud Hub. Note: At this time, most non ESGF and ESM sources will not be accessible outside of the Reflective Cloud Hub due to technical limitations. We are working on storing the data in new locations and will update when it's ready.

This is an Intake-like interface for browsing, searching, and loading SRM related datasets in a unified manner. All available datasets can be seen here with more infomration in the Reflective Cloud Hub documentation. We've also included an example Jupyter Notebook showing how to use the tool.

Installation

pip install reflective-data-catalog

For development:

git clone https://github.com/ReflectiveCloud/reflective-data-catalog.git
cd reflective-data-catalog
pip install -e ".[dev]"
pre-commit install

This installs a pre-commit hook that automatically runs Ruff linting (with auto-fix) and formatting on every commit.

Quick Start

from reflective_data_catalog import ReflectiveCatalog

rdc = ReflectiveCatalog()

# Load a dataset lazily with dask
ds = rdc.cesm2_waccm_g6_1p5k_hilla(variable='T').to_dask()

# Load a dataset into memory
ds = rdc.miroc_es2h_g6_1p5k_sai(variable='SurfT').read()

Available Sources

Source Description
cesm2_waccm_g6_1p5k_hilla CESM2-WACCM G6-1.5K-HiLLA
cesm2_waccm_historical CESM2-WACCM Historical
cesm2_waccm_ssp245 CESM2-WACCM SSP2-4.5
cesm2_waccm6_g6_1p5k_hilla CESM2-WACCM6 G6-1.5K-HiLLA
e3smv3_g6_1p5k_hilla E3SMv3 G6-1.5K-HiLLA
miroc_es2h_g6_1p5k_hilla MIROC-ES2H G6-1.5K-HiLLA
miroc_es2h_g6_1p5k_sai MIROC-ES2H G6-1.5K-SAI
ukesm1_g6_1p5k_hilla UKESM1.1 G6-1.5K-HiLLA
ukesm1_ssp245 UKESM1.1 SSP2-4.5

Usage

Selecting Parameters

Each source accepts keyword arguments to select the table, variable, ensemble member, and other parameters:

# Specify variable, table, and ensemble
ds = rdc.cesm2_waccm_g6_1p5k_hilla(
    variable='T',
    table='AMON',
    ensemble='r2'
).to_dask()

# MIROC sources support a variant parameter
ds = rdc.miroc_es2h_g6_1p5k_hilla(
    variable='SurfT',
    variant='G6-1.5K-SAI',
    ensemble='r01'
).to_dask()

Discovering Available Data

Each source provides discovery methods to explore what data is available:

source = rdc.ukesm1_g6_1p5k_hilla()

# List available variables, ensembles, or tables
source.list_variables()
source.list_ensembles()
source.list_tables()

# Print a full summary
source.discover()

Google Cloud CMIP6 / GeoMIP (intake-esm)

Access cloud-optimized Zarr data from the Google Cloud CMIP6 catalog:

# Search and load in one step
datasets = rdc.esm.load(
    experiment_id=['G6sulfur', 'ssp245', 'ssp585'],
    variable_id='tas',
    table_id='Amon',
    require_all_on=['source_id', 'institution_id'],
)

# Or use the GeoMIP convenience helper
datasets = rdc.geomip_cloud.load_ensemble(
    experiments=['G6sulfur', 'ssp245', 'ssp585'],
    variable='tas',
)

# Quick single-experiment load
ds_dict = rdc.geomip_cloud.g6sulfur(variable='tas')

# Explore what's available
rdc.geomip_cloud.list_models()
rdc.geomip_cloud.list_variables(experiment_id='G6sulfur')
rdc.geomip_cloud.summary()

# Advanced: direct search then load
subset = rdc.esm.search(
    experiment_id='G6sulfur',
    variable_id=['tas', 'pr'],
    table_id='Amon',
)
datasets = subset.to_dataset_dict()

ESGF Data

The catalog also provides access to ESGF (Earth System Grid Federation) data:

ds = rdc.esgf.geomip.g6sulfur(model='UKESM1-0-LL', variable='tas')

Running Tests

Run the full test suite:

pytest

Run with coverage report:

pytest --cov=reflective_data_catalog --cov-report=term-missing

Run a specific test file:

pytest tests/test_flexible_sources.py

Tests mock all external services (S3, ESGF, intake-esm) so no network access or cloud credentials are required.

Requirements

  • Python >= 3.11
  • intake >= 2.0.0
  • intake-esm >= 2025.2.3
  • intake-esgf >= 2025.5.9
  • xarray >= 2025.01.0
  • obstore >= 0.8.0

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reflective_data_catalog-0.0.1.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reflective_data_catalog-0.0.1-py3-none-any.whl (42.4 kB view details)

Uploaded Python 3

File details

Details for the file reflective_data_catalog-0.0.1.tar.gz.

File metadata

  • Download URL: reflective_data_catalog-0.0.1.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reflective_data_catalog-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d7a4104f8dbe5a70af48aa7195dcbbc7ff057850571f1b04c964fa4f295c05f4
MD5 6d3a875f7e1dfdeda160e699690297a0
BLAKE2b-256 0b730231dd30e00f25b3db3a1da49c994493e2f4addf999f654fb4642790dbf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for reflective_data_catalog-0.0.1.tar.gz:

Publisher: build-and-push.yml on ReflectiveCloud/reflective-data-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reflective_data_catalog-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for reflective_data_catalog-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7af2b74ab091f0ade69533c120ebc0db72b053c1c9112e9ce7029df89e0f579a
MD5 a55911103698edee22a7108321f52880
BLAKE2b-256 6c909b3ad0a93a1505246e55419f4c50feb89895c20e5fbbebebce624c09d0b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for reflective_data_catalog-0.0.1-py3-none-any.whl:

Publisher: build-and-push.yml on ReflectiveCloud/reflective-data-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page