Skip to main content

Reflective's Unified SAI Data Catalog

Project description

Reflective Data Catalog

Reflective CI License: Apache-2.0 PyPI Last Commit

Reflective's unified Python interface for accessing SAI (Stratospheric Aerosol Injection) climate model data across cloud providers (S3, GCS, Azure, Cloudflare R2) for use on the Reflective Cloud Hub. Note: At this time, most non ESGF and ESM sources will not be accessible outside of the Reflective Cloud Hub due to technical limitations. We are working on storing the data in new locations and will update when it's ready.

This is an Intake-like interface for browsing, searching, and loading SRM related datasets in a unified manner. All available datasets can be seen here with more infomration in the Reflective Cloud Hub documentation. We've also included an example Jupyter Notebook showing how to use the tool.

Installation

pip install reflective-data-catalog

For development:

git clone https://github.com/ReflectiveCloud/reflective-data-catalog.git
cd reflective-data-catalog
pip install -e ".[dev]"
pre-commit install

This installs a pre-commit hook that automatically runs Ruff linting (with auto-fix) and formatting on every commit.

Quick Start

from reflective_data_catalog import ReflectiveCatalog

rdc = ReflectiveCatalog()

# Load a dataset lazily with dask
ds = rdc.cesm2_waccm_g6_1p5k_hilla(variable='T').to_dask()

# Load a dataset into memory
ds = rdc.miroc_es2h_g6_1p5k_sai(variable='SurfT').read()

Available Sources

Source Description
cesm2_waccm_g6_1p5k_hilla CESM2-WACCM G6-1.5K-HiLLA
cesm2_waccm_historical CESM2-WACCM Historical
cesm2_waccm_ssp245 CESM2-WACCM SSP2-4.5
cesm2_waccm6_g6_1p5k_hilla CESM2-WACCM6 G6-1.5K-HiLLA
e3smv3_g6_1p5k_hilla E3SMv3 G6-1.5K-HiLLA
miroc_es2h_g6_1p5k_hilla MIROC-ES2H G6-1.5K-HiLLA
miroc_es2h_g6_1p5k_sai MIROC-ES2H G6-1.5K-SAI
ukesm1_g6_1p5k_hilla UKESM1.1 G6-1.5K-HiLLA
ukesm1_ssp245 UKESM1.1 SSP2-4.5

Usage

Selecting Parameters

Each source accepts keyword arguments to select the table, variable, ensemble member, and other parameters:

# Specify variable, table, and ensemble
ds = rdc.cesm2_waccm_g6_1p5k_hilla(
    variable='T',
    table='AMON',
    ensemble='r2'
).to_dask()

# MIROC: HiLLA vs SAI are different experiment prefixes in storage — use the matching source
ds = rdc.miroc_es2h_g6_1p5k_hilla(
    table='Amon',
    variable='tas',
    variant='baseline',
    ensemble='r01',
).to_dask()
ds = rdc.miroc_es2h_g6_1p5k_sai(
    table='Mon',
    variable='SurfT',
    ensemble='r01',
).to_dask()  # default variant is G6-1.5K-SAI

Discovering Available Data

Each source provides discovery methods to explore what data is available:

source = rdc.ukesm1_g6_1p5k_hilla()

# List available variables, ensembles, or tables
source.list_variables()
source.list_ensembles()
source.list_tables()

# Print a full summary
source.discover()

Google Cloud CMIP6 / GeoMIP (intake-esm)

Access cloud-optimized Zarr data from the Google Cloud CMIP6 catalog:

# Search and load in one step
datasets = rdc.esm.load(
    experiment_id=['G6sulfur', 'ssp245', 'ssp585'],
    variable_id='tas',
    table_id='Amon',
    require_all_on=['source_id', 'institution_id'],
)

# Or use the GeoMIP convenience helper
datasets = rdc.geomip_cloud.load_ensemble(
    experiments=['G6sulfur', 'ssp245', 'ssp585'],
    variable='tas',
)

# Quick single-experiment load
ds_dict = rdc.geomip_cloud.g6sulfur(variable='tas')

# Explore what's available
rdc.geomip_cloud.list_models()
rdc.geomip_cloud.list_variables(experiment_id='G6sulfur')
rdc.geomip_cloud.summary()

# Advanced: direct search then load
subset = rdc.esm.search(
    experiment_id='G6sulfur',
    variable_id=['tas', 'pr'],
    table_id='Amon',
)
datasets = subset.to_dataset_dict()

ESGF Data

The catalog also provides access to ESGF (Earth System Grid Federation) data:

ds = rdc.esgf.geomip.g6sulfur(model='UKESM1-0-LL', variable='tas')

Running Tests

Run the full test suite:

pytest

Run with coverage report:

pytest --cov=reflective_data_catalog --cov-report=term-missing

Run a specific test file:

pytest tests/test_flexible_sources.py

Tests mock all external services (S3, ESGF, intake-esm) so no network access or cloud credentials are required.

Requirements

  • Python >= 3.11
  • intake >= 2.0.0
  • intake-esm >= 2025.2.3
  • intake-esgf >= 2025.5.9
  • xarray >= 2025.01.0
  • obstore >= 0.8.0

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reflective_data_catalog-0.0.5.tar.gz (63.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reflective_data_catalog-0.0.5-py3-none-any.whl (43.0 kB view details)

Uploaded Python 3

File details

Details for the file reflective_data_catalog-0.0.5.tar.gz.

File metadata

  • Download URL: reflective_data_catalog-0.0.5.tar.gz
  • Upload date:
  • Size: 63.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reflective_data_catalog-0.0.5.tar.gz
Algorithm Hash digest
SHA256 9cefe0e0356cffbcb782ef7bb879644dc923910895c77ac1d42b0093e06d23db
MD5 fc0d9e99600a9f6d47a40a27d1be2ab4
BLAKE2b-256 1f36a345eb0f4e498ea84ba107d4b30c8ac74ce20d84e393a057af133dedf909

See more details on using hashes here.

Provenance

The following attestation bundles were made for reflective_data_catalog-0.0.5.tar.gz:

Publisher: build-and-push.yml on ReflectiveCloud/reflective-data-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reflective_data_catalog-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for reflective_data_catalog-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 aa3067a43f59d812726a56ae84563eb9d1131f908dbf1e30a1ecf8117e71b7b0
MD5 50fe7f3e3855989345ca525bf07cbe98
BLAKE2b-256 207228bf0feed0b27d456d6cdbdabf626659eaf2c087015f5048d463058e5210

See more details on using hashes here.

Provenance

The following attestation bundles were made for reflective_data_catalog-0.0.5-py3-none-any.whl:

Publisher: build-and-push.yml on ReflectiveCloud/reflective-data-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page