Skip to main content

Reflective's Unified SAI Data Catalog

Project description

Reflective Data Catalog

Reflective CI License: Apache-2.0 PyPI Last Commit

Reflective's unified Python interface for accessing SAI (Stratospheric Aerosol Injection) climate model data across cloud providers (S3, GCS, Azure, Cloudflare R2) for use on the Reflective Cloud Hub. Note: At this time, most non ESGF and ESM sources will not be accessible outside of the Reflective Cloud Hub due to technical limitations. We are working on storing the data in new locations and will update when it's ready.

This is an Intake-like interface for browsing, searching, and loading SRM related datasets in a unified manner. All available datasets can be seen here with more infomration in the Reflective Cloud Hub documentation. We've also included an example Jupyter Notebook showing how to use the tool.

Installation

pip install reflective-data-catalog

For development:

git clone https://github.com/ReflectiveCloud/reflective-data-catalog.git
cd reflective-data-catalog
pip install -e ".[dev]"
pre-commit install

This installs a pre-commit hook that automatically runs Ruff linting (with auto-fix) and formatting on every commit.

Quick Start

from reflective_data_catalog import ReflectiveCatalog

rdc = ReflectiveCatalog()

# Load a dataset lazily with dask
ds = rdc.cesm2_waccm_g6_1p5k_hilla(variable='T').to_dask()

# Load a dataset into memory
ds = rdc.miroc_es2h_g6_1p5k_sai(variable='SurfT').read()

Available Sources

Source Description
cesm2_waccm_g6_1p5k_hilla CESM2-WACCM G6-1.5K-HiLLA
cesm2_waccm_historical CESM2-WACCM Historical
cesm2_waccm_ssp245 CESM2-WACCM SSP2-4.5
cesm2_waccm6_g6_1p5k_hilla CESM2-WACCM6 G6-1.5K-HiLLA
e3smv3_g6_1p5k_hilla E3SMv3 G6-1.5K-HiLLA
miroc_es2h_g6_1p5k_hilla MIROC-ES2H G6-1.5K-HiLLA
miroc_es2h_g6_1p5k_sai MIROC-ES2H G6-1.5K-SAI
ukesm1_g6_1p5k_hilla UKESM1.1 G6-1.5K-HiLLA
ukesm1_ssp245 UKESM1.1 SSP2-4.5

Usage

Selecting Parameters

Each source accepts keyword arguments to select the table, variable, ensemble member, and other parameters:

# Specify variable, table, and ensemble
ds = rdc.cesm2_waccm_g6_1p5k_hilla(
    variable='T',
    table='AMON',
    ensemble='r2'
).to_dask()

# MIROC sources support a variant parameter
ds = rdc.miroc_es2h_g6_1p5k_hilla(
    variable='SurfT',
    variant='G6-1.5K-SAI',
    ensemble='r01'
).to_dask()

Discovering Available Data

Each source provides discovery methods to explore what data is available:

source = rdc.ukesm1_g6_1p5k_hilla()

# List available variables, ensembles, or tables
source.list_variables()
source.list_ensembles()
source.list_tables()

# Print a full summary
source.discover()

Google Cloud CMIP6 / GeoMIP (intake-esm)

Access cloud-optimized Zarr data from the Google Cloud CMIP6 catalog:

# Search and load in one step
datasets = rdc.esm.load(
    experiment_id=['G6sulfur', 'ssp245', 'ssp585'],
    variable_id='tas',
    table_id='Amon',
    require_all_on=['source_id', 'institution_id'],
)

# Or use the GeoMIP convenience helper
datasets = rdc.geomip_cloud.load_ensemble(
    experiments=['G6sulfur', 'ssp245', 'ssp585'],
    variable='tas',
)

# Quick single-experiment load
ds_dict = rdc.geomip_cloud.g6sulfur(variable='tas')

# Explore what's available
rdc.geomip_cloud.list_models()
rdc.geomip_cloud.list_variables(experiment_id='G6sulfur')
rdc.geomip_cloud.summary()

# Advanced: direct search then load
subset = rdc.esm.search(
    experiment_id='G6sulfur',
    variable_id=['tas', 'pr'],
    table_id='Amon',
)
datasets = subset.to_dataset_dict()

ESGF Data

The catalog also provides access to ESGF (Earth System Grid Federation) data:

ds = rdc.esgf.geomip.g6sulfur(model='UKESM1-0-LL', variable='tas')

Running Tests

Run the full test suite:

pytest

Run with coverage report:

pytest --cov=reflective_data_catalog --cov-report=term-missing

Run a specific test file:

pytest tests/test_flexible_sources.py

Tests mock all external services (S3, ESGF, intake-esm) so no network access or cloud credentials are required.

Requirements

  • Python >= 3.11
  • intake >= 2.0.0
  • intake-esm >= 2025.2.3
  • intake-esgf >= 2025.5.9
  • xarray >= 2025.01.0
  • obstore >= 0.8.0

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reflective_data_catalog-0.0.3.tar.gz (62.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reflective_data_catalog-0.0.3-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file reflective_data_catalog-0.0.3.tar.gz.

File metadata

  • Download URL: reflective_data_catalog-0.0.3.tar.gz
  • Upload date:
  • Size: 62.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reflective_data_catalog-0.0.3.tar.gz
Algorithm Hash digest
SHA256 181345feb3bea6a8f37eddf799331e1cbdfbf3c2e64898666c77ed34506e2bf4
MD5 63acf351eae9b7881a5014f25af338f7
BLAKE2b-256 a0728154b4b7f87caafe56c0d78d0df08feb3b1c8ab5fcc4dc57851c20402333

See more details on using hashes here.

Provenance

The following attestation bundles were made for reflective_data_catalog-0.0.3.tar.gz:

Publisher: build-and-push.yml on ReflectiveCloud/reflective-data-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reflective_data_catalog-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for reflective_data_catalog-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 26ff3dedb160891e2579f57bd27d5665841276981dcf5aa3c01cfdfde02bd58f
MD5 e6cb19f38abd0d2791d05b8a1e13c279
BLAKE2b-256 fbbb59727f4376e5daa5ce79f015b72f3c1c519588305788dfe7ac101f80714c

See more details on using hashes here.

Provenance

The following attestation bundles were made for reflective_data_catalog-0.0.3-py3-none-any.whl:

Publisher: build-and-push.yml on ReflectiveCloud/reflective-data-catalog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page