Reflective's Unified SAI Data Catalog
Project description
Reflective Data Catalog
Reflective's unified Python interface for accessing SAI (Stratospheric Aerosol Injection) climate model data across cloud providers (S3, GCS, Azure, Cloudflare R2) for use on the Reflective Cloud Hub. Note: At this time, most non ESGF and ESM sources will not be accessible outside of the Reflective Cloud Hub due to technical limitations. We are working on storing the data in new locations and will update when it's ready.
This is an Intake-like interface for browsing, searching, and loading SRM related datasets in a unified manner. All available datasets can be seen here with more infomration in the Reflective Cloud Hub documentation. We've also included an example Jupyter Notebook showing how to use the tool.
Installation
pip install reflective-data-catalog
For development:
git clone https://github.com/ReflectiveCloud/reflective-data-catalog.git
cd reflective-data-catalog
pip install -e ".[dev]"
pre-commit install
This installs a pre-commit hook that automatically runs Ruff linting (with auto-fix) and formatting on every commit.
Quick Start
from reflective_data_catalog import ReflectiveCatalog
rdc = ReflectiveCatalog()
# Load a dataset lazily with dask
ds = rdc.cesm2_waccm_g6_1p5k_hilla(variable='T').to_dask()
# Load a dataset into memory
ds = rdc.miroc_es2h_g6_1p5k_sai(variable='SurfT').read()
Available Sources
| Source | Description |
|---|---|
cesm2_waccm_g6_1p5k_hilla |
CESM2-WACCM G6-1.5K-HiLLA |
cesm2_waccm_historical |
CESM2-WACCM Historical |
cesm2_waccm_ssp245 |
CESM2-WACCM SSP2-4.5 |
cesm2_waccm6_g6_1p5k_hilla |
CESM2-WACCM6 G6-1.5K-HiLLA |
e3smv3_g6_1p5k_hilla |
E3SMv3 G6-1.5K-HiLLA |
miroc_es2h_g6_1p5k_hilla |
MIROC-ES2H G6-1.5K-HiLLA |
miroc_es2h_g6_1p5k_sai |
MIROC-ES2H G6-1.5K-SAI |
ukesm1_g6_1p5k_hilla |
UKESM1.1 G6-1.5K-HiLLA |
ukesm1_ssp245 |
UKESM1.1 SSP2-4.5 |
Usage
Selecting Parameters
Each source accepts keyword arguments to select the table, variable, ensemble member, and other parameters:
# Specify variable, table, and ensemble
ds = rdc.cesm2_waccm_g6_1p5k_hilla(
variable='T',
table='AMON',
ensemble='r2'
).to_dask()
# MIROC sources support a variant parameter
ds = rdc.miroc_es2h_g6_1p5k_hilla(
variable='SurfT',
variant='G6-1.5K-SAI',
ensemble='r01'
).to_dask()
Discovering Available Data
Each source provides discovery methods to explore what data is available:
source = rdc.ukesm1_g6_1p5k_hilla()
# List available variables, ensembles, or tables
source.list_variables()
source.list_ensembles()
source.list_tables()
# Print a full summary
source.discover()
Google Cloud CMIP6 / GeoMIP (intake-esm)
Access cloud-optimized Zarr data from the Google Cloud CMIP6 catalog:
# Search and load in one step
datasets = rdc.esm.load(
experiment_id=['G6sulfur', 'ssp245', 'ssp585'],
variable_id='tas',
table_id='Amon',
require_all_on=['source_id', 'institution_id'],
)
# Or use the GeoMIP convenience helper
datasets = rdc.geomip_cloud.load_ensemble(
experiments=['G6sulfur', 'ssp245', 'ssp585'],
variable='tas',
)
# Quick single-experiment load
ds_dict = rdc.geomip_cloud.g6sulfur(variable='tas')
# Explore what's available
rdc.geomip_cloud.list_models()
rdc.geomip_cloud.list_variables(experiment_id='G6sulfur')
rdc.geomip_cloud.summary()
# Advanced: direct search then load
subset = rdc.esm.search(
experiment_id='G6sulfur',
variable_id=['tas', 'pr'],
table_id='Amon',
)
datasets = subset.to_dataset_dict()
ESGF Data
The catalog also provides access to ESGF (Earth System Grid Federation) data:
ds = rdc.esgf.geomip.g6sulfur(model='UKESM1-0-LL', variable='tas')
Running Tests
Run the full test suite:
pytest
Run with coverage report:
pytest --cov=reflective_data_catalog --cov-report=term-missing
Run a specific test file:
pytest tests/test_flexible_sources.py
Tests mock all external services (S3, ESGF, intake-esm) so no network access or cloud credentials are required.
Requirements
- Python >= 3.11
- intake >= 2.0.0
- intake-esm >= 2025.2.3
- intake-esgf >= 2025.5.9
- xarray >= 2025.01.0
- obstore >= 0.8.0
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reflective_data_catalog-0.0.1.tar.gz.
File metadata
- Download URL: reflective_data_catalog-0.0.1.tar.gz
- Upload date:
- Size: 62.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7a4104f8dbe5a70af48aa7195dcbbc7ff057850571f1b04c964fa4f295c05f4
|
|
| MD5 |
6d3a875f7e1dfdeda160e699690297a0
|
|
| BLAKE2b-256 |
0b730231dd30e00f25b3db3a1da49c994493e2f4addf999f654fb4642790dbf7
|
Provenance
The following attestation bundles were made for reflective_data_catalog-0.0.1.tar.gz:
Publisher:
build-and-push.yml on ReflectiveCloud/reflective-data-catalog
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reflective_data_catalog-0.0.1.tar.gz -
Subject digest:
d7a4104f8dbe5a70af48aa7195dcbbc7ff057850571f1b04c964fa4f295c05f4 - Sigstore transparency entry: 1068781375
- Sigstore integration time:
-
Permalink:
ReflectiveCloud/reflective-data-catalog@f537f6feda673b978e7d490498e53ea2e5a0b0c1 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/ReflectiveCloud
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-and-push.yml@f537f6feda673b978e7d490498e53ea2e5a0b0c1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file reflective_data_catalog-0.0.1-py3-none-any.whl.
File metadata
- Download URL: reflective_data_catalog-0.0.1-py3-none-any.whl
- Upload date:
- Size: 42.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7af2b74ab091f0ade69533c120ebc0db72b053c1c9112e9ce7029df89e0f579a
|
|
| MD5 |
a55911103698edee22a7108321f52880
|
|
| BLAKE2b-256 |
6c909b3ad0a93a1505246e55419f4c50feb89895c20e5fbbebebce624c09d0b2
|
Provenance
The following attestation bundles were made for reflective_data_catalog-0.0.1-py3-none-any.whl:
Publisher:
build-and-push.yml on ReflectiveCloud/reflective-data-catalog
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reflective_data_catalog-0.0.1-py3-none-any.whl -
Subject digest:
7af2b74ab091f0ade69533c120ebc0db72b053c1c9112e9ce7029df89e0f579a - Sigstore transparency entry: 1068781416
- Sigstore integration time:
-
Permalink:
ReflectiveCloud/reflective-data-catalog@f537f6feda673b978e7d490498e53ea2e5a0b0c1 -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/ReflectiveCloud
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-and-push.yml@f537f6feda673b978e7d490498e53ea2e5a0b0c1 -
Trigger Event:
push
-
Statement type: