NASA CDAWeb data access for heliophysics — browse missions, inspect parameters, fetch CDF data
Project description
xhelio-cdaweb
NASA CDAWeb data access for heliophysics — browse missions, inspect parameters, fetch CDF data.
Works as a standalone Python library or as an MCP server for any MCP-compatible LLM client (Claude Desktop, Cursor, custom agents).
What's included
- 54 mission catalogs with 2500+ datasets — ACE, Parker Solar Probe, Solar Orbiter, Wind, MMS, THEMIS, GOES, Voyager, and more
- 2541 pre-built parameter metadata files from Master CDF skeletons —
browse_parametersworks instantly, no network required - Automatic data validation — fetched CDF files are compared against Master CDF metadata to detect phantom (documented but missing) and undocumented (present but undocumented) parameters
- Structured system prompts per mission — give an LLM full context about available instruments, datasets, and time coverage
Installation
# Library only
pip install xhelio-cdaweb
# With MCP server
pip install xhelio-cdaweb[mcp]
MCP Server
Configuration (Claude Desktop, Cursor, etc.)
{
"mcpServers": {
"cdaweb": {
"command": "xhelio-cdaweb-mcp"
}
}
}
With custom cache directory:
{
"mcpServers": {
"cdaweb": {
"command": "xhelio-cdaweb-mcp",
"args": ["--cache-dir", "/path/to/cache"]
}
}
}
Or run directly:
xhelio-cdaweb-mcp
xhelio-cdaweb-mcp --cache-dir /path/to/cache
python -m cdawebmcp
Cache directory
All runtime data is stored under a single root directory. Defaults to ~/.cdawebmcp/.
Configure via --cache-dir (MCP server) or cdawebmcp.configure() (library):
import cdawebmcp
cdawebmcp.configure(cache_dir="/path/to/cache")
~/.cdawebmcp/ # or custom path via configure()
├── metadata/ # Master CDF parameter metadata (user-fetched, supplements bundled data)
├── cdf_cache/ # Downloaded CDF data files (permanent, reused across fetches)
│ └── ace/mfi/ # organized by mission/instrument path
│ └── ac_h2_mfi_2024.cdf
└── overrides/ # Validation sync results (append-only)
└── ace/
└── AC_H2_MFI.json
metadata/— User-fetched parameter metadata. Checked before bundled metadata and Master CDF download.cdf_cache/— Permanent cache of downloaded CDF files. Once a CDF file is downloaded, it is never re-downloaded. Usemanage_cache(action="clean", category="cdf_cache")to free disk space.overrides/— Validation results from comparing fetched data against metadata. Append-only, one JSON per dataset.
Tools
| Tool | Description |
|---|---|
browse_missions() |
List all 54 CDAWeb missions with descriptions, dataset counts, and instruments |
load_mission(mission_id) |
Get the complete system prompt for a mission (role instructions + full dataset catalog) |
browse_parameters(dataset_id) |
Browse all variables in a dataset — name, type, units, description, plus validation status if available |
fetch_data(dataset_id, parameters, start, stop, output_dir) |
Download CDF data, write to file, return metadata + per-column stats (min, max, mean, std, nan_ratio) |
manage_cache(action, ...) |
Cache management — status, clean, refresh metadata, refresh time ranges, rebuild catalog |
Typical workflow
browse_missions → load_mission("ace") → browse_parameters("AC_H2_MFI") → fetch_data(...)
- Discover available missions
- Load a mission's full catalog and instructions
- Inspect dataset parameters to choose what to fetch
- Fetch data for a time range — returns file path + statistics
Python Library
from cdawebmcp.catalog import browse_missions
from cdawebmcp.prompts import build_mission_prompt
from cdawebmcp.metadata import browse_parameters
from cdawebmcp.fetch import fetch_data
# List all 54 missions
missions = browse_missions()
# Get mission-specific system prompt
prompt = build_mission_prompt("ace")
# Browse dataset parameters (instant — uses bundled metadata)
params = browse_parameters(dataset_id="AC_H2_MFI")
# Fetch data — returns DataFrames directly
result = fetch_data("AC_H2_MFI", ["Magnitude"], "2024-01-01", "2024-01-02")
mag = result["Magnitude"]
print(mag["data"]) # pandas DataFrame
print(mag["units"]) # "nT"
print(mag["stats"]) # per-column {min, max, mean, std, nan_ratio}
Data validation
When fetch_data downloads CDF files, it automatically compares actual data variables against the bundled Master CDF metadata. Discrepancies are recorded in ~/.cdawebmcp/overrides/ and surfaced through browse_parameters:
- Phantom parameters — listed in metadata but absent from actual data files
- Undocumented parameters — present in data files but not in official metadata
This validation runs once per unique CDF source URL and builds an append-only archive with full provenance (source file, URL, timestamp).
Bundled data
| Data | Count | Description |
|---|---|---|
| Mission catalogs | 54 | Instruments, datasets, time coverage, PI info |
| Parameter metadata | 2541 | Variable names, types, units, fill values, sizes |
| Prompt templates | 2 | Generic role + CDAWeb-specific workflow instructions |
All bundled data ships with the package. No network access needed for browsing — only fetch_data requires a connection to CDAWeb.
Catalog updates
Rebuild from CDAWeb REST API:
# Rebuild mission catalogs
python -m cdawebmcp.scripts.build_catalog
python -m cdawebmcp.scripts.build_catalog --mission psp
python -m cdawebmcp.scripts.build_catalog --discover
# Rebuild parameter metadata from Master CDFs
python -m cdawebmcp.scripts.build_metadata
python -m cdawebmcp.scripts.build_metadata --mission psp
Development
pip install -e ".[dev]"
pytest tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xhelio_cdaweb-0.2.0.tar.gz.
File metadata
- Download URL: xhelio_cdaweb-0.2.0.tar.gz
- Upload date:
- Size: 782.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5002061bfdd7bdc5597f3355fdc845609b876ab1b8b4bd885584be6da2a1f375
|
|
| MD5 |
c893233aabb76143d50dd39591bc77da
|
|
| BLAKE2b-256 |
5df82b46d3a456ace781b1c56fb945ec3ce0663106bd7a12219245addb23f11f
|
Provenance
The following attestation bundles were made for xhelio_cdaweb-0.2.0.tar.gz:
Publisher:
publish.yml on huangzesen/xhelio-cdaweb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xhelio_cdaweb-0.2.0.tar.gz -
Subject digest:
5002061bfdd7bdc5597f3355fdc845609b876ab1b8b4bd885584be6da2a1f375 - Sigstore transparency entry: 1073414886
- Sigstore integration time:
-
Permalink:
huangzesen/xhelio-cdaweb@a8f41a1d2c32812139f73d42e65ec06cb8f2d73b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/huangzesen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8f41a1d2c32812139f73d42e65ec06cb8f2d73b -
Trigger Event:
release
-
Statement type:
File details
Details for the file xhelio_cdaweb-0.2.0-py3-none-any.whl.
File metadata
- Download URL: xhelio_cdaweb-0.2.0-py3-none-any.whl
- Upload date:
- Size: 2.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef9b57ed1274c95d4ca52ad19f2bc28781b5ebe2fabf311d6a1e67b5642b2333
|
|
| MD5 |
c70a03283beeaacfc998c44b0f96a839
|
|
| BLAKE2b-256 |
ca7ed1831e74d3554f99411bf9b9fb761f85cc952df5f80b1dace4d192c4c0a0
|
Provenance
The following attestation bundles were made for xhelio_cdaweb-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on huangzesen/xhelio-cdaweb
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xhelio_cdaweb-0.2.0-py3-none-any.whl -
Subject digest:
ef9b57ed1274c95d4ca52ad19f2bc28781b5ebe2fabf311d6a1e67b5642b2333 - Sigstore transparency entry: 1073414938
- Sigstore integration time:
-
Permalink:
huangzesen/xhelio-cdaweb@a8f41a1d2c32812139f73d42e65ec06cb8f2d73b -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/huangzesen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a8f41a1d2c32812139f73d42e65ec06cb8f2d73b -
Trigger Event:
release
-
Statement type: