Skip to main content

Concurrent CDS API downloader with TUI and script mode

Project description

cdsswarm logo cdsswarm

CI codecov License: MIT PyPI Downloads

Concurrent CDS API downloader with an interactive Textual TUI and script mode.

Submit multiple CDS API requests and download them in parallel with a configurable number of workers. Monitor progress through an interactive terminal UI with an htop-style worker table, or run headless in script mode for CI/cron jobs.

Feedback welcome! This project is under active development. If you have suggestions, feature requests, or run into any issues, please open an issue on GitHub or send an email to b.giebl@protonmail.com.

Workers tab Files tab

Installation

pip install cdsswarm

For YAML request file support:

pip install "cdsswarm[yaml]"

For development (tests, pre-commit):

git clone https://github.com/bgiebl/cdsswarm.git
cd cdsswarm
pip install -e ".[dev]"

Prerequisites

A valid CDS API configuration file at ~/.cdsapirc:

url: https://cds.climate.copernicus.eu/api
key: <your-uid>:<your-api-key>

See the CDS API documentation for setup instructions.

Quick Start

Command Line

Create a request file requests.json:

[
  {
    "dataset": "reanalysis-era5-single-levels",
    "request": {
      "product_type": ["reanalysis"],
      "variable": ["2m_temperature"],
      "year": ["2024"],
      "month": ["01"],
      "day": ["01", "02", "03"],
      "time": ["12:00"],
      "data_format": "grib"
    },
    "target": "temperature_jan.grib"
  },
  {
    "dataset": "reanalysis-era5-single-levels",
    "request": {
      "product_type": ["reanalysis"],
      "variable": ["total_precipitation"],
      "year": ["2024"],
      "month": ["01"],
      "day": ["01", "02", "03"],
      "time": ["12:00"],
      "data_format": "grib"
    },
    "target": "precipitation_jan.grib"
  }
]

Run with 4 workers:

cdsswarm requests.json --workers 4

Python API

import cdsswarm

tasks = [
    cdsswarm.Task(
        dataset="reanalysis-era5-single-levels",
        request={
            "product_type": ["reanalysis"],
            "variable": ["2m_temperature"],
            "year": ["2024"],
            "month": ["01"],
            "day": ["01", "02", "03"],
            "time": ["12:00"],
            "data_format": "grib",
        },
        target="temperature_jan.grib",
    ),
    cdsswarm.Task(
        dataset="reanalysis-era5-single-levels",
        request={
            "product_type": ["reanalysis"],
            "variable": ["total_precipitation"],
            "year": ["2024"],
            "month": ["01"],
            "day": ["01", "02", "03"],
            "time": ["12:00"],
            "data_format": "grib",
        },
        target="precipitation_jan.grib",
    ),
]

results = cdsswarm.download(tasks, num_workers=4)

for r in results:
    if r.success:
        print(f"Downloaded {r.task.target}")
    else:
        print(f"Failed {r.task.target}: {r.error}")

CLI Reference

usage: cdsswarm [-h] [--version] [-w WORKERS] [-m {interactive,script,auto}]
                [--no-skip] [--resume | --no-resume] [--reuse | --no-reuse]
                [--max-retries MAX_RETRIES] [--output-dir OUTPUT_DIR]
                [--dry-run] [--ignore-warnings] [--log FILE] [--summary FILE]
                [--post-hook CMD]
                requests_file
Argument Description
requests_file Path to a JSON or YAML file with download requests
-w, --workers Number of parallel download workers (default: 4)
-m, --mode Display mode: interactive (TUI), script (plain text), or auto (default)
--no-skip Re-download files that already exist on disk
--resume / --no-resume Resume an interrupted session if state file exists (default: enabled)
--reuse / --no-reuse Reuse existing CDS jobs with matching parameters (default: enabled)
--max-retries Max retry attempts per task (default: 3, 1 to disable)
--output-dir Prepend directory to relative target paths
--dry-run Show what would be downloaded without actually downloading
--ignore-warnings Auto-continue on warnings (e.g. checksum mismatch) without prompting
--log FILE Write timestamped log to a file
--summary FILE Export summary as JSON (.json) or CSV (.csv)
--post-hook CMD Shell command to run after each successful download (see below)

In auto mode, the TUI is used when stdout is a TTY; otherwise it falls back to script mode.

Post-download hooks

The --post-hook option runs a shell command after each file is successfully downloaded. Use {file} and {dataset} as placeholders:

# Compress each file after download
cdsswarm requests.json --post-hook "gzip {file}"

# Convert GRIB to NetCDF with CDO
cdsswarm requests.json --post-hook "cdo -f nc copy {file} {file}.nc"

# Upload to S3
cdsswarm requests.json --post-hook "aws s3 cp {file} s3://my-bucket/cds/"

Hook failures produce a warning but do not mark the download as failed — the file is already on disk.

Request generation

The generate subcommand expands a template file into a full request file using Cartesian product expansion:

cdsswarm generate template.json -o requests.json
cdsswarm generate template.json --dry-run          # preview without writing

The template file must contain a single JSON object (not a list). If you pass a single-element list [{...}], it will be auto-unwrapped with a warning.

A template looks like a single request with a split_by field that lists which dimensions to expand:

{
  "dataset": "reanalysis-era5-single-levels",
  "request": {
    "product_type": ["reanalysis"],
    "variable": ["2m_temperature", "total_precipitation"],
    "year": ["2023", "2024"],
    "month": ["01", "02", "03"],
    "day": ["01", "02", "03"],
    "time": ["12:00"],
    "data_format": "grib"
  },
  "target": "output/{variable}_{year}_{month}.grib",
  "split_by": ["variable", "year", "month"]
}

This generates 2 × 2 × 3 = 12 separate tasks, one for each combination of variable, year, and month. Non-split fields (day, time, etc.) are shared across all tasks. The {placeholder} syntax in target fills in each combination's values.

Option Description
--split-by FIELDS Override the template's split_by (comma-separated)
-o, --output FILE Output file path (default: stdout)
--dry-run Show task count and target filenames without writing output

Cancelling requests

The cancel subcommand cancels active CDS API requests on the server — useful for cleaning up after a crashed session or accidental submissions:

cdsswarm cancel                        # cancel all queued/running requests (new API only)
cdsswarm cancel abc-123 def-456        # cancel specific request IDs (both APIs)
cdsswarm cancel --yes                  # skip confirmation prompt

When no IDs are given, cdsswarm queries the CDS server for all active (accepted/running) requests and presents them for confirmation before cancelling. This requires the new CADS API (ecmwf-datastores). With the old cdsapi, you must provide specific request IDs.

Option Description
request_ids Specific request IDs to cancel (omit to cancel all active)
-y, --yes Skip confirmation prompt

Session resume

cdsswarm automatically saves session state after each task completes. If a download session is interrupted (e.g. by Ctrl+C or a network failure), rerunning the same command picks up where it left off — completed tasks are skipped and failed/pending tasks are retried.

State files are stored in ~/.cache/cdsswarm/sessions/ (or $XDG_CACHE_HOME), keyed by request file path and output directory.

cdsswarm requests.json -w 4             # interrupted — 50 of 100 tasks done
cdsswarm requests.json -w 4             # resumes from task 51
cdsswarm requests.json -w 4 --no-resume # force a fresh start

Configuration file

Settings can be stored in a .cdsswarm.toml file instead of passing CLI flags every time. CLI flags always take precedence.

Location Scope
~/.cdsswarm.toml User-global defaults
.cdsswarm.toml (working directory) Project-level overrides

Example .cdsswarm.toml:

workers = 8
max-retries = 5
mode = "script"
output-dir = "/data/downloads"
post-hook = "gzip {file}"

All CLI flags are supported as config keys (use hyphens, e.g. max-retries, post-hook, skip-existing).

Request File Format

List format

Each entry specifies its own dataset:

[
  {
    "dataset": "reanalysis-era5-single-levels",
    "request": { ... },
    "target": "output1.grib"
  },
  {
    "dataset": "reanalysis-era5-pressure-levels",
    "request": { ... },
    "target": "output2.grib"
  }
]

Compact format

Share a dataset across all requests:

{
  "dataset": "reanalysis-era5-single-levels",
  "requests": [
    { "request": { ... }, "target": "output1.grib" },
    { "request": { ... }, "target": "output2.grib" }
  ]
}

YAML

Both formats also work in YAML (requires pip install cdsswarm[yaml]):

dataset: reanalysis-era5-single-levels
requests:
  - request:
      product_type: [reanalysis]
      variable: [2m_temperature]
      year: ["2024"]
      month: ["01"]
      day: ["01"]
      time: ["12:00"]
      data_format: grib
    target: temperature.grib

The request dict accepts the same parameters as cdsapi.Client.retrieve().

Python API Reference

cdsswarm.Task(dataset, request, target)

A single CDS API download request.

Field Type Description
dataset str CDS dataset name (e.g. "reanalysis-era5-single-levels")
request dict Request parameters, same format as cdsapi.Client.retrieve()
target str Local file path to save the downloaded data

cdsswarm.download(tasks, num_workers=4, skip_existing=True, reuse_jobs=True, max_retries=3, on_message=None, post_hook="")

Download multiple CDS API requests concurrently.

Parameter Type Default Description
tasks list[Task] required List of download tasks
num_workers int 4 Number of parallel workers
skip_existing bool True Skip files that already exist
reuse_jobs bool True Reuse existing CDS jobs with matching parameters
max_retries int 3 Max retry attempts per task (1 to disable)
on_message callable None Callback fn(message: str) for status updates
post_hook str "" Shell command to run after each successful download ({file}, {dataset})

Returns a list[Result]. Returns an empty list if interrupted by KeyboardInterrupt.

cdsswarm.Result

Field Type Description
task Task The original task
success bool Whether the download succeeded
error str Error message (empty on success)

TUI

The interactive TUI (terminal user interface) is built with Textual and is available via the CLI only. It shows an htop-style DataTable with one row per worker:

W  │Status      │Prog │Filename          │Started  │Elapsed  │Size    │DL %   │Request ID
0  │ running    │72%  │era5_2024_01.grib │22:31:24 │2h30m05s │15.2 GB│48%    │af1e2306-28c3...
1  │ successful │100% │era5_2024_02.nc   │22:31:25 │1h15m00s │8.1 GB │100% ✓ │b2f4a891-...

The layout has two tabs (Workers and Files), an info panel above the table, and a progress footer with an overall progress bar and ETA.

Key bindings:

Key Action
q Quit
t / Tab Switch tab
Enter Open scrollable log for the selected worker
a Show full request parameters
Esc Dismiss screen / go back
Ctrl+C Cancel — in-flight CDS API requests are cancelled on the server

Running Tests

pip install -e ".[dev]"
pytest -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdsswarm-0.1.5.tar.gz (254.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdsswarm-0.1.5-py3-none-any.whl (47.0 kB view details)

Uploaded Python 3

File details

Details for the file cdsswarm-0.1.5.tar.gz.

File metadata

  • Download URL: cdsswarm-0.1.5.tar.gz
  • Upload date:
  • Size: 254.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cdsswarm-0.1.5.tar.gz
Algorithm Hash digest
SHA256 ebb802ccce1d3675b39df2c77a63079f2b7dd2ecfc3b9bfb292db9055b0ac614
MD5 b4f3b8d17977a3330674ac90b488f94f
BLAKE2b-256 fbad80ed27a21b90a89ba9b3ac50c44f0178df6c403ac475bc3e6a154bfcf30a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdsswarm-0.1.5.tar.gz:

Publisher: publish.yml on bgiebl/cdsswarm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cdsswarm-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: cdsswarm-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 47.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cdsswarm-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 af9bc2a02b24b244d86c2bf2bd6616cd98ee26b5c1058741096fee5c98adff32
MD5 2353a7ae7bc2766db43cbd283aa6c1ce
BLAKE2b-256 34000042e371c927117d743dde518a4a266fbd2a7d4265402818848261197052

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdsswarm-0.1.5-py3-none-any.whl:

Publisher: publish.yml on bgiebl/cdsswarm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page