Skip to main content

Concurrent CDS API downloader with TUI and script mode

Project description

cdsswarm

CI codecov

Concurrent CDS API downloader with an interactive Textual TUI and script mode.

Submit multiple CDS API requests and download them in parallel with a configurable number of workers. Monitor progress through an interactive terminal UI with an htop-style worker table, or run headless in script mode for CI/cron jobs.

Workers tab Files tab

Installation

pip install cdsswarm

For YAML request file support:

pip install "cdsswarm[yaml]"

For development (tests, pre-commit):

git clone https://github.com/bgiebl/cdsswarm.git
cd cdsswarm
pip install -e ".[dev]"

Prerequisites

A valid CDS API configuration file at ~/.cdsapirc:

url: https://cds.climate.copernicus.eu/api
key: <your-uid>:<your-api-key>

See the CDS API documentation for setup instructions.

Quick Start

Command Line

Create a request file requests.json:

[
  {
    "dataset": "reanalysis-era5-single-levels",
    "request": {
      "product_type": ["reanalysis"],
      "variable": ["2m_temperature"],
      "year": ["2024"],
      "month": ["01"],
      "day": ["01", "02", "03"],
      "time": ["12:00"],
      "data_format": "grib"
    },
    "target": "temperature_jan.grib"
  },
  {
    "dataset": "reanalysis-era5-single-levels",
    "request": {
      "product_type": ["reanalysis"],
      "variable": ["total_precipitation"],
      "year": ["2024"],
      "month": ["01"],
      "day": ["01", "02", "03"],
      "time": ["12:00"],
      "data_format": "grib"
    },
    "target": "precipitation_jan.grib"
  }
]

Run with 4 workers:

cdsswarm requests.json --workers 4

Python API

import cdsswarm

tasks = [
    cdsswarm.Task(
        dataset="reanalysis-era5-single-levels",
        request={
            "product_type": ["reanalysis"],
            "variable": ["2m_temperature"],
            "year": ["2024"],
            "month": ["01"],
            "day": ["01", "02", "03"],
            "time": ["12:00"],
            "data_format": "grib",
        },
        target="temperature_jan.grib",
    ),
    cdsswarm.Task(
        dataset="reanalysis-era5-single-levels",
        request={
            "product_type": ["reanalysis"],
            "variable": ["total_precipitation"],
            "year": ["2024"],
            "month": ["01"],
            "day": ["01", "02", "03"],
            "time": ["12:00"],
            "data_format": "grib",
        },
        target="precipitation_jan.grib",
    ),
]

results = cdsswarm.download(tasks, num_workers=4)

for r in results:
    if r.success:
        print(f"Downloaded {r.task.target}")
    else:
        print(f"Failed {r.task.target}: {r.error}")

CLI Reference

usage: cdsswarm [-h] [--version] [-w WORKERS] [-m {interactive,script,auto}]
                [--no-skip] [--reuse | --no-reuse] [--max-retries MAX_RETRIES]
                [--output-dir OUTPUT_DIR] [--dry-run] [--log FILE]
                [--summary FILE]
                requests_file
Argument Description
requests_file Path to a JSON or YAML file with download requests
-w, --workers Number of parallel download workers (default: 4)
-m, --mode Display mode: interactive (TUI), script (plain text), or auto (default)
--no-skip Re-download files that already exist on disk
--reuse / --no-reuse Reuse existing CDS jobs with matching parameters (default: enabled)
--max-retries Max retry attempts per task (default: 3, 1 to disable)
--output-dir Prepend directory to relative target paths
--dry-run Show what would be downloaded without actually downloading
--log FILE Write timestamped log to a file
--summary FILE Export summary as JSON (.json) or CSV (.csv)

In auto mode, the TUI is used when stdout is a TTY; otherwise it falls back to script mode.

Request File Format

List format

Each entry specifies its own dataset:

[
  {
    "dataset": "reanalysis-era5-single-levels",
    "request": { ... },
    "target": "output1.grib"
  },
  {
    "dataset": "reanalysis-era5-pressure-levels",
    "request": { ... },
    "target": "output2.grib"
  }
]

Compact format

Share a dataset across all requests:

{
  "dataset": "reanalysis-era5-single-levels",
  "requests": [
    { "request": { ... }, "target": "output1.grib" },
    { "request": { ... }, "target": "output2.grib" }
  ]
}

YAML

Both formats also work in YAML (requires pip install cdsswarm[yaml]):

dataset: reanalysis-era5-single-levels
requests:
  - request:
      product_type: [reanalysis]
      variable: [2m_temperature]
      year: ["2024"]
      month: ["01"]
      day: ["01"]
      time: ["12:00"]
      data_format: grib
    target: temperature.grib

The request dict accepts the same parameters as cdsapi.Client.retrieve().

Python API Reference

cdsswarm.Task(dataset, request, target)

A single CDS API download request.

Field Type Description
dataset str CDS dataset name (e.g. "reanalysis-era5-single-levels")
request dict Request parameters, same format as cdsapi.Client.retrieve()
target str Local file path to save the downloaded data

cdsswarm.download(tasks, num_workers=4, skip_existing=True, reuse_jobs=False, max_retries=3, on_message=None)

Download multiple CDS API requests concurrently.

Parameter Type Default Description
tasks list[Task] required List of download tasks
num_workers int 4 Number of parallel workers
skip_existing bool True Skip files that already exist
reuse_jobs bool True Reuse existing CDS jobs with matching parameters
max_retries int 3 Max retry attempts per task (1 to disable)
on_message callable None Callback fn(message: str) for status updates

Returns a list[Result]. Returns an empty list if interrupted by KeyboardInterrupt.

cdsswarm.Result

Field Type Description
task Task The original task
success bool Whether the download succeeded
error str Error message (empty on success)

TUI

The interactive TUI (terminal user interface) is built with Textual and is available via the CLI only. It shows an htop-style DataTable with one row per worker:

W  │Status      │Prog │Filename          │Started  │Elapsed  │Size    │DL %   │Request ID
0  │ running    │72%  │era5_2024_01.grib │22:31:24 │2h30m05s │15.2 GB│48%    │af1e2306-28c3...
1  │ successful │100% │era5_2024_02.nc   │22:31:25 │1h15m00s │8.1 GB │100% ✓ │b2f4a891-...

The layout has two tabs (Workers and Files), an info panel above the table, and a progress footer with an overall progress bar and ETA.

Key bindings:

Key Action
q Quit
t / Tab Switch tab
Enter Open scrollable log for the selected worker
a Show full request parameters
Esc Dismiss screen / go back
Ctrl+C Cancel — in-flight CDS API requests are cancelled on the server

Running Tests

pip install -e ".[dev]"
pytest -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdsswarm-0.1.1.tar.gz (211.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cdsswarm-0.1.1-py3-none-any.whl (36.8 kB view details)

Uploaded Python 3

File details

Details for the file cdsswarm-0.1.1.tar.gz.

File metadata

  • Download URL: cdsswarm-0.1.1.tar.gz
  • Upload date:
  • Size: 211.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cdsswarm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fdf5ac2a66a1cb7d016ae01742c4fe11d2f82a46655e62ffa74b96a3c1de880d
MD5 06ab831928eebc49c13f54af7a8b8a59
BLAKE2b-256 dc54690f1da545e541da01f2630fddfe7f1271a82f840602b873ce1c51e4c25c

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdsswarm-0.1.1.tar.gz:

Publisher: publish.yml on bgiebl/cdsswarm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cdsswarm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cdsswarm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 36.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cdsswarm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8fce624c1e707ec9e66bc33b694b019b811230863094333d4dcb0c24e823ba02
MD5 3e0e82c0a4556ca8df5560b7d52025d4
BLAKE2b-256 116bf507305d22f316244e0c13b80a125c4461fcded8b366dfffd6d81b288556

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdsswarm-0.1.1-py3-none-any.whl:

Publisher: publish.yml on bgiebl/cdsswarm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page