Skip to main content

Earthkit Workflows is a Python library for declaring earthkit task DAGs, as well as scheduling and executing them on heterogeneous computing systems.

Project description

ECMWF Software EnginE Maturity Level Licence Latest Release

InstallationQuick StartDocumentation

[!IMPORTANT] This software is Emerging and subject to ECMWF's guidelines on Software Maturity.

earthkit-workflows is a Python library for declaring earthkit task as DAGs. It contains an internal cascade engine for scheduling and executing task graphs almost optimally across heterogeneous platforms with complex network technologies and topologies. It effectively performs task-based parallelism across CPUs, GPUs, distributed systems (HPC), and any combination thereof. It is designed for a no-IO approach, where expensive storage of intermediate data is minimised whilst maximising all available transport technologies between different hardware.

Cascade is designed to work on well-profiled task graphs, where:

  • the task graph is a static DAG,
  • the DAG nodes are defined by tasks with well-known execution times,
  • the DAG edges are defined by data dependencies with well-known data sizes,
  • the characteristics of the hardware (processors, network connections) are known.

earthkit-workflows allows for declaring such task graphs using a neat fluent API, and interoperates pleasantly with the rest of the earthkit ecosystem.

Installation

Install via pip with:

$ pip install 'earthkit-workflows[all]'

For development, you can use pip install -e . though there is currently an issue with earthkit masking. Additionally you may want to install pre-commit hooks via

$ pip install pre-commit
$ pre-commit install

Quick Start

Note: this section is moderately outdated.

We support two regimes for cascade executions -- local mode (ideal for developing and debugging small graphs) and distributed mode (assumed for slurm & HPC).

To launch in local mode, in your python repl / jupyno:

import cascade.benchmarks.job1 as j1
import cascade.benchmarks.distributed as di
import cloudpickle

spec = di.ZmqClusterSpec.local(j1.get_prob())
print(spec.controller.outputs)
# prints out:
# {DatasetId(task='mean:dc9d90 ...
# defaults to all "sinks", but can be overridden

rv = di.launch_from_specs(spec, None)

for key, value in rv.outputs.items():
    deser = cloudpickle.loads(value)
    print(f"output {key} is of type {type(deser)}")

For distributed mode, launch

./scripts/launch_slurm.sh ./localConfigs/<your_config.sh>

Inside the <your_config.sh>, you define size of the cluster, logging directory output, which job to run... Pay special attention to definitions of your venv and LD_LIBRARY_PATH etc -- this is not autotamed.

Both of these examples hardcode particular job, "job1", which is a benchmarking thing. Most likely, you want to define your own -- for the local mode, just pass cascade.Graph instance to the call; in the dist mode, you need to provide that instance in the cascade.benchmarks.__main__ modules instead (ideally by extending the get_job function).

There is also python -m cascade.benchmarks local <..> -- you may use that as an alternative path to local mode, for your own e2e tests.

Documentation

Not yet available.

Contributions and Support

Due to the maturity and status of the project, there is no support provided -- unless the usage of this project happens within some higher-status initiative that ECMWF participates at. External contributions and created issues will be looked at, but are not guaranteed to be accepted or responded to. In general, follow ECMWF's guidelines for external contributions.

License

See license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

earthkit_workflows-0.13.0.tar.gz (10.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

earthkit_workflows-0.13.0-py3-none-any.whl (203.8 kB view details)

Uploaded Python 3

File details

Details for the file earthkit_workflows-0.13.0.tar.gz.

File metadata

  • Download URL: earthkit_workflows-0.13.0.tar.gz
  • Upload date:
  • Size: 10.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for earthkit_workflows-0.13.0.tar.gz
Algorithm Hash digest
SHA256 ec3ffc4fb9f5bef6d26451ba81da622ed1dea76664e57336ec7d1d60207e0f10
MD5 4c9f2a198de340f9087ca813c6c0af6f
BLAKE2b-256 07c27c8a45b308a9974a1ecc73d0c4efcd6f0bbb4f20dd3ea3889f9c580b7926

See more details on using hashes here.

File details

Details for the file earthkit_workflows-0.13.0-py3-none-any.whl.

File metadata

File hashes

Hashes for earthkit_workflows-0.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 472c653cb88cb17b70c2aded8981d4abe2ddca12dc40bc89fcd8d0bae26835d1
MD5 e9c777ca2387ca3f3488f5b7ae0a825d
BLAKE2b-256 ed03725120dd43344218efcb6e7458c29205defb56644944924999b8edfff8cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page