Skip to main content

A scalable, multi-cloud task processing system

Project description

GitHub release; latest by date GitHub Release Date Test Status Documentation Status Code coverage
PyPI - Version PyPI - Format PyPI - Downloads PyPI - Python Version
GitHub commits since latest release GitHub commit activity GitHub last commit
Number of GitHub open issues Number of GitHub closed issues Number of GitHub open pull requests Number of GitHub closed pull requests
GitHub License Number of GitHub stars GitHub forks

Introduction

Cloud Tasks (contained in the rms-cloud-tasks package) is a framework for running independent tasks on cloud providers with automatic compute instance and task queue management. It is specifically designed for running the same code multiple times in a batch environment to process a series of different inputs. For example, the program could be an image processing program that takes the image filename as an argument, downloads the image from the cloud, performs some manipulations, and writes the result to a cloud-based location. It is very important that the tasks are completely independent; no communication between them is supported. Also, the processing happens entirely in a batch mode: a certain number of compute instances are created, they all process tasks in parallel, and then the compute instances are destroyed.

rms-cloud-tasks is a product of the PDS Ring-Moon Systems Node.

Features

Cloud Tasks is extremely easy to use with a simple command line interface and straightforward configuration file. It supports AWS and GCP compute instances and queues along with the ability to run jobs on a local workstation, all using a provider-independent API. Although each cloud provider has implemented similar functionality as part of their offering (e.g. GCP's Cloud Run), Cloud Tasks is unique in that it unifies all supported providers into a single, simple, universal system that does not require learning the often-complicated details of the official full-featured services.

Cloud Tasks consists of four primary components:

  • A Python module to make parallel execution simple
    • Allows conversion of an existing Python program to a parallel task with only a few lines of code
    • Supports both cloud compute instance and local machine environments
    • Executes each task in its own process for complete isolation
    • Reads task information from a cloud-based task queue or directly from a local file
    • Monitors the state of spot instances to notify tasks of upcoming preemption
  • A command line interface to manage the task queue system, that allows
    • Loading of tasks from a JSON or YAML file
    • Checking the status of a queue
    • Purging a queue of remaining tasks
    • Deleting a queue entirely
  • A command line interface to query the cloud about available resources, given certain constraints
    • Types of compute instances available, including price (both demand and spot instances)
    • VM boot images available
    • Regions and zones
  • A command line interface to manage a pool of compute instances optimized for price, given certain constraints
    • Automatically finds the optimal compute instance type given pricing and other constraints
    • Automatically determines the number of simultaneous instances to use
    • Creates new instances and runs a specified startup script to execute the task manager
    • Monitors instances for failure or preemption and creates new instances as needed to keep the compute pool full
    • Detects when all jobs are complete and terminates the instances

Installation

cloud_tasks consists of a command line interface (called cloud_tasks) and a Python module (also called cloud_tasks). They are available via the rms-cloud-tasks package on PyPI and can be installed with:

pip install rms-cloud-tasks

Note that this will install cloud_tasks into your current system Python, or into your currently activated virtual environment (venv), if any.

If you already have the rms-cloud-tasks package installed but wish to upgrade to a more recent version, you can use:

pip install --upgrade rms-cloud-tasks

You may also install cloud_tasks using pipx, which will isolate the installation from your system Python without requiring the creation of a virtual environment. To install pipx, please see the installation instructions. Once pipx is available, you may install cloud_tasks with:

pipx install rms-cloud-tasks

If you already have the rms-cloud-tasks package installed with pipx, you may upgrade to a more recent version with:

pipx upgrade rms-cloud-tasks

Using pipx is only useful if you want to use the command line interface and not access the Python module; however, it does not require you to worry about the Python version, setting up a virtual environment, etc.

Basic Examples

The cloud_tasks command line program supports many useful commands that control the task queue, compute instance pool, and retrieve general information about the cloud in a provider-indepent manner. A few examples are given below.

To get a list of available commands:

cloud_tasks --help

To get help on a particular command:

cloud_tasks load_queue --help

To list all ARM64-based compute instance types that have 2 to 4 vCPUs and at most 4 GB memory per vCPU.

cloud_tasks list_instance_types \
  --provider gcp --region us-central1 \
  --min-cpu 2 --max-cpu 4 --arch ARM64 --max-memory-per-cpu 4

To load a JSON file containing task descriptions into the task queue:

cloud_tasks load_queue \
  --provider gcp --region us-central1 --project-id my-project \
  --job-id my-job --task-file mytasks.json

To start automatic creation and management of a compute instance pool:

cloud_tasks manage_pool --provider gcp --config myconfig.yaml

Local Development

Setup

From the project root:

  1. Clone the repository (if you have not already).
  2. Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate   # On Windows: venv\Scripts\activate
    
  3. Install the package in editable/development mode and dependencies:
    pip install -e .
    pip install -r requirements.txt
    
    For development tooling (ruff, mypy, pytest, Sphinx), the same requirements.txt includes them; no separate requirements-dev.txt is needed.

Running Checks

From the project root (with the virtual environment activated), run all checks with:

./scripts/run-all-checks.sh

Execution modes: -p / --parallel (default) runs code checks and docs build in parallel; -s / --sequential runs them one after the other. Use -c / --code to run only code checks (ruff, mypy, pytest), or -d / --docs to run only the Sphinx documentation build.

Code checks: ruff (check and format), mypy, and pytest. Docs: Sphinx.

Prerequisites: Activate the project venv and ensure ruff, mypy, pytest, and Sphinx are installed (e.g. pip install -r requirements.txt).

Example usage:

./scripts/run-all-checks.sh              # parallel: code + docs
./scripts/run-all-checks.sh -s          # sequential
./scripts/run-all-checks.sh -c           # code only (ruff, mypy, pytest)
./scripts/run-all-checks.sh -d           # docs only (Sphinx)

Contributing

Information on contributing to this package can be found in the Contributing Guide.

Links

Licensing

This code is licensed under the Apache License v2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rms_cloud_tasks-0.2.0.tar.gz (385.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rms_cloud_tasks-0.2.0-py3-none-any.whl (131.5 kB view details)

Uploaded Python 3

File details

Details for the file rms_cloud_tasks-0.2.0.tar.gz.

File metadata

  • Download URL: rms_cloud_tasks-0.2.0.tar.gz
  • Upload date:
  • Size: 385.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rms_cloud_tasks-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4282de73d937b2e67b3d49d9ff2f0b8daba83d7608fcdeeb3a84c2fc346c6d64
MD5 22c639fd31b985e4bcb3f4b72efae866
BLAKE2b-256 341a43b69592e872b0d90de56e650691bf072f916c8c81678874b10af2286ff0

See more details on using hashes here.

File details

Details for the file rms_cloud_tasks-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rms_cloud_tasks-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6cc0851f7b3f5d6b2c16d1b1d21cef0e271c742c21bc2fe9ae9c114b769455af
MD5 26266ddbb3fbede6fb240cf61b10fcda
BLAKE2b-256 681547199d190c68467daee52817d9a9376c2f22972064418382d1df87e92661

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page