Skip to main content

A lightweight, single-node job scheduler written in Rust.

Project description

gflow - A lightweight, single-node job scheduler

Documentation Status GitHub Actions Workflow Status PyPI - Version Crates.io Version Anaconda-Server Badge Crates.io Downloads (recent) dependency status Crates.io License Crates.io Size Discord

English | 简体中文

gflow is a lightweight, single-node job scheduler written in Rust, inspired by Slurm. It is designed for efficiently managing and scheduling tasks, especially on machines with GPU resources.

Core Features

  • Daemon-based Scheduling: A persistent daemon (gflowd) manages the job queue and resource allocation.
  • Rich Job Submission: Supports dependencies, priorities, job arrays, and time limits via the gbatch command.
  • Time Limits: Set maximum runtime for jobs (similar to Slurm's --time) to prevent runaway processes.
  • Service and Job Control: Provides clear commands to inspect the scheduler state (ginfo), query the job queue (gqueue), and control job states (gcancel).
  • tmux Integration: Uses tmux for robust, background task execution and session management.
  • Output Logging: Automatic capture of job output to log files via tmux pipe-pane.
  • Simple Command-Line Interface: Offers a user-friendly and powerful set of command-line tools.

Component Overview

The gflow suite consists of several command-line tools:

  • gflowd: The scheduler daemon that runs in the background, managing jobs and resources.
  • ginfo: Displays scheduler and GPU information.
  • gbatch: Submits jobs to the scheduler, similar to Slurm's sbatch.
  • gqueue: Lists and filters jobs in the queue, similar to Slurm's squeue.
  • gcancel: Cancels jobs and manages job states (internal use).

Installation

Install via PyPI (Recommended)

Install gflow using pipx (recommended for CLI tools):

pipx install gflow

Or using uv:

uv tool install gflow

Or using pip:

pip install gflow

This will install pre-built binaries for Linux (x86_64, ARM64, ARMv7) with both GNU and MUSL libc support.

Quick Install Script (Linux x86_64)

Install gflow with a single command:

curl -fsSL https://gflow-releases.puqing.work/install.sh | sh

Or use GitHub:

curl -fsSL https://raw.githubusercontent.com/AndPuQing/gflow/main/install.sh | sh

This will download and install the latest release binaries to ~/.cargo/bin.

You can customize the installation directory by setting the GFLOW_INSTALL_DIR environment variable:

curl -fsSL https://gflow-releases.puqing.work/install.sh | GFLOW_INSTALL_DIR=/usr/local/bin sh

Install via cargo

cargo install gflow

cargo install(main branch)

cargo install --git https://github.com/AndPuQing/gflow.git --locked

This will install all the necessary binaries (gflowd, ginfo, gbatch, gqueue, gcancel, gjob).

Install via Conda

You can install gflow using Conda from the conda-forge channel:

conda install -c conda-forge gflow

Build Manually

  1. Clone the repository:

    git clone https://github.com/AndPuQing/gflow.git
    cd gflow
    
  2. Build the project:

    cargo build --release
    

    The executables will be available in the target/release/ directory.

Quick Start

  1. Start the scheduler daemon:

    gflowd up
    

    Run this in a dedicated terminal or tmux session and leave it running. You can check its health at any time with gflowd status and inspect resources with ginfo.

  2. Submit a job: Create a script my_job.sh:

    #!/bin/bash
    echo "Starting job on GPU: $CUDA_VISIBLE_DEVICES"
    sleep 30
    echo "Job finished."
    

    Submit it using gbatch:

    gbatch --gpus 1 ./my_job.sh
    
  3. Check the job queue:

    gqueue
    

    You can also watch the queue update live: watch gqueue.

  4. Stop the scheduler:

    gflowd down
    

    This shuts down the daemon and cleans up the tmux session.

Usage Guide

Submitting Jobs with gbatch

gbatch provides flexible options for job submission.

  • Submit a command directly:

    gbatch --gpus 1 python train.py --epochs 10
    
  • Set a job name and priority:

    gbatch --gpus 1 --name "training-run-1" --priority 10 ./my_job.sh
    
  • Create a job that depends on another:

    # First job
    gbatch --gpus 1 --name "job1" ./job1.sh
    # Get job ID from gqueue, e.g., 123
    
    # Second job depends on the first
    gbatch --gpus 1 --name "job2" --depends-on 123 ./job2.sh
    
  • Set a time limit for a job:

    # 30-minute limit
    gbatch --time 30 python train.py
    
    # 2-hour limit (HH:MM:SS format)
    gbatch --time 2:00:00 python long_training.py
    
    # 5 minutes 30 seconds
    gbatch --time 5:30 python quick_task.py
    

    See docs/TIME_LIMITS.md for detailed documentation on time limits.

Querying Jobs with gqueue

gqueue allows you to filter and format the job list.

  • Filter by job state:

    gqueue --states Running,Queued
    
  • Filter by job ID or name:

    gqueue --jobs 123,124
    gqueue --names "training-run-1"
    
  • Customize output format:

    gqueue --format "ID,Name,State,GPUs"
    

Configuration

Configuration for gflowd can be customized. The default configuration file is located at ~/.config/gflow/gflowd.toml.

Star History

Star History Chart

Contributing

If you find any bugs or have feature requests, feel free to create an Issue and contribute by submitting Pull Requests.

License

gflow is licensed under the MIT License. See LICENSE for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

runqd-0.4.10.tar.gz (301.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

runqd-0.4.10-py3-none-musllinux_1_2_x86_64.whl (12.2 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

runqd-0.4.10-py3-none-musllinux_1_2_aarch64.whl (11.4 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARM64

runqd-0.4.10-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

runqd-0.4.10-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (11.1 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

File details

Details for the file runqd-0.4.10.tar.gz.

File metadata

  • Download URL: runqd-0.4.10.tar.gz
  • Upload date:
  • Size: 301.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for runqd-0.4.10.tar.gz
Algorithm Hash digest
SHA256 5fb132f1904828948eaa9890fe5f2d43e87e33d5eda86758fd8689f20b92389d
MD5 ccfdff120c9b4934dffc14d2388ddb4b
BLAKE2b-256 3c18ecf9fe08323ffe7c1c19e54a4586e00f461c9e356c028d9479ff4db3ba98

See more details on using hashes here.

File details

Details for the file runqd-0.4.10-py3-none-musllinux_1_2_x86_64.whl.

File metadata

  • Download URL: runqd-0.4.10-py3-none-musllinux_1_2_x86_64.whl
  • Upload date:
  • Size: 12.2 MB
  • Tags: Python 3, musllinux: musl 1.2+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for runqd-0.4.10-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 65d8b2db43023885e68cc04c1377db65e2c8b53c03d87608d535a20350855ef4
MD5 1c31a6e1e944565848c864bca8f77ae2
BLAKE2b-256 6f6f8c14f74b18de1ddbb1f7c89d2f4d2f07efa285379e2fd9e077754607c293

See more details on using hashes here.

File details

Details for the file runqd-0.4.10-py3-none-musllinux_1_2_aarch64.whl.

File metadata

  • Download URL: runqd-0.4.10-py3-none-musllinux_1_2_aarch64.whl
  • Upload date:
  • Size: 11.4 MB
  • Tags: Python 3, musllinux: musl 1.2+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for runqd-0.4.10-py3-none-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 886d292ecd298507961461e8a069cb1be7498c82471b1e6d7e31b2ece9abc7d0
MD5 6da1de627258f1e2f31e2105d0cbab9f
BLAKE2b-256 2dd277e865503e1612237615eaa7c31eb9373793909749818d7828f713d9bd14

See more details on using hashes here.

File details

Details for the file runqd-0.4.10-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

  • Download URL: runqd-0.4.10-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  • Upload date:
  • Size: 11.7 MB
  • Tags: Python 3, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for runqd-0.4.10-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3e88c612976ff6dc550ae4095cadb41ba981d7fe18f01d88baf5d1bc04bd1581
MD5 1069d17f4ed1ae9a108e07fa5a06b83f
BLAKE2b-256 5b964aab76e5f519234519046864f073cdb3c5f3f922578f6702d10a69b90d68

See more details on using hashes here.

File details

Details for the file runqd-0.4.10-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

  • Download URL: runqd-0.4.10-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
  • Upload date:
  • Size: 11.1 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for runqd-0.4.10-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e9e474d161cd9f386bb43181ac8b9baec2b4e6f808b87e07e44ce35eadf16f85
MD5 7947143be3725ac54bf9e4477cef7d36
BLAKE2b-256 a4dd34ec0c38d5c8bb97eb9b737bcd66baa020829b27e303f2ed52dd8f7d31fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page