Skip to main content

A CLI tool for launching Kubernetes job fast in EIDF

Project description

kblaunch

Test Python Version Ruff PyPI Version

A CLI tool for launching Kubernetes jobs with environment variable and secret management.

Installation

pip install kblaunch

Or using uv:

uv add kblaunch

You can even use uvx to use the cli without installing it:

uvx kblaunch --help

Usage

Setup

Run the setup command to configure the tool (email and slack webhook):

kblaunch setup

This will go through the following steps:

  1. Set the user (optional): This is used to identify the user and required by the cluster. The default is set to $USER.
  2. Set the email (required): This is used to identify the user and required by the cluster.
  3. Set up Slack notifications (optional): This will send a test message to the webhook, and setup the webhook in the config. When your job starts you will receive a message at the webhook
  4. Set up a PVC (optional): This will create a PVC for the user to use in their jobs
  5. Set the default PVC to use (optional): Note only one pod can use the PVC at a time

Basic Usage

Launch a simple job:

kblaunch launch
    --job-name myjob \
    --command "python script.py"

With Environment Variables

  1. From local environment:

    export PATH=...
    export OPENAI_API_KEY=...
    # pass the environment variables to the job
    kblaunch launch \
        --job-name myjob \
        --command "python script.py" \
        --local-env-vars PATH \
        --local-env-vars OPENAI_API_KEY
    
  2. From Kubernetes secrets:

    kblaunch launch \
        --job-name myjob \
        --command "python script.py" \
        --secrets-env-vars mysecret1 \
        --secrets-env-vars mysecret2
    
  3. From .env file (default behavior):

    kblaunch launch \
        --job-name myjob \
        --command "python script.py" \
        --load-dotenv
    

    If a .env exists in the current directory, it will be loaded and passed as environment variables to the job.

GPU Jobs

Specify GPU requirements:

kblaunch launch \
    --job-name gpu-job \
    --command "python train.py" \
    --gpu-limit 2 \
    --gpu-product "NVIDIA-A100-SXM4-80GB"

Interactive Mode

Launch an interactive job:

kblaunch launch \
    --job-name interactive \
    --interactive

Launch Options

Launch command options:

  • --email: User email (overrides config)
  • --job-name: Name of the Kubernetes job [required]
  • --docker-image: Docker image (default: "nvcr.io/nvidia/cuda:12.0.0-devel-ubuntu22.04")
  • --namespace: Kubernetes namespace (default: "informatics")
  • --queue-name: Kueue queue name (default: "informatics-user-queue")
  • --interactive: Run in interactive mode (default: False)
  • --command: Command to run in the container [required if not interactive]
  • --cpu-request: CPU request (default: "1")
  • --ram-request: RAM request (default: "8Gi")
  • --gpu-limit: GPU limit (default: 1)
  • --gpu-product: GPU product type (default: "NVIDIA-A100-SXM4-40GB")
    • Available options:
      • NVIDIA-A100-SXM4-80GB
      • NVIDIA-A100-SXM4-40GB
      • NVIDIA-A100-SXM4-40GB-MIG-3g.20gb
      • NVIDIA-A100-SXM4-40GB-MIG-1g.5gb
      • NVIDIA-H100-80GB-HBM3
  • --secrets-env-vars: List of secret environment variables (default: [])
  • --local-env-vars: List of local environment variables (default: [])
  • --load-dotenv: Load environment variables from .env file (default: True)
  • --nfs-server: NFS server address
  • --pvc-name: Persistent Volume Claim name
  • --dry-run: Print job YAML without creating it (default: False)
  • --priority: Priority class name (default: "default")
    • Available options: default, batch, short
  • --vscode: Install VS Code CLI in container (default: False)
  • --tunnel: Start VS Code SSH tunnel on startup (requires SLACK_WEBHOOK and --vscode)
  • --startup-script: Path to startup script to run in container

Monitor command options:

  • --namespace: Kubernetes namespace (default: "informatics")

Monitoring Commands

The kblaunch monitor command provides several subcommands to monitor cluster resources:

Displays aggreate GPU statistics for the cluster:

kblaunch monitor gpus

Displays queued jobs (jobs which are waiting for GPUs):

kblaunch monitor queue

Displays per-user statistics:

kblaunch monitor users

Displays per-job statistics:

kblaunch monitor jobs

Note that users and jobs commands will run nvidia-smi on pods to obtain GPU usage is not recommended for frequent use.

Features

  • Kubernetes job management
  • Environment variable handling from multiple sources
  • Kubernetes secrets integration
  • GPU job support
  • Interactive mode
  • Automatic job cleanup
  • Slack notifications (when configured)
  • Persistent Volume Claim (PVC) management
  • VS Code integration (with Code tunnelling support)
  • Monitoring commands

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kblaunch-0.2.13.tar.gz (66.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kblaunch-0.2.13-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file kblaunch-0.2.13.tar.gz.

File metadata

  • Download URL: kblaunch-0.2.13.tar.gz
  • Upload date:
  • Size: 66.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.5.26

File hashes

Hashes for kblaunch-0.2.13.tar.gz
Algorithm Hash digest
SHA256 1aac3d6dee253ab867cfb4fa2960e93aef410f00eb16f343d580c56409bb78e5
MD5 df7aa998e4f77c42f89c43345aa21925
BLAKE2b-256 274402081aca75fb2dda0b79cbb50abe78e435d436555af0d3eda92f7aaf56ff

See more details on using hashes here.

File details

Details for the file kblaunch-0.2.13-py3-none-any.whl.

File metadata

  • Download URL: kblaunch-0.2.13-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.5.26

File hashes

Hashes for kblaunch-0.2.13-py3-none-any.whl
Algorithm Hash digest
SHA256 c1808237f984ede53d312b78d986463f6596a3fd47440f06209c368cbc97ff17
MD5 d0a7b58758d9bdbe9fa838b59f201000
BLAKE2b-256 cdbe8e746685b9ac36a9d3e410f6d9c04f3f076e105ee2e17f1f55e877a2680e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page