A no-nonsense job launcher for distributed workloads on Kubernetes
Project description
Seekr Chain
No-nonsense job launcher
Currently supported backends:
- Argo Workflows
Why Seekr-Chain?
Seekr-chain aims to make it as easy as possible to get your jobs running, without getting in your way. Chain's philosophy is simple.
- A job consists of a DAG of steps
- A step consists of:
- image
- command
- resources
Vs Metaflow:
- Code directory:
chaingives you full control over exactly which directory (and which files) are uploaded into your job runtimemetaflowonly allows uploading code in the current directory, and is cumbersome to include/exclude specific files
- Runtime:
chainprovides full control over the job runtime. You define your image, and you can run any command you want - bash, python (select your own executable!), anything.- Easily run setup steps, such as installs, etc, before your main script
- Easily run
torchrun,deepspeed,accelerate, or anything else!
metaflowdoes not allow you to choose your runtime. You job runs in python.- In order to use
deepspeed,torchrun, etc, complicated and opaque decorators are required - When things go wrong, it is hard to debug!
- In order to use
- Interactive jobs:
chainsupports interactive jobs, allowing you to easily debug
- Job monitoring
chainmakes it easy to follow all pods within a jobmetaflowcan easily follow steps, EXCEPT for jobsets created by decorators (e.g.@deepspeed). Then you have to manually search for relevant pods
- DAGs:
- both
chainandmetaflowsupport DAGs
- both
- Much closer to pure argo-workflows
- Less 'stuff' between you and the running code. Less assumptions, less magic
- Arguments between steps:
chaindoes not (yet) pass arguments between steps for you. In the future,chainwill support passing args between steps as json, for maximum compatabilitymetaflowpasses args between steps in native python dtypes. When it works, it's magic. When it breaks, it's not.
Installation
It is recommended to install as a dependency in your project environment. You can install directly from git, or as a submodule.
Pre-reqs:
-
Kubectl
Make sure you have
kubectlinstalled and configured.
Install from PyPI
-
uvuv add seekr-chain
-
pippip install seekr-chain
Install as submodule
If you think you will need to modify seekr-chain in conjunction with your project, it may be convenient to install as an editable submodule.
- In your repo, create a
submodsdirectory:mkdir submods && cd submods - Clone repository:
git clone https://github.com/seekr-technologies/seekr-chain.git - Add as submodule:
git submodule add ./seekr-chain - Add as editable dependency:
-
uv:uv pip install -e ./seekr-chain -
pippip install -e ./seekr-chain
-
Install as uv tool
This makes chain available everywhere
uv tool install seekr-chain
Usage
seekr-chain allows you to define an arbitrary workflow, specified completely by config.
Workflow Config
The Workflow Config is defined and validated as a pydantic DataModel. As such, just by viewing the config definition, you can see a full definition and documentation of all options.
For the config definition, see the Configuration Reference. The main config is the WorkflowConfig
Python API
You can easily construct and launch jobs in python.
import seekr_chain
# Define the job config
config = {...}
# Launch the workflow
workflow = seekr_chain.launch_argo_workflow(config) # Returns an `ArgoWorkflow` object
# follow the workflow, printing logs and workflow status
workflow.follow()
# Alternatively, wait for workflow to complete
seekr_chain.wait(workflow)
CLI
Define a config in any of the supported languages, and run the job with:
chain submit <path_to_config>
You can also use the -f/--follow flag to follow the workflow.
Supported CLI config formats:
yaml
Examples
Multiple examples can be found in the examples directory. Each example can be run with
chain submit examples/.../config.yaml --follow
Features
-
Multinode jobs: Easily run high performance multi-node jobs, just by specifying
num_nodesfor a given step.chainwill ensure all pods in a step can communicate, making multi-node training a breeze -
DAGs: String together arbitrary steps in a DAG
-
Code Upload: Easily upload code from any directory for your job, with full control over inclusion/exclusion rules
-
Persistent Volume Claims: Attach to or create PVCs for your jobs
-
Secrets: Securely pass secrets into jobs
-
Interactive jobs: Simply specify
--interactiveto the CLI, orinteractive=Truein python.Chain will launch your job, and automatically drop you in a shell in your job when it starts.
chain submit examples/0_hello_world/config.yaml --interactive 2025-09-18 11:00:21.985 INFO seekr_chain Packaging assets: None 2025-09-18 11:00:21.986 INFO seekr_chain Uploading assets to s3://bucket/seekr-chain/54/c72ca3-9172-461f-8315-2c9c15ebd696.tar.gz 2025-09-18 11:00:22.122 WARNING seekr_chain Setting auto-timeout of 1 hour 2025-09-18 11:00:23.459 INFO seekr_chain Uploaded workflow secrets: AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY Launched argo workflow: hello-world-nt7y6d 2025-09-18 11:00:28.220 INFO seekr_chain Waiting for job to start hello-world-nt7y6d 2025-09-18 11:00:34.884 INFO seekr_chain Connecting ________ _____ _____ __ / ____/ / / / | / _/ | / / / / / /_/ / /| | / // |/ / / /___/ __ / ___ |_/ // /| / \____/_/ /_/_/ |_/___/_/ |_/ Argo Workflow Name: hello-world-nt7y6d Type `c-d` to exit this shell To run this job, use `/seekr-chain/entrypoint.sh` Defaulted container "trainer" out of: trainer, download-assets (init), unpack-assets (init), create-hostfile (init) root@oke-trn-01-ngwgf6vcrhq-1:/seekr-chain/workspace#
Environment Variables
Chain provides the following evars:
| Evar | Description |
|---|---|
GPUS_PER_NODE |
Number of GPUs per node |
HOSTNAME |
Hostname of current pod, usually the same as the node. For a unique DNS name, use SEEKR_CHAIN_POD_INSTANCE_ID |
HOSTFILE |
Deepspeed-style hostfile, wiht the hostname and num slots per node |
MASTER_ADDR |
Master addr for distributed comm |
MASTER_PORT |
Master port for distributed comm |
NNODES |
Number of nodes in set |
NODE_RANK |
Rank of this node in set |
SEEKR_CHAIN_WORKFLOW_ID |
ID of the overall workflow (shared across all steps) |
SEEKR_CHAIN_JOBSET_ID |
ID of the current step/jobset |
SEEKR_CHAIN_POD_ID |
Stable ID of the current pod in step/jobset |
SEEKR_CHAIN_POD_INSTANCE_ID |
Unique ID of the current pod, unique across restarts/completions |
Roadmap
- Live code sync for
interactivejobs - Expanded backend support:
- Local
- Slurm
- Basic result passing
Developer/Contributing
Developer install
- Install
uv - Clone repository
- Run
uv sync - Run tests with
uv run pytest tests
Contributions welcome.
CI will be set up to run unittests on PR
Commit conventions
Releases are triggered automatically on merge to main. The version bump is determined
by the highest-priority conventional commit prefix found in the PR's commits:
| Prefix | Bump |
|---|---|
feat!:, fix!:, any !: |
major |
feat: |
minor |
fix:, perf:, refactor:, revert:, test: |
patch |
ci:, chore:, docs:, style:, build: |
none — no release |
If no commits use conventional format, no release is triggered.
Choosing the right prefix: feat: means a user-facing feature that warrants a minor
version bump. Changes that only affect CI, build infrastructure, dev tooling, or test
scaffolding should use ci: or chore: even if they touch production code — the test is
whether an end-user would notice the change.
Changelog
See changelog
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seekr_chain-0.9.1.tar.gz.
File metadata
- Download URL: seekr_chain-0.9.1.tar.gz
- Upload date:
- Size: 181.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96b4e955a7a274ca3369309491adf935e318c9f26d9f035f10b1a9f7dcf9fad7
|
|
| MD5 |
fa4d9fe547b26653ae98bd6552132f6d
|
|
| BLAKE2b-256 |
3e1b1f0172a3e8b258da2280c0be49ce884104859e4971664fbb79859946d119
|
File details
Details for the file seekr_chain-0.9.1-py3-none-any.whl.
File metadata
- Download URL: seekr_chain-0.9.1-py3-none-any.whl
- Upload date:
- Size: 60.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c9388dc961fd4a1587f15cb72817c58701c30added4b4438ba7d19cae69dfb
|
|
| MD5 |
60b3c3c8dd6377552959bab7312eab98
|
|
| BLAKE2b-256 |
0c456e603f9bb9a04acffce7145129cc15caed096d64769a907f89432f00cd8d
|