Kubeflow Python SDK to manage ML workloads and to interact with Kubeflow APIs.
Project description
Kubeflow SDK
Latest News 🔥
- [2025/11] Please fill this survey to shape the future of Kubeflow SDK.
- [2025/11] The Kubeflow SDK v0.2 is officially released. Check out the announcement blog post.
Overview
The Kubeflow SDK is a set of unified Pythonic APIs that let you run any AI workload at any scale – without the need to learn Kubernetes. It provides simple and consistent APIs across the Kubeflow ecosystem, enabling users to focus on building AI applications rather than managing complex infrastructure.
Kubeflow SDK Benefits
- Unified Experience: Single SDK to interact with multiple Kubeflow projects through consistent Python APIs
- Simplified AI Workloads: Abstract away Kubernetes complexity and work effortlessly across all Kubeflow projects using familiar Python APIs
- Built for Scale: Seamlessly scale any AI workload — from local laptop to large-scale production cluster with thousands of GPUs using the same APIs.
- Rapid Iteration: Reduced friction between development and production environments
- Local Development: First-class support for local development without a Kubernetes cluster
requiring only
pipinstallation
Kubeflow SDK Introduction
The following KubeCon + CloudNativeCon 2025 talk provides an overview of Kubeflow SDK:
Additionally, check out these demos to deep dive into Kubeflow SDK capabilities:
Get Started
Install Kubeflow SDK
pip install -U kubeflow
Run your first PyTorch distributed job
from kubeflow.trainer import TrainerClient, CustomTrainer, TrainJobTemplate
def get_torch_dist(learning_rate: str, num_epochs: str):
import os
import torch
import torch.distributed as dist
dist.init_process_group(backend="gloo")
print("PyTorch Distributed Environment")
print(f"WORLD_SIZE: {dist.get_world_size()}")
print(f"RANK: {dist.get_rank()}")
print(f"LOCAL_RANK: {os.environ['LOCAL_RANK']}")
lr = float(learning_rate)
epochs = int(num_epochs)
loss = 1.0 - (lr * 2) - (epochs * 0.01)
if dist.get_rank() == 0:
print(f"loss={loss}")
# Create the TrainJob template
template = TrainJobTemplate(
runtime="torch-distributed",
trainer=CustomTrainer(
func=get_torch_dist,
func_args={"learning_rate": "0.01", "num_epochs": "5"},
num_nodes=3,
resources_per_node={"cpu": 2},
),
)
# Create the TrainJob
job_id = TrainerClient().train(**template)
# Wait for TrainJob to complete
TrainerClient().wait_for_job_status(job_id)
# Print TrainJob logs
print("\n".join(TrainerClient().get_job_logs(name=job_id)))
Optimize hyperparameters for your training
from kubeflow.optimizer import OptimizerClient, Search, TrialConfig
# Create OptimizationJob with the same template
optimization_id = OptimizerClient().optimize(
trial_template=template,
trial_config=TrialConfig(num_trials=10, parallel_trials=2),
search_space={
"learning_rate": Search.loguniform(0.001, 0.1),
"num_epochs": Search.choice([5, 10, 15]),
},
)
print(f"OptimizationJob created: {optimization_id}")
Local Development
Kubeflow Trainer client supports local development without needing a Kubernetes cluster.
Available Backends
- KubernetesBackend (default) - Production training on Kubernetes
- ContainerBackend - Local development with Docker/Podman isolation
- LocalProcessBackend - Quick prototyping with Python subprocesses
Quick Start:
Install container support: pip install kubeflow[docker] or pip install kubeflow[podman]
from kubeflow.trainer import TrainerClient, ContainerBackendConfig, CustomTrainer
# Switch to local container execution
client = TrainerClient(backend_config=ContainerBackendConfig())
# Your training runs locally in isolated containers
job_id = client.train(trainer=CustomTrainer(func=train_fn))
Supported Kubeflow Projects
| Project | Status | Version Support | Description |
|---|---|---|---|
| Kubeflow Trainer | ✅ Available | v2.0.0+ | Train and fine-tune AI models with various frameworks |
| Kubeflow Katib | ✅ Available | v0.19.0+ | Hyperparameter optimization |
| Kubeflow Pipelines | 🚧 Planned | TBD | Build, run, and track AI workflows |
| Kubeflow Model Registry | 🚧 Planned | TBD | Manage model artifacts, versions and ML artifacts metadata |
| Kubeflow Spark Operator | 🚧 Planned | TBD | Manage Spark applications for data processing and feature engineering |
Community
Getting Involved
- Slack: Join our #kubeflow-ml-experience Slack channel
- Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings
- GitHub: Discussions, issues and contributions at kubeflow/sdk
Contributing
Kubeflow SDK is a community project and is still under active development. We welcome contributions! Please see our CONTRIBUTING Guide for details.
Documentation
- Blog Post Announcement: Introducing the Kubeflow SDK: A Pythonic API to Run AI Workloads at Scale
- Design Document: Kubeflow SDK design proposal
- Component Guides: Individual component documentation
- DeepWiki: AI-powered repository documentation
✨ Contributors
We couldn't have done it without these incredible people:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kubeflow-0.3.0.tar.gz.
File metadata
- Download URL: kubeflow-0.3.0.tar.gz
- Upload date:
- Size: 6.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d67485a6d4cfd00870a7939faa0909e03455227aba80be4ed083456dde9a31f7
|
|
| MD5 |
00f611b2e245d12af44d80d1e078d468
|
|
| BLAKE2b-256 |
0b99a8c9ce2e1a1c768022ce548482d1d7e29bb7053b117dcbc8bbc7d9e731a7
|
Provenance
The following attestation bundles were made for kubeflow-0.3.0.tar.gz:
Publisher:
release.yml on kubeflow/sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kubeflow-0.3.0.tar.gz -
Subject digest:
d67485a6d4cfd00870a7939faa0909e03455227aba80be4ed083456dde9a31f7 - Sigstore transparency entry: 835579638
- Sigstore integration time:
-
Permalink:
kubeflow/sdk@230b68dc69aa8d2904dab670808463519a9bf7ee -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kubeflow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@230b68dc69aa8d2904dab670808463519a9bf7ee -
Trigger Event:
push
-
Statement type:
File details
Details for the file kubeflow-0.3.0-py3-none-any.whl.
File metadata
- Download URL: kubeflow-0.3.0-py3-none-any.whl
- Upload date:
- Size: 128.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
066d1acea230a89f6ea08686c6408c07fe70c506fff6c0d13dbab8cfdf30febe
|
|
| MD5 |
96cb5bf0dba3fb7f86e7737a5eb60f8a
|
|
| BLAKE2b-256 |
82d5b59eade9a27179aa22a992e4e33e23665197f8af432f8fb6674db83d3127
|
Provenance
The following attestation bundles were made for kubeflow-0.3.0-py3-none-any.whl:
Publisher:
release.yml on kubeflow/sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kubeflow-0.3.0-py3-none-any.whl -
Subject digest:
066d1acea230a89f6ea08686c6408c07fe70c506fff6c0d13dbab8cfdf30febe - Sigstore transparency entry: 835579640
- Sigstore integration time:
-
Permalink:
kubeflow/sdk@230b68dc69aa8d2904dab670808463519a9bf7ee -
Branch / Tag:
refs/heads/main - Owner: https://github.com/kubeflow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@230b68dc69aa8d2904dab670808463519a9bf7ee -
Trigger Event:
push
-
Statement type: