Skip to main content

xpk helps Cloud developers to orchestrate training jobs on accelerators on GKE.

Project description

Build Tests Nightly Tests

Overview

XPK (Accelerated Processing Kit, pronounced x-p-k) is a command line interface that simplifies cluster creation and workload execution on Google Kubernetes Engine (GKE). XPK generates preconfigured, training-optimized clusters and allows easy workload scheduling without any Kubernetes expertise.

XPK is recommended for quick creation of GKE clusters for proofs of concepts and testing.

XPK decouples provisioning capacity from running jobs. There are two structures: clusters (provisioned VMs) and workloads (training jobs). Clusters represent the physical resources you have available. Workloads represent training jobs -- at any time some of these will be completed, others will be running and some will be queued, waiting for cluster resources to become available.

The ideal workflow starts by provisioning the clusters for all of the ML hardware you have reserved. Then, without re-provisioning, submit jobs as needed. By eliminating the need for re-provisioning between jobs, using Docker containers with pre-installed dependencies and cross-ahead of time compilation, these queued jobs run with minimal start times. Further, because workloads return the hardware back to the shared pool when they complete, developers can achieve better use of finite hardware resources. And automated tests can run overnight while resources tend to be underutilized.

XPK supports a variety of hardware accelerators.

Accelerator Type Recipes
Ironwood tpu7x Run training workload with Ironwood and regular/gSC/DWS Calendar reservations using GCS Bucket storage
Run training workload with Ironwood with flex-start using Filestore storage
Run training workload with Ironwood and flex-start using Lustre storage
Trillium v6e Create Cluster
Create Workload
TPU v5p v5p Create Cluster
Create Workload
TPU v5e v5e Create Cluster
Create Workload
TPU v4 v4 Create Cluster
Create Workload
GPU A4X gb200 Create Cluster
Create Workload
GPU A4 b200 Create Cluster
Create Workload
GPU A3 Ultra h200 Create Cluster
Create Workload
GPU A3 Mega h100-mega Create Cluster
Create Workload
GPU A3 High h100 Create Cluster
Create Workload
GPU A100 A100 Create Cluster
Create Workload
CPU n2-standard-32 Create Cluster
Create Workload

XPK also supports the following Google Cloud Storage solutions:

Storage Type Documentation
Cloud Storage FUSE docs
Filestore docs
Parallelstore docs
Block storage (Persistent Disk, Hyperdisk) docs

Documentation

Dependencies

Dependency When used
Google Cloud SDK (gcloud) always
kubectl always (Auto-installed)
ClusterToolkit Provisioning GPU clusters (Auto-installed)
Kueue Scheduling workloads (Auto-installed)
JobSet Workload creation (Auto-installed)
Crane Building workload container (Auto-installed)
CoreDNS Cluster set up (Auto-installed)

Privacy notice

To help improve XPK, feature usage statistics are collected and sent to Google. You can opt-out at any time by executing the following shell command:

xpk config set send-telemetry <true/false>

XPK telemetry overall is handled in accordance with the Google Privacy Policy. When you use XPK to interact with or utilize GCP Services, your information is handled in accordance with the Google Cloud Privacy Notice.

Contributing

Please read contributing.md for details on our code of conduct, and the process for submitting pull requests to us.

Get involved

We'd love to hear from you! If you have questions or want to discuss ideas, join us on GitHub Discussions. Found a bug or have a feature request? Please let us know on GitHub Issues.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpk-1.12.0.tar.gz (323.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xpk-1.12.0-py3-none-any.whl (312.4 kB view details)

Uploaded Python 3

File details

Details for the file xpk-1.12.0.tar.gz.

File metadata

  • Download URL: xpk-1.12.0.tar.gz
  • Upload date:
  • Size: 323.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xpk-1.12.0.tar.gz
Algorithm Hash digest
SHA256 70165c0e236a8911a1dca0b6e67999f30a29839472167e218356242aa967993d
MD5 e7f02e745be6bc235b00b19bda7e3208
BLAKE2b-256 2a3a0b28217de6d4df2af0fa5808d6ed3d67df864debe7076565e73cbd5f4e65

See more details on using hashes here.

Provenance

The following attestation bundles were made for xpk-1.12.0.tar.gz:

Publisher: build_wheels.yaml on AI-Hypercomputer/xpk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xpk-1.12.0-py3-none-any.whl.

File metadata

  • Download URL: xpk-1.12.0-py3-none-any.whl
  • Upload date:
  • Size: 312.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xpk-1.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e5b374cb8d5ac75f38ec3737f08186a706295efedc87e6b0cf5f0cddbf96648
MD5 ca57a4962c93069e286eeda08bf085cf
BLAKE2b-256 e3ff34841bc993a9916135e061c04df6f93cc903e02b00240e3d9d1a7cfbe877

See more details on using hashes here.

Provenance

The following attestation bundles were made for xpk-1.12.0-py3-none-any.whl:

Publisher: build_wheels.yaml on AI-Hypercomputer/xpk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page