Skip to main content

CLI tool for distributed PyTorch training on Kubernetes

Project description

Torchway

A CLI tool that takes any PyTorch training script and handles the entire distributed training stack in one command — no YAML, no boilerplate.

pip install torchway

What it does

  • Generates and applies Kubernetes PyTorchJob manifests automatically
  • Injects DDP/FSDP distributed training boilerplate
  • Aggregates multi-pod logs in real time
  • Auto-wires MLflow experiment tracking
  • Surfaces plain-language fault explanations for OOM kills and NCCL errors

Status

🚧 Under active development — V1 coming soon.

Author

Suvan Kasina — GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchway-0.0.2.tar.gz (2.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchway-0.0.2-py3-none-any.whl (2.5 kB view details)

Uploaded Python 3

File details

Details for the file torchway-0.0.2.tar.gz.

File metadata

  • Download URL: torchway-0.0.2.tar.gz
  • Upload date:
  • Size: 2.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for torchway-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ae379fe50c8723330a1637a0b6a87a0e4256f40b8e94412bcc8ee04c2172ea6c
MD5 a3d5b822a54b7bfaa9f899f01ae4590d
BLAKE2b-256 b28369373f0208216cd8d0f3bec4ce2f4cfcae51ab617b215f3a28ad6869b1ff

See more details on using hashes here.

File details

Details for the file torchway-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: torchway-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 2.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for torchway-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 202f37fbb71dbd294b087c717a1110b76d23f0d7315ed49db0219f8a53542831
MD5 45c66dd55fc8fb76e8179afaf286f7a2
BLAKE2b-256 bdf674d6b53147708b0c4d356d2e694dcf862a6eb4a3534510e51620d67ac0f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page