Skip to main content

eopod is a streamlined command execution tool designed to run and manage operations on Google Cloud Pods efficiently

Project description

eopod: Enhanced TPU Command Runner

eopod is a command-line tool designed to simplify and enhance interaction with Google Cloud TPU VMs. It provides real-time output streaming, background process management, and robust error handling.

Features

  • Configuration Management: Easily configure eopod with your Google Cloud project ID, zone, and TPU name.
  • Command Execution: Run commands on TPU VMs with advanced features like retries, delays, timeouts, and worker selection.
  • Interactive Mode (Experimental): Run commands in an interactive SSH session (use with caution).
  • Command History: View a history of executed commands, their status, and truncated output.
  • Error Logging: Detailed error logs are maintained for debugging failed commands.
  • Rich Output: Utilizes the rich library for visually appealing and informative output in the console.

Installation

pip install eopod

Configuration

Before using eopod, configure it with your Google Cloud credentials:

eopod configure --project-id YOUR_PROJECT_ID --zone YOUR_ZONE --tpu-name YOUR_TPU_NAME

Usage Examples

Basic Command Execution

Commands are executed with real-time output streaming by default:

# Simple command
eopod run echo "Hello TPU"

# Run Python script
eopod run python train.py --batch-size 32

# Complex commands with pipes and redirections
eopod run "cat data.txt | grep error > errors.log"

# Commands with multiple arguments
eopod run ls -la /path/to/dir

Background Processes

Run long-running tasks in the background:

# Start training in background
eopod run python long_training.py --epochs 1000 --background

# Check background processes
eopod check-background

# Check specific process
eopod check-background 12345

# Kill a background process
eopod kill 12345

# Force kill if necessary
eopod kill 12345 --force

Worker-Specific Commands

Execute commands on specific workers:

# Run on specific worker
eopod run nvidia-smi --worker 0

# Run on all workers (default)
eopod run hostname --worker all

Advanced Options

# Disable output streaming
eopod run python script.py --no-stream

# Set custom retry count
eopod run python train.py --retry 5

# Set custom retry delay
eopod run python train.py --delay 10

# Set custom timeout
eopod run python train.py --timeout 600

Kill and free TPU process

# Kill all TPU processes
eopod kill-tpu

# Force kill all TPU processes
eopod kill-tpu --force

# Kill specific PID(s)
eopod kill-tpu --pid 1234 --pid 5678

# Kill processes on specific worker
eopod kill-tpu --worker 0

Viewing History and Logs

# View command history
eopod history

# View error logs
eopod errors

# View current configuration
eopod show-config

Command Reference

Main Commands

  • run: Execute commands on TPU VM

    eopod run [OPTIONS] COMMAND [ARGS]...
    

    Options:

    • --worker TEXT: Specific worker or "all" (default: "all")
    • --retry INTEGER: Number of retries for failed commands (default: 3)
    • --delay INTEGER: Delay between retries in seconds (default: 5)
    • --timeout INTEGER: Command timeout in seconds (default: 300)
    • --no-stream: Disable output streaming
    • --background: Run command in background
  • configure: Set up eopod configuration

    eopod configure --project-id ID --zone ZONE --tpu-name NAME
    
  • status: Check TPU status

    eopod status
    
  • check-background: Check background processes

    eopod check-background [PID]
    
  • kill: Kill background processes

    eopod kill PID [--force]
    

Utility Commands

  • history: View command execution history
  • errors: View error logs
  • show-config: Display current configuration

File Locations

  • Configuration: ~/.eopod/config.ini
  • Command history: ~/.eopod/history.yaml
  • Error logs: ~/.eopod/error_log.yaml
  • Application logs: ~/.eopod/eopod.log

Error Handling

eopod includes built-in error handling and retry mechanisms:

  • Automatic retry for failed commands
  • Timeout handling
  • Detailed error logging
  • Rich error output

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eopod-0.0.33.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eopod-0.0.33-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file eopod-0.0.33.tar.gz.

File metadata

  • Download URL: eopod-0.0.33.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for eopod-0.0.33.tar.gz
Algorithm Hash digest
SHA256 82aec596f12c39b05f43e6ec8f1e06aaa1da00f31b9d822d435f2c36b1df32a1
MD5 6e20f7dddcb24b52298c348aa3d7ee49
BLAKE2b-256 4580d3a7ff68c745fba91c3232b3794914e904321538aef3e14a1d3ef9f8097e

See more details on using hashes here.

File details

Details for the file eopod-0.0.33-py3-none-any.whl.

File metadata

  • Download URL: eopod-0.0.33-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.0

File hashes

Hashes for eopod-0.0.33-py3-none-any.whl
Algorithm Hash digest
SHA256 703e32f243394c862bfa04299a24d2759335e0d39a60b2d393e92a3c7e0ded77
MD5 cb6de39532e52150705155f685bc4fec
BLAKE2b-256 cb609d4ad0c4a1b951c92347f6d60c8737cd92770ef372337928c2843dd783c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page