Skip to main content

eopod is a streamlined command execution tool designed to run and manage operations on Google Cloud Pods efficiently

Project description

eopod: Enhanced TPU Command Runner

eopod is a command-line tool designed to simplify and enhance interaction with Google Cloud TPU VMs. It provides real-time output streaming, background process management, and robust error handling.

Features

  • Configuration Management: Easily configure eopod with your Google Cloud project ID, zone, and TPU name.
  • Command Execution: Run commands on TPU VMs with advanced features like retries, delays, timeouts, and worker selection.
  • Interactive Mode (Experimental): Run commands in an interactive SSH session (use with caution).
  • Command History: View a history of executed commands, their status, and truncated output.
  • Error Logging: Detailed error logs are maintained for debugging failed commands.
  • Rich Output: Utilizes the rich library for visually appealing and informative output in the console.

Installation

pip install eopod

Configuration

Before using eopod, configure it with your Google Cloud credentials:

eopod configure --project-id YOUR_PROJECT_ID --zone YOUR_ZONE --tpu-name YOUR_TPU_NAME

Usage Examples

Basic Command Execution

Commands are executed with real-time output streaming by default:

# Simple command
eopod run echo "Hello TPU"

# Run Python script
eopod run python train.py --batch-size 32

# Complex commands with pipes and redirections
eopod run "cat data.txt | grep error > errors.log"

# Commands with multiple arguments
eopod run ls -la /path/to/dir

Background Processes

Run long-running tasks in the background:

# Start training in background
eopod run python long_training.py --epochs 1000 --background

# Check background processes
eopod check-background

# Check specific process
eopod check-background 12345

# Kill a background process
eopod kill 12345

# Force kill if necessary
eopod kill 12345 --force

Worker-Specific Commands

Execute commands on specific workers:

# Run on specific worker
eopod run nvidia-smi --worker 0

# Run on all workers (default)
eopod run hostname --worker all

Advanced Options

# Disable output streaming
eopod run python script.py --no-stream

# Set custom retry count
eopod run python train.py --retry 5

# Set custom retry delay
eopod run python train.py --delay 10

# Set custom timeout
eopod run python train.py --timeout 600

Kill and free TPU process

# Kill all TPU processes
eopod kill-tpu

# Force kill all TPU processes
eopod kill-tpu --force

# Kill specific PID(s)
eopod kill-tpu --pid 1234 --pid 5678

# Kill processes on specific worker
eopod kill-tpu --worker 0

Viewing History and Logs

# View command history
eopod history

# View error logs
eopod errors

# View current configuration
eopod show-config

Command Reference

Main Commands

  • run: Execute commands on TPU VM

    eopod run [OPTIONS] COMMAND [ARGS]...
    

    Options:

    • --worker TEXT: Specific worker or "all" (default: "all")
    • --retry INTEGER: Number of retries for failed commands (default: 3)
    • --delay INTEGER: Delay between retries in seconds (default: 5)
    • --timeout INTEGER: Command timeout in seconds (default: 300)
    • --no-stream: Disable output streaming
    • --background: Run command in background
  • configure: Set up eopod configuration

    eopod configure --project-id ID --zone ZONE --tpu-name NAME
    
  • status: Check TPU status

    eopod status
    
  • check-background: Check background processes

    eopod check-background [PID]
    
  • kill: Kill background processes

    eopod kill PID [--force]
    

Utility Commands

  • history: View command execution history
  • errors: View error logs
  • show-config: Display current configuration

File Locations

  • Configuration: ~/.eopod/config.ini
  • Command history: ~/.eopod/history.yaml
  • Error logs: ~/.eopod/error_log.yaml
  • Application logs: ~/.eopod/eopod.log

Error Handling

eopod includes built-in error handling and retry mechanisms:

  • Automatic retry for failed commands
  • Timeout handling
  • Detailed error logging
  • Rich error output

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eopod-0.0.25.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eopod-0.0.25-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file eopod-0.0.25.tar.gz.

File metadata

  • Download URL: eopod-0.0.25.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for eopod-0.0.25.tar.gz
Algorithm Hash digest
SHA256 8280c3711896a0c32cb79401b2ca90b5e6f47eb009afae941207049603568043
MD5 1bea249e03e2046f6b3b68b02e612c29
BLAKE2b-256 9d1cdbd2facb76eecd441e3832dafabe53722c63ecbb6ddaa600dab6f6de8dc6

See more details on using hashes here.

File details

Details for the file eopod-0.0.25-py3-none-any.whl.

File metadata

  • Download URL: eopod-0.0.25-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for eopod-0.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 28fc56a3589b6f1b3166ef6139ea3b6f9df6d24e6d6ec7e0b9cb880f78e62912
MD5 a7cc1dc843b666aab7d6de9c1212ea97
BLAKE2b-256 a9bcfbe9b0369eea7f5362837a60e3812524c11bcad0f3ec41130f510841ae78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page