Skip to main content

eopod is a streamlined command execution tool designed to run and manage operations on Google Cloud Pods efficiently

Project description

eopod: Enhanced TPU Command Runner

eopod is a command-line tool designed to simplify and enhance interaction with Google Cloud TPU VMs. It provides real-time output streaming, background process management, and robust error handling.

Features

  • Configuration Management: Easily configure eopod with your Google Cloud project ID, zone, and TPU name.
  • Command Execution: Run commands on TPU VMs with advanced features like retries, delays, timeouts, and worker selection.
  • Interactive Mode (Experimental): Run commands in an interactive SSH session (use with caution).
  • Command History: View a history of executed commands, their status, and truncated output.
  • Error Logging: Detailed error logs are maintained for debugging failed commands.
  • Rich Output: Utilizes the rich library for visually appealing and informative output in the console.

Installation

pip install eopod

Configuration

Before using eopod, configure it with your Google Cloud credentials:

eopod configure --project-id YOUR_PROJECT_ID --zone YOUR_ZONE --tpu-name YOUR_TPU_NAME

Usage Examples

Basic Command Execution

Commands are executed with real-time output streaming by default:

# Simple command
eopod run echo "Hello TPU"

# Run Python script
eopod run python train.py --batch-size 32

# Complex commands with pipes and redirections
eopod run "cat data.txt | grep error > errors.log"

# Commands with multiple arguments
eopod run ls -la /path/to/dir

Background Processes

Run long-running tasks in the background:

# Start training in background
eopod run python long_training.py --epochs 1000 --background

# Check background processes
eopod check-background

# Check specific process
eopod check-background 12345

# Kill a background process
eopod kill 12345

# Force kill if necessary
eopod kill 12345 --force

Worker-Specific Commands

Execute commands on specific workers:

# Run on specific worker
eopod run nvidia-smi --worker 0

# Run on all workers (default)
eopod run hostname --worker all

Advanced Options

# Disable output streaming
eopod run python script.py --no-stream

# Set custom retry count
eopod run python train.py --retry 5

# Set custom retry delay
eopod run python train.py --delay 10

# Set custom timeout
eopod run python train.py --timeout 600

Kill and free TPU process

# Kill all TPU processes
eopod kill-tpu

# Force kill all TPU processes
eopod kill-tpu --force

# Kill specific PID(s)
eopod kill-tpu --pid 1234 --pid 5678

# Kill processes on specific worker
eopod kill-tpu --worker 0

Viewing History and Logs

# View command history
eopod history

# View error logs
eopod errors

# View current configuration
eopod show-config

Command Reference

Main Commands

  • run: Execute commands on TPU VM

    eopod run [OPTIONS] COMMAND [ARGS]...
    

    Options:

    • --worker TEXT: Specific worker or "all" (default: "all")
    • --retry INTEGER: Number of retries for failed commands (default: 3)
    • --delay INTEGER: Delay between retries in seconds (default: 5)
    • --timeout INTEGER: Command timeout in seconds (default: 300)
    • --no-stream: Disable output streaming
    • --background: Run command in background
  • configure: Set up eopod configuration

    eopod configure --project-id ID --zone ZONE --tpu-name NAME
    
  • status: Check TPU status

    eopod status
    
  • check-background: Check background processes

    eopod check-background [PID]
    
  • kill: Kill background processes

    eopod kill PID [--force]
    

Utility Commands

  • history: View command execution history
  • errors: View error logs
  • show-config: Display current configuration

File Locations

  • Configuration: ~/.eopod/config.ini
  • Command history: ~/.eopod/history.yaml
  • Error logs: ~/.eopod/error_log.yaml
  • Application logs: ~/.eopod/eopod.log

Error Handling

eopod includes built-in error handling and retry mechanisms:

  • Automatic retry for failed commands
  • Timeout handling
  • Detailed error logging
  • Rich error output

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eopod-0.0.24.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eopod-0.0.24-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file eopod-0.0.24.tar.gz.

File metadata

  • Download URL: eopod-0.0.24.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for eopod-0.0.24.tar.gz
Algorithm Hash digest
SHA256 6d0376ad3d01ed8339c91af654e111f2c51ea2f3633111e89c42ece128cdc9b3
MD5 e6e5f4f84fabd7aa8f33b16a778e8e2b
BLAKE2b-256 c87183b783a85d7542f918444f06f2702a5b07e774ab909b25ce7a84b0c8ac75

See more details on using hashes here.

File details

Details for the file eopod-0.0.24-py3-none-any.whl.

File metadata

  • Download URL: eopod-0.0.24-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for eopod-0.0.24-py3-none-any.whl
Algorithm Hash digest
SHA256 6ae56208d01fb73733be0af96513a46f32561e966b82c6a0e1ab56e99233889b
MD5 a3bb7c0515744f090126465a2111f1c2
BLAKE2b-256 02a52aa929424d911d45d3a807ac5cc2f470fe90d970d4b3d41b9c8f29533e6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page