Skip to main content

EOpod is a streamlined command execution tool designed to run and manage operations on Google Cloud Pods efficiently

Project description

EOpod: Enhanced TPU Command Runner

EOpod is a command-line tool designed to simplify and enhance interaction with Google Cloud TPU VMs. It provides real-time output streaming, background process management, and robust error handling.

Features

  • Configuration Management: Easily configure EOpod with your Google Cloud project ID, zone, and TPU name.
  • Command Execution: Run commands on TPU VMs with advanced features like retries, delays, timeouts, and worker selection.
  • Interactive Mode (Experimental): Run commands in an interactive SSH session (use with caution).
  • Command History: View a history of executed commands, their status, and truncated output.
  • Error Logging: Detailed error logs are maintained for debugging failed commands.
  • Rich Output: Utilizes the rich library for visually appealing and informative output in the console.

Installation

pip install eopod

Configuration

Before using EOpod, configure it with your Google Cloud credentials:

eopod configure --project-id YOUR_PROJECT_ID --zone YOUR_ZONE --tpu-name YOUR_TPU_NAME

Usage Examples

Basic Command Execution

Commands are executed with real-time output streaming by default:

# Simple command
eopod run echo "Hello TPU"

# Run Python script
eopod run python train.py --batch-size 32

# Complex commands with pipes and redirections
eopod run "cat data.txt | grep error > errors.log"

# Commands with multiple arguments
eopod run ls -la /path/to/dir

Background Processes

Run long-running tasks in the background:

# Start training in background
eopod run python long_training.py --epochs 1000 --background

# Check background processes
eopod check-background

# Check specific process
eopod check-background 12345

# Kill a background process
eopod kill 12345

# Force kill if necessary
eopod kill 12345 --force

Worker-Specific Commands

Execute commands on specific workers:

# Run on specific worker
eopod run nvidia-smi --worker 0

# Run on all workers (default)
eopod run hostname --worker all

Advanced Options

# Disable output streaming
eopod run python script.py --no-stream

# Set custom retry count
eopod run python train.py --retry 5

# Set custom retry delay
eopod run python train.py --delay 10

# Set custom timeout
eopod run python train.py --timeout 600

Kill and free TPU process

# Kill all TPU processes
eopod kill-tpu

# Force kill all TPU processes
eopod kill-tpu --force

# Kill specific PID(s)
eopod kill-tpu --pid 1234 --pid 5678

# Kill processes on specific worker
eopod kill-tpu --worker 0

Viewing History and Logs

# View command history
eopod history

# View error logs
eopod errors

# View current configuration
eopod show-config

Command Reference

Main Commands

  • run: Execute commands on TPU VM

    eopod run [OPTIONS] COMMAND [ARGS]...
    

    Options:

    • --worker TEXT: Specific worker or "all" (default: "all")
    • --retry INTEGER: Number of retries for failed commands (default: 3)
    • --delay INTEGER: Delay between retries in seconds (default: 5)
    • --timeout INTEGER: Command timeout in seconds (default: 300)
    • --no-stream: Disable output streaming
    • --background: Run command in background
  • configure: Set up EOpod configuration

    eopod configure --project-id ID --zone ZONE --tpu-name NAME
    
  • status: Check TPU status

    eopod status
    
  • check-background: Check background processes

    eopod check-background [PID]
    
  • kill: Kill background processes

    eopod kill PID [--force]
    

Utility Commands

  • history: View command execution history
  • errors: View error logs
  • show-config: Display current configuration

File Locations

  • Configuration: ~/.eopod/config.ini
  • Command history: ~/.eopod/history.yaml
  • Error logs: ~/.eopod/error_log.yaml
  • Application logs: ~/.eopod/eopod.log

Error Handling

EOpod includes built-in error handling and retry mechanisms:

  • Automatic retry for failed commands
  • Timeout handling
  • Detailed error logging
  • Rich error output

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eopod-0.0.16.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eopod-0.0.16-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file eopod-0.0.16.tar.gz.

File metadata

  • Download URL: eopod-0.0.16.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-25-generic

File hashes

Hashes for eopod-0.0.16.tar.gz
Algorithm Hash digest
SHA256 9dafc02a03f09d93602973252a9c78cc8409f180d81d602c6f369ae29038fb1d
MD5 bef3412c0be4bfc0cd190a823875b970
BLAKE2b-256 d8a7d4cd93ec33fd452f8e939eba3193ba19e8dde3ec1af37b596150f74da0d9

See more details on using hashes here.

File details

Details for the file eopod-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: eopod-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-25-generic

File hashes

Hashes for eopod-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f76c3231a8d72fdce120f15c382c76b47fda08634277410eccf5a4dbf6108108
MD5 f7457bcea2067dd0a46e22afb6e3a9d1
BLAKE2b-256 a92aef9a333626e69800b1148a58f0904b224da60817b0de29e33ebb27855f2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page