Skip to main content

eopod is a streamlined command execution tool designed to run and manage operations on Google Cloud Pods efficiently

Project description

eopod: Enhanced TPU Command Runner

eopod is a command-line tool designed to simplify and enhance interaction with Google Cloud TPU VMs. It provides real-time output streaming, background process management, and robust error handling.

Features

  • Configuration Management: Easily configure eopod with your Google Cloud project ID, zone, and TPU name.
  • Command Execution: Run commands on TPU VMs with advanced features like retries, delays, timeouts, and worker selection.
  • Interactive Mode (Experimental): Run commands in an interactive SSH session (use with caution).
  • Command History: View a history of executed commands, their status, and truncated output.
  • Error Logging: Detailed error logs are maintained for debugging failed commands.
  • Rich Output: Utilizes the rich library for visually appealing and informative output in the console.

Installation

pip install eopod

Configuration

Before using eopod, configure it with your Google Cloud credentials:

eopod configure --project-id YOUR_PROJECT_ID --zone YOUR_ZONE --tpu-name YOUR_TPU_NAME

Usage Examples

Basic Command Execution

Commands are executed with real-time output streaming by default:

# Simple command
eopod run echo "Hello TPU"

# Run Python script
eopod run python train.py --batch-size 32

# Complex commands with pipes and redirections
eopod run "cat data.txt | grep error > errors.log"

# Commands with multiple arguments
eopod run ls -la /path/to/dir

Background Processes

Run long-running tasks in the background:

# Start training in background
eopod run python long_training.py --epochs 1000 --background

# Check background processes
eopod check-background

# Check specific process
eopod check-background 12345

# Kill a background process
eopod kill 12345

# Force kill if necessary
eopod kill 12345 --force

Worker-Specific Commands

Execute commands on specific workers:

# Run on specific worker
eopod run nvidia-smi --worker 0

# Run on all workers (default)
eopod run hostname --worker all

Advanced Options

# Disable output streaming
eopod run python script.py --no-stream

# Set custom retry count
eopod run python train.py --retry 5

# Set custom retry delay
eopod run python train.py --delay 10

# Set custom timeout
eopod run python train.py --timeout 600

Kill and free TPU process

# Kill all TPU processes
eopod kill-tpu

# Force kill all TPU processes
eopod kill-tpu --force

# Kill specific PID(s)
eopod kill-tpu --pid 1234 --pid 5678

# Kill processes on specific worker
eopod kill-tpu --worker 0

Viewing History and Logs

# View command history
eopod history

# View error logs
eopod errors

# View current configuration
eopod show-config

Command Reference

Main Commands

  • run: Execute commands on TPU VM

    eopod run [OPTIONS] COMMAND [ARGS]...
    

    Options:

    • --worker TEXT: Specific worker or "all" (default: "all")
    • --retry INTEGER: Number of retries for failed commands (default: 3)
    • --delay INTEGER: Delay between retries in seconds (default: 5)
    • --timeout INTEGER: Command timeout in seconds (default: 300)
    • --no-stream: Disable output streaming
    • --background: Run command in background
  • configure: Set up eopod configuration

    eopod configure --project-id ID --zone ZONE --tpu-name NAME
    
  • status: Check TPU status

    eopod status
    
  • check-background: Check background processes

    eopod check-background [PID]
    
  • kill: Kill background processes

    eopod kill PID [--force]
    

Utility Commands

  • history: View command execution history
  • errors: View error logs
  • show-config: Display current configuration

File Locations

  • Configuration: ~/.eopod/config.ini
  • Command history: ~/.eopod/history.yaml
  • Error logs: ~/.eopod/error_log.yaml
  • Application logs: ~/.eopod/eopod.log

Error Handling

eopod includes built-in error handling and retry mechanisms:

  • Automatic retry for failed commands
  • Timeout handling
  • Detailed error logging
  • Rich error output

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eopod-0.0.22.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eopod-0.0.22-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file eopod-0.0.22.tar.gz.

File metadata

  • Download URL: eopod-0.0.22.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for eopod-0.0.22.tar.gz
Algorithm Hash digest
SHA256 39c984e1d411ca542dcaadb0dd191dfaaec72d1677d62e022aa16a25b8c0d071
MD5 bb5e8f3682d6f6e02bc45930b6190f04
BLAKE2b-256 97584881596efc575d085fe16b68a578c401302faeacabc3ffb27d5f250a2a0f

See more details on using hashes here.

File details

Details for the file eopod-0.0.22-py3-none-any.whl.

File metadata

  • Download URL: eopod-0.0.22-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for eopod-0.0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 a9665774aa373b4904360092a0e98d93fd9e09db3e8a3e3065a94a3ce9927c7a
MD5 05c4de8139ccb33f964b71ee654f8334
BLAKE2b-256 92eb7c8bfe51a3cd83e9942587a2413cf7a493da73e49983b7b954af8cf1b918

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page