Shared utilities for the Cloudmesh AI ecosystem, providing system introspection, structured telemetry, standardized logging, and unified remote execution.
Project description
Cloudmesh AI Common
cloudmesh-ai-common provides a set of shared utilities for the Cloudmesh AI ecosystem, focusing on system introspection, structured telemetry, standardized logging, and unified remote execution.
Installation
Using pip
pip install cloudmesh-ai-common
Using pipx (for isolated environment)
pipx install cloudmesh-ai-common
Features
1. Unified Remote Execution (cloudmesh.ai.common.remote)
A robust wrapper around paramiko for standardized SSH and SFTP operations across different host environments.
- Command Execution: Execute shell commands with timeout support and capture exit status, stdout, and stderr.
- File Transfers: Simplified
uploadanddownloadmethods using SFTP. - Direct Remote Writing:
write_remote_fileallows writing strings directly to remote paths, ideal for deploying scripts or configuration files without local temporary files. - Session Management: Implemented as a context manager to ensure connections are always closed properly.
2. Enhanced Console & I/O (cloudmesh.ai.common.io)
A set of utilities to standardize user interaction and file handling.
- Rich Console: A custom
Consoleclass extendingrich.console.Consolewith semantic helpers:ok(): Success messages in green.error(): Error messages in red.warning(): Warning messages in yellow.msg(): General messages in blue.note(): Informational notes in cyan.ynchoice(): Standardized yes/no interactive prompts.
- Path Expansion:
path_expand()resolves~, environment variables, and relative paths into absolute POSIX paths. - YAML Helpers: Safe
load_yamlanddump_yamlutilities that handle directory creation automatically. - Visual Banners:
banner()utility to create structured, styled panels for high-level information.
3. Enhanced System Introspection (cloudmesh.ai.common.sys)
The system utility provides deep insights into the host hardware and environment.
- Hardware Detection:
- GPU: Automatic detection for NVIDIA, AMD, Apple Silicon, and Intel GPUs.
- Topology: Detection of NUMA nodes and high-speed network interfaces (InfiniBand/RoCE).
- CPU: Detailed processor descriptions across macOS, Windows, and Linux.
- Thermal Monitoring: Standardized CPU temperature retrieval across Linux and macOS to detect thermal throttling.
- Container Awareness: Automatic detection of Docker and Kubernetes environments, including extraction of K8s namespace metadata.
- Real-time Metrics:
- Live CPU utilization (overall and per-core).
- Detailed memory and swap usage.
- Disk I/O statistics and usage.
- System Info: The
systeminfo()function collects a comprehensive snapshot of the environment. Userealtime=Trueto include live performance metrics.
4. Advanced Telemetry System (cloudmesh.ai.common.telemetry)
A pluggable telemetry framework for emitting structured performance and status data from AI commands.
- Pluggable Backends:
JSONFileBackend: Emits records in JSONL format for easy ingestion by log aggregators.SQLiteBackend: Stores telemetry in a structured SQLite database for complex querying.TextBackend: Provides human-readable summaries for developers and operators.
- Automatic Tracking: The
track()context manager automatically handles start/complete/fail events and calculates task duration. - Async Support: The
AsyncTelemetryclass allows non-blocking telemetry emission usingasyncio, ensuring that I/O operations do not interfere with compute-intensive AI tasks. - Standardized Events: Built-in helpers for
start(),complete(), andfail()events.
5. Telemetry Aggregation (cloudmesh.ai.common.aggregation)
The TelemetryAggregator utility allows for the analysis of emitted telemetry data.
- Multi-source Loading: Load data from either JSONL files or SQLite databases.
- Statistical Summaries: Calculate success rates, status distributions, and command frequency.
- Metric Analysis: Aggregate specific KPIs to find average, minimum, and maximum values across multiple runs.
6. Standardized Logging (cloudmesh.ai.common.logging)
Provides a consistent logging interface across all AI components to ensure uniform log formatting and traceability.
Usage Examples
Remote Execution
from cloudmesh.ai.common.remote import RemoteExecutor
host = "dgx-node-1"
with RemoteExecutor(host) as executor:
# Execute a command
status, stdout, stderr = executor.execute("nvidia-smi")
if status == 0:
print(f"GPU Status: {stdout}")
# Write a script directly to the remote host
script_content = "#!/bin/bash\necho 'Hello from remote'"
executor.write_remote_file(script_content, "/tmp/hello.sh")
executor.execute("chmod +x /tmp/hello.sh")
executor.execute("/tmp/hello.sh")
Enhanced Console & I/O
from cloudmesh.ai.common.io import console, banner, path_expand
# Use semantic console methods
console.ok("Operation completed successfully!")
console.error("Failed to connect to server")
# Interactive yes/no choice
if console.ynchoice("Do you want to proceed?", default=True):
console.msg("Proceeding...")
# Path expansion
abs_path = path_expand("~/$PROJECT/data.csv")
print(f"Absolute path: {abs_path}")
# Visual banners
console.print(banner("System Status", "All systems operational\nLatency: 12ms"))
System Info
from cloudmesh.ai.common import sys as ai_sys
# Get static and real-time system info
info = ai_sys.systeminfo(realtime=True)
print(f"CPU: {info['cpu']}")
print(f"GPU Present: {info['gpu.present']}")
print(f"CPU Temp: {info.get('cpu.temp')}")
print(f"Container: {info.get('container.type', 'none')}")
Synchronous Telemetry with Context Manager
from cloudmesh.ai.common.telemetry import Telemetry, SQLiteBackend, TextBackend
telemetry = Telemetry(
"my-ai-command",
backends=[SQLiteBackend("metrics.db"), TextBackend()]
)
# Use the track context manager for automatic timing and error handling
with telemetry.track(message="Running inference", metrics={"model": "llama-3"}):
# ... perform work ...
# If an exception occurs here, it is automatically logged as a 'failed' event
pass
Asynchronous Telemetry
import asyncio
from cloudmesh.ai.common.telemetry import AsyncTelemetry, JSONFileBackend
async def run_command():
telemetry = AsyncTelemetry(
"async-ai-task",
backends=[JSONFileBackend("async_metrics.jsonl")]
)
async with telemetry.track(message="Processing async request"):
# ... perform async work ...
await asyncio.sleep(0.1)
asyncio.run(run_command())
Telemetry Aggregation
from cloudmesh.ai.common.aggregation import TelemetryAggregator
# Analyze data from a SQLite database
agg = TelemetryAggregator("metrics.db")
# Get high-level summary
summary = agg.get_summary()
print(f"Success Rate: {summary['success_rate']}")
# Aggregate a specific metric across all runs
latency_stats = agg.aggregate_metric("duration_sec")
print(f"Average Duration: {latency_stats['avg']}s")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cloudmesh_ai_common-7.1.3-py3-none-any.whl.
File metadata
- Download URL: cloudmesh_ai_common-7.1.3-py3-none-any.whl
- Upload date:
- Size: 60.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
325ec8115e6e17a5c3429946d107bd3edaa16ad164eea86b0738b60c5dd33d79
|
|
| MD5 |
cffc22ac620a4f02bc95496e0cb7991a
|
|
| BLAKE2b-256 |
1bf01cc46161489eac14f0e68ebb10517db4f0c43182bd82f4d42dff554914ba
|