Paper - Pytorch

These details have not been verified by PyPI

Project links

Project description

ClusterOps

ClusterOps is an enterprise-grade Python library developed and maintained by the Swarms Team to help you manage and execute agents on specific CPUs and GPUs across clusters. This tool enables advanced CPU and GPU selection, dynamic task allocation, and resource monitoring, making it ideal for high-performance distributed computing environments.

Features

CPU Execution: Dynamically assign tasks to specific CPU cores.
GPU Execution: Execute tasks on specific GPUs or dynamically select the best available GPU based on memory usage.
Fault Tolerance: Built-in retry logic with exponential backoff for handling transient errors.
Resource Monitoring: Real-time CPU and GPU resource monitoring (e.g., free memory on GPUs).
Logging: Advanced logging configuration with customizable log levels (DEBUG, INFO, ERROR).
Scalability: Supports multi-GPU task execution with Ray for distributed computation.

Installation
Quick Start
Usage
Configuration
Contributing
License

Installation

pip3 install -U clusterops

Quick Start

The following example demonstrates how to use ClusterOps to run tasks on specific CPUs and GPUs.

from clusterops import execute_with_cpu_cores, execute_on_gpu, retry_with_backoff

# Sample function to execute
def sample_task(n: int) -> int:
    return n * n

# Run the task on 4 CPU cores
result_cpu = execute_with_cpu_cores(4, sample_task, 10)
print(f"Result on CPU cores: {result_cpu}")

# Run the task on the best available GPU with retries
result_gpu = retry_with_backoff(execute_on_gpu, None, sample_task, 10)
print(f"Result on GPU: {result_gpu}")

Usage

Executing on Specific CPUs

You can execute a task on a specific number of CPU cores using the execute_with_cpu_cores() function. It automatically adjusts CPU affinity on systems where this feature is supported.

from clusterops import execute_with_cpu_cores

def sample_task(n: int) -> int:
    return n * n

# Execute the task using 4 CPU cores
result = execute_with_cpu_cores(4, sample_task, 10)
print(f"Result on 4 CPU cores: {result}")

Executing on Specific GPUs

ClusterOps supports running tasks on specific GPUs or dynamically selecting the best available GPU (based on free memory).

from clusterops import execute_on_gpu

def sample_task(n: int) -> int:
    return n * n

# Execute the task on GPU with ID 1
result = execute_on_gpu(1, sample_task, 10)
print(f"Result on GPU 1: {result}")

# Execute the task on the best available GPU
result_best_gpu = execute_on_gpu(None, sample_task, 10)
print(f"Result on best available GPU: {result_best_gpu}")

Retry Logic and Fault Tolerance

For production environments, ClusterOps includes retry logic with exponential backoff, which retries a task in case of failures.

from clusterops import retry_with_backoff, execute_on_gpu

# Run task on the best GPU with retry logic
result = retry_with_backoff(execute_on_gpu, None, sample_task, 10)
print(f"Result with retry: {result}")

Configuration

ClusterOps provides configuration through environment variables, making it adaptable for different environments (development, staging, production).

Environment Variables

LOG_LEVEL: Configures logging verbosity. Options: DEBUG, INFO, ERROR. Default is INFO.
RETRY_COUNT: Number of times to retry a task in case of failure. Default is 3.
RETRY_DELAY: Initial delay in seconds before retrying. Default is 1 second.

Set these variables in your environment:

export LOG_LEVEL=DEBUG
export RETRY_COUNT=5
export RETRY_DELAY=2.0

Contributing

We welcome contributions to ClusterOps! If you'd like to contribute, please follow these steps:

Fork the repository on GitHub.

Clone your fork locally:

git clone https://github.com/your-username/clusterops.git
cd clusterops

Create a feature branch for your changes:
```
git checkout -b feature/new-feature
```
Install the development dependencies:
```
pip install -r dev-requirements.txt
```
Make your changes, and be sure to include tests.
Run tests to ensure everything works:
```
pytest
```

Commit your changes and push them to GitHub:

git commit -m "Add new feature"
git push origin feature/new-feature

Submit a pull request on GitHub, and we’ll review it as soon as possible.

Reporting Issues

If you encounter any issues, please create a GitHub issue.

License

ClusterOps is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions, feedback, or contributions, please contact the Swarms Team at contact@swarms.world.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.6

Oct 31, 2024

0.0.5

Oct 21, 2024

0.0.4

Oct 21, 2024

This version

0.0.3

Oct 9, 2024

0.0.1

Oct 9, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clusterops-0.0.3.tar.gz (6.1 kB view hashes)

Uploaded Oct 9, 2024 Source

Built Distribution

clusterops-0.0.3-py3-none-any.whl (6.4 kB view hashes)

Uploaded Oct 9, 2024 Python 3

Hashes for clusterops-0.0.3.tar.gz

Hashes for clusterops-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`749216f5a19e08fe689c422f17de29ef3be2da4715f7d54c668fc38dcc925e76`
MD5	`c78ad9ec45d528fe3a0222c940551685`
BLAKE2b-256	`81c006305558dcd34b1840b21d452f828a4bf5197b47d4f49c7087335ee1fa6e`

Hashes for clusterops-0.0.3-py3-none-any.whl

Hashes for clusterops-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f6f0717b8a38cff31203fbc648a27cab006109ac6122a43e27878cbeead1adda`
MD5	`c06ff9d9ff7339cfaac08519da3afc77`
BLAKE2b-256	`2f2be82b639b93d40836d0626ce3939e288677c7fd3742f397a536c940b53a49`