No project description provided
Project description
nshrunner
nshrunner is a Python library that provides a unified way to run functions in various environments, such as local dev machines, cloud VMs, and SLURM clusters. It was created to simplify the process of running ML training jobs across multiple machines and environments.
Motivation
When running ML training jobs on different machines and environments, it can be challenging to manage the specifics of each environment. nshrunner was developed to address this issue by providing a single function that can be used to run jobs on any supported environment without having to worry about the details of each environment.
Features
- Supports running functions locally, on SLURM clusters, and in GNU Screen sessions
- Provides a unified interface for running functions across different environments
- Allows for easy configuration of job options, such as resource requirements and environment variables
- Supports snapshotting the environment to ensure reproducibility, using the
nshsnaplibrary - Provides utilities for logging, seeding, and signal handling
Installation
nshrunner can be installed using pip:
pip install nshrunner
Usage
Here's a simple example showing the different ways to run a function:
import nshrunner as R
def train_model(batch_size: int, learning_rate: float):
# Training logic here
return {"accuracy": 0.95}
# Define runs with different hyperparameters
runs = [
(32, 0.001), # (batch_size, learning_rate)
(64, 0.0005),
]
# Run locally
results = R.run_local(train_model, runs)
# Run in a GNU Screen session
R.submit_screen(
train_model,
runs,
screen={
"name": "training",
"logging": {
"output_file": "logs/output.log",
"error_file": "logs/error.log"
},
"attach": False # Run detached
}
)
# Run on SLURM
R.submit_slurm(
train_model,
runs,
slurm={
"name": "training",
"partition": "gpu",
"resources": {
"nodes": 1,
"cpus": 4,
"gpus": 1,
"memory_gb": 32,
"time": "12:00:00"
},
"output_dir": "logs"
}
)
The library provides a consistent interface across different execution environments while handling the complexities of:
- Job submission and management
- Resource allocation
- Environment setup
- Output logging
- Error handling
For more advanced usage, you can configure additional options like:
# Configure environment snapshot for reproducibility
R.submit_slurm(
train_model,
runs,
runner={
"working_dir": "experiments",
"snapshot": True, # Snapshot code and dependencies
"seed": {"seed": 42} # Set random seeds
},
slurm={...}
)
Contributing
Contributions are welcome! For feature requests, bug reports, or questions, please open an issue on GitHub. If you'd like to contribute code, please submit a pull request with your changes.
License
nshrunner is released under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nshrunner-1.6.0.tar.gz.
File metadata
- Download URL: nshrunner-1.6.0.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.8.0-60-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
586ae479e6a8aa8af446f45a2fde3a7a39f37763757a85085cc9ee0ec8659f63
|
|
| MD5 |
97266a82a645f659c61df70ea1bbaf39
|
|
| BLAKE2b-256 |
fb925ad93818edd8bcbdf6b072518b5732335efd92da0d6a1be28c25f16c86df
|
File details
Details for the file nshrunner-1.6.0-py3-none-any.whl.
File metadata
- Download URL: nshrunner-1.6.0-py3-none-any.whl
- Upload date:
- Size: 56.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.8.0-60-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20e0a6062bf4fdf37d59ca304bb671db07950858a484d5e786493c7344dbb084
|
|
| MD5 |
985a7519d78ed799a8355719817acadc
|
|
| BLAKE2b-256 |
68f44b74be744c75341936b2815d2ec287e63529f7c1811dc5121e7e15774cc4
|