Skip to main content

No project description provided

Project description

nshrunner

nshrunner is a Python library that provides a unified way to run functions in various environments, such as local dev machines, cloud VMs, SLURM clusters, and LSF clusters. It was created to simplify the process of running ML training jobs across multiple machines and environments.

Motivation

When running ML training jobs on different machines and environments, it can be challenging to manage the specifics of each environment. nshrunner was developed to address this issue by providing a single function that can be used to run jobs on any supported environment without having to worry about the details of each environment.

Features

  • Supports running functions locally, on SLURM clusters, and on LSF clusters
  • Provides a unified interface for running functions across different environments
  • Allows for easy configuration of job options, such as resource requirements and environment variables
  • Supports snapshotting the environment to ensure reproducibility, using the nshsnap library
  • Provides utilities for logging, seeding, and signal handling

Installation

nshrunner can be installed using pip:

pip install nshrunner

Usage

Here's a simple example of how to use nshrunner to run a function locally:

import nshrunner as R

def run_fn(x: int):
    return x + 5

runs = [(1,)]

runner = R.Runner(run_fn, R.RunnerConfig(working_dir="."))
list(runner.local(runs))

To run the same function on a SLURM cluster:

runner.submit_slurm(
    runs,
    {
        "partition": "learnaccel",
        "nodes": 4,
        "ntasks_per_node": 8,  # Change this to limit # of GPUs
        "gpus_per_task": 1,
        "cpus_per_task": 1,
    },
    snapshot=True,
)

And on an LSF cluster:

runner.submit_lsf(
    runs,
    {
        "summit": True,
        "queue": "learnaccel",
        "nodes": 4,
        "rs_per_node": 8,  # Change this to limit # of GPUs
    },
    snapshot=True,
)

For more detailed usage examples, please refer to the documentation.

Acknowledgements

nshrunner is heavily inspired by submitit. It builds on submitit's design and adds support for LSF clusters, snapshotting, and other features.

Contributing

Contributions are welcome! For feature requests, bug reports, or questions, please open an issue on GitHub. If you'd like to contribute code, please submit a pull request with your changes.

License

nshrunner is released under the MIT License. See LICENSE for more information.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nshrunner-0.23.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nshrunner-0.23.0-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file nshrunner-0.23.0.tar.gz.

File metadata

  • Download URL: nshrunner-0.23.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.3 Linux/6.8.0-49-generic

File hashes

Hashes for nshrunner-0.23.0.tar.gz
Algorithm Hash digest
SHA256 497f3f5c181e5ccf450dc0582068d41cfc5f4e132590ad5cd5d2db2739f89fb1
MD5 cb8951293acebf6116d76abb1fb8e086
BLAKE2b-256 8fba6bc206a5527d3f15b4111382f967333da06c0041757a2a81b6887e7c69d0

See more details on using hashes here.

File details

Details for the file nshrunner-0.23.0-py3-none-any.whl.

File metadata

  • Download URL: nshrunner-0.23.0-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.3 Linux/6.8.0-49-generic

File hashes

Hashes for nshrunner-0.23.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb081ec3ff1a16e2ccbe87235fbdef8527e5f6c21ed08c3b5407ff9aa97828cf
MD5 86394d841648dfa4a100dca593b0498b
BLAKE2b-256 30ff8c70a3efcf8a725966158913ad36331786a5e97686bcecb55af25b4b312d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page