Skip to main content

A simple Linux command-line utility which submits a job to one of the multiple servers

Project description

ΣΣJob

PyPI version Downloads License

ΣΣJob or SumsJob (Simple Utility for Multiple-Servers Job Submission) is a simple Linux command-line utility which submits a job to one of the multiple servers each with limited resources such as GPUs. ΣΣJob provides similar key functions for multiple servers as Slurm Workload Manager for supercomputers and computer clusters. It provides the following key functions:

  • show the status of GPUs on all servers,
  • submit a job to servers in noninteractive mode, i.e., the job will be running in the background of the server,
  • submit a job to servers in interactive mode, just as the job is running in your local machine,
  • display all running jobs,
  • cancel running jobs.

Motivation

Assume you have a few GPU servers: server1, server2, ... When you need to run a code from your computer, you will

  1. Select one server and log in

    $ ssh LAN (You may need to first log in a local area network)
    $ ssh server1
    
  2. Check GPU status. If no free GPU, go to step 1

    $ nvidia-smi or $ gpustat

  3. Copy the code from your computer to the server

    $ scp -r codes server1:~/project/codes
    
  4. Run the code in the server

    $ cd ~/project/codes
    $ CUDA_VISIBLE_DEVICES=0 python main.py
    
  5. Transfer back the results

    $ scp server1:~/project/codes/results.dat .
    

These steps are boring. ΣΣJob makes all these steps automatic.

Features

  • Simple to use
  • Two modes: noninteractive mode, and interactive mode
  • Noninteractive mode: the job will be running in the background of the server
    • You can turn off your local machine
  • Interactive mode: just as the job is running in your local machine
    • Display the output of the program in the terminal of your local machine in real time
    • Kill the job by Ctrl-C

Usage

$ gpuresource

Show the status of GPUs on all servers. For example,

$ submit jobfile [jobname]

Submit a job to (GPU) servers. Automatically do the following steps:

  1. Find a server with free GPU. You can specify the server and GPU ID by -s SERVER and --gpuid GPUID.
  2. Copy the code to the server.
  3. Run the job on it in noninteractive mode (default) or interactive mode (with -i).
  4. Save the output in a log file.
  5. For interactive mode, when the code finishes, transfer back the result files and the log file.
  • jobfile : File to be run
  • jobname : Job name, and also the folder name of the job. If not provided, a random number will be used.

Options:

  • -h, --help : Show this help message and exit
  • -i, --interact : Submit as an interactive job
  • -s SERVER, --server SERVER : Server host name
  • --gpuid GPUID : GPU ID to be used; -1 to use CPU only

$ sacct

Display all running jobs ordered by the start time. For example,

$ scancel jobname

Cancel a running job.

  • jobname : Job name.

Installation

Install SumsJob with pip:

$ pip install sumsjob

You also need to do the following:

  • Make sure you can ssh to each server, ideally without typing the password by SSH keys.
  • Install gpustat in each server.
  • Have a configuration file at ~/.sumsjob/config.py. Use config.py as a template, and modify the values to your configurations.
  • Make sure ~/.local/bin is in your $PATH.

Then run gpuresource to check if everything works.

License

GNU GPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SumsJob-0.5.0.tar.gz (19.5 kB view hashes)

Uploaded Source

Built Distribution

SumsJob-0.5.0-py3-none-any.whl (21.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page