Skip to main content

Remote Execution Framework

Project description


A sweet tool for Remote Execution.

What is the chillest way one can train models in remote machines?

  • Do not worry about environment setup (dependencies)
  • Don't bother choosing an instance to run on
  • No more bash scripts to copy files back and forth

Principles

re provides a suite of features that empowers the user to focus on the experiments without having to worry about boring details listed above.

  • Almost zero conf
  • Abstract away boring repetitive details
  • Ease of execution

Conventions

You do need to follow a couple of conventions.

  • Data goes into data/
  • Any non-python file that is necessary for remote execution should be added to .recompute/include
  • Any python file that shouldn't be pushed to remote machine should be added to .recompute/exclude

Setup

# install sshpass
brew install http://git.io/sshpass.rb
pip install --user recompute

Configuration

The configuration file is super-short.

[general]
instance = 0
remote_home = projects/

[instance 0]
username = grenouille
host = grasse.local
password = hen0s3datru1h

You can add credentials for remote machines directly into the configuration file or add them sequentially via command-line re sshadd --instance='user@remotehost'.

Workflow

My machine learning workflow follows these steps:

  • Copy code to remote machine rsync
  • Setup dependencies pip install
  • Download dataset and place them in data/
  • Execute code in remote machine
  • Get execution log
  • Copy binaries generated bin/

wiht re, the tasks listed above can be accomplished with 4 commands, as below:

# re sshadd --instance='
re init                        # initalize [rsync, install]
re async "python3 x.py"        # start execution in remote
# (or) re sync "python3 x.py"  # blocking run (wait for completion)
re log                         # after a while
re pull "bin/ ./bin/" .        # pull generated binaries
  • init creates local configuration files, setting up the environment for remote execution
    • Makes a list of local dependencies (python files)
    • Populates requirements.txt with required pypi packages
    • Installs pypi packages in remote machine
    • Copies local dependencies to remote machine using rsync
    • A copy of local folder is created in the remote machine, under ~/projects/
  • We could start execution in remote machine and wait for it to complete by using sync mode or just start remote execution and move on, using async mode
    • The command to be executed in remote machine, should be given as a string next to sync or async mode
  • re log fetches log from remote machine
  • re pull pulls any file from remote machine
    • Files are addressed by their relative paths

Logging

re redirects the stdout and stderr of remote execution into <project-name>.log, which could be pulled to local machine by running re log. More often than not, it takes a while for execution to complete. So we start the execution in remote machine and check the log once in a while using re log. Or you could put this "once in a while" as a command-line argument and re pulls the log and shows you every "once in a while". It is recommended to use logging module to print information onto stdout, instead of print statements.

# fetch log from remote machine
re log
# . start execution in remote machine
# .. fetch log
re async "python3 nn.py"
re log
# . start execution 
# .. pull log every 20 seconds
re async "python3 nn.py"
re log --loop=20

rsync

Files (local dependencies) can be synchronized by using rsync command. rsync is run in the background which copies files listed in .recompute/rsync.db to remote machine. --force switch forces re to figure out the local dependencies and update rsync.db.

re rsync  # --force updates .recompute/rsync.db

Dependencies

requirements.txt is populated with python packages necessary for execution (uses pipreqs behind the scenes). re install reads requirements.txt and installs the packages in remote system.

# install dependencies
re install  # --force updates requirements.txt
# manual install
re install "torch tqdm"

Manages Processes

re keeps track of all the remote processes it has spawned. We could list them out using list command and selectively kill processes using kill command.

# list live processes
re list
# +-------+--------------+-------+
# | Index |     Name     |  PID  |
# +-------+--------------+-------+
# |   0   |     all      |   *   |
# |   1   | zombie/spawn | 30601 |
# |   2   |    runner    | 31036 |
# +-------+--------------+-------+
# kill process [1]
re kill --idx=1
# kill them all
re purge
# or kill interactively with just `re kill`

Upload/Download

You might wanna download or upload a file just once without having to include it in rsync database. We have push and pull commands. And there is a special command named data which downloads from space separated urls from command-line, into remote machine's data/ directory.

# . upload from local machine to remote
# .. copy [current_dir/x/localfile] to [remote_home/projects/mynn1/x/]
re push "x/localfile x/"
# . download from remote machine to local
# .. copy [remote_home/projects/mynn1/y/remotefile] to [current_dir/y/remotefile]
re pull "y/remotefile y/"
# download IRIS dataset to remote machine's [data/]
re data https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data  # more urls can be added, separated by spaces

Notebook

Sometimes you wanna run code snippets in a notebook. re notebook starts a remote jupyter notebook server and hooks it to a local port. The remote server is tracked (re list) and could be killed whenever necessary.

# . start notebook server in remote machine
# .. hook to local port
re notebook  # Cntl-c to quit

Probe

probe command probes remote machines and provides us with a table of available machines with info on available resources.

re probe
# +--------------------------------+--------+----------+-----------+
# |        Machine                 | Status | GPU (MB) | Disk (MB) |
# +--------------------------------+--------+----------+-----------+
# | grenouille@grasse.local        | active |  10432   |     4238  |
# | slartibartfast@magrathea.local | active |   8642   |    12012  |
# +--------------------------------+--------+----------+-----------+

Manual

re man gives you a detailed manual.

Mode Description Options Example
init Setup current directory for remote execution --instance-idx re init
re init --instance-idx=1
rsync Use rsync to synchronize local files with remote --force re rsync
sshadd Add a new instance to config --instance re sshadd --instance="usr@host"
install Install pypi packages in requirements.txt in remote cmd, --force re install
re install "pytorch tqdm"
sync Synchronous execution of "args.cmd" in remote cmd, --force, --rsync re sync "python3 x.py"
async Asynchronous execution of "args.cmd" in remote cmd, --force, --rsync re async "python3 x.py"
log Fetch log from remote machine --loop, --filter re log
re log --loop=2
re log --filter="pattern"
list List out processes alive in remote machine --force re list
kill Kill a process by index --idx re kill
re kill --idx=1
purge Kill all remote process that are alive None re purge
ssh Create an ssh session in remote machine None re ssh
notebook Create jupyter notebook in remote machine --run-async re notebook
push Upload file to remote machine cmd re push "x.py y/"
pull Download file from remote machine cmd re pull "y/z.py ."
data Download data from web into data/ folder of remote cmd re data "url1 url2 url3"
man Show this man page None re man

Contribution

All kinds of contribution are welcome.

  • Somethin went wrong?
  • What feature is missing?
  • What could be done better?

Raise an issue. Add a pull request.

License

Copyright (c) 2019 Suriyadeepan Ramamoorthy. All rights reserved.

This work is licensed under the terms of the MIT license.
For a copy, see https://opensource.org/licenses/MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recompute-0.9.14.tar.gz (24.6 kB view hashes)

Uploaded Source

Built Distributions

recompute-0.9.14-py3.6.egg (50.6 kB view hashes)

Uploaded Source

recompute-0.9.14-py3-none-any.whl (25.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page