Remote Computation Framework
Project description
A sweet tool for Remote Execution.
What is the chillest way one can train models in remote machines?
- Do not worry about environment setup (dependencies)
- Don't bother choosing an instance to run on
- No more bash scripts to copy files back and forth
Principles
re provides a suite of features that empowers the user to focus on the experiments without having to worry about boring details listed above.
- Almost zero conf
- Abstract away boring repetitive details
- Ease of execution
Conventions
You do need to follow a couple of conventions.
- Data goes into
data/ - Any non-python file that is necessary for remote execution should be added to
.recompute/include - Any python file that shouldn't be pushed to remote machine should be added to
.recompute/exclude
Setup
pip install --user recompute
Configuration
The configuration file is super-short.
[general]
instance = 0
remote_home = projects/
[instance 0]
username = grenouille
host = grasse.local
password = hen0s3datru1h
You can add credentials for remote machines directly into the configuration file or add them sequentially via command-line re sshadd --instance='user@remotehost'.
Workflow
My machine learning workflow follows these steps:
- Copy code to remote machine
rsync - Setup dependencies
pip install - Download dataset and place them in
data/ - Execute code in remote machine
- Get execution log
- Copy binaries generated
bin/
wiht re, the tasks listed above can be accomplished with 4 commands, as below:
# re sshadd --instance='
re init # initalize [rsync, install]
re async "python3 x.py" # start execution in remote
# (or) re sync "python3 x.py" # blocking run (wait for completion)
re log # after a while
re pull "bin/ ./bin/" . # pull generated binaries
initcreates local configuration files, setting up the environment for remote execution- Makes a list of local dependencies (python files)
- Populates
requirements.txtwith required pypi packages - Installs pypi packages in remote machine
- Copies local dependencies to remote machine using
rsync - A copy of local folder is created in the remote machine, under
~/projects/
- We could start execution in remote machine and wait for it to complete by using
syncmode or just start remote execution and move on, usingasyncmode- The command to be executed in remote machine, should be given as a string next to
syncorasyncmode
- The command to be executed in remote machine, should be given as a string next to
re logfetches log from remote machinere pullpulls any file from remote machine- Files are addressed by their relative paths
Logging
re redirects the stdout and stderr of remote execution into <project-name>.log, which could be pulled to local machine by running re log. More often than not, it takes a while for execution to complete. So we start the execution in remote machine and check the log once in a while using re log. Or you could put this "once in a while" as a command-line argument and re pulls the log and shows you every "once in a while". It is recommended to use logging module to print information onto stdout, instead of print statements.
# fetch log from remote machine
re log
# . start execution in remote machine
# .. fetch log
re async "python3 nn.py"
re log
# . start execution
# .. pull log every 20 seconds
re async "python3 nn.py"
re log --loop=20
rsync
Files (local dependencies) can be synchronized by using rsync command. rsync is run in the background which copies files listed in .recompute/rsync.db to remote machine. --force switch forces re to figure out the local dependencies and update rsync.db.
re rsync # --force updates .recompute/rsync.db
Dependencies
requirements.txt is populated with python packages necessary for execution (uses pipreqs behind the scenes). re install reads requirements.txt and installs the packages in remote system.
# install dependencies
re install # --force updates requirements.txt
# manual install
re install "torch tqdm"
Manages Processes
re keeps track of all the remote processes it has spawned. We could list them out using list command and selectively kill processes using kill command.
# list live processes
re list
# +-------+--------------+-------+
# | Index | Name | PID |
# +-------+--------------+-------+
# | 0 | all | * |
# | 1 | zombie/spawn | 30601 |
# | 2 | runner | 31036 |
# +-------+--------------+-------+
# kill process [1]
re kill --idx=1
# kill them all
re purge
# or kill interactively with just `re kill`
Upload/Download
You might wanna download or upload a file just once without having to include it in rsync database. We have push and pull commands. And there is a special command named data which downloads from space separated urls from command-line, into remote machine's data/ directory.
# . upload from local machine to remote
# .. copy [current_dir/x/localfile] to [remote_home/projects/mynn1/x/]
re push "x/localfile x/"
# . download from remote machine to local
# .. copy [remote_home/projects/mynn1/y/remotefile] to [current_dir/y/remotefile]
re pull "y/remotefile y/"
# download IRIS dataset to remote machine's [data/]
re data https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data # more urls can be added, separated by spaces
Notebook
Sometimes you wanna run code snippets in a notebook. re notebook starts a remote jupyter notebook server and hooks it to a local port. The remote server is tracked (re list) and could be killed whenever necessary.
# . start notebook server in remote machine
# .. hook to local port
re notebook # Cntl-c to quit
Probe
probe command probes remote machines and provides us with a table of available machines with info on available resources.
re probe
# +--------------------------------+--------+----------+-----------+
# | Machine | Status | GPU (MB) | Disk (MB) |
# +--------------------------------+--------+----------+-----------+
# | grenouille@grasse.local | active | 10432 | 4238 |
# | slartibartfast@magrathea.local | active | 8642 | 12012 |
# +--------------------------------+--------+----------+-----------+
Manual
re man gives you a detailed manual.
| Mode | Description | Options | Example |
|---|---|---|---|
| init | Setup current directory for remote execution | --instance-idx | re init |
| re init --instance-idx=1 | |||
| rsync | Use rsync to synchronize local files with remote | --force | re rsync |
| sshadd | Add a new instance to config | --instance | re sshadd --instance="usr@host" |
| install | Install pypi packages in requirements.txt in remote | cmd, --force | re install |
| re install "pytorch tqdm" | |||
| sync | Synchronous execution of "args.cmd" in remote | cmd, --force, --rsync | re sync "python3 x.py" |
| async | Asynchronous execution of "args.cmd" in remote | cmd, --force, --rsync | re async "python3 x.py" |
| log | Fetch log from remote machine | --loop, --filter | re log |
| re log --loop=2 | |||
| re log --filter="pattern" | |||
| list | List out processes alive in remote machine | --force | re list |
| kill | Kill a process by index | --idx | re kill |
| re kill --idx=1 | |||
| purge | Kill all remote process that are alive | None | re purge |
| ssh | Create an ssh session in remote machine | None | re ssh |
| notebook | Create jupyter notebook in remote machine | --run-async | re notebook |
| push | Upload file to remote machine | cmd | re push "x.py y/" |
| pull | Download file from remote machine | cmd | re pull "y/z.py ." |
| data | Download data from web into data/ folder of remote | cmd | re data "url1 url2 url3" |
| man | Show this man page | None | re man |
Contribution
All kinds of contribution are welcome.
- Somethin went wrong?
- What feature is missing?
- What could be done better?
Raise an issue. Add a pull request.
License
Copyright (c) 2019 Suriyadeepan Ramamoorthy. All rights reserved.
This work is licensed under the terms of the MIT license.
For a copy, see https://opensource.org/licenses/MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file recompute-0.9.6.tar.gz.
File metadata
- Download URL: recompute-0.9.6.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cf325be82948f92c03881bb45d5dfacb13bb3f017750ba042130fb8da241590
|
|
| MD5 |
24d73da4ba253aa392f22196d68ca1f4
|
|
| BLAKE2b-256 |
68152be34e679f207bb33ab67fa2e9eccf146deb261a029861c67646c79b637b
|