Run python code on remote servers
Project description
labml_remote
is a very simple tool that lets you setup python and run python on remote computers.
It's mainly intended for deep learning training.
It doesn't use layers and technologies such as docker, terraform or slurm.
It simply SSH's into the remote computers and run commands, and jobs with nohup
,
and synchronises using rsync.
labml_remote
comes with a easy-to-use Commandline Interface.
You can also use the API to launch
customized distributed training sessions.
Here is a sample.
Install from PIP
pip install labml_remote
Initialization
Go to your project folder.
cd [PATH TO YOUR PROJECT FOLDER]
Initialize for remote execution
labml_remote init
Configurations
labml_remote init
asks for your SSH credentials and creates two files .remote/configs.yaml
and .remote/exclude.txt
.
.remote/configs.yaml
keeps the remote configurations for the project.
Here's a sample .remote/configs.yaml
:
name: sample
servers:
primary:
hostname: 3.19.32.53
private_key: ./.remote/private_key
username: ubuntu
secondary:
hostname: ec2-3-20-234-50.us-east-2.compute.amazonaws.com
private_key: ./.remote/private_key
.remote/exclude.txt
is like .gitignore
- it specifies the files and folders that you dont need
to sync up with the remote server. The excludes generated by labml_remote --init
excludes
things like .git
, .remote
, logs
and __pycache__
.
You should edit this if you have things that you don't want to be synced with your remote computer.
CLI
Get the command line interface help with,
labml_remote --help
Use the flag --help
with any command to get the help for that command.
Prepare the servers
labml_remote prepare
This will install Conda on the servers, rsync your project content and install the pip packages,
based on your requirements.txt
or Pipfile
.
Run a command
labml_remote run --cmd 'python my_script.py'
This will execute the command on the server and show you the outputs of it.
Start a job
labml_remote job-run --cmd 'python my_script.py' --tag my-job
List jobs
labml_remote job-list --rsync
--rysnc
flag will sync up the job information from server to your local computer before
listing.
Tail a job output
labml_remote job-tail --tag my-job
This will keep on tailing the output of the job.
Kill jobs
labml_remote job-kill --tag my-job
Launch a PyTorch distributed training session
labml_remote helper-torch-launch --cmd 'train.py' --nproc-per-node 2 --env GLOO_SOCKET_IFNAME enp1s0
Here train.py
is the training script. We are using computers with 2 GPUs, we want two processes per computer
so --nproc-per-node
is 2. --env GLOO_SOCKET_IFNAME enp1s0
sets environment variable GLOO_SOCKET_IFNAME
to
enp1s0
. You can specify multiple environment variables with --env
.
How it works
It sets up miniconda if it is not already installed and create a new environment for the project.
Then it creates a folder by the name of the project inside home folder and synchronises the contents
of your local folder with the remote computer.
It syncs using rsync so subsequent synchronisations only need to send the changes.
Then it installs packages from requirements.txt
or with pipenv if a Pipfile
is found.
It will use pipenv to run your commands if a Pipfile
is present.
The outputs of commands are streamed backed to the local computer and the outputs of jobs redirected to
files on the server which are synchronized back to the local computer using rsync.
What it doesn't do
This won't install things like drivers or CUDA. So if you need them you should pick an image that comes with those for your instance. For example, on AWS pick a deep learning AMI if you want to use an instance with GPUs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file labml_remote-0.1.0.tar.gz
.
File metadata
- Download URL: labml_remote-0.1.0.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e78ba229a3a49867173221ef1976c0964a05950fd133df330271d47dd421c105 |
|
MD5 | 022bca76f49d540f07c77e2cc21e3807 |
|
BLAKE2b-256 | 5a12d1b6d0ec4091975dc10b50d79dff4e73ca979d0987c1cd757a768beaf62a |
File details
Details for the file labml_remote-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: labml_remote-0.1.0-py3-none-any.whl
- Upload date:
- Size: 58.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191030 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d0b0bb615a58b173adbb1414ecfb221994acac57044ab6189456e5ec21a4918 |
|
MD5 | b252cc8f96daa58b4b2de71bbd820aa2 |
|
BLAKE2b-256 | 267376d4708866a35d0f2db3a3026632a8abd9cfe16e886c30daf1818e280513 |