A CLI/SDK to run remote scripts on ec2 via ssh/scp
Project description
Remote Runner
Remote Runner is an easy pythonic way to migrate your python training scripts from a local environment to a powerful cloud-backed instance to efficiently scale your training, save cost & time, and iterate quickly on experiments in a parallel containerized way.
How does Remote Runner work?
- Creating all required cloud resources
- Migrating your script to the remote machine
- Executing your script
- making sure the instance is terminated again.
Getting started
pip install rm-runner
Permissons
To use EC2RemoteRunner you need to following permissions:
- create/delete keypairs
- create/delete secruity groups
- add inbound/ingress rules to security groups
- create/start/terminate instances (with ebs)
Habana Gaudi example
from rm_runner import EC2RemoteRunner
runner = EC2RemoteRunner(instance_type="dl1.24xlarge", profile="hf-sm", region="us-east-1")
runner.launch(command="hl-smi")
expected output
2022-07-21 10:00:09,898 | INFO | Found credentials in shared credentials file: ~/.aws/credentials
2022-07-21 10:00:10,812 | INFO | Created key pair: rm-runner-abdk
2022-07-21 10:00:11,621 | INFO | Created security group: rm-runner-abdk
2022-07-21 10:00:13,227 | INFO | Launched instance: i-03dcc3b5f53cb946a
2022-07-21 10:00:13,230 | INFO | Waiting for instance to be ready...
2022-07-21 10:00:29,252 | INFO | Instance is ready. Public DNS: ec2-3-93-4-123.compute-1.amazonaws.com
2022-07-21 10:00:29,267 | INFO | Setting up ssh connection...
2022-07-21 10:01:49,292 | INFO | Setting up ssh connection...
2022-07-21 10:02:05,434 | INFO | Setting up ssh connection...
2022-07-21 10:02:10,542 | INFO | Setting up ssh connection...
2022-07-21 10:02:10,766 | INFO | Connected (version 2.0, client OpenSSH_8.2p1)
2022-07-21 10:02:11,840 | INFO | Authentication (publickey) successful!
2022-07-21 10:02:11,840 | INFO | Pulling container: vault.habana.ai/gaudi-docker/1.4.1/ubuntu20.04/habanalabs/pytorch-installer-1.10.2:1.4.1-11...
2022-07-21 10:02:20,460 | INFO | Executing: docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host -v /home/ubuntu:/home/ubuntu/rm-runner --workdir=/home/ubuntu/rm-runner vault.habana.ai/gaudi-docker/1.4.1/ubuntu20.04/habanalabs/pytorch-installer-1.10.2:1.4.1-11 hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version: hl-1.4.1-rc-fw-35.0.2.0 |
| Driver Version: 1.4.0-d8f95f4 |
|-------------------------------+----------------------+----------------------+
| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. |
|===============================+======================+======================|
| 0 HL-205 N/A | 0000:20:1d.0 N/A | 0 |
| N/A 35C N/A 102W / 350W | 512Mib / 32768Mib | 2% N/A |
|-------------------------------+----------------------+----------------------+
| 1 HL-205 N/A | 0000:a0:1d.0 N/A | 0 |
| N/A 36C N/A 101W / 350W | 512Mib / 32768Mib | 1% N/A |
|-------------------------------+----------------------+----------------------+
| 2 HL-205 N/A | 0000:a0:1e.0 N/A | 0 |
| N/A 33C N/A 105W / 350W | 512Mib / 32768Mib | 3% N/A |
|-------------------------------+----------------------+----------------------+
| 3 HL-205 N/A | 0000:90:1d.0 N/A | 0 |
| N/A 32C N/A 97W / 350W | 512Mib / 32768Mib | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 4 HL-205 N/A | 0000:90:1e.0 N/A | 0 |
| N/A 35C N/A 101W / 350W | 512Mib / 32768Mib | 1% N/A |
|-------------------------------+----------------------+----------------------+
| 5 HL-205 N/A | 0000:10:1d.0 N/A | 0 |
| N/A 34C N/A 93W / 350W | 512Mib / 32768Mib | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 6 HL-205 N/A | 0000:10:1e.0 N/A | 0 |
| N/A 36C N/A 108W / 350W | 512Mib / 32768Mib | 4% N/A |
|-------------------------------+----------------------+----------------------+
| 7 HL-205 N/A | 0000:20:1e.0 N/A | 0 |
| N/A 33C N/A 101W / 350W | 512Mib / 32768Mib | 1% N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes: AIP Memory |
| AIP PID Type Process name Usage |
|=============================================================================|
| 0 N/A N/A N/A N/A |
| 1 N/A N/A N/A N/A |
| 2 N/A N/A N/A N/A |
| 3 N/A N/A N/A N/A |
| 4 N/A N/A N/A N/A |
| 5 N/A N/A N/A N/A |
| 6 N/A N/A N/A N/A |
| 7 N/A N/A N/A N/A |
+=============================================================================+
2022-07-21 10:04:00,641 | INFO | Terminating instance: i-03dcc3b5f53cb946a
2022-07-21 10:05:48,297 | INFO | Deleting security group: rm-runner-abdk
2022-07-21 10:05:49,891 | INFO | Deleting key: rm-runner-abdk
2022-07-21 13:29:12,489 | INFO | Total time: 302s
2022-07-21 13:29:12,489 | INFO | Startup time: 165s
2022-07-21 13:29:12,490 | INFO | Execution time: 43s
2022-07-21 13:29:12,490 | INFO | Termination time: 94s
2022-07-21 13:29:12,490 | INFO | Estimated cost: $1.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rm_runner-0.1.0.tar.gz.
File metadata
- Download URL: rm_runner-0.1.0.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bb34bff8d5d65c54fa83a2e7c5c03d25966fbfca6a3f9ad807511c46f7febac
|
|
| MD5 |
ddde47a47a0b0008ba3980d8f841ee6f
|
|
| BLAKE2b-256 |
8d06ede86a2dc849104d3b5c2ba8abb74e798176f8a1ef51d986f0bbfba8d6b5
|
File details
Details for the file rm_runner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rm_runner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
baece7636ed67899043e02a9fb96ba9ac60669ba77a0fa7353acdc387d2bd04e
|
|
| MD5 |
9cd28a9dd48467428ba17af037873e60
|
|
| BLAKE2b-256 |
90375943f89afa953fe00b5a7c1dd73d42df1564bb189daed2a74d50f7d007ce
|