A CLI/SDK to run remote scripts on ec2 via ssh/scp
Project description
Remote Runner
Remote Runner is an easy pythonic way to migrate your python training scripts from a local environment to a powerful cloud-backed instance to efficiently scale your training, save cost & time, and iterate quickly on experiments in a parallel containerized way.
How does Remote Runner work?
- Creating all required cloud resources
- Migrating your script to the remote machine
- Executing your script
- making sure the instance is terminated again.
Getting started
pip install rm-runner
Permissons
To use EC2RemoteRunner
you need to following permissions:
- create/delete keypairs
- create/delete secruity groups
- add inbound/ingress rules to security groups
- create/start/terminate instances (with ebs)
Habana Gaudi example
from rm_runner import EC2RemoteRunner
runner = EC2RemoteRunner(instance_type="dl1.24xlarge", profile="hf-sm", region="us-east-1")
runner.launch(command="hl-smi")
expected output
2022-07-21 10:00:09,898 | INFO | Found credentials in shared credentials file: ~/.aws/credentials
2022-07-21 10:00:10,812 | INFO | Created key pair: rm-runner-abdk
2022-07-21 10:00:11,621 | INFO | Created security group: rm-runner-abdk
2022-07-21 10:00:13,227 | INFO | Launched instance: i-03dcc3b5f53cb946a
2022-07-21 10:00:13,230 | INFO | Waiting for instance to be ready...
2022-07-21 10:00:29,252 | INFO | Instance is ready. Public DNS: ec2-3-93-4-123.compute-1.amazonaws.com
2022-07-21 10:00:29,267 | INFO | Setting up ssh connection...
2022-07-21 10:01:49,292 | INFO | Setting up ssh connection...
2022-07-21 10:02:05,434 | INFO | Setting up ssh connection...
2022-07-21 10:02:10,542 | INFO | Setting up ssh connection...
2022-07-21 10:02:10,766 | INFO | Connected (version 2.0, client OpenSSH_8.2p1)
2022-07-21 10:02:11,840 | INFO | Authentication (publickey) successful!
2022-07-21 10:02:11,840 | INFO | Pulling container: vault.habana.ai/gaudi-docker/1.4.1/ubuntu20.04/habanalabs/pytorch-installer-1.10.2:1.4.1-11...
2022-07-21 10:02:20,460 | INFO | Executing: docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host -v /home/ubuntu:/home/ubuntu/rm-runner --workdir=/home/ubuntu/rm-runner vault.habana.ai/gaudi-docker/1.4.1/ubuntu20.04/habanalabs/pytorch-installer-1.10.2:1.4.1-11 hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version: hl-1.4.1-rc-fw-35.0.2.0 |
| Driver Version: 1.4.0-d8f95f4 |
|-------------------------------+----------------------+----------------------+
| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. |
|===============================+======================+======================|
| 0 HL-205 N/A | 0000:20:1d.0 N/A | 0 |
| N/A 35C N/A 102W / 350W | 512Mib / 32768Mib | 2% N/A |
|-------------------------------+----------------------+----------------------+
| 1 HL-205 N/A | 0000:a0:1d.0 N/A | 0 |
| N/A 36C N/A 101W / 350W | 512Mib / 32768Mib | 1% N/A |
|-------------------------------+----------------------+----------------------+
| 2 HL-205 N/A | 0000:a0:1e.0 N/A | 0 |
| N/A 33C N/A 105W / 350W | 512Mib / 32768Mib | 3% N/A |
|-------------------------------+----------------------+----------------------+
| 3 HL-205 N/A | 0000:90:1d.0 N/A | 0 |
| N/A 32C N/A 97W / 350W | 512Mib / 32768Mib | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 4 HL-205 N/A | 0000:90:1e.0 N/A | 0 |
| N/A 35C N/A 101W / 350W | 512Mib / 32768Mib | 1% N/A |
|-------------------------------+----------------------+----------------------+
| 5 HL-205 N/A | 0000:10:1d.0 N/A | 0 |
| N/A 34C N/A 93W / 350W | 512Mib / 32768Mib | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 6 HL-205 N/A | 0000:10:1e.0 N/A | 0 |
| N/A 36C N/A 108W / 350W | 512Mib / 32768Mib | 4% N/A |
|-------------------------------+----------------------+----------------------+
| 7 HL-205 N/A | 0000:20:1e.0 N/A | 0 |
| N/A 33C N/A 101W / 350W | 512Mib / 32768Mib | 1% N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes: AIP Memory |
| AIP PID Type Process name Usage |
|=============================================================================|
| 0 N/A N/A N/A N/A |
| 1 N/A N/A N/A N/A |
| 2 N/A N/A N/A N/A |
| 3 N/A N/A N/A N/A |
| 4 N/A N/A N/A N/A |
| 5 N/A N/A N/A N/A |
| 6 N/A N/A N/A N/A |
| 7 N/A N/A N/A N/A |
+=============================================================================+
2022-07-21 10:04:00,641 | INFO | Terminating instance: i-03dcc3b5f53cb946a
2022-07-21 10:05:48,297 | INFO | Deleting security group: rm-runner-abdk
2022-07-21 10:05:49,891 | INFO | Deleting key: rm-runner-abdk
2022-07-21 13:29:12,489 | INFO | Total time: 302s
2022-07-21 13:29:12,489 | INFO | Startup time: 165s
2022-07-21 13:29:12,490 | INFO | Execution time: 43s
2022-07-21 13:29:12,490 | INFO | Termination time: 94s
2022-07-21 13:29:12,490 | INFO | Estimated cost: $1.1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rm_runner-0.1.0.tar.gz
(9.1 kB
view hashes)
Built Distribution
Close
Hashes for rm_runner-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | baece7636ed67899043e02a9fb96ba9ac60669ba77a0fa7353acdc387d2bd04e |
|
MD5 | 9cd28a9dd48467428ba17af037873e60 |
|
BLAKE2b-256 | 90375943f89afa953fe00b5a7c1dd73d42df1564bb189daed2a74d50f7d007ce |