Skip to main content

A CLI/SDK to run remote scripts on ec2 via ssh/scp

Project description

Remote Runner

Remote Runner is an easy pythonic way to migrate your python training scripts from a local environment to a powerful cloud-backed instance to efficiently scale your training, save cost & time, and iterate quickly on experiments in a parallel containerized way.

How does Remote Runner work?

  1. Creating all required cloud resources
  2. Migrating your script to the remote machine
  3. Executing your script
  4. making sure the instance is terminated again.

overview

Getting started

pip install rm-runner

Permissons

To use EC2RemoteRunner you need to following permissions:

  • create/delete keypairs
  • create/delete secruity groups
  • add inbound/ingress rules to security groups
  • create/start/terminate instances (with ebs)

Habana Gaudi example

from rm_runner import EC2RemoteRunner

runner = EC2RemoteRunner(instance_type="dl1.24xlarge", profile="hf-sm", region="us-east-1")

runner.launch(command="hl-smi")

expected output

2022-07-21 10:00:09,898 | INFO | Found credentials in shared credentials file: ~/.aws/credentials
2022-07-21 10:00:10,812 | INFO | Created key pair: rm-runner-abdk
2022-07-21 10:00:11,621 | INFO | Created security group: rm-runner-abdk
2022-07-21 10:00:13,227 | INFO | Launched instance: i-03dcc3b5f53cb946a
2022-07-21 10:00:13,230 | INFO | Waiting for instance to be ready...
2022-07-21 10:00:29,252 | INFO | Instance is ready. Public DNS: ec2-3-93-4-123.compute-1.amazonaws.com
2022-07-21 10:00:29,267 | INFO | Setting up ssh connection...
2022-07-21 10:01:49,292 | INFO | Setting up ssh connection...
2022-07-21 10:02:05,434 | INFO | Setting up ssh connection...
2022-07-21 10:02:10,542 | INFO | Setting up ssh connection...
2022-07-21 10:02:10,766 | INFO | Connected (version 2.0, client OpenSSH_8.2p1)
2022-07-21 10:02:11,840 | INFO | Authentication (publickey) successful!
2022-07-21 10:02:11,840 | INFO | Pulling container: vault.habana.ai/gaudi-docker/1.4.1/ubuntu20.04/habanalabs/pytorch-installer-1.10.2:1.4.1-11...
2022-07-21 10:02:20,460 | INFO | Executing: docker run --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host -v /home/ubuntu:/home/ubuntu/rm-runner --workdir=/home/ubuntu/rm-runner vault.habana.ai/gaudi-docker/1.4.1/ubuntu20.04/habanalabs/pytorch-installer-1.10.2:1.4.1-11 hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version:                            hl-1.4.1-rc-fw-35.0.2.0          |
| Driver Version:                                      1.4.0-d8f95f4          |
|-------------------------------+----------------------+----------------------+
| AIP  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | AIP-Util  Compute M. |
|===============================+======================+======================|
|   0  HL-205              N/A  | 0000:20:1d.0     N/A |                   0  |
| N/A   35C   N/A   102W / 350W |    512Mib / 32768Mib |     2%           N/A |
|-------------------------------+----------------------+----------------------+
|   1  HL-205              N/A  | 0000:a0:1d.0     N/A |                   0  |
| N/A   36C   N/A   101W / 350W |    512Mib / 32768Mib |     1%           N/A |
|-------------------------------+----------------------+----------------------+
|   2  HL-205              N/A  | 0000:a0:1e.0     N/A |                   0  |
| N/A   33C   N/A   105W / 350W |    512Mib / 32768Mib |     3%           N/A |
|-------------------------------+----------------------+----------------------+
|   3  HL-205              N/A  | 0000:90:1d.0     N/A |                   0  |
| N/A   32C   N/A    97W / 350W |    512Mib / 32768Mib |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   4  HL-205              N/A  | 0000:90:1e.0     N/A |                   0  |
| N/A   35C   N/A   101W / 350W |    512Mib / 32768Mib |     1%           N/A |
|-------------------------------+----------------------+----------------------+
|   5  HL-205              N/A  | 0000:10:1d.0     N/A |                   0  |
| N/A   34C   N/A    93W / 350W |    512Mib / 32768Mib |     0%           N/A |
|-------------------------------+----------------------+----------------------+
|   6  HL-205              N/A  | 0000:10:1e.0     N/A |                   0  |
| N/A   36C   N/A   108W / 350W |    512Mib / 32768Mib |     4%           N/A |
|-------------------------------+----------------------+----------------------+
|   7  HL-205              N/A  | 0000:20:1e.0     N/A |                   0  |
| N/A   33C   N/A   101W / 350W |    512Mib / 32768Mib |     1%           N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes:                                               AIP Memory |
|  AIP       PID   Type   Process name                             Usage      |
|=============================================================================|
|   0        N/A   N/A    N/A                                      N/A        |
|   1        N/A   N/A    N/A                                      N/A        |
|   2        N/A   N/A    N/A                                      N/A        |
|   3        N/A   N/A    N/A                                      N/A        |
|   4        N/A   N/A    N/A                                      N/A        |
|   5        N/A   N/A    N/A                                      N/A        |
|   6        N/A   N/A    N/A                                      N/A        |
|   7        N/A   N/A    N/A                                      N/A        |
+=============================================================================+
2022-07-21 10:04:00,641 | INFO | Terminating instance: i-03dcc3b5f53cb946a
2022-07-21 10:05:48,297 | INFO | Deleting security group: rm-runner-abdk
2022-07-21 10:05:49,891 | INFO | Deleting key: rm-runner-abdk
2022-07-21 13:29:12,489 | INFO | Total time:       302s
2022-07-21 13:29:12,489 | INFO | Startup time:     165s
2022-07-21 13:29:12,490 | INFO | Execution time:   43s
2022-07-21 13:29:12,490 | INFO | Termination time: 94s
2022-07-21 13:29:12,490 | INFO | Estimated cost:  $1.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rm_runner-0.1.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

rm_runner-0.1.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file rm_runner-0.1.0.tar.gz.

File metadata

  • Download URL: rm_runner-0.1.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for rm_runner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9bb34bff8d5d65c54fa83a2e7c5c03d25966fbfca6a3f9ad807511c46f7febac
MD5 ddde47a47a0b0008ba3980d8f841ee6f
BLAKE2b-256 8d06ede86a2dc849104d3b5c2ba8abb74e798176f8a1ef51d986f0bbfba8d6b5

See more details on using hashes here.

File details

Details for the file rm_runner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rm_runner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for rm_runner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 baece7636ed67899043e02a9fb96ba9ac60669ba77a0fa7353acdc387d2bd04e
MD5 9cd28a9dd48467428ba17af037873e60
BLAKE2b-256 90375943f89afa953fe00b5a7c1dd73d42df1564bb189daed2a74d50f7d007ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page