Skip to main content

Cloud resource management for deep learning applications.

Project description

# Cloud Utilities for Deep Learning ⛅️

A super lightweight cloud management tool designed with deep learning applications in mind.

This project is still a work in progress. We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!

## Contents
- [Quickstart](#quickstart)
- [Install](#install)
- [Config](#config)
- [Usage](#usage)
- [Documentation](#documentation)
- [Amazon EC2](#amazon-ec2)
- [Azure](#azure)
- [Google Cloud](#google-cloud)

## Quickstart

### Install:

```
git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud
```

### Config:

See `configs/cloud.toml-*` for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).

Place your completed configuration file (named `cloud.toml`) in either root `/` or `$HOME`. Otherwise, provide a full path to the file in `$CLOUD_CFG`.

### Usage:
#### GPU
```python
import cloud
cloud.connect()

# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.

while True:
# train your model or w/e

cloud.down() # stop the instance (does not delete instance)
```

#### TPU (Only on GCP)
```python
import cloud
cloud.connect()

tpu = cloud.instance.tpu.get(preemptible=True) # acquire an accelerator
while True:
if not tpu.usable:
tpu.delete(async=True) # release the accelerator in the background
tpu = cloud.instance.tpu.get(preemptible=True) # acquire a new accelerator
else:
# train your model or w/e

cloud.down() # release all resources, then stop the instance (does not delete instance)
```

---

# Documentation

### cloud.connect()
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.

| **returns** | **desc.** |
| cloud_env | a cloud.Instance. |

### cloud.down()
Calls `cloud.instance.down()`.

### cloud.delete(confirm=True)
Calls `cloud.instance.delete(confirm)`.

### cloud.Resource
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.

| properties | desc. |
| :------- | :------- |
| `name` | str, name of the instance |
| `usable ` | bool, whether this resource is usable |
| **methods** | **desc.** |
| `up(async=False)` | start an existing stopped resource |
| `down(async=False)` | stop the resource. Note: this should not necessarily delete this resource |
| `delete(async=False)` | delete this resource |

### cloud.Instance(Resource)

An object representing a cloud instance with a set of Resources that can be allocated/deallocated.

| properties | desc. |
| :------- | :------- |
| `resource_managers` | list of ResourceManagers |
| **methods** | **desc.** |
| `down(async=False, delete_resources=True)` | stop this instance and optionally delete all managed resources |
| `delete(async=False, confirm=True)` | delete this instance with optional user confirmation |

### cloud.ResourceManager

Class for managing the creation and maintanence of `cloud.Resources`.

| properties | desc. |
| :------- | :------- |
| `instance ` | `cloud.Instance` instance owning this resource manager |
| `resource_cls ` | `cloud.Resource` type, the class of the resource to be managed |
| `resources ` | list of `cloud.Resource`s, managed resources |
| **methods** | **desc.** |
| `__init__(instance, resource_cls)` | `instance`: the `cloud.Instance` object operating this ResourceManager |
| | `resource_cls `: the `cloud.Resource` class this object manages |
| `add(*args, **kwargs)` | add an existing resource to this manager |
| `remove(*args, **kwargs)` | remove an existing resource from this manager |

## Amazon EC2
### cloud.AWSInstance(Instance)

A `cloud.Instance` object for AWS EC2 instances.

## Azure
### cloud.AzureInstance(Instance)

A `cloud.Instance` object for Microsoft Azure instances.

## Google Cloud

Our GCPInstance requires that your instances have `gcloud` installed and properly authenticated so that `gcloud alpha compute tpus create test_name` runs without issue.

### cloud.GCPInstance(Instance)

A `cloud.Instance` object for Google Cloud instances.

| properties | desc. |
| :------- | :------- |
| `tpu ` | `cloud.TPUManager`, a resource manager for this instance's TPUs |
| `resource_managers ` | list of owned `cloud.ResourceManager`s |
| **methods** | **desc.** |
| `__init__(collect_existing_tpus=True, **kwargs)` | `collect_existing_tpus `: bool, whether to add existing TPUs to this manager |
| | `**kwargs `: passed to `cloud.Instance`'s initializer |


### cloud.TPU(Resource)

Resource class for TPU accelerators.

| properties | desc. |
| :------- | :------- |
| `ip` | str, IP address of the TPU |
| `preemptible` | bool, whether this TPU is preemptible or not |
| `details` | dict {str: str}, properties of this TPU |
| **methods** | **desc.** |
| `up(async=False)` | start this TPU |
| `down(async=False)` | stop this TPU |
| `delete(async=False)` | delete this TPU |

### cloud.TPUManager(ResourceManager)

ResourceManager class for TPU accelerators.

| properties | desc. |
| :------- | :------- |
| `names` | list of str, names of the managed TPUs |
| `ips` | list of str, ips of the managed TPUs |
| **methods** | **desc.** |
| `__init__(instance, collect_existing=True)` | `instance`: the `cloud.GCPInstance` object operating this TPUManager |
| | `collect_existing`: bool, whether to add existing TPUs to this manager |
| `clean(async=True)` | delete all managed TPUs with unhealthy states |
| `get(preemptible=True)` | get an available TPU, or create one using `up()` if none exist |
| `up(preemptible=True, async=False)` | allocate and manage a new instance of `resource_cls ` |


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dl-cloud-0.0.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

dl_cloud-0.0.1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file dl-cloud-0.0.1.tar.gz.

File metadata

  • Download URL: dl-cloud-0.0.1.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for dl-cloud-0.0.1.tar.gz
Algorithm Hash digest
SHA256 320bb73560905ab52a68b560ae08323fad34642025de83d8d0fecbe1a2d936dc
MD5 ea35ecc81c6a6ac480f6a58f53b2828c
BLAKE2b-256 b5a5cc16571eaab76a097e830e788fcfdb21590f0897064a44b68052b7d0df6a

See more details on using hashes here.

File details

Details for the file dl_cloud-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dl_cloud-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for dl_cloud-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b4e37d883beb3f0b886ac095444faf0d71503dae3b15fabc7435351165699b2
MD5 241062e3032b3873ee21f5c008cbafdc
BLAKE2b-256 bee97cf37ba05455cea083073bae558e8d35650b094b682f7eb89254cd1054b3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page