Skip to main content

Cloud resource management for deep learning applications.

Project description

# Cloud Utilities for Deep Learning ⛅️

A super lightweight cloud management tool designed with deep learning applications in mind.

**Built with the belief that managing cloud resources should be as easy as:**
```
import cloud

cloud.connect()
train_my_network()
cloud.down()
```

We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!

## Contents
- [Quickstart](#quickstart)
- [Install](#install)
- [Config](#config)
- [Usage](#usage)
- [Documentation](#documentation)
- [Amazon EC2](#amazon-ec2)
- [Azure](#azure)
- [Google Cloud](#google-cloud)

## Quickstart

### Install:
Sort of stable:
```
sudo pip install dl-cloud
```
Bleeding edge:
```
git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud
```

### Config:

See `configs/cloud.toml-*` for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).

Place your completed configuration file (named `cloud.toml`) in either root `/` or `$HOME`. Otherwise, provide a full path to the file in `$CLOUD_CFG`.

### Usage:
#### GPU
```python
import cloud
cloud.connect()

# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.

while True:
# train your model or w/e

cloud.down() # stop the instance (does not delete instance)
```

#### TPU (Only on GCP)
```python
import cloud
cloud.connect()

tpu = cloud.instance.tpu.get(preemptible=True) # acquire an accelerator
while True:
if not tpu.usable:
tpu.delete(async=True) # release the accelerator in the background
tpu = cloud.instance.tpu.get(preemptible=True) # acquire a new accelerator
else:
# train your model or w/e

cloud.down() # release all resources, then stop the instance (does not delete instance)
```

---

# Documentation

### cloud.connect()
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.

| **returns** | **desc.** |
| cloud_env | a cloud.Instance. |

### cloud.down()
Calls `cloud.instance.down()`.

### cloud.delete(confirm=True)
Calls `cloud.instance.delete(confirm)`.

### cloud.Resource
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.

| properties | desc. |
| :------- | :------- |
| `name` | str, name of the instance |
| `usable ` | bool, whether this resource is usable |
| **methods** | **desc.** |
| `up(async=False)` | start an existing stopped resource |
| `down(async=False)` | stop the resource. Note: this should not necessarily delete this resource |
| `delete(async=False)` | delete this resource |

### cloud.Instance(Resource)

An object representing a cloud instance with a set of Resources that can be allocated/deallocated.

| properties | desc. |
| :------- | :------- |
| `resource_managers` | list of ResourceManagers |
| **methods** | **desc.** |
| `down(async=False, delete_resources=True)` | stop this instance and optionally delete all managed resources |
| `delete(async=False, confirm=True)` | delete this instance with optional user confirmation |

### cloud.ResourceManager

Class for managing the creation and maintanence of `cloud.Resources`.

| properties | desc. |
| :------- | :------- |
| `instance ` | `cloud.Instance` instance owning this resource manager |
| `resource_cls ` | `cloud.Resource` type, the class of the resource to be managed |
| `resources ` | list of `cloud.Resource`s, managed resources |
| **methods** | **desc.** |
| `__init__(instance, resource_cls)` | `instance`: the `cloud.Instance` object operating this ResourceManager |
| | `resource_cls `: the `cloud.Resource` class this object manages |
| `add(*args, **kwargs)` | add an existing resource to this manager |
| `remove(*args, **kwargs)` | remove an existing resource from this manager |

## Amazon EC2
### cloud.AWSInstance(Instance)

A `cloud.Instance` object for AWS EC2 instances.

## Azure
### cloud.AzureInstance(Instance)

A `cloud.Instance` object for Microsoft Azure instances.

## Google Cloud

Our GCPInstance requires that your instances have `gcloud` installed and properly authenticated so that `gcloud alpha compute tpus create test_name` runs without issue.

### cloud.GCPInstance(Instance)

A `cloud.Instance` object for Google Cloud instances.

| properties | desc. |
| :------- | :------- |
| `tpu ` | `cloud.TPUManager`, a resource manager for this instance's TPUs |
| `resource_managers ` | list of owned `cloud.ResourceManager`s |
| **methods** | **desc.** |
| `__init__(collect_existing_tpus=True, **kwargs)` | `collect_existing_tpus `: bool, whether to add existing TPUs to this manager |
| | `**kwargs `: passed to `cloud.Instance`'s initializer |


### cloud.TPU(Resource)

Resource class for TPU accelerators.

| properties | desc. |
| :------- | :------- |
| `ip` | str, IP address of the TPU |
| `preemptible` | bool, whether this TPU is preemptible or not |
| `details` | dict {str: str}, properties of this TPU |
| **methods** | **desc.** |
| `up(async=False)` | start this TPU |
| `down(async=False)` | stop this TPU |
| `delete(async=False)` | delete this TPU |

### cloud.TPUManager(ResourceManager)

ResourceManager class for TPU accelerators.

| properties | desc. |
| :------- | :------- |
| `names` | list of str, names of the managed TPUs |
| `ips` | list of str, ips of the managed TPUs |
| **methods** | **desc.** |
| `__init__(instance, collect_existing=True)` | `instance`: the `cloud.GCPInstance` object operating this TPUManager |
| | `collect_existing`: bool, whether to add existing TPUs to this manager |
| `clean(async=True)` | delete all managed TPUs with unhealthy states |
| `get(preemptible=True)` | get an available TPU, or create one using `up()` if none exist |
| `up(preemptible=True, async=False)` | allocate and manage a new instance of `resource_cls ` |


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dl-cloud-0.0.5.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

dl_cloud-0.0.5-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file dl-cloud-0.0.5.tar.gz.

File metadata

  • Download URL: dl-cloud-0.0.5.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for dl-cloud-0.0.5.tar.gz
Algorithm Hash digest
SHA256 309ab92bdf5ca699fc5e946985cda251efae2958a16d9d48a7fab3977dcbe893
MD5 1d966890bc6e0e60407b73aef2e29973
BLAKE2b-256 07ed9ba33d9613e78dbc2ba5caf9a8edd29cd44c20628f10fce5671bbaeaf890

See more details on using hashes here.

File details

Details for the file dl_cloud-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: dl_cloud-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for dl_cloud-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 edc2f079d09acbf2a189965b53b4ed5738fe8fd1cd01e34e5309f7df387947a5
MD5 8cd6f2b58a0603f4d12a15b06dc02cc4
BLAKE2b-256 6e185cf25f72eebd6d90b1f327bf0a8ce82c1675e096fb63bacc97cd7868c813

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page