Skip to main content

Cloud resource management for deep learning applications.

Project description

# Cloud Utilities for Deep Learning ⛅️

A super lightweight cloud management tool designed with deep learning applications in mind.

This project is still a work in progress. We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!

## Contents
- [Quickstart](#quickstart)
- [Install](#install)
- [Config](#config)
- [Usage](#usage)
- [Documentation](#documentation)
- [Amazon EC2](#amazon-ec2)
- [Azure](#azure)
- [Google Cloud](#google-cloud)

## Quickstart

### Install:
Sort of stable 🚀:
```
sudo pip install dl-cloud
```
Bleeding edge 🛸:
```
git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud
```

### Config:

See `configs/cloud.toml-*` for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).

Place your completed configuration file (named `cloud.toml`) in either root `/` or `$HOME`. Otherwise, provide a full path to the file in `$CLOUD_CFG`.

### Usage:
#### GPU
```python
import cloud
cloud.connect()

# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.

while True:
# train your model or w/e

cloud.down() # stop the instance (does not delete instance)
```

#### TPU (Only on GCP)
```python
import cloud
cloud.connect()

tpu = cloud.instance.tpu.get(preemptible=True) # acquire an accelerator
while True:
if not tpu.usable:
tpu.delete(async=True) # release the accelerator in the background
tpu = cloud.instance.tpu.get(preemptible=True) # acquire a new accelerator
else:
# train your model or w/e

cloud.down() # release all resources, then stop the instance (does not delete instance)
```

---

# Documentation

### cloud.connect()
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.

| **returns** | **desc.** |
| cloud_env | a cloud.Instance. |

### cloud.down()
Calls `cloud.instance.down()`.

### cloud.delete(confirm=True)
Calls `cloud.instance.delete(confirm)`.

### cloud.Resource
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.

| properties | desc. |
| :------- | :------- |
| `name` | str, name of the instance |
| `usable ` | bool, whether this resource is usable |
| **methods** | **desc.** |
| `up(async=False)` | start an existing stopped resource |
| `down(async=False)` | stop the resource. Note: this should not necessarily delete this resource |
| `delete(async=False)` | delete this resource |

### cloud.Instance(Resource)

An object representing a cloud instance with a set of Resources that can be allocated/deallocated.

| properties | desc. |
| :------- | :------- |
| `resource_managers` | list of ResourceManagers |
| **methods** | **desc.** |
| `down(async=False, delete_resources=True)` | stop this instance and optionally delete all managed resources |
| `delete(async=False, confirm=True)` | delete this instance with optional user confirmation |

### cloud.ResourceManager

Class for managing the creation and maintanence of `cloud.Resources`.

| properties | desc. |
| :------- | :------- |
| `instance ` | `cloud.Instance` instance owning this resource manager |
| `resource_cls ` | `cloud.Resource` type, the class of the resource to be managed |
| `resources ` | list of `cloud.Resource`s, managed resources |
| **methods** | **desc.** |
| `__init__(instance, resource_cls)` | `instance`: the `cloud.Instance` object operating this ResourceManager |
| | `resource_cls `: the `cloud.Resource` class this object manages |
| `add(*args, **kwargs)` | add an existing resource to this manager |
| `remove(*args, **kwargs)` | remove an existing resource from this manager |

## Amazon EC2
### cloud.AWSInstance(Instance)

A `cloud.Instance` object for AWS EC2 instances.

## Azure
### cloud.AzureInstance(Instance)

A `cloud.Instance` object for Microsoft Azure instances.

## Google Cloud

Our GCPInstance requires that your instances have `gcloud` installed and properly authenticated so that `gcloud alpha compute tpus create test_name` runs without issue.

### cloud.GCPInstance(Instance)

A `cloud.Instance` object for Google Cloud instances.

| properties | desc. |
| :------- | :------- |
| `tpu ` | `cloud.TPUManager`, a resource manager for this instance's TPUs |
| `resource_managers ` | list of owned `cloud.ResourceManager`s |
| **methods** | **desc.** |
| `__init__(collect_existing_tpus=True, **kwargs)` | `collect_existing_tpus `: bool, whether to add existing TPUs to this manager |
| | `**kwargs `: passed to `cloud.Instance`'s initializer |


### cloud.TPU(Resource)

Resource class for TPU accelerators.

| properties | desc. |
| :------- | :------- |
| `ip` | str, IP address of the TPU |
| `preemptible` | bool, whether this TPU is preemptible or not |
| `details` | dict {str: str}, properties of this TPU |
| **methods** | **desc.** |
| `up(async=False)` | start this TPU |
| `down(async=False)` | stop this TPU |
| `delete(async=False)` | delete this TPU |

### cloud.TPUManager(ResourceManager)

ResourceManager class for TPU accelerators.

| properties | desc. |
| :------- | :------- |
| `names` | list of str, names of the managed TPUs |
| `ips` | list of str, ips of the managed TPUs |
| **methods** | **desc.** |
| `__init__(instance, collect_existing=True)` | `instance`: the `cloud.GCPInstance` object operating this TPUManager |
| | `collect_existing`: bool, whether to add existing TPUs to this manager |
| `clean(async=True)` | delete all managed TPUs with unhealthy states |
| `get(preemptible=True)` | get an available TPU, or create one using `up()` if none exist |
| `up(preemptible=True, async=False)` | allocate and manage a new instance of `resource_cls ` |


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dl-cloud-0.0.4.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

dl_cloud-0.0.4-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file dl-cloud-0.0.4.tar.gz.

File metadata

  • Download URL: dl-cloud-0.0.4.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for dl-cloud-0.0.4.tar.gz
Algorithm Hash digest
SHA256 fa6e17b8bdc02c60c74971508f233ea2d98bfd0915ff3fefaef7f45ebae145b0
MD5 f9775e8f37f481b195440d60017b39f3
BLAKE2b-256 e763bcd39c702deeac3cc8603a3f3ee287866ee69581dd9c40e02ca66cf949e9

See more details on using hashes here.

File details

Details for the file dl_cloud-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: dl_cloud-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for dl_cloud-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 940bc003afb415c17007ce128ef728563f451a53f78a4947181ec702d8242cc0
MD5 4ad6f941be13bd0eed0e6f5804f492be
BLAKE2b-256 2c040ae9b0b1daad8ff84fd0a9e83d29b66d125a50a8f7b26e3dfbc51bc41654

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page