Cloud resource management for deep learning applications.
Project description
# Cloud Utilities for Deep Learning ⛅️
A super lightweight cloud management tool designed with deep learning applications in mind.
This project is still a work in progress. We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!
## Contents
- [Quickstart](#quickstart)
- [Install](#install)
- [Config](#config)
- [Usage](#usage)
- [Documentation](#documentation)
- [Amazon EC2](#amazon-ec2)
- [Azure](#azure)
- [Google Cloud](#google-cloud)
## Quickstart
### Install:
```
git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud
```
### Config:
See `configs/cloud.toml-*` for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).
Place your completed configuration file (named `cloud.toml`) in either root `/` or `$HOME`. Otherwise, provide a full path to the file in `$CLOUD_CFG`.
### Usage:
#### GPU
```python
import cloud
cloud.connect()
# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.
while True:
# train your model or w/e
cloud.down() # stop the instance (does not delete instance)
```
#### TPU (Only on GCP)
```python
import cloud
cloud.connect()
tpu = cloud.instance.tpu.get(preemptible=True) # acquire an accelerator
while True:
if not tpu.usable:
tpu.delete(async=True) # release the accelerator in the background
tpu = cloud.instance.tpu.get(preemptible=True) # acquire a new accelerator
else:
# train your model or w/e
cloud.down() # release all resources, then stop the instance (does not delete instance)
```
---
# Documentation
### cloud.connect()
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.
| **returns** | **desc.** |
| cloud_env | a cloud.Instance. |
### cloud.down()
Calls `cloud.instance.down()`.
### cloud.delete(confirm=True)
Calls `cloud.instance.delete(confirm)`.
### cloud.Resource
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.
| properties | desc. |
| :------- | :------- |
| `name` | str, name of the instance |
| `usable ` | bool, whether this resource is usable |
| **methods** | **desc.** |
| `up(async=False)` | start an existing stopped resource |
| `down(async=False)` | stop the resource. Note: this should not necessarily delete this resource |
| `delete(async=False)` | delete this resource |
### cloud.Instance(Resource)
An object representing a cloud instance with a set of Resources that can be allocated/deallocated.
| properties | desc. |
| :------- | :------- |
| `resource_managers` | list of ResourceManagers |
| **methods** | **desc.** |
| `down(async=False, delete_resources=True)` | stop this instance and optionally delete all managed resources |
| `delete(async=False, confirm=True)` | delete this instance with optional user confirmation |
### cloud.ResourceManager
Class for managing the creation and maintanence of `cloud.Resources`.
| properties | desc. |
| :------- | :------- |
| `instance ` | `cloud.Instance` instance owning this resource manager |
| `resource_cls ` | `cloud.Resource` type, the class of the resource to be managed |
| `resources ` | list of `cloud.Resource`s, managed resources |
| **methods** | **desc.** |
| `__init__(instance, resource_cls)` | `instance`: the `cloud.Instance` object operating this ResourceManager |
| | `resource_cls `: the `cloud.Resource` class this object manages |
| `add(*args, **kwargs)` | add an existing resource to this manager |
| `remove(*args, **kwargs)` | remove an existing resource from this manager |
## Amazon EC2
### cloud.AWSInstance(Instance)
A `cloud.Instance` object for AWS EC2 instances.
## Azure
### cloud.AzureInstance(Instance)
A `cloud.Instance` object for Microsoft Azure instances.
## Google Cloud
Our GCPInstance requires that your instances have `gcloud` installed and properly authenticated so that `gcloud alpha compute tpus create test_name` runs without issue.
### cloud.GCPInstance(Instance)
A `cloud.Instance` object for Google Cloud instances.
| properties | desc. |
| :------- | :------- |
| `tpu ` | `cloud.TPUManager`, a resource manager for this instance's TPUs |
| `resource_managers ` | list of owned `cloud.ResourceManager`s |
| **methods** | **desc.** |
| `__init__(collect_existing_tpus=True, **kwargs)` | `collect_existing_tpus `: bool, whether to add existing TPUs to this manager |
| | `**kwargs `: passed to `cloud.Instance`'s initializer |
### cloud.TPU(Resource)
Resource class for TPU accelerators.
| properties | desc. |
| :------- | :------- |
| `ip` | str, IP address of the TPU |
| `preemptible` | bool, whether this TPU is preemptible or not |
| `details` | dict {str: str}, properties of this TPU |
| **methods** | **desc.** |
| `up(async=False)` | start this TPU |
| `down(async=False)` | stop this TPU |
| `delete(async=False)` | delete this TPU |
### cloud.TPUManager(ResourceManager)
ResourceManager class for TPU accelerators.
| properties | desc. |
| :------- | :------- |
| `names` | list of str, names of the managed TPUs |
| `ips` | list of str, ips of the managed TPUs |
| **methods** | **desc.** |
| `__init__(instance, collect_existing=True)` | `instance`: the `cloud.GCPInstance` object operating this TPUManager |
| | `collect_existing`: bool, whether to add existing TPUs to this manager |
| `clean(async=True)` | delete all managed TPUs with unhealthy states |
| `get(preemptible=True)` | get an available TPU, or create one using `up()` if none exist |
| `up(preemptible=True, async=False)` | allocate and manage a new instance of `resource_cls ` |
A super lightweight cloud management tool designed with deep learning applications in mind.
This project is still a work in progress. We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!
## Contents
- [Quickstart](#quickstart)
- [Install](#install)
- [Config](#config)
- [Usage](#usage)
- [Documentation](#documentation)
- [Amazon EC2](#amazon-ec2)
- [Azure](#azure)
- [Google Cloud](#google-cloud)
## Quickstart
### Install:
```
git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud
```
### Config:
See `configs/cloud.toml-*` for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).
Place your completed configuration file (named `cloud.toml`) in either root `/` or `$HOME`. Otherwise, provide a full path to the file in `$CLOUD_CFG`.
### Usage:
#### GPU
```python
import cloud
cloud.connect()
# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.
while True:
# train your model or w/e
cloud.down() # stop the instance (does not delete instance)
```
#### TPU (Only on GCP)
```python
import cloud
cloud.connect()
tpu = cloud.instance.tpu.get(preemptible=True) # acquire an accelerator
while True:
if not tpu.usable:
tpu.delete(async=True) # release the accelerator in the background
tpu = cloud.instance.tpu.get(preemptible=True) # acquire a new accelerator
else:
# train your model or w/e
cloud.down() # release all resources, then stop the instance (does not delete instance)
```
---
# Documentation
### cloud.connect()
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.
| **returns** | **desc.** |
| cloud_env | a cloud.Instance. |
### cloud.down()
Calls `cloud.instance.down()`.
### cloud.delete(confirm=True)
Calls `cloud.instance.delete(confirm)`.
### cloud.Resource
Takes/Creates a `cloud.Instance` object and sets `cloud.instance` to it.
| properties | desc. |
| :------- | :------- |
| `name` | str, name of the instance |
| `usable ` | bool, whether this resource is usable |
| **methods** | **desc.** |
| `up(async=False)` | start an existing stopped resource |
| `down(async=False)` | stop the resource. Note: this should not necessarily delete this resource |
| `delete(async=False)` | delete this resource |
### cloud.Instance(Resource)
An object representing a cloud instance with a set of Resources that can be allocated/deallocated.
| properties | desc. |
| :------- | :------- |
| `resource_managers` | list of ResourceManagers |
| **methods** | **desc.** |
| `down(async=False, delete_resources=True)` | stop this instance and optionally delete all managed resources |
| `delete(async=False, confirm=True)` | delete this instance with optional user confirmation |
### cloud.ResourceManager
Class for managing the creation and maintanence of `cloud.Resources`.
| properties | desc. |
| :------- | :------- |
| `instance ` | `cloud.Instance` instance owning this resource manager |
| `resource_cls ` | `cloud.Resource` type, the class of the resource to be managed |
| `resources ` | list of `cloud.Resource`s, managed resources |
| **methods** | **desc.** |
| `__init__(instance, resource_cls)` | `instance`: the `cloud.Instance` object operating this ResourceManager |
| | `resource_cls `: the `cloud.Resource` class this object manages |
| `add(*args, **kwargs)` | add an existing resource to this manager |
| `remove(*args, **kwargs)` | remove an existing resource from this manager |
## Amazon EC2
### cloud.AWSInstance(Instance)
A `cloud.Instance` object for AWS EC2 instances.
## Azure
### cloud.AzureInstance(Instance)
A `cloud.Instance` object for Microsoft Azure instances.
## Google Cloud
Our GCPInstance requires that your instances have `gcloud` installed and properly authenticated so that `gcloud alpha compute tpus create test_name` runs without issue.
### cloud.GCPInstance(Instance)
A `cloud.Instance` object for Google Cloud instances.
| properties | desc. |
| :------- | :------- |
| `tpu ` | `cloud.TPUManager`, a resource manager for this instance's TPUs |
| `resource_managers ` | list of owned `cloud.ResourceManager`s |
| **methods** | **desc.** |
| `__init__(collect_existing_tpus=True, **kwargs)` | `collect_existing_tpus `: bool, whether to add existing TPUs to this manager |
| | `**kwargs `: passed to `cloud.Instance`'s initializer |
### cloud.TPU(Resource)
Resource class for TPU accelerators.
| properties | desc. |
| :------- | :------- |
| `ip` | str, IP address of the TPU |
| `preemptible` | bool, whether this TPU is preemptible or not |
| `details` | dict {str: str}, properties of this TPU |
| **methods** | **desc.** |
| `up(async=False)` | start this TPU |
| `down(async=False)` | stop this TPU |
| `delete(async=False)` | delete this TPU |
### cloud.TPUManager(ResourceManager)
ResourceManager class for TPU accelerators.
| properties | desc. |
| :------- | :------- |
| `names` | list of str, names of the managed TPUs |
| `ips` | list of str, ips of the managed TPUs |
| **methods** | **desc.** |
| `__init__(instance, collect_existing=True)` | `instance`: the `cloud.GCPInstance` object operating this TPUManager |
| | `collect_existing`: bool, whether to add existing TPUs to this manager |
| `clean(async=True)` | delete all managed TPUs with unhealthy states |
| `get(preemptible=True)` | get an available TPU, or create one using `up()` if none exist |
| `up(preemptible=True, async=False)` | allocate and manage a new instance of `resource_cls ` |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dl-cloud-0.0.1.tar.gz
(7.1 kB
view details)
Built Distribution
dl_cloud-0.0.1-py3-none-any.whl
(11.7 kB
view details)
File details
Details for the file dl-cloud-0.0.1.tar.gz
.
File metadata
- Download URL: dl-cloud-0.0.1.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 320bb73560905ab52a68b560ae08323fad34642025de83d8d0fecbe1a2d936dc |
|
MD5 | ea35ecc81c6a6ac480f6a58f53b2828c |
|
BLAKE2b-256 | b5a5cc16571eaab76a097e830e788fcfdb21590f0897064a44b68052b7d0df6a |
File details
Details for the file dl_cloud-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: dl_cloud-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b4e37d883beb3f0b886ac095444faf0d71503dae3b15fabc7435351165699b2 |
|
MD5 | 241062e3032b3873ee21f5c008cbafdc |
|
BLAKE2b-256 | bee97cf37ba05455cea083073bae558e8d35650b094b682f7eb89254cd1054b3 |