Cloud resource management for deep learning applications.
Project description
Cloud Utilities for Deep Learning ⛅️
A super lightweight cloud management tool designed with deep learning applications in mind.
Built with the belief that managing cloud resources should be as easy as:
import cloud
cloud.connect()
train_my_network()
cloud.down()
We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!
Contents
Quickstart
Install:
Sort of stable:
sudo pip install dl-cloud
Bleeding edge:
git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud
Config:
See configs/cloud.toml-*
for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).
Place your completed configuration file (named cloud.toml
) in either root /
or $HOME
. Otherwise, provide a full path to the file in $CLOUD_CFG
.
Usage:
GPU
import cloud
cloud.connect()
# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.
while True:
# train your model or w/e
cloud.down() # stop the instance (does not delete instance)
TPU (Only on GCP)
import cloud
cloud.connect()
tpu = cloud.instance.tpu.get(preemptible=True) # acquire an accelerator
while True:
if not tpu.usable:
tpu.delete(background=True) # release the accelerator in the background
tpu = cloud.instance.tpu.get(preemptible=True) # acquire a new accelerator
else:
# train your model or w/e
cloud.down() # release all resources, then stop the instance (does not delete instance)
Documentation
cloud.connect()
Takes/Creates a cloud.Instance
object and sets cloud.instance
to it.
returns | desc. |
---|---|
cloud_env | a cloud.Instance. |
cloud.down()
Calls cloud.instance.down()
.
cloud.delete(confirm=True)
Calls cloud.instance.delete(confirm)
.
cloud.Resource
Takes/Creates a cloud.Instance
object and sets cloud.instance
to it.
properties | desc. |
---|---|
name |
str, name of the instance |
usable |
bool, whether this resource is usable |
methods | desc. |
up(background=False) |
start an existing stopped resource |
down(background=False) |
stop the resource. Note: this should not necessarily delete this resource |
delete(background=False) |
delete this resource |
cloud.Instance(Resource)
An object representing a cloud instance with a set of Resources that can be allocated/deallocated.
properties | desc. |
---|---|
resource_managers |
list of ResourceManagers |
methods | desc. |
down(background=False, delete_resources=True) |
stop this instance and optionally delete all managed resources |
delete(background=False, confirm=True) |
delete this instance with optional user confirmation |
cloud.ResourceManager
Class for managing the creation and maintanence of cloud.Resources
.
properties | desc. |
---|---|
instance |
cloud.Instance instance owning this resource manager |
resource_cls |
cloud.Resource type, the class of the resource to be managed |
resources |
list of cloud.Resource s, managed resources |
methods | desc. |
__init__(instance, resource_cls) |
instance : the cloud.Instance object operating this ResourceManager |
resource_cls : the cloud.Resource class this object manages |
|
add(*args, **kwargs) |
add an existing resource to this manager |
remove(*args, **kwargs) |
remove an existing resource from this manager |
Amazon EC2
cloud.AWSInstance(Instance)
A cloud.Instance
object for AWS EC2 instances.
Azure
cloud.AzureInstance(Instance)
A cloud.Instance
object for Microsoft Azure instances.
Google Cloud
Our GCPInstance requires that your instances have gcloud
installed and properly authenticated so that gcloud alpha compute tpus create test_name
runs without issue.
cloud.GCPInstance(Instance)
A cloud.Instance
object for Google Cloud instances.
properties | desc. |
---|---|
tpu |
cloud.TPUManager , a resource manager for this instance's TPUs |
resource_managers |
list of owned cloud.ResourceManager s |
methods | desc. |
__init__(collect_existing_tpus=True, **kwargs) |
collect_existing_tpus : bool, whether to add existing TPUs to this manager |
**kwargs : passed to cloud.Instance 's initializer |
cloud.TPU(Resource)
Resource class for TPU accelerators.
properties | desc. |
---|---|
ip |
str, IP address of the TPU |
preemptible |
bool, whether this TPU is preemptible or not |
details |
dict {str: str}, properties of this TPU |
methods | desc. |
up(background=False) |
start this TPU |
down(background=False) |
stop this TPU |
delete(background=False) |
delete this TPU |
cloud.TPUManager(ResourceManager)
ResourceManager class for TPU accelerators.
properties | desc. |
---|---|
names |
list of str, names of the managed TPUs |
ips |
list of str, ips of the managed TPUs |
methods | desc. |
__init__(instance, collect_existing=True) |
instance : the cloud.GCPInstance object operating this TPUManager |
collect_existing : bool, whether to add existing TPUs to this manager |
|
clean(background=True) |
delete all managed TPUs with unhealthy states |
get(preemptible=True) |
get an available TPU, or create one using up() if none exist |
up(preemptible=True, background=False) |
allocate and manage a new instance of resource_cls |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dl-cloud-0.1.7.tar.gz
.
File metadata
- Download URL: dl-cloud-0.1.7.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e7535af6805b4d1ee806f62eb82ccf3bd07ffe3c4671e06baf2e9400fb28ba1 |
|
MD5 | 76aa0e34f49c59b262c50681ab895f29 |
|
BLAKE2b-256 | fe40788ca2decf9fe99ad2d0caa34b86b9ce2774a92a8794a37b60cfe0713d85 |
File details
Details for the file dl_cloud-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: dl_cloud-0.1.7-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5c737b6c9737badf538e408e74731272e82a6ee7e764c73306494facac4739c |
|
MD5 | 86cfd47b1d2bcf3b8c3cb9dd3141d661 |
|
BLAKE2b-256 | 512b94862ccf3ea54053a8887def3556e0e4ee7b87caab3540aa482b25393282 |