Skip to main content

Cloud resource management for deep learning applications.

Project description

Cloud Utilities for Deep Learning ⛅️

A super lightweight cloud management tool designed with deep learning applications in mind.

Built with the belief that managing cloud resources should be as easy as:

import cloud

cloud.connect()
train_my_network()
cloud.down()

We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!

Contents

Quickstart

Install:

Sort of stable:

sudo pip install dl-cloud

Bleeding edge:

git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud

Config:

See configs/cloud.toml-* for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).

Place your completed configuration file (named cloud.toml) in either root / or $HOME. Otherwise, provide a full path to the file in $CLOUD_CFG.

If you use GCP as a provider for your cloud.toml it will use GCP Instance metadata APIs to fetch APIs. If you want to configure for Google Cloud Build, please use;

is_gcb = true
zone = '{{DESIRED_ZONE}}' 

Usage:

GPU

import cloud
cloud.connect()

# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.

while True:
  # train your model or w/e

cloud.down()  # stop the instance (does not delete instance)

TPU (Only on GCP)

import cloud
cloud.connect()

tpu = cloud.instance.tpu.get(preemptible=True)  # acquire an accelerator
while True:
  if not tpu.usable:
    tpu.delete(background=True)  # release the accelerator in the background
    tpu = cloud.instance.tpu.get(preemptible=True)  # acquire a new accelerator
  else:
    # train your model or w/e

cloud.down()  # release all resources, then stop the instance (does not delete instance)

Documentation

cloud.connect()

Takes/Creates a cloud.Instance object and sets cloud.instance to it.

returns desc.
cloud_env a cloud.Instance.

cloud.down()

Calls cloud.instance.down().

cloud.delete(confirm=True)

Calls cloud.instance.delete(confirm).

cloud.Resource

Takes/Creates a cloud.Instance object and sets cloud.instance to it.

properties desc.
name str, name of the instance
usable bool, whether this resource is usable
methods desc.
up(background=False) start an existing stopped resource
down(background=False) stop the resource. Note: this should not necessarily delete this resource
delete(background=False) delete this resource

cloud.Instance(Resource)

An object representing a cloud instance with a set of Resources that can be allocated/deallocated.

properties desc.
resource_managers list of ResourceManagers
methods desc.
down(background=False, delete_resources=True) stop this instance and optionally delete all managed resources
delete(background=False, confirm=True) delete this instance with optional user confirmation

cloud.ResourceManager

Class for managing the creation and maintanence of cloud.Resources.

properties desc.
instance cloud.Instance instance owning this resource manager
resource_cls cloud.Resource type, the class of the resource to be managed
resources list of cloud.Resources, managed resources
methods desc.
__init__(instance, resource_cls) instance: the cloud.Instance object operating this ResourceManager
resource_cls : the cloud.Resource class this object manages
add(*args, **kwargs) add an existing resource to this manager
remove(*args, **kwargs) remove an existing resource from this manager

Amazon EC2

cloud.AWSInstance(Instance)

A cloud.Instance object for AWS EC2 instances.

Azure

cloud.AzureInstance(Instance)

A cloud.Instance object for Microsoft Azure instances.

Google Cloud

Our GCPInstance requires that your instances have gcloud installed and properly authenticated so that gcloud alpha compute tpus create test_name runs without issue.

cloud.GCPInstance(Instance)

A cloud.Instance object for Google Cloud instances.

properties desc.
tpu cloud.TPUManager, a resource manager for this instance's TPUs
resource_managers list of owned cloud.ResourceManagers
methods desc.
__init__(collect_existing_tpus=True, **kwargs) collect_existing_tpus : bool, whether to add existing TPUs to this manager
**kwargs : passed to cloud.Instance's initializer

cloud.TPU(Resource)

Resource class for TPU accelerators.

properties desc.
ip str, IP address of the TPU
preemptible bool, whether this TPU is preemptible or not
details dict {str: str}, properties of this TPU
methods desc.
up(background=False) start this TPU
down(background=False) stop this TPU
delete(background=False) delete this TPU

cloud.TPUManager(ResourceManager)

ResourceManager class for TPU accelerators.

properties desc.
names list of str, names of the managed TPUs
ips list of str, ips of the managed TPUs
methods desc.
__init__(instance, collect_existing=True) instance: the cloud.GCPInstance object operating this TPUManager
collect_existing: bool, whether to add existing TPUs to this manager
clean(background=True) delete all managed TPUs with unhealthy states
get(preemptible=True) get an available TPU, or create one using up() if none exist
up(preemptible=True, background=False) allocate and manage a new instance of resource_cls

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dl-cloud-0.1.15.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

dl_cloud-0.1.15-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file dl-cloud-0.1.15.tar.gz.

File metadata

  • Download URL: dl-cloud-0.1.15.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for dl-cloud-0.1.15.tar.gz
Algorithm Hash digest
SHA256 4220e4e665976b6cc8df1f7400187a50ecbd9d1e639e57d4a89ccea137557354
MD5 cd975127a6c2411a6b96c44cd0937ed4
BLAKE2b-256 1b590369d0370c3354c75607689c3b309cc817cdf55ff4b7748d2f8629ac3b46

See more details on using hashes here.

File details

Details for the file dl_cloud-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: dl_cloud-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for dl_cloud-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 50fa19a82d561cff00021f8a814db53a4d2d180751159318d7a080184639e49d
MD5 2c970e9c925b7a51b594f05e01e9b405
BLAKE2b-256 3e370a405fc1e7007b90b96c9eb81d9ddbeb674ec082675092f559229d31900d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page