Skip to main content

Cloud resource management for deep learning applications.

Project description

Cloud Utilities for Deep Learning ⛅️

A super lightweight cloud management tool designed with deep learning applications in mind.

Built with the belief that managing cloud resources should be as easy as:

import cloud

cloud.connect()
train_my_network()
cloud.down()

We welcome all contributions, suggestions, and use-cases. Reach out to us over GitHub or at team@for.ai with ideas!

Contents

Quickstart

Install:

Sort of stable:

sudo pip install dl-cloud

Bleeding edge:

git clone git@github.com:for-ai/cloud.git
sudo pip install -e cloud

Config:

See configs/cloud.toml-* for instructions on how to authenticate for each provider (Google Cloud, AWS EC2, and Azure).

Place your completed configuration file (named cloud.toml) in either root / or $HOME. Otherwise, provide a full path to the file in $CLOUD_CFG.

Usage:

GPU

import cloud
cloud.connect()

# gpu instances have a dedicated GPU so we don't need to worry
# about preemption or acquiring/releasing accelerators online.

while True:
  # train your model or w/e

cloud.down()  # stop the instance (does not delete instance)

TPU (Only on GCP)

import cloud
cloud.connect()

tpu = cloud.instance.tpu.get(preemptible=True)  # acquire an accelerator
while True:
  if not tpu.usable:
    tpu.delete(background=True)  # release the accelerator in the background
    tpu = cloud.instance.tpu.get(preemptible=True)  # acquire a new accelerator
  else:
    # train your model or w/e

cloud.down()  # release all resources, then stop the instance (does not delete instance)

Documentation

cloud.connect()

Takes/Creates a cloud.Instance object and sets cloud.instance to it.

returns desc.
cloud_env a cloud.Instance.

cloud.down()

Calls cloud.instance.down().

cloud.delete(confirm=True)

Calls cloud.instance.delete(confirm).

cloud.Resource

Takes/Creates a cloud.Instance object and sets cloud.instance to it.

properties desc.
name str, name of the instance
usable bool, whether this resource is usable
methods desc.
up(background=False) start an existing stopped resource
down(background=False) stop the resource. Note: this should not necessarily delete this resource
delete(background=False) delete this resource

cloud.Instance(Resource)

An object representing a cloud instance with a set of Resources that can be allocated/deallocated.

properties desc.
resource_managers list of ResourceManagers
methods desc.
down(background=False, delete_resources=True) stop this instance and optionally delete all managed resources
delete(background=False, confirm=True) delete this instance with optional user confirmation

cloud.ResourceManager

Class for managing the creation and maintanence of cloud.Resources.

properties desc.
instance cloud.Instance instance owning this resource manager
resource_cls cloud.Resource type, the class of the resource to be managed
resources list of cloud.Resources, managed resources
methods desc.
__init__(instance, resource_cls) instance: the cloud.Instance object operating this ResourceManager
resource_cls : the cloud.Resource class this object manages
add(*args, **kwargs) add an existing resource to this manager
remove(*args, **kwargs) remove an existing resource from this manager

Amazon EC2

cloud.AWSInstance(Instance)

A cloud.Instance object for AWS EC2 instances.

Azure

cloud.AzureInstance(Instance)

A cloud.Instance object for Microsoft Azure instances.

Google Cloud

Our GCPInstance requires that your instances have gcloud installed and properly authenticated so that gcloud alpha compute tpus create test_name runs without issue.

cloud.GCPInstance(Instance)

A cloud.Instance object for Google Cloud instances.

properties desc.
tpu cloud.TPUManager, a resource manager for this instance's TPUs
resource_managers list of owned cloud.ResourceManagers
methods desc.
__init__(collect_existing_tpus=True, **kwargs) collect_existing_tpus : bool, whether to add existing TPUs to this manager
**kwargs : passed to cloud.Instance's initializer

cloud.TPU(Resource)

Resource class for TPU accelerators.

properties desc.
ip str, IP address of the TPU
preemptible bool, whether this TPU is preemptible or not
details dict {str: str}, properties of this TPU
methods desc.
up(background=False) start this TPU
down(background=False) stop this TPU
delete(background=False) delete this TPU

cloud.TPUManager(ResourceManager)

ResourceManager class for TPU accelerators.

properties desc.
names list of str, names of the managed TPUs
ips list of str, ips of the managed TPUs
methods desc.
__init__(instance, collect_existing=True) instance: the cloud.GCPInstance object operating this TPUManager
collect_existing: bool, whether to add existing TPUs to this manager
clean(background=True) delete all managed TPUs with unhealthy states
get(preemptible=True) get an available TPU, or create one using up() if none exist
up(preemptible=True, background=False) allocate and manage a new instance of resource_cls

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dl-cloud-0.1.9.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

dl_cloud-0.1.9-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file dl-cloud-0.1.9.tar.gz.

File metadata

  • Download URL: dl-cloud-0.1.9.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dl-cloud-0.1.9.tar.gz
Algorithm Hash digest
SHA256 a68b53b496d30d1da48867e12975c77ce3d8753bd0f5e143ffaa64f17b940f95
MD5 b073d67a44b59bd611082ff9fc52b48c
BLAKE2b-256 c6a187f2e7968d0e16ffbc45cb15d7f6542f5facfbe0c3097c11cff459bf45e2

See more details on using hashes here.

File details

Details for the file dl_cloud-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: dl_cloud-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.2

File hashes

Hashes for dl_cloud-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 fd4e5fe8f55fe68fe3395ee93dd45c44745761643fa858cb021c7b6c29c773de
MD5 25be0db6b368b309cd4f2b08de5bf48a
BLAKE2b-256 0b5e615f852d8ad219e48d7ac93a60b67d3cbd423f5ea4ee419d9df846dd71c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page