Library for model training in multi-cloud environment.
Project description
cascade
Cascade is a library for submitting and managing jobs across multiple cloud environments. It is designed to integrate seamlessly into existing Prefect workflows or can be used as a standalone library.
Getting Started
Installation
poetry add block-cascade
or
pip install block-cascade
Example Usage in a Prefect flow
from block_cascade import remote
from block_cascade import GcpEnvironmentConfig, GcpMachineConfig, GcpResource
machine_config = GcpMachineConfig("n2-standard-4", 1)
environment_config = GcpEnvironmentConfig(
project="ds-cash-production",
region="us-west1",
service_account=f"ds-cash-production@ds-cash-production.iam.gserviceaccount.com",
image="us.gcr.io/ds-cash-production/cascade/cascade-test",
network="projects/603986066384/global/networks/neteng-shared-vpc-prod"
)
gcp_resource = GcpResource(
chief=machine_config,
environment=environment_config,
)
@remote(resource=gcp_resource)
def addition(a: int, b: int) -> int:
return a + b
result = addition(1, 2)
assert result == 3
Configuration
Cascade supports defining different resource requirements via a configuration file titled either cascade.yaml or cascade.yml. This configuration file must be located in the working directory of the code execution to be discovered at runtime.
calculate:
type: GcpResource
chief:
type: n1-standard-1
You can even define a default configuration that can be overridden by specific tasks to eliminate redundant definitions.
default:
GcpResource:
environment:
project: ds-cash-dev
service_account: ds-cash-production@ds-cash-production.iam.gserviceaccount.com
region: us-central-1
chief:
type: n1-standard-4
calculate:
type: GcpResource
environment:
project: ds-cash-production
chief:
count: 2
Authorization
Cascade requires authorization both to submit jobs to either GCP or Databricks and to stage picklied code to a cloud storage bucket. In the GCP example below, an authorization token is obtained via IAM by running the following command:
gcloud auth login --update-adc
No additional configuration is required in your application's code to use this token.
However, for authenticating to Databricks and AWS you will need to provide a token and secret key respectively. These can be passed directly to the DatabricksResource
object or set as environment variables. The following example shows how to provide these values in the configuration file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file block_cascade-2.2.0.tar.gz
.
File metadata
- Download URL: block_cascade-2.2.0.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38dd7aea86d0d47e762a3bf015aa8535e1bea1773ab0e5b93c1aec5bdcd30e01 |
|
MD5 | e6521c51cdf9569008ef65d08874668b |
|
BLAKE2b-256 | c03a92f583a762af0ba0a978fed44cbc930e34be86932037d4507ea6943c198c |
File details
Details for the file block_cascade-2.2.0-py3-none-any.whl
.
File metadata
- Download URL: block_cascade-2.2.0-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0daea7eea644a8323eea94a773adc94d3f773e96d22f5338fedb4d15659d5f8 |
|
MD5 | 1bbcb39ed2964568ddd201a0bc55cbb1 |
|
BLAKE2b-256 | f99b55d4b7a1c65ae6dc8b45f0161ef7ae06ff38217343426cb7ce5ac5594b14 |