Skip to main content

CloudSend is a Python package that allows you to run any script on a cloud service (for now AWS only).

Project description

Description

CloudSend is a Python package that allows you to run any script on a cloud service (for now AWS only).

Features

  • Easily run scripts on AWS by writing a simple configuration file
  • Handles Python and Julia scripts, or any command
  • Handles PyPi , Conda/Mamba, Apt-get and Julia environments
  • Concurrent instance support
  • Handles disconnections from instances, including stopped or terminated instances
  • Handles interruption of CloudSend, with state recovery
  • Runs locally or on a remote instance, with 'watcher' functionality
Important Note
CloudSend helps you easily create instances on AWS so that you can focus on your scripts. It is important to realize that it can and will likely generate extra costs. If you want to minimize those costs, activate the eco mode in the configuration or make sure you monitor the resources created by CloudSend. Those include:
  • VPCs
  • Subnets
  • Security Groups
  • Instances
  • Device Mappings / Disks
  • Policies & Roles
  • KeyPairs

Pre-requisites

In order to use the python AWS client (Boto3), you need to have an existing AWS account and to setup your computer for AWS.

with AWS CLI

  1. Go to the AWS Signup page and create an account
  2. Download the AWS CLI
  3. In the AWS web console, create a user with administrator privilege
  4. In the AWS web console, under the AMI section, click on the new user and make sure you create an access key under the tab "Security Credentials". Make sure "Console Password" is Enabled as well
  5. In ther Terminal, use the AWS CLI to setup your configuration:
aws configure

See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html

  1. To run in remote mode, you also need to add the following credentials to your user (maybe):
  • iam:PassRole
  • iam:CreateRole
  • ec2:AssociateIamInstanceProfile
  • ec2:ReplaceIamInstanceProfileAssociation

manually

  1. Go to the AWS Signup page and create an account
  2. In the AWS web console, create a user with administrator privilege
  3. In the AWS web console, under the IAM section, click on the new user and make sure you create an access key under the tab "Security Credentials". Make sure "Console Password" is Enabled as well
  4. Add your new user credentials manually, in the credentials file
'~/.aws/config' example ('C:\Users\USERNAME\.aws\config' on Windows)
[default]
region = eu-west-3
output = json
'~/.aws/credentials' example ('C:\Users\USERNAME\.aws\credentials' on Windows)
[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

Setting up a separate user with least permissions (manually)

  1. In the AWS web console, in the IAM service, create a group 'cloudsend-users' with AmazonEC2FullAccess and IAMFullAccess permissions
  2. In the AWS web console, in the IAM service, create a user USERNAME attached to the 'cloudsend-users' group:

Step 1

add user 1

Step 2

add user 2

... Step 5

add user 3

COPY the Access Key info !

  1. Add your new user profile manually, in the credentials file
'~/.aws/config' example ('C:\Users\USERNAME\.aws\config' on Windows)
[default]
region = eu-west-3
output = json

[profile cloudsend]
region = eu-west-3
output = json
'~/.aws/credentials' example ('C:\Users\USERNAME\.aws\credentials' on Windows)
[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

[cloudsend]
aws_access_key_id = YOU_PROFILE_ACCESS_KEY_ID
aws_secret_access_key = YOUR_PROFILE_SECRET_ACCESS_KEY
  1. add the 'profile' : 'cloudsend_USERNAME' to the configuration
config = {

    ################################################################################
    # GLOBALS
    ################################################################################

    'project'      : 'test' ,                             # this will be concatenated with the instance hashes (if not None) 
    'profile'      : 'cloudsend' ,
    ...

Installation

On MacOS / Linux

with pip

python3 -m venv .venv
source ./.venv/bin/activate
python3 -m ensurepip --default-pip
python3 -m pip install -r requirements.txt

with poetry

curl -sSL https://install.python-poetry.org | python3.8 -
poetry install

On Windows (powershell)

with pip

C:\> python3 -m venv .venv
C:\> .venv\\Scripts\\activate.bat
C:\> python -m pip install -r requirements.txt

with poetry

C:\> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
C:\> poetry install

Usage / Test runs

# copy the example file
cp example/config.example.py config.py
#
# EDIT THE FILE
#

# to run with pip
python3 -m cloudsend.demo config
# to run with pip with reset (maestro and instances)
python3 -m cloudsend.demo config reset
# to run with poetry
poetry run demo config
# to run with poetry with reset (maestro and the instances)
poetry run demo config reset

Configuration example

config = {

    ################################################################################
    # GLOBALS
    ################################################################################

    'project'      : 'test' ,                             # this will be concatenated with the instance hashes (if not None) 
    'profile'      : None ,                               # if you want to use a specific profile (user/region), specify its name here
    'dev'          : False ,                              # When True, this will ensure the same instance and dev environement are being used (while working on building up the project) 
    'debug'        : 1 ,                                  # debug level (0...3)
    'maestro'      : 'local' ,                            # where the 'maestro' resides: local' | 'remote' (micro instance)
    'auto_stop'    : True ,                               # will automatically stop the instances and the maestro, once the jobs are done
    'provider'     : 'aws' ,                              # the provider name ('aws' | 'azure' | ...)
    'job_assign'   : None ,                               # algorithm used for job assignation / task scheduling ('random' | 'multi_knapsack')
    'recover'      : True ,                               # if True, CloudSend will always save the state and try to recover this state on the next execution
    'print_deploy' : False ,                              # if True, this will cause the deploy stage to print more (and lock)
    'mutualize_uploads' : True ,                          # adjusts the directory structure of the uploads ... (False = per job or True = global/mutualized)


    ################################################################################
    # INSTANCES / HARDWARE
    ################################################################################

    'instances' : [
        { 
            'region'       : None ,                       # can be None or has to be valid. Overrides AWS user region configuration.
            'cloud_id'     : None ,                       # can be None, or even wrong/non-existing - then the default one is used
            'img_id'       : 'ami-077fd75cd229c811b' ,    # OS image: has to be valid and available for the profile (user/region)
            'img_username' : 'ubuntu' ,                   # the SSH user for the image
            'type'         : 't2.micro' ,                 # proprietary size spec (has to be valid)
            'cpus'         : None ,                       # number of CPU cores
            'gpu'          : None ,                       # the proprietary type of the GPU 
            'disk_size'    : None ,                       # the disk size of this instance type (in GB)
            'disk_type'    : None ,                       # the proprietary disk type of this instance type: 'standard', 'io1', 'io2', 'st1', etc
            'eco'          : True ,                       # eco == True >> SPOT e.g.
            'eco_life'     : None ,                       # lifecycle of the machine in ECO mode (datetime.timedelta object) (can be None with eco = True)
            'max_bid'      : None ,                       # max bid ($/hour) (can be None with eco = True)
            'number'       : 1 ,                          # multiplicity: the number of instance(s) to create
            'explode'      : True                         # multiplicity: can this instance type be distributed accross multiple instances, to split CPUs
        }

    ] ,

    ################################################################################
    # ENVIRONMENTS / SOFTWARE
    ################################################################################

    'environments' : [
        {
            'name'         : None ,                       # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1

            # env_conda + env_pypi  : mamba is used to setup the env (pip dependencies included)
            # env_conda (only)      : mamba is used to setup the env
            # env_pypi  (only)      : venv + pip is used to setup the env 

            'command'      : 'example/install_julia.sh' ,      # None, or a string: path to a bash file to execute when deploying
            'env_aptget'   : [ "openssh-client"] ,        # None, an array of librarires/binaries for apt-get
            'env_conda'    : "example/environment.yml",   # None, an array of libraries, a path to environment.yml  file, or a path to the root of a conda environment
            'env_conda_channels' : None ,                 # None, an array of channels. If None (or absent), defaults and conda-forge will be used
            'env_pypi'     : "example/requirements.txt" , # None, an array of libraries, a path to requirements.txt file, or a path to the root of a venv environment 
            'env_julia'    : [ "Wavelets" ] ,             # None, a string or an array of Julia packages to install (requires julia)
        }
    ] ,

    ################################################################################
    # JOBS / SCRIPTS
    ################################################################################

    'jobs' : [
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'example/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'run_command'  : None ,                       # the command to run
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
            'repeat'       : 2 ,                          # the number of times this job is repeated
        } ,
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'example/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'run_command'  : None ,                       # the command to run
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
        }
    ]
}

Minimum configuration example

config = {
    'debug'        : 1 ,                                  # debug level (0...3)
    'maestro'      : 'local' ,                            # where the 'maestro' resides: local' | 'remote' (nano instance) | 'lambda'
    'provider'     : 'aws' ,                              # the provider name ('aws' | 'azure' | ...)

    'instances' : [
        { 
            'type'         : 't2.micro' ,                 # proprietary size spec (has to be valid)
        }

    ] ,

    'environments' : [
        {
            'name'         : None ,                       # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1
            'env_conda'    : "example/environment.yml",   # None, an array of libraries, a path to environment.yml  file, or a path to the root of a conda environment
            'env_julia'    : ["Wavelets"] ,                       # None, a string or an array of Julia packages to install (requires julia)
        }
    ] ,

    'jobs' : [
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'example/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
        } ,
        {
            'env_name'     : None ,                       # the environment to use (can be 'None' if solely one environment is provided above)
            'cpus_req'     : None ,                       # the CPU(s) requirements for the process (can be None)
            'run_script'   : 'example/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
            'upload_files' : [ "uploaded.txt"] ,          # any file to upload (array or string) - will be put in the same directory
            'input_files'   : 'input.dat' ,                # the input file name (used by the script)
            'output_files'  : 'output.dat' ,               # the output file name (used by the script)
        }
    ]
}

Python API

class CloudSendLightProvider(ABC):
class CloudSendFatProvider(ABC):

    def debug(self,level,*args,**kwargs):

    # start the provider: creates the instances
    # if reset = True, CloudSend forces a process cleanup as well as more re-uploads
    def start(self,reset):

    # deploy all materials (environments, files, scripts etc.)
    def deploy(self):

    # run the jobs
    # returns a CloudSendRunSession
    def run(self,wait=False):

    # wait for the processes to reach a state
    def wait(self,job_state,run_session=None):

    # get the states of the processes
    def get_jobs_states(self,run_session=None):

    # print a summary of processes
    def print_jobs_summary(self,run_session=None,instance=None):

    # print the aborted logs, if any
    def print_aborted_logs(self,run_session=None,instance=None):

    # fetch results data
    def fetch_results(self,out_directory=None,run_session=None):

    # wait for the watcher process to be completely done (useful for demo)
    def finalize(self):

    # wakeup = start + assign + deploy + run + watch
    def wakeup(self)

    @abstractmethod
    def get_region(self):

    @abstractmethod
    def get_recommended_cpus(self,inst_cfg):

    @abstractmethod
    def create_instance_objects(self,config,for_maestro):

    @abstractmethod
    def find_instance(self,config):

    @abstractmethod
    def start_instance(self,instance):

    @abstractmethod
    def stop_instance(self,instance):

    @abstractmethod
    def terminate_instance(self,instance):

    @abstractmethod
    def update_instance_info(self,instance):    

# GLOBAL methods 

def get_client(provider='aws',maestro='local')

CloudSend usage

Python programmatic use

Note: this demo works the same way, whether CloudSend runs locally or remotely

from cloudsend      import provider as cloudsend
from cloudsend.core import CloudSendProcessState
import asyncio 

# load config
config = __import__(config).config

# create provider: this loads the config
provider = cloudsend.get_client(config)

# start the provider: this attempts to create the instances
await provider.start()

# deploy the necessary stuff onto the instances
await provider.deploy()

# run the jobs and get active processes objects back
run_session = await provider.run()

# wait for the active proccesses to be done or aborted:
await provider.wait(CloudSendProcessState.DONE|CloudSendProcessState.ABORTED)

# you can get the state of all jobs this way:
await provider.get_jobs_states()
# or get the state for a specific run session:
await provider.get_jobs_states(run_session)

# you can print processes summary with:
await provider.print_jobs_summary()

# get the results file locally
await provider.fetch_results('./tmp')

CLI use

Note: the commands below work the same way, whether CloudSend runs locally or remotely

with Poetry

# init the client with global params and add instances, envs and jobs (if any)
poetry run cli init config.py
# add more jobs
poetry run cli cfg_add_jobs config_jobs.py
# add more stuff
poetry run cli cfg_add_config config_more.py
# deploy the material onto the instances
poetry run cli deploy
# run the jobs
poetry run cli run
# wait for the jobs to be done
poetry run cli wait
# get the results
poetry run cli fetch_results
 # shutdown the daemon
poetry run cli shutdown

with virtualenv

# init the client with global params and add instances, envs and jobs (if any)
python3 -m cloudsend.cli init config.py
# add more jobs
python3 -m cloudsend.cli cfg_add_jobs config_jobs.py
# add more stuff
python3 -m cloudsend.cli cfg_add_config config_more.py
# deploy the material onto the instances
python3 -m cloudsend.cli deploy
# run the jobs
python3 -m cloudsend.cli run
# wait for the jobs to be done
python3 -m cloudsend.cli wait
# get the results
python3 -m cloudsend.cli fetch_results
 # shutdown the daemon
python3 -m cloudsend.cli shutdown

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloudsend-0.6.5.tar.gz (97.2 kB view details)

Uploaded Source

Built Distribution

cloudsend-0.6.5-py3-none-any.whl (102.3 kB view details)

Uploaded Python 3

File details

Details for the file cloudsend-0.6.5.tar.gz.

File metadata

  • Download URL: cloudsend-0.6.5.tar.gz
  • Upload date:
  • Size: 97.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8

File hashes

Hashes for cloudsend-0.6.5.tar.gz
Algorithm Hash digest
SHA256 622f239812d1dbf7e3a09ef973b34d6f396223c8186ede6910dc70b68252d234
MD5 80af5329efb1a173469a474bd03de5c5
BLAKE2b-256 94854f348f88cd0b2ecc7c75162ded5cf405a1a445425f6dcd69ff73787ad630

See more details on using hashes here.

File details

Details for the file cloudsend-0.6.5-py3-none-any.whl.

File metadata

  • Download URL: cloudsend-0.6.5-py3-none-any.whl
  • Upload date:
  • Size: 102.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/39.2.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.8

File hashes

Hashes for cloudsend-0.6.5-py3-none-any.whl
Algorithm Hash digest
SHA256 57d506b037c592fcaa3c6838c799fd5616fd0561e198652638d09bafdc14874b
MD5 5d8518b026f93946549727c9fdf8e844
BLAKE2b-256 cb57c58ff7b2417d0f2ce3bad980a6b3291072af8483abfac8beda503d531c05

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page