Skip to main content

A simple cloud-based workflow system

Project description

Crosscore

Crosscore is part of Project Crossbow. It allows you to create an autoscaling pool of instances in the cloud that can then be used with crossflow to execute computational workflows.

Currently Crosscore supports Amazon Web Services and Google Cloud Platform.

1. Installation

1.1 Prerequisites

1.1.1 Python Version

Crosscore requires Python 3.6 or higher. No version of Python 2 is supported.

1.1.2 Cloud Provider Configuration

Crosscore supports both AWS and GCP. There are slightly different configuration processes depending on which you plan to use:

AWS

It is assumed that you have done what is required to give you programmatic access to your AWS account. This will involve generating your AWS AccessKey ID and Secret Access Key, and installing them with aws configure.

In addition you need to make sure your account has the following permissions:

Amazon EC2FullAccess

GCP

You need to have downloaded a .json file with your service account credentials - see here for details. Then you need to decide on an availability zone for your cluster - bear in mind that this will affect the range of instance types (particularly GPU accelerators) you will be able to launch. With these in hand, create two environment variables:

export GOOGLE_APPLICATION_CREDENTIALS=<path to credentials file>
export GOOGLE_DEFAULT_AVAILABILITY_ZONE=<availability zone>

1.1.3 Terraform

Crosscore uses Terraform to do the heavy lifting of cloud infrastructure creation and management. Before you can use Crosscore you must install terraform accoring to their instructions. Once you can run:

terraform -version

you have done enough.

1.1.4 SSH

You will need an ssh public key (e.g., $HOME/.ssh/id_rsa.pub). If you don't already have this, use ssh-keygen to make it, then set an environment variable to its location:

export SSH_PUBLIC_KEY=<path to id_rsa.pub or equivalent>

1.2 Install the Crosscore Python Package

Crosscore is not currently in pypi so to install it use:

pip install git+https://bitbucket.org/claughton/crosscore.git

If all goes smoothly, you can then check the installation is OK by running xcore -h:

usage: xcore [-h] [-V] {status,start,restart,shutdown,daemon} ...

Crosscore: Cloud clusters for distributed computing.

positional arguments:
  {status,start,restart,shutdown,daemon}
    status              status of crosscore cluster
    start               create cloud resources
    restart             recreate cloud resources
    shutdown            terminate and delete all resources
    daemon              control the xcore daemon

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit

1.3 Configuration

Produce an initial default configuration for xcore using the command:

xcore-init <provider>

where is "aws" or "gcp".

This will run some checks that you have all the prerequisites, and create some configuration and template files. These will be placed in $HOME/.xcore. The process may take quite a time, as it involves terraform creating your base cloud infrastructure.

Once complete, check the default configuration in $HOME/.xcore/config.yaml. In particular you may want to change image_name - the name of the machine image used to create the worker instances, and the associated image_owner. Most other "interesting" parameters such as worker instance type and the maximum number of workers that can be launched, can be changed interactively so do not need editing here now.

1.4 Start up

Once you are happy with the configuration, run xcore start to create the base cloud infrastructure and launch the Crosscore daemon. The base infrastructure consists of a small (default t2.small/f1-micro) instance that runs the scheduler, the daemon listens for job requests and autoscales the cluster as required.

2. Run a test job

Create a small crossflow workflow, e.g.:

from crossflow.kernels import SubprocessKernel
from crossflow.clients import Client
from crosscore import cluster

sleeper = SubprocessKernel('sleep {n}; echo {n}')
sleeper.set_inputs(['n'])
sleeper.set_outputs(['STDOUT'])

client = Client(address=cluster.get_url())
result = client.submit(sleeper, 10)
print(result.result())

If you run this Python script interactively in one window, you can use xcore status from another to follow the process of worker creation, the job being run, and the worker being deleted after.

3. Shut down the cluster

If you are not going to use the cluster for a while, you can shut down the scheduler instance and stop the daemon:

xcore shutdown

When you want to use it again, you run xcore restart

4. Changing the instance type and cluster size

Within a script you can adjust the maximum number of instances that may be launched, and their instance type, before you submit the job, e.g.:

...
# AWS example:
cluster.set_worker_type('c5.xlarge')
# GCP example:
# cluster.set_worker_type('n1-standard-4', accelerator_type='nvidia-tesla-t4')
cluster.set_max_workers(5)
client = Client(cluster.get_url())
...

5. Changing the machine image

The workflows you can run using crossflow depends on the software installed on your worker nodes. Though you may be able to do some provisioning of these on the fly (i.e., within crossflow kernel definitions) most likely you will want to prepare machine images with your favourite software stack pre-installed. Examples of how this can be done using Packer are available in the Packer folder.

Note that if you change the machine image, you will need to restart crosscore (crosscore shutdown; crosscore restart).

6. Authors:

• Christian Suess • Charlie Laughton charles.laughton@nottingham.ac.uk

7. Acknowlegements:

EPSRC Grant EP/P011993/1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crosscore-0.0.1.tar.gz (17.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page