A simple cloud-based workflow system
Project description
Crosscore
Crosscore is part of Project Crossbow. It allows you to create an autoscaling pool of instances in the cloud that can then be used with crossflow to execute computational workflows.
Currently Crosscore supports Amazon Web Services and Google Cloud Platform.
1. Installation
1.1 Prerequisites
1.1.1 Python Version
Crosscore requires Python 3.6 or higher. No version of Python 2 is supported.
1.1.2 Cloud Provider Configuration
Crosscore supports both AWS and GCP. There are slightly different configuration processes depending on which you plan to use:
AWS
It is assumed that you have done what is required to give you programmatic access to your AWS account. This
will involve generating your AWS AccessKey ID and Secret Access Key, and installing them with aws configure
.
In addition you need to make sure your account has the following permissions:
Amazon EC2FullAccess
GCP
You need to have downloaded a .json file with your service account credentials - see here for details. Then you need to decide on an availability zone for your cluster - bear in mind that this will affect the range of instance types (particularly GPU accelerators) you will be able to launch. With these in hand, create two environment variables:
export GOOGLE_APPLICATION_CREDENTIALS=<path to credentials file>
export GOOGLE_DEFAULT_AVAILABILITY_ZONE=<availability zone>
1.1.3 Terraform
Crosscore uses Terraform to do the heavy lifting of cloud infrastructure creation and management. Before you can use Crosscore you must install terraform accoring to their instructions. Once you can run:
terraform -version
you have done enough.
1.1.4 SSH
You will need an ssh public key (e.g., $HOME/.ssh/id_rsa.pub). If you don't already have this, use ssh-keygen
to make it, then set an environment variable to
its location:
export SSH_PUBLIC_KEY=<path to id_rsa.pub or equivalent>
1.2 Install the Crosscore Python Package
Crosscore is not currently in pypi so to install it use:
pip install git+https://bitbucket.org/claughton/crosscore.git
If all goes smoothly, you can then check the installation is OK by running xcore -h
:
usage: xcore [-h] [-V] {status,start,restart,shutdown,daemon} ...
Crosscore: Cloud clusters for distributed computing.
positional arguments:
{status,start,restart,shutdown,daemon}
status status of crosscore cluster
start create cloud resources
restart recreate cloud resources
shutdown terminate and delete all resources
daemon control the xcore daemon
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
1.3 Configuration
Produce an initial default configuration for xcore using the command:
xcore-init <provider>
where is "aws" or "gcp".
This will run some checks that you have all the prerequisites, and create some configuration and template files. These will be placed in $HOME/.xcore. The process may take quite a time, as it involves terraform creating your base cloud infrastructure.
Once complete, check the default configuration in $HOME/.xcore/config.yaml. In particular you may want to change image_name
- the name of the machine image
used to create the worker instances, and the associated image_owner
. Most other "interesting" parameters such as worker instance type and the maximum number of workers that can be launched, can be changed interactively so do not need editing here now.
1.4 Start up
Once you are happy with the configuration, run xcore start
to create the base cloud infrastructure and launch the Crosscore daemon. The base infrastructure consists of a small (default t2.small/f1-micro) instance that runs the scheduler, the daemon listens for job requests and autoscales the cluster as required.
2. Run a test job
Create a small crossflow workflow, e.g.:
from crossflow.kernels import SubprocessKernel
from crossflow.clients import Client
from crosscore import cluster
sleeper = SubprocessKernel('sleep {n}; echo {n}')
sleeper.set_inputs(['n'])
sleeper.set_outputs(['STDOUT'])
client = Client(address=cluster.get_url())
result = client.submit(sleeper, 10)
print(result.result())
If you run this Python script interactively in one window, you can use
xcore status
from another to follow the process of worker creation, the
job being run, and the worker being deleted after.
3. Shut down the cluster
If you are not going to use the cluster for a while, you can shut down the scheduler instance and stop the daemon:
xcore shutdown
When you want to use it again, you run xcore restart
4. Changing the instance type and cluster size
Within a script you can adjust the maximum number of instances that may be launched, and their instance type, before you submit the job, e.g.:
...
# AWS example:
cluster.set_worker_type('c5.xlarge')
# GCP example:
# cluster.set_worker_type('n1-standard-4', accelerator_type='nvidia-tesla-t4')
cluster.set_max_workers(5)
client = Client(cluster.get_url())
...
5. Changing the machine image
The workflows you can run using crossflow depends on the software installed on your worker nodes. Though you may be able to do some provisioning
of these on the fly (i.e., within crossflow kernel definitions) most likely you will want to prepare machine images with your favourite
software stack pre-installed. Examples of how this can be done using Packer are available in the Packer
folder.
Note that if you change the machine image, you will need to restart crosscore (crosscore shutdown; crosscore restart
).
6. Authors:
• Christian Suess • Charlie Laughton charles.laughton@nottingham.ac.uk
7. Acknowlegements:
EPSRC Grant EP/P011993/1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file crosscore-0.0.1.tar.gz
.
File metadata
- Download URL: crosscore-0.0.1.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c6521d9c41a6a983abbf44f79c1fa8ef1bd7ecfc8e51df693fbea2709bb35a1 |
|
MD5 | 067c77d74e28f7b69f24d5118fc0a60a |
|
BLAKE2b-256 | 0a2ed484a076965f0d7b354dea3366fee99c8c6f334037af554cb65bc9f48eee |