A python package that enables user to build their custom singularity image on HPC cluster
Project description
Building a singular container for HPC using globus-compute
Context
-
One of the execution configurations of globus-compute requires a registered container which is spun up to execute the user function on the HPC.
-
HPCs do not run docker containers(due to security reasons as discussed here) and support only an apptainer/singularity image.
-
Installing the apptainer setup to build the singularity image locally is not a straightforward process especially on windows and mac systems as discussed in the documentation.
Using this python library the user can specify their custom image specification to build an apptainer/singularity image which would be used to in-turn to run their functions on globus-compute. The library registers the container and returns the container id which would be used by the globus-compute executor to execute the user function.
Prerequisite.
A globus-compute-endpoint setup on HPC cluster.
The following steps can be used to create an endpoint on the NCSA Delta Cluster, you can modify the configurations based on your use-case:
Note.
For the following to work we must use the globus-compute-sdk version of 2.2.0 while setting up our endpoint. It is recommended to use python3.9 for setting up the endpoint and as the client
- Create a conda virtual env. We have created a
custom-image-builder
conda env on the delta cluster as follows:
conda create --name custom-image-builder-py-3.9 python=3.9
conda activate custom-image-builder
pip install globus-compute-endpoint==2.2.0
- Creating a globus-compute endpoint:
globus-compute-endpoint configure custom-image-builder
Update the endpoint config at ~/.globus_compute/custom-image-builder/config.py
to :
from parsl.addresses import address_by_interface
from parsl.launchers import SrunLauncher
from parsl.providers import SlurmProvider
from globus_compute_endpoint.endpoint.utils.config import Config
from globus_compute_endpoint.executors import HighThroughputExecutor
user_opts = {
'delta': {
'worker_init': 'conda activate custom-image-builder-py-3.9',
'scheduler_options': '#SBATCH --account=bbmi-delta-cpu',
}
}
config = Config(
executors=[
HighThroughputExecutor(
max_workers_per_node=10,
address=address_by_interface('hsn0'),
scheduler_mode='soft',
worker_mode='singularity_reuse',
container_type='singularity',
container_cmd_options="",
provider=SlurmProvider(
partition='cpu',
launcher=SrunLauncher(),
# string to prepend to #SBATCH blocks in the submit
# script to the scheduler eg: '#SBATCH --constraint=knl,quad,cache'
scheduler_options=user_opts['delta']['scheduler_options'],
worker_init=user_opts['delta']['worker_init'],
# Command to be run before starting a worker, such as:
# 'module load Anaconda; source activate parsl_env'.
# Scale between 0-1 blocks with 2 nodes per block
nodes_per_block=1,
init_blocks=0,
min_blocks=0,
max_blocks=1,
# Hold blocks for 30 minutes
walltime='00:30:00'
),
)
],
)
- Start the endpoint and store the endpoint id to be used in the following example
globus-compute-endpoint start custom-image-builder
Example
Consider the following use-case where the user wants to execute a pandas operation on HPC using globus-compute. They need a singularity image which would be used by the globus-compute executor. The library can be leveraged as follows:
Locally you need to install the following packages, you can create a virtual env as follows:
cd example/
python3.9 -m venv venv
source venv/bin/activate
pip install globus-compute-sdk==2.2.0
pip install custom-image-builder
from custom_image_builder import build_and_register_container
from globus_compute_sdk import Client, Executor
def transform():
import pandas as pd
data = {'Column1': [1, 2, 3],
'Column2': [4, 5, 6]}
df = pd.DataFrame(data)
return "Successfully created df"
def main():
image_builder_endpoint = "bc106b18-c8b2-45a3-aaf0-75eebc2bef80"
gcc_client = Client()
container_id = build_and_register_container(gcc_client=gcc_client,
endpoint_id=image_builder_endpoint,
image_file_name="my-pandas-image",
base_image_type="docker",
base_image="python:3.8",
pip_packages=["pandas"])
print("The container id is", container_id)
with Executor(endpoint_id=image_builder_endpoint,
container_id=container_id) as ex:
fut = ex.submit(transform)
print(fut.result())
Note.
The singularity image require globus-compute-endpoint as one of its packages in-order to run the workers as our custom singularity container, hence by default we require python as part of the image inorder to install globus-compute-endpoint.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for custom_image_builder-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2601abf9b2aeb9abdbecf2e0a7bb94e133126fe47bcdb46704e9b12e84700d2 |
|
MD5 | 65713ea40d7a379e738dd9efcb1c63b2 |
|
BLAKE2b-256 | 7f6da29ea805478fc34f3b8e4bbdcb4996a22759fd55e3c2479ed12a7f41ef67 |
Hashes for custom_image_builder-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad0fe09165155a5c0c1604e0917c17aa2bcf0eb44d41b889d20593882d3c77c3 |
|
MD5 | 62de03144b16e69e1715aed22705ba7d |
|
BLAKE2b-256 | d96d5e969d01b7a3e2758b7cce8a42b68d927947e063e0fb48b8a6dfbca3ecd5 |