Skip to main content

Python library for scheduling jobs on a Kubernetes cluster by simply calling a Python function.

Project description

Kubernetes-job: simple Kubernetes job creation

A library for starting a Kubernetes batch job as a normal Python function call.

Installation

Kubernetes-job can be installed using Pip:

pip install kubernetes-job

Quick start

from kubernetes_job import JobManager


def add(a, b):
    return a + b


manager = JobManager(k8s_client=k8s_client, k8s_job_spec='job.yaml', namespace='default')
job = manager.create_job(add, 1, 2)

The JobManager will now create a Kubernetes job using the basic job specification in the job.yaml file. The call to add is then passed on to the new job node, where the function is subsequently executed.

The job.yaml file should be adjusted to your needs. This is the place to put Kubernetes node selectors, Docker base images, etc. etc. Please refer to the Kubernetes documentation for details.

Please note that this is very silly example, for two obvious reasons.

First, add will take a very short time to complete, and is therefore not a function you would want to spawn a Kubernetes job for. A job should be created for a task that is not easily performed on the calling machine. A good example would be training Machine Learning models on a heavy CUDA node, started from a web server node with modest resources.

Second, Kubernetes jobs do not return values! This means the result of this addition will be lost. In a Kubernetes job, it is up to the job to save its work. In this case, the result of (1 + 2) will be lost for humanity.

Configuration

KubernetesJobs does not need much configuration. Basically, there are 3 things to be done:

  1. Configuring the Kubernetes job spec template (i.e. job.yaml);
  2. Initializing a Kubernetes ApiClient
  3. Initializing the JobManager

Configuring the Kubernetes job spec template (i.e. job.yaml)

When KubernetesJob spawns a new job, the Kubernetes job spec template is used as the base configuration for the new job.

This is an example:

apiVersion: batch/v1
kind: Job
metadata:
  # job name; a unique id will be added when launching a new job based on this template
  name: kubernetes-job
spec:

  # Try 1 time to execute this job
  backoffLimit: 1

  # Active deadline (timeout), in a number of seconds.
  activeDeadlineSeconds: 3600

  # Clean up pods and logs after finishing the job
  ttlSecondsAfterFinished: 3600

  template:
    spec:
      containers:
      - name: kubernetes-job
        image: registry.gitlab.com/roemer/kubernetes-job:latest
      restartPolicy: Never

Please adjust this template to your needs by specifying the right container image, job deadlines, etc. The Kubernetes documentation contains more information.

When KubernetesJob spawns a new job, three things are added to the template:

  1. A unique name, generated by adding a timestamp;
  2. The function call, serialized (using Pickle), added as an environment variable;
  3. A cmd entry calling JobManager.execute.

A working example can be found in the test/ directory.

Make sure the Docker image in the job template contains the same packaged Python software as the process creating the job! Otherwise the function cannot be executed in the new job pod.

Initializing a Kubernetes ApiClient

There are several ways to configure the Kubernetes client. Probably the easiest way is to use a bearer token. This can be done as follows:

from kubernetes import client

configuration = client.Configuration()
configuration.api_key["authorization"] = '<token>'
configuration.api_key_prefix['authorization'] = 'Bearer'
configuration.host = 'https://<endpoint_of_api_server>'
configuration.ssl_ca_cert = '<path_to_cluster_ca_certificate>'

k8s_client = client.ApiClient(configuration)

How the correct settings for token, endpoint_of_api_server, and the cluster CA certificates can be retrieved is explained in the section below.

Another possibility is to use an existing Kubectl configuration. This might be the best solution for testing purposes:

from kubernetes import client, config

# Configs can be set in Configuration class directly or using helper utility
config.load_kube_config()

k8s_client = client.ApiClient()

Please refer to https://github.com/kubernetes-client/python for more documentation.

Initializing the JobManager

The JobManager must be supplied a yaml template file (see above) and the Kubernetes client.

from pathlib import Path
from kubernetes_job import JobManager

# Path to worker configuration
yaml_spec = Path(__file__).parent / 'job.yml'

# initialize the job manager
manager = JobManager(k8s_client=k8s_client, k8s_job_spec=yaml_spec, namespace='default')

Please note that the k8s_job_spec may be a path to a file, or a dict instance. The latter is handy for generating configuration on the fly!

API

Create a new job

A job can be started by invoking create_job on the JobManager instance:

# function to pass to the job
def add(a, b):
    result = a + b
    print(result)
    return result

# create a new job
job = manager.create_job(add, 123, 456)

create_job takes a function pointer. This function pointer and all arguments (*args and **kwargs) are then "pickled", and merged in the job template.

Our job is now running on the Kubernetes cluster!

Listing jobs

# list all jobs
for job in manager.list_jobs():
    print(f"Found: {job.metadata.name}")

Retrieving job status

from kubernetes_job import is_active, is_succeeded, is_failed, is_completed 

# get the status of a job
job = manager.read_job(name)
print(f"Running: {is_active(job)} Succeeded: {is_succeeded(job)} Failed: {is_failed(job)} Completed: {is_completed(job)}")

Deleting jobs

# delete a job
manager.delete_job(name)

Configuring Kubernetes for token-based authentication

Create a service account

First, create a service account:

# Create a service account
kubectl create -f service_account.yml --k8s_namespace=default

An example of service_account.yml can be found here

Kubernetes generates a unique name for the new service account. We need to retrieve that unique name, and to do that, we need to ask Kubernetes for its secrets:

# retrieve secret 
kubectl get secrets --k8s_namespace=default | grep kubernetes-job-service-account

This returns something like this:

kubernetes-job-service-account-token-XXXXX   kubernetes.io/service-account-token   3      66s

kubernetes-job-service-account-token-XXXXX is the name generated by Kubernetes.

Retrieving the access token

Now we are able to retrieve the access token for this service account:

kubectl describe secret/kubernetes-job-service-account-token-XXXXX | grep token

This returns something like:

token:      <token>

This token is the one we're looking for.

Cluster endpoint and cluster CA certificates

To connect to the cluster we also need the cluster endpoint and the CA certificates. Both can easily be retrieved through the Kubernetes dashboard, through the "cluster details" page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kubernetes-job-0.1.4.tar.gz (9.2 kB view hashes)

Uploaded Source

Built Distribution

kubernetes_job-0.1.4-py3-none-any.whl (9.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page