Skip to main content

Custom metrics exporter for Flux in Kubernetes

Project description

Flux Metrics API

All Contributors

PyPI

This is an experiment to create a metrics API for Kubernetes that can be run directly from the Flux leader broker pod. We made this after creating prometheus-flux and wanting a more minimalist design. I'm not even sure it will work, but it's worth a try!

Usage

Install

You can install from pypi or from source:

$ python -m venv env
$ source env/bin/activate
$ pip install flux-metrics-api

# or

$ git clone https://github.com/converged-computing/flux-metrics-api
$ cd flux-metrics-api
$ pip install .
# you can also do "pip install -e ."

This will install the executable to your path, which might be your local user bin:

$ which flux-metric-api
/home/vscode/.local/bin/flux-metrics-api

Note that the provided .devcontainer includes an environment for VSCode where you have Flux and can install this and use ready to go!

Start

You'll want to be running in a Flux instance, as we need to connect to the broker handle.

$ flux start --test-size=4

And then start the server. This will use a default port and host (0.0.0.0:8443) that you can customize if desired.

$ flux-metrics-api start

# customize the port or host
$ flux-metrics-api start --port 9000 --host 127.0.0.1

SSL

If you want ssl (port 443) you can provide the path to a certificate and keyfile:

$ flux-metrics-api start --ssl-certfile /etc/certs/tls.crt --ssl-keyfile /etc/certs/tls.key

An example of a full command we might run from within a pod:

$ flux-metrics-api start --port 8443 --ssl-certfile /etc/certs/tls.crt --ssl-keyfile /etc/certs/tls.key --namespace flux-operator --service-name custom-metrics-apiserver

On the fly custom metrics!

If you want to provide custom metrics, you can write a function in an external file that we will read it and add to the server. As a general rule:

  • The name of the function will be the name of the custom metric
  • You can expect the only argument to be the flux handle
  • You'll need to do imports within your function to get them in scope

This likely can be improved upon, but is a start for now! We provide an example file. As an example:

$ flux-metrics-api start --custom-metric ./example/custom-metrics.py

And then test it:

$ curl -s http://localhost:8443/apis/custom.metrics.k8s.io/v1beta2/namespaces/flux-operator/metrics/my_custom_metric_name | jq
{
  "items": [
    {
      "metric": {
        "name": "my_custom_metric_name"
      },
      "value": 4,
      "timestamp": "2023-06-01T01:39:08+00:00",
      "windowSeconds": 0,
      "describedObject": {
        "kind": "Service",
        "namespace": "flux-operator",
        "name": "custom-metrics-apiserver",
        "apiVersion": "v1beta2"
      }
    }
  ],
  "apiVersion": "custom.metrics.k8s.io/v1beta2",
  "kind": "MetricValueList",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta2"
  }
}

See --help to see other options available.

Endpoints

Metric

GET /apis/custom.metrics.k8s.io/v1beta2/namespaces//metrics/<metric_name>

Here is an example to get the "node_up_count" metric:

 curl -s http://localhost:8443/apis/custom.metrics.k8s.io/v1beta2/namespaces/flux-operator/metrics/node_up_count | jq
{
  "items": [
    {
      "metric": {
        "name": "node_up_count"
      },
      "value": 2,
      "timestamp": "2023-05-31T04:44:57+00:00",
      "windowSeconds": 0,
      "describedObject": {
        "kind": "Service",
        "namespace": "flux-operator",
        "name": "custom-metrics-apiserver",
        "apiVersion": "v1beta2"
      }
    }
  ],
  "apiVersion": "custom.metrics.k8s.io/v1beta2",
  "kind": "MetricValueList",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta2"
  }
}

The following metrics are supported:

  • node_up_count: number of nodes up in the MiniCluster
  • node_free_count: number of nodes free in the MiniCluster
  • node_cores_free_count: number of node cores free in the MiniCluster
  • node_cores_up_count: number of node cores up in the MiniCluster
  • job_queue_state_new_count: number of new jobs in the queue
  • job_queue_state_depend_count: number of jobs in the queue in state "depend"
  • job_queue_state_priority_count: number of jobs in the queue in state "priority"
  • job_queue_state_sched_count: number of jobs in the queue in state "sched"
  • job_queue_state_run_count: number of jobs in the queue in state "run"
  • job_queue_state_cleanup_count: number of jobs in the queue in state "cleanup"
  • job_queue_state_inactive_count: number of jobs in the queue in state "inactive"

Docker

We have a docker container, which you can customize for your use case, but it's more intended to be a demo. You can either build it yourself, or use our build.

$ docker build -t flux_metrics_api .
$ docker run -it -p 8443:8443 flux_metrics_api

or

$ docker run -it -p 8443:8443 ghcr.io/converged-computing/flux-metrics-api

Development

Note that this is implemented in Python, but (I found this after) we could also use Go. Specifically, I found this repository useful to see the spec format.

You can then open up the browser at http://localhost:8443/metrics/ to see the metrics!

😁️ Contributors 😁️

We use the all-contributors tool to generate a contributors graphic below.

Vanessasaurus
Vanessasaurus

💻

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flux-metrics-api-0.0.11.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

flux_metrics_api-0.0.11-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file flux-metrics-api-0.0.11.tar.gz.

File metadata

  • Download URL: flux-metrics-api-0.0.11.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/6.0.0 pkginfo/1.9.6 requests/2.29.0 requests-toolbelt/0.9.1 tqdm/4.65.0 CPython/3.11.3

File hashes

Hashes for flux-metrics-api-0.0.11.tar.gz
Algorithm Hash digest
SHA256 28f59a61772fea164bd660b1da51470bbf1f9a5d8ef0ef3fd01cf0939947f827
MD5 9ec52986ad89e4dafd20e24f1e710919
BLAKE2b-256 a452ba96c606284868923c1c67430a7dc2f43741ce956e82912021c62eec9447

See more details on using hashes here.

Provenance

File details

Details for the file flux_metrics_api-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: flux_metrics_api-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/6.0.0 pkginfo/1.9.6 requests/2.29.0 requests-toolbelt/0.9.1 tqdm/4.65.0 CPython/3.11.3

File hashes

Hashes for flux_metrics_api-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 051705583bb34a89cd311b5cec02129111d9247fef3f375553bb9ef1c6b6e1fc
MD5 74f8c5709c6a8652c97d7ff8aacef67e
BLAKE2b-256 020f1478b1fec6ff966fd2d54a01703a4e3e069c6d16914fbc74dddc2103255d

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page