Skip to main content

Prometheus HTTP SD framework.

Project description

prometheus-http-sd

This is a Prometheus HTTP SD framework.

Test

Features

  • Support static targets from Json file;
  • Support static targets from Yaml file;
  • Support generating target list using Python script;
  • Support check command, to testing the generated target is as expected, and counting the targets;
  • You can monitoring your target generator via /metrics, see metrics;
  • Admin page to list all target paths;
  • Auto reload when generator or targets changed;
  • Support managing targets in a hierarchy way;
  • Throttle parallel execution and cache the result for Python script;
  • Support Sentry APM.

Installation

pip install prometheus-http-sd

Usage

First, you need a directory, everything in this directory will be used to generate targets for prometheus-http-sd.

$ mkdir targets

In this directory, every file is called a target "generator":

  • Filename that ending with .json will be exposed directly
  • Filename that ending with .yaml will be exposed directly
  • Filename that ending with .py must include a generate_targets() function, the function will be run, and it must return a TargetList (Type helper in prometheus_http_sd.targets.)
  • Filename that starts with _ will be ignored, so you can have some python utils there, for e.g. _utils/__init__.py that you can import in you generate_targets()
  • Filename that starts with . (hidden file in Linux) will also be ignored

Let write our first target generator by yaml, put this into your targets/first_target.yaml:

---
- targets:
    - "10.1.1.9:9100"
    - "10.1.1.10:9100"
  labels:
    job: node
    datacenter: nyc
    group: g1
- targets:
    - "10.2.1.9:9100"
    - "10.2.1.10:9100"
  labels:
    job: node
    datacenter: sg
    group: g2

If you use json, the data structure is the same, just in Json format.

The Python Target Generator

Let's put another generator using Python:

Put this into your targets/by_python.py:

def generate_targets(**extra_parameters):
  return [{"targets": ["10.1.1.22:2379"], "labels": {"app": "etcd"}}]

Then you can run prometheus-http-sd serve -h 0.0.0.0 -p 8080 /tmp/targets, prometheus-http-sd will start to expose targets at: http://0.0.0.0:8080/targets

The -h and -p is optional, defaults to 127.0.0.1 and 8080.

$ prometheus-http-sd serve /tmp/targets # replace this to your target path
[2022-07-24 00:52:03,896] {wasyncore.py:486} INFO - Serving on http://127.0.0.1:8080

If you run curl http://127.0.0.1:8080/targets you will get:

{"targets": "10.1.1.22:2379", "labels": {"app": "etcd"}}

Finally, you can tell your Prometheus to find targets under http://127.0.0.1:8080/targets, by adding this into your Prometheus config:

scrape_configs:
  - job_name: "etcd"
    http_sd_config:
      url: http://127.0.0.1:8080/targets/

The Python target generator also support URL query params. You can check the params in your generate_targets() function.

For example:

def generate_targets(**params):
  cluster = params.get("cluster")
  return [{"targets": ["10.1.1.22:2379"], "labels": {"app": "etcd", "cluster": cluster}}]

Then curl http://127.0.0.1:8080/targets?cluster=us1 you will get:

{"targets": "10.1.1.22:2379", "labels": {"app": "etcd", "cluster": "us1"}}

Python Target Generator Cache and Throttle

Support you have 10 Prometheus instance request http-sd for targets every minutes, for Python script target generator, it doesn't make sense that the same script run 10 times in every minute, instead, it should run only once, and use this result to respond for 10 Prometheus instances.

prometheus-http-sd has cache and throttle by default, that means:

  • At any time there is only one python script running
  • The result will be cached for 1minute (This means that every script at max will be only running one time per minute, and your target update will delay at most 1 minute)

Manage prometheus-http-sd by systemd

Just put this file under /lib/systemd/system/http-sd.service (remember to change your installation path and root_dir path):

# /lib/systemd/system/http-sd.service
[Unit]
Description=Prometheus HTTP SD Service
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
ExecStart=/opt/httpsd_env/bin/prometheus-http-sd serve \
    -h 0.0.0.0                                         \
    -p 8080                                            \
    /opt/httpsd_targets

Restart=always
RestartSec=90

[Install]
WantedBy=multi-user.target

Admin Page

You can open the root path, http://127.0.0.1:8080/ in this example, and you will see all of the available paths list in the admin page.

Serve under a different root path

If you put prometheus-http-sd behind a reverse proxy like Nginx, like this:

location /http_sd/ {
      proxy_pass http://prometheus_http_sd;
}

Then you need to tell prometheus_http_sd to serve all HTTP requests under this path, by using the --url_prefix /http_sd cli option, (or -r /http_sd for short).

Change Certificate

By default, prometheus-http-sd has caching capabilities for Python targets to avoid server crashes due to too many queries.

    +------------+
    |            |
    |            |
    |            |                  +-----------+
    |  Caller 1  +----+             |           |
    |            |    |             |           |
    |            |    |             |           |
    |            |    |             |           |
    +------------+    |             |           |
                      |             |           |
                      |             |           |
    +------------+    |             |           |                  +----------+
    |            |    |             |           |                  |          |
    |            |    | call at the |           | only single call |          |
    |            |    |  same time  |  Timeout  |    to the back   |          |
    |  Caller 2  +----|------------>+   Cache   +----------------->+ Function |
    |            |    |             |           |                  |          |
    |            |    |             |           |                  |          |
    |            |    |             |           |                  |          |
    +------------+    |             |           |                  +----------+
                      |             |           |
                      |             |           |
    +------------+    |             |           |
    |            |    |             |           |
    |            |    |             |           |
    |            |    |             |           |
    |  Caller 3  +----+             |           |
    |            |                  +-----------+
    |            |
    |            |
    +------------+

To change this behavior, you can use the option --cache-type to change the cache behavior.

Also, you can use the option --cache-opt to change the variable. For example:

prometheus-http-sd serve       \
    -h 0.0.0.0                 \
    -p 8080                    \
    --cache-type="Timeout"     \
    --cache-opt="timeout=360"  \
    /opt/httpsd_targets

Timeout

This is the default value, It will cache the result or exception from the target function.

  • timeout=<seconds>: function timeout. if exceed, raise TimeoutException (in sec).
  • cache_time=<seconds>: after function return normally, how long should we cache the result (in sec).
  • cache_exception_time=<seconds>: after function return incorrectly, how long should we cache the exception (in sec).
  • name=<str>: prometheus_client metrics prefix
  • garbage_collection_count=<seconds>: garbage collection threshold
  • garbage_collection_interval=<seconds>: the second to avoid collection too often.
  • copy_response=<bool>: if true, use copy.deepcopy on the response from the target function.

None

This is a dummy function if you don't need any cache method.

Sentry APM

You can use the option --sentry-url <you-sentry-url> (or -s <your-sentry-url>) to enable Sentry APM.

The Exception from user's script will be sent to Sentry.

Define your targets

Your target generator

Please see the Usage to know how to define your generator.

The Target Path

prometheus-http-sd support sub-pathes.

For example, if we use prometheus-http-sd serve gateway, and the gateway directory's structure is as follows:

gateway
├── nginx
│   ├── edge.py
│   └── targets.json
└── targets.json

Then:

  • /targets/gateway will return the targets from:
    • gateway/nginx/edge.py
    • gateway/nginx/targets.json
    • gateway/targets.json
  • /targets/gateway/nginx will return the targets from:
    • gateway/nginx/edge.py
    • gateway/nginx/targets.json

This is very useful when you use vertical scaling. Say you have 5 Prometheus instances, and you want each one of them scrape for different targets, then you can use the sub-path feature of prometheus-http-sd.

For example, in one Prometheus's scrape config:

scrape_configs:
  - job_name: "nginx"
    http_sd_config:
      url: http://prometheus-http-sd:8080/targets/nginx

  - job_name: "etcd"
    http_sd_config:
      url: http://prometheus-http-sd:8080/targets/etcd

And in another one:

scrape_configs:
  - job_name: "nginx"
    http_sd_config:
      url: http://prometheus-http-sd:8080/targets/database

  - job_name: "etcd"
    http_sd_config:
      url: http://prometheus-http-sd:8080/targets/application

Overwriting job_name labels

You may want to put all of etcd targets in one generator, including port 2379 for etcd metrics and 9100 for node_exporter metrics of the etcd server. But the job_name setting was based on per URL.

The trick is that, you can overwrite the job label in the target labels, like this:

---
- targets:
    - "10.1.1.9:9100"
  labels:
    job: node
    datacenter: nyc
    group: g1
- targets:
    - "10.1.1.9:2379"
  labels:
    job: etcd
    datacenter: nyc
    group: g1

Check and Validate your Targets

You can use prometheus-http-sd check command to test your targets dir. It will run all of you generators, validate the targets, and print the targets count that each generator generates.

$ prometheus-http-sd check test/test_generator/root
[2022-08-06 00:50:11,095] {validate.py:16} INFO - Run generator test/test_generator/root/json/target.json, took 0.0011398792266845703s, generated 1 targets.
[2022-08-06 00:50:11,100] {validate.py:16} INFO - Run generator test/test_generator/root/yaml/target.yaml, took 0.0043718814849853516s, generated 2 targets.
[2022-08-06 00:50:11,100] {validate.py:22} INFO - Done! Generated {total_targets} in total.

It's a good idea to use prometheus-http-sd check in your CI system to validate your targets generator scripts and target files.

For Python script, prometheus-http-sd check command will run generate_targets in each script, without any params. However, you can overwrite the check logic by providing a function called test_generate_targets()(without any function args), then check will run test_generate_targets instead. (So you can call generate_targets(foo="bar") to set the test logic of your own.

Script Dependencies

If you want your scripts to use some other python library, just install them into the same virtualenv that you install prometheus-http-sd, so that prometheus-http-sd can import them.

Update Your Scripts

If you want to update your script file or target json file, just upload and overwrite with your new version, it will take effect immediately after you making changes, there is no need to restart prometheus-http-sd, prometheus-http-sd will read the file (or reload the python script) every time serving a request.

It is worth noting that restarting is safe because if Prometheus failed to get the target list via HTTP request, it won't update its current target list to empty, instead, it will keep using the current list.

Prometheus caches target lists. If an error occurs while fetching an updated targets list, Prometheus keeps using the current targets list.

For the same reason, if there are 3 scripts under /targets/mysystem and only one failed for a request, prometheus-http-sd will return a HTTP 500 Error for the whole request instead of returning the partial targets from the other two scripts.

Also for the same reason, if your script met any error, you should throw out Exception all the way to the top instead of catch it in your script and return a null TargetList, if you return a null TargetList, prometheus-http-sd will think that your script run successfully and empty the target list as well.

You can notice this error from stdout logs or /metrics from prometheus-http-sd.

Debug Your Scripts

Monolithic Mode

Debug script latency.

You can add ?debug=true at the end of your target url to see the time cost of each generator of a path.

For example:

curl http://127.0.0.1:8080/targets/echo_target\?debug\=true
{"generator_run_seconds":{"./test/app_root/echo_target/sleep2_target.py":2.005011796951294,"./test/app_root/echo_target/sleep_target.py":3.00480318069458,"./test/app_root/echo_target/target.py":0.0009987354278564453}}

Redis-Based Architecture

For the Redis-based architecture (server + workers), the debug feature provides detailed information about job status, errors, and cache state.

Quick Start:

# Add ?debug=true to see job processing status and error details
curl http://127.0.0.1:8080/targets/my-service?debug=true

What Debug Shows:

  • Job processing status (pending, processing, completed)
  • Error details with full tracebacks when generation fails
  • Cache status and timing information
  • Suggestions for troubleshooting

For detailed documentation, see Debug Guide.

Redis-Based Architecture

The Redis-based architecture splits the application into separate server and worker components for better scalability and reliability.

Key Features:

  • Horizontal scaling of servers and workers
  • Persistent job queue with Redis
  • Cache-based target serving
  • Detailed debug information
  • Comprehensive metrics

Quick Start:

# Terminal 1: Start Redis
redis-server

# Terminal 2: Start server
prometheus-http-sd server-only \
  --host 0.0.0.0 \
  --port 8080 \
  --redis-url redis://localhost:6379/0 \
  /path/to/targets

# Terminal 3: Start workers
prometheus-http-sd worker-only \
  --num-workers 4 \
  --redis-url redis://localhost:6379/0 \
  /path/to/targets

Documentation:

Debugging:

  • Add ?debug=true to requests for job status and error details
  • Monitor Prometheus metrics for cache hits/misses
  • Check Redis queue lengths for bottlenecks
  • See Debug Guide for detailed troubleshooting

Best Practice

You can use a git repository to manage your target generator.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prometheus_http_sd-3.0.6.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prometheus_http_sd-3.0.6-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file prometheus_http_sd-3.0.6.tar.gz.

File metadata

  • Download URL: prometheus_http_sd-3.0.6.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for prometheus_http_sd-3.0.6.tar.gz
Algorithm Hash digest
SHA256 063e5cf0e2fdd3dc0c352a9001d6c3fff3e834b53ccb142ac6b59860ec7773a5
MD5 17b2cadd71b9761c0ba0f4d6ca91a242
BLAKE2b-256 7e80017a9146a7c386ff47d575ea2ac380ff0d03bb2ae1c93ac14253f4a41378

See more details on using hashes here.

File details

Details for the file prometheus_http_sd-3.0.6-py3-none-any.whl.

File metadata

  • Download URL: prometheus_http_sd-3.0.6-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for prometheus_http_sd-3.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 33f6a7f9260411b274beeab2dbda6a4dbcccebe9c2dd826c7fe63ee0c82f0afd
MD5 7566a1343b02f29fa51422696bdb0b6e
BLAKE2b-256 25d1930f7b8a451252364cc309ffbb49afe3ba369ab0043e021322b38a9839e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page