Skip to main content

ClickHouse collector and storage for DetectK

Project description

detectk-collectors-clickhouse

ClickHouse collector and storage for DetectK.

Installation

pip install detectk-collectors-clickhouse

Features

  • ClickHouseCollector: Collect metrics from ClickHouse queries
  • ClickHouseStorage: Store metric history in ClickHouse (dtk_datapoints and dtk_detections tables)
  • Auto-registration in DetectK registries
  • Connection pooling and error handling
  • Partitioned tables for performance

Usage

As Collector

# config.yaml
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

detector:
  type: "threshold"
  params:
    threshold: 1000

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

As Storage

# config.yaml
storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: false  # Optional: save detection results

Multiple Detectors (A/B Testing)

# config.yaml - Compare multiple detection strategies
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

# Multiple detectors with auto-generated IDs
detectors:
  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 3.0
    # ID auto-generated: e.g., "a1b2c3d4"

  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 5.0
    # ID auto-generated: e.g., "b2c3d4e5" (different from above)

  - id: "zscore_7d"  # Manual ID override
    type: "zscore"
    params:
      window_size: "7 days"

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: true  # Save all detector results for comparison

How it works:

  • Each detector gets a unique ID (auto-generated 8-char hash or manual)
  • All detector results are saved to dtk_detections with their detector_id
  • Alert sent if ANY detector finds anomaly (configurable in future)
  • Query results: SELECT * FROM dtk_detections WHERE metric_name = 'sessions_10min' ORDER BY detected_at, detector_id

Configuration

Collector Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • query: SQL query returning value and optionally timestamp columns
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)

Storage Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)
  • save_detections: Save detection results to dtk_detections table (default: false)

Storage Schema

dtk_datapoints

Collected metric values (required for detection):

CREATE TABLE dtk_datapoints (
    id UInt64,
    metric_name String,
    collected_at DateTime64(3),
    value Float64,
    context String  -- JSON string
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(collected_at)
ORDER BY (metric_name, collected_at);

dtk_detections

Detection results (optional, for audit/cooldown):

CREATE TABLE dtk_detections (
    id UInt64,
    metric_name String,
    detector_id String,  -- Unique detector identifier (for multi-detector support)
    detected_at DateTime64(3),
    value Float64,
    is_anomaly UInt8,
    anomaly_score Nullable(Float64),
    lower_bound Nullable(Float64),
    upper_bound Nullable(Float64),
    direction Nullable(String),
    percent_deviation Nullable(Float64),
    detector_type String,
    detector_params String,  -- JSON with full params for transparency
    alert_sent UInt8,
    alert_reason Nullable(String),
    alerter_type Nullable(String),
    context String  -- JSON
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(detected_at)
ORDER BY (metric_name, detector_id, detected_at);

Multi-Detector Support: The detector_id field allows storing results from multiple detectors for the same metric. Each detector gets a unique ID (auto-generated or manual), enabling A/B testing and parameter tuning.

Tables are created automatically on first use.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectk_collectors_clickhouse-0.1.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detectk_collectors_clickhouse-0.1.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file detectk_collectors_clickhouse-0.1.1.tar.gz.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.1.tar.gz
Algorithm Hash digest
SHA256 84889c2b6d7b015b7997907be8334d3395b7a58263bc1d885b9b079132e45b26
MD5 23fe0d4d5dddbca2f7d8da9ef85386d6
BLAKE2b-256 12d3f9bc64cc62b0aa6359908e04fa7879bac6f3e4ae848e7e0336cccee12efc

See more details on using hashes here.

File details

Details for the file detectk_collectors_clickhouse-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 42d3c1698cd797edd80d157a0d8e0d633c3ef082e67b35ecad676409538d1dc1
MD5 f22715461f4de8c032d2b7da134a1dfc
BLAKE2b-256 9355f1230f867ce10fdaa1119eb72c8485baa82d9afb69cf493e608677131c08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page