Skip to main content

ClickHouse collector and storage for DetectK

Project description

detectk-collectors-clickhouse

ClickHouse collector and storage for DetectK.

Installation

pip install detectk-collectors-clickhouse

Features

  • ClickHouseCollector: Collect metrics from ClickHouse queries
  • ClickHouseStorage: Store metric history in ClickHouse (dtk_datapoints and dtk_detections tables)
  • Auto-registration in DetectK registries
  • Connection pooling and error handling
  • Partitioned tables for performance

Usage

As Collector

# config.yaml
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

detector:
  type: "threshold"
  params:
    threshold: 1000

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

As Storage

# config.yaml
storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: false  # Optional: save detection results

Multiple Detectors (A/B Testing)

# config.yaml - Compare multiple detection strategies
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

# Multiple detectors with auto-generated IDs
detectors:
  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 3.0
    # ID auto-generated: e.g., "a1b2c3d4"

  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 5.0
    # ID auto-generated: e.g., "b2c3d4e5" (different from above)

  - id: "zscore_7d"  # Manual ID override
    type: "zscore"
    params:
      window_size: "7 days"

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: true  # Save all detector results for comparison

How it works:

  • Each detector gets a unique ID (auto-generated 8-char hash or manual)
  • All detector results are saved to dtk_detections with their detector_id
  • Alert sent if ANY detector finds anomaly (configurable in future)
  • Query results: SELECT * FROM dtk_detections WHERE metric_name = 'sessions_10min' ORDER BY detected_at, detector_id

Configuration

Collector Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • query: SQL query returning value and optionally timestamp columns
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)

Storage Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)
  • save_detections: Save detection results to dtk_detections table (default: false)

Storage Schema

dtk_datapoints

Collected metric values (required for detection):

CREATE TABLE dtk_datapoints (
    id UInt64,
    metric_name String,
    collected_at DateTime64(3),
    value Float64,
    context String  -- JSON string
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(collected_at)
ORDER BY (metric_name, collected_at);

dtk_detections

Detection results (optional, for audit/cooldown):

CREATE TABLE dtk_detections (
    id UInt64,
    metric_name String,
    detector_id String,  -- Unique detector identifier (for multi-detector support)
    detected_at DateTime64(3),
    value Float64,
    is_anomaly UInt8,
    anomaly_score Nullable(Float64),
    lower_bound Nullable(Float64),
    upper_bound Nullable(Float64),
    direction Nullable(String),
    percent_deviation Nullable(Float64),
    detector_type String,
    detector_params String,  -- JSON with full params for transparency
    alert_sent UInt8,
    alert_reason Nullable(String),
    alerter_type Nullable(String),
    context String  -- JSON
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(detected_at)
ORDER BY (metric_name, detector_id, detected_at);

Multi-Detector Support: The detector_id field allows storing results from multiple detectors for the same metric. Each detector gets a unique ID (auto-generated or manual), enabling A/B testing and parameter tuning.

Tables are created automatically on first use.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectk_collectors_clickhouse-0.1.4.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detectk_collectors_clickhouse-0.1.4-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file detectk_collectors_clickhouse-0.1.4.tar.gz.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.4.tar.gz
Algorithm Hash digest
SHA256 26a7621e6821c2ba8ff10f5f09a01320d2e0f441c1cacd548e182b28b8d6210f
MD5 c2f5114cfa88e3dc5e623da15c969dcf
BLAKE2b-256 b977823d2be33d2310fb9727e0f1b7134a0828d6fbc76eb809ea0ad32d9d197e

See more details on using hashes here.

File details

Details for the file detectk_collectors_clickhouse-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f11e93bc232770a27bf056fada5c2189b6c51ba711a102b9845b00683f85677a
MD5 f729eca40f2382175e34d1c96a1ad9af
BLAKE2b-256 ee8da9572abcb7e9a0d5478ff5b0b5769125db5c78c65f8b36910f5f2aec29f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page