Skip to main content

ClickHouse collector and storage for DetectK

Project description

detectk-collectors-clickhouse

ClickHouse collector and storage for DetectK.

Installation

pip install detectk-collectors-clickhouse

Features

  • ClickHouseCollector: Collect metrics from ClickHouse queries
  • ClickHouseStorage: Store metric history in ClickHouse (dtk_datapoints and dtk_detections tables)
  • Auto-registration in DetectK registries
  • Connection pooling and error handling
  • Partitioned tables for performance

Usage

As Collector

# config.yaml
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

detector:
  type: "threshold"
  params:
    threshold: 1000

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

As Storage

# config.yaml
storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: false  # Optional: save detection results

Multiple Detectors (A/B Testing)

# config.yaml - Compare multiple detection strategies
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

# Multiple detectors with auto-generated IDs
detectors:
  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 3.0
    # ID auto-generated: e.g., "a1b2c3d4"

  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 5.0
    # ID auto-generated: e.g., "b2c3d4e5" (different from above)

  - id: "zscore_7d"  # Manual ID override
    type: "zscore"
    params:
      window_size: "7 days"

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: true  # Save all detector results for comparison

How it works:

  • Each detector gets a unique ID (auto-generated 8-char hash or manual)
  • All detector results are saved to dtk_detections with their detector_id
  • Alert sent if ANY detector finds anomaly (configurable in future)
  • Query results: SELECT * FROM dtk_detections WHERE metric_name = 'sessions_10min' ORDER BY detected_at, detector_id

Configuration

Collector Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • query: SQL query returning value and optionally timestamp columns
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)

Storage Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)
  • save_detections: Save detection results to dtk_detections table (default: false)

Storage Schema

dtk_datapoints

Collected metric values (required for detection):

CREATE TABLE dtk_datapoints (
    id UInt64,
    metric_name String,
    collected_at DateTime64(3),
    value Float64,
    context String  -- JSON string
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(collected_at)
ORDER BY (metric_name, collected_at);

dtk_detections

Detection results (optional, for audit/cooldown):

CREATE TABLE dtk_detections (
    id UInt64,
    metric_name String,
    detector_id String,  -- Unique detector identifier (for multi-detector support)
    detected_at DateTime64(3),
    value Float64,
    is_anomaly UInt8,
    anomaly_score Nullable(Float64),
    lower_bound Nullable(Float64),
    upper_bound Nullable(Float64),
    direction Nullable(String),
    percent_deviation Nullable(Float64),
    detector_type String,
    detector_params String,  -- JSON with full params for transparency
    alert_sent UInt8,
    alert_reason Nullable(String),
    alerter_type Nullable(String),
    context String  -- JSON
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(detected_at)
ORDER BY (metric_name, detector_id, detected_at);

Multi-Detector Support: The detector_id field allows storing results from multiple detectors for the same metric. Each detector gets a unique ID (auto-generated or manual), enabling A/B testing and parameter tuning.

Tables are created automatically on first use.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectk_collectors_clickhouse-0.1.2.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detectk_collectors_clickhouse-0.1.2-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file detectk_collectors_clickhouse-0.1.2.tar.gz.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.2.tar.gz
Algorithm Hash digest
SHA256 41bf1e4aa694c24b27d5aec1d161a251e740e12bc8d0d41e95ce4c897073d9f4
MD5 55528b2247505b9bdc3d471a1ee7651b
BLAKE2b-256 b127556903141a86e5392b1f415b7fea1360374af2b04bcd1042409053e5a226

See more details on using hashes here.

File details

Details for the file detectk_collectors_clickhouse-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ee5c5e10c3357373d709039e58877430caaa1e53e8140ef4e835eb8d8e917a9
MD5 986f9ef4b15a149dab6a8ce6edff7dad
BLAKE2b-256 3e08acdb32a97f5b00e83111431e75c137ce25e13b28eaae70cbd88bb67dd2f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page