Skip to main content

ClickHouse collector and storage for DetectK

Project description

detectk-collectors-clickhouse

ClickHouse collector and storage for DetectK.

Installation

pip install detectk-collectors-clickhouse

Features

  • ClickHouseCollector: Collect metrics from ClickHouse queries
  • ClickHouseStorage: Store metric history in ClickHouse (dtk_datapoints and dtk_detections tables)
  • Auto-registration in DetectK registries
  • Connection pooling and error handling
  • Partitioned tables for performance

Usage

As Collector

# config.yaml
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

detector:
  type: "threshold"
  params:
    threshold: 1000

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

As Storage

# config.yaml
storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: false  # Optional: save detection results

Multiple Detectors (A/B Testing)

# config.yaml - Compare multiple detection strategies
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
        count() as value
      FROM sessions
      WHERE timestamp >= toDateTime('{{ period_start }}')
        AND timestamp < toDateTime('{{ period_finish }}')

# Multiple detectors with auto-generated IDs
detectors:
  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 3.0
    # ID auto-generated: e.g., "a1b2c3d4"

  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 5.0
    # ID auto-generated: e.g., "b2c3d4e5" (different from above)

  - id: "zscore_7d"  # Manual ID override
    type: "zscore"
    params:
      window_size: "7 days"

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: true  # Save all detector results for comparison

How it works:

  • Each detector gets a unique ID (auto-generated 8-char hash or manual)
  • All detector results are saved to dtk_detections with their detector_id
  • Alert sent if ANY detector finds anomaly (configurable in future)
  • Query results: SELECT * FROM dtk_detections WHERE metric_name = 'sessions_10min' ORDER BY detected_at, detector_id

Configuration

Collector Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • query: SQL query returning value and optionally timestamp columns
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)

Storage Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)
  • save_detections: Save detection results to dtk_detections table (default: false)

Storage Schema

dtk_datapoints

Collected metric values (required for detection):

CREATE TABLE dtk_datapoints (
    id UInt64,
    metric_name String,
    collected_at DateTime64(3),
    value Float64,
    context String  -- JSON string
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(collected_at)
ORDER BY (metric_name, collected_at);

dtk_detections

Detection results (optional, for audit/cooldown):

CREATE TABLE dtk_detections (
    id UInt64,
    metric_name String,
    detector_id String,  -- Unique detector identifier (for multi-detector support)
    detected_at DateTime64(3),
    value Float64,
    is_anomaly UInt8,
    anomaly_score Nullable(Float64),
    lower_bound Nullable(Float64),
    upper_bound Nullable(Float64),
    direction Nullable(String),
    percent_deviation Nullable(Float64),
    detector_type String,
    detector_params String,  -- JSON with full params for transparency
    alert_sent UInt8,
    alert_reason Nullable(String),
    alerter_type Nullable(String),
    context String  -- JSON
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(detected_at)
ORDER BY (metric_name, detector_id, detected_at);

Multi-Detector Support: The detector_id field allows storing results from multiple detectors for the same metric. Each detector gets a unique ID (auto-generated or manual), enabling A/B testing and parameter tuning.

Tables are created automatically on first use.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectk_collectors_clickhouse-0.1.3.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detectk_collectors_clickhouse-0.1.3-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file detectk_collectors_clickhouse-0.1.3.tar.gz.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4d920c8fc14eeccf2f0fef99e026e4c9a778627aaded03e427c65bc339ed37a9
MD5 ab8e0fdcfe291426621d6e3d8572a7e6
BLAKE2b-256 a27d633f78cd843abfde3ff82791f36a89a2ee336ca2e516d3a746df0149ccb2

See more details on using hashes here.

File details

Details for the file detectk_collectors_clickhouse-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 38b7394c6bf2fc2c07e17a7c2364a43b21e105c692238f1a051a31d126d3812a
MD5 c16e563008239cad1d14bc05325fc449
BLAKE2b-256 fb4bedcc89c671e2c6d1d8a8349d6345dded4af0accaacc5ba15d6e34e9fff9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page