Skip to main content

ClickHouse collector and storage for DetectK

Project description

detectk-collectors-clickhouse

ClickHouse collector and storage for DetectK.

Installation

pip install detectk-collectors-clickhouse

Features

  • ClickHouseCollector: Collect metrics from ClickHouse queries
  • ClickHouseStorage: Store metric history in ClickHouse (dtk_datapoints and dtk_detections tables)
  • Auto-registration in DetectK registries
  • Connection pooling and error handling
  • Partitioned tables for performance

Usage

As Collector

# config.yaml
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: |
      SELECT
        count() as value,
        now() as timestamp
      FROM sessions
      WHERE timestamp > now() - INTERVAL 10 MINUTE

detector:
  type: "threshold"
  params:
    threshold: 1000

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

As Storage

# config.yaml
storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: false  # Optional: save detection results

Multiple Detectors (A/B Testing)

# config.yaml - Compare multiple detection strategies
name: "sessions_10min"

collector:
  type: "clickhouse"
  params:
    host: "localhost"
    database: "analytics"
    query: "SELECT count() as value, now() as timestamp FROM sessions"

# Multiple detectors with auto-generated IDs
detectors:
  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 3.0
    # ID auto-generated: e.g., "a1b2c3d4"

  - type: "mad"
    params:
      window_size: "30 days"
      n_sigma: 5.0
    # ID auto-generated: e.g., "b2c3d4e5" (different from above)

  - id: "zscore_7d"  # Manual ID override
    type: "zscore"
    params:
      window_size: "7 days"

alerter:
  type: "mattermost"
  params:
    webhook_url: "${MATTERMOST_WEBHOOK}"

storage:
  enabled: true
  type: "clickhouse"
  params:
    host: "localhost"
    database: "default"
    save_detections: true  # Save all detector results for comparison

How it works:

  • Each detector gets a unique ID (auto-generated 8-char hash or manual)
  • All detector results are saved to dtk_detections with their detector_id
  • Alert sent if ANY detector finds anomaly (configurable in future)
  • Query results: SELECT * FROM dtk_detections WHERE metric_name = 'sessions_10min' ORDER BY detected_at, detector_id

Configuration

Collector Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • query: SQL query returning value and optionally timestamp columns
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)

Storage Parameters

  • host: ClickHouse server host (default: localhost)
  • port: ClickHouse server port (default: 9000)
  • database: Database name (default: default)
  • user: Username (optional)
  • password: Password (optional)
  • timeout: Query timeout in seconds (default: 30)
  • secure: Use SSL connection (default: false)
  • save_detections: Save detection results to dtk_detections table (default: false)

Storage Schema

dtk_datapoints

Collected metric values (required for detection):

CREATE TABLE dtk_datapoints (
    id UInt64,
    metric_name String,
    collected_at DateTime64(3),
    value Float64,
    context String  -- JSON string
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(collected_at)
ORDER BY (metric_name, collected_at);

dtk_detections

Detection results (optional, for audit/cooldown):

CREATE TABLE dtk_detections (
    id UInt64,
    metric_name String,
    detector_id String,  -- Unique detector identifier (for multi-detector support)
    detected_at DateTime64(3),
    value Float64,
    is_anomaly UInt8,
    anomaly_score Nullable(Float64),
    lower_bound Nullable(Float64),
    upper_bound Nullable(Float64),
    direction Nullable(String),
    percent_deviation Nullable(Float64),
    detector_type String,
    detector_params String,  -- JSON with full params for transparency
    alert_sent UInt8,
    alert_reason Nullable(String),
    alerter_type Nullable(String),
    context String  -- JSON
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(detected_at)
ORDER BY (metric_name, detector_id, detected_at);

Multi-Detector Support: The detector_id field allows storing results from multiple detectors for the same metric. Each detector gets a unique ID (auto-generated or manual), enabling A/B testing and parameter tuning.

Tables are created automatically on first use.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detectk_collectors_clickhouse-0.1.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detectk_collectors_clickhouse-0.1.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file detectk_collectors_clickhouse-0.1.0.tar.gz.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a2f6ea7cb6450cb6c73cc256709e4e7040d566cadeee0ff066aefd8e62f68d44
MD5 f5abf5340f25a03c3e13e190a5406d6c
BLAKE2b-256 c321b425b596b3d907c44466219ee3543b95b5036b123d1c4dc7328ca121d220

See more details on using hashes here.

File details

Details for the file detectk_collectors_clickhouse-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for detectk_collectors_clickhouse-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a23887faead1eb8a9316d39f7fdf21d1e7d1d5695483a423055db1b215d155b6
MD5 8dcc3fdb4035e57eb55a40f395e07004
BLAKE2b-256 1b7a7c2252ae8c760e6d33252d63e15c44f4f5241d55fb929213d4e75966a8c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page