ClickHouse collector and storage for DetectK
Project description
detectk-collectors-clickhouse
ClickHouse collector and storage for DetectK.
Installation
pip install detectk-collectors-clickhouse
Features
- ClickHouseCollector: Collect metrics from ClickHouse queries
- ClickHouseStorage: Store metric history in ClickHouse (
dtk_datapointsanddtk_detectionstables) - Auto-registration in DetectK registries
- Connection pooling and error handling
- Partitioned tables for performance
Usage
As Collector
# config.yaml
name: "sessions_10min"
collector:
type: "clickhouse"
params:
host: "localhost"
database: "analytics"
query: |
SELECT
toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
count() as value
FROM sessions
WHERE timestamp >= toDateTime('{{ period_start }}')
AND timestamp < toDateTime('{{ period_finish }}')
detector:
type: "threshold"
params:
threshold: 1000
alerter:
type: "mattermost"
params:
webhook_url: "${MATTERMOST_WEBHOOK}"
As Storage
# config.yaml
storage:
enabled: true
type: "clickhouse"
params:
host: "localhost"
database: "default"
save_detections: false # Optional: save detection results
Multiple Detectors (A/B Testing)
# config.yaml - Compare multiple detection strategies
name: "sessions_10min"
collector:
type: "clickhouse"
params:
host: "localhost"
database: "analytics"
query: |
SELECT
toStartOfInterval(toDateTime('{{ period_finish }}'), INTERVAL 10 MINUTE) as period_time,
count() as value
FROM sessions
WHERE timestamp >= toDateTime('{{ period_start }}')
AND timestamp < toDateTime('{{ period_finish }}')
# Multiple detectors with auto-generated IDs
detectors:
- type: "mad"
params:
window_size: "30 days"
n_sigma: 3.0
# ID auto-generated: e.g., "a1b2c3d4"
- type: "mad"
params:
window_size: "30 days"
n_sigma: 5.0
# ID auto-generated: e.g., "b2c3d4e5" (different from above)
- id: "zscore_7d" # Manual ID override
type: "zscore"
params:
window_size: "7 days"
alerter:
type: "mattermost"
params:
webhook_url: "${MATTERMOST_WEBHOOK}"
storage:
enabled: true
type: "clickhouse"
params:
host: "localhost"
database: "default"
save_detections: true # Save all detector results for comparison
How it works:
- Each detector gets a unique ID (auto-generated 8-char hash or manual)
- All detector results are saved to
dtk_detectionswith theirdetector_id - Alert sent if ANY detector finds anomaly (configurable in future)
- Query results:
SELECT * FROM dtk_detections WHERE metric_name = 'sessions_10min' ORDER BY detected_at, detector_id
Configuration
Collector Parameters
host: ClickHouse server host (default: localhost)port: ClickHouse server port (default: 9000)database: Database name (default: default)user: Username (optional)password: Password (optional)query: SQL query returningvalueand optionallytimestampcolumnstimeout: Query timeout in seconds (default: 30)secure: Use SSL connection (default: false)
Storage Parameters
host: ClickHouse server host (default: localhost)port: ClickHouse server port (default: 9000)database: Database name (default: default)user: Username (optional)password: Password (optional)timeout: Query timeout in seconds (default: 30)secure: Use SSL connection (default: false)save_detections: Save detection results todtk_detectionstable (default: false)
Storage Schema
dtk_datapoints
Collected metric values (required for detection):
CREATE TABLE dtk_datapoints (
id UInt64,
metric_name String,
collected_at DateTime64(3),
value Float64,
context String -- JSON string
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(collected_at)
ORDER BY (metric_name, collected_at);
dtk_detections
Detection results (optional, for audit/cooldown):
CREATE TABLE dtk_detections (
id UInt64,
metric_name String,
detector_id String, -- Unique detector identifier (for multi-detector support)
detected_at DateTime64(3),
value Float64,
is_anomaly UInt8,
anomaly_score Nullable(Float64),
lower_bound Nullable(Float64),
upper_bound Nullable(Float64),
direction Nullable(String),
percent_deviation Nullable(Float64),
detector_type String,
detector_params String, -- JSON with full params for transparency
alert_sent UInt8,
alert_reason Nullable(String),
alerter_type Nullable(String),
context String -- JSON
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(detected_at)
ORDER BY (metric_name, detector_id, detected_at);
Multi-Detector Support: The detector_id field allows storing results from multiple detectors for the same metric. Each detector gets a unique ID (auto-generated or manual), enabling A/B testing and parameter tuning.
Tables are created automatically on first use.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file detectk_collectors_clickhouse-0.1.4.tar.gz.
File metadata
- Download URL: detectk_collectors_clickhouse-0.1.4.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26a7621e6821c2ba8ff10f5f09a01320d2e0f441c1cacd548e182b28b8d6210f
|
|
| MD5 |
c2f5114cfa88e3dc5e623da15c969dcf
|
|
| BLAKE2b-256 |
b977823d2be33d2310fb9727e0f1b7134a0828d6fbc76eb809ea0ad32d9d197e
|
File details
Details for the file detectk_collectors_clickhouse-0.1.4-py3-none-any.whl.
File metadata
- Download URL: detectk_collectors_clickhouse-0.1.4-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f11e93bc232770a27bf056fada5c2189b6c51ba711a102b9845b00683f85677a
|
|
| MD5 |
f729eca40f2382175e34d1c96a1ad9af
|
|
| BLAKE2b-256 |
ee8da9572abcb7e9a0d5478ff5b0b5769125db5c78c65f8b36910f5f2aec29f2
|