Streaming log intelligence agent — detects operational failures and security threats with online ML
Project description
Seerflow
A streaming, entity-centric log intelligence agent that detects operational failures and security threats across log sources. Combines traditional ML (fast, cheap) for bulk detection with LLMs (accurate, explanatory) for edge cases and root cause analysis.
Status
Alpha — Full ingestion + detection pipeline operational.
Quick Start
# Install from source
git clone https://github.com/seerflow/seerflow.git
cd seerflow
uv sync
# Copy and edit the example config
cp seerflow.example.yaml seerflow.yaml
# Start the pipeline
uv run python -m seerflow start
Command Line
# Start with default config (seerflow.yaml in current directory)
uv run python -m seerflow start
# Start with a specific config file
uv run python -m seerflow --config /path/to/seerflow.yaml start
# Show version
uv run python -m seerflow --version
Docker
# Build and run with SQLite defaults (zero config)
docker compose up -d
# Run with PostgreSQL (set password first)
export POSTGRES_PASSWORD=your-secure-password
docker compose --profile postgres up -d
# Or run standalone from a registry image
docker run -p 8080:8080 -p 4317:4317 -p 514:514/udp seerflow/seerflow
# Mount a custom config
docker run -v ./seerflow.yaml:/app/seerflow.yaml:ro seerflow/seerflow
What It Does
- Ingests logs from multiple sources simultaneously (syslog, OTLP gRPC/HTTP, file tailing, webhooks)
- Parses each log line with Drain3 (template extraction) and regex entity extraction (IPs, users, hosts, files, domains, processes)
- Scores events using Half-Space Trees (streaming ML anomaly detection)
- Thresholds scores with biDSPOT (EVT-based auto-threshold -- no manual tuning)
- Alerts on anomalies with template, entities, and score details
- Persists all events to SQLite for later analysis
Example: Detect Anomalies in Syslog
# seerflow.yaml
receivers:
syslog_enabled: true
syslog_udp_port: 5514 # use high port to avoid root
otlp_grpc_enabled: false
otlp_http_enabled: false
webhook_enabled: false
detection:
hst_window_size: 100 # lower for faster calibration
dspot:
calibration_window: 200
risk_level: 0.01 # more sensitive for testing
# Terminal 1: Start Seerflow
uv run python -m seerflow start
# Terminal 2: Send normal traffic
for i in $(seq 1 300); do
echo "<134>1 2026-03-24T19:00:00Z web nginx $i - - GET /api/v$((i%5)) 200 ${i}ms" \
| nc -u -w1 127.0.0.1 5514
done
# Terminal 2: Send anomalies
echo '<11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100' \
| nc -u -w1 127.0.0.1 5514
Output:
INFO Seerflow 0.1.0 starting
INFO Receivers: syslog
INFO Pipeline running — Ctrl+C to stop
WARNING ANOMALY [syslog] score=0.952 threshold=0.009 dir=upper
WARNING template: [7] <*> <*> postgres <*> - - FATAL connection limit exceeded <*>
WARNING message: <11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100
WARNING entities: 203.0.113.1
Shutdown Summary
Press Ctrl+C to see session stats:
INFO --- Session Summary ---
INFO Events processed: 312
INFO Anomalies detected: 10
INFO Unique templates: 7
INFO Duration: 45.3s
INFO Throughput: 7 events/sec
INFO Seerflow stopped
Configuration
See SETTINGS.md for the complete configuration reference.
All settings are optional -- Seerflow runs with sensible defaults (zero-config).
Key config sections:
- receivers -- syslog, OTLP gRPC/HTTP, file tailing, webhooks (enable/disable + ports)
- detection -- HST window size, DSPOT calibration, scoring weights
- storage -- SQLite (default) or PostgreSQL
- alerting -- dedup window, webhook/PagerDuty targets
Receivers
| Receiver | Port | Protocol | Status |
|---|---|---|---|
| Syslog UDP/TCP | 514 (5514) | RFC 5424/3164 | Done |
| OTLP gRPC | 4317 | Protobuf | Done |
| OTLP HTTP | 4318 | Protobuf + JSON | Done |
| File tailing | -- | Glob + watchfiles | Done |
| Webhooks | 8081 | JSON/form + auth | Done |
Detection Pipeline
Log Sources → Receivers → Drain3 Parser → Entity Extractor → HST Scorer → biDSPOT Threshold → Alert
↓ ↓ ↓ ↓
template_id IPs, users anomaly score is_anomaly?
template_str hosts, files [0.0 - 1.0] upper/lower
template_params domains, procs
- Drain3: Streaming log template extraction (120K msgs/sec)
- Half-Space Trees: Online ML anomaly detection via River (constant time/memory)
- biDSPOT: Bidirectional EVT auto-threshold (upper spikes + lower drops)
- DetectionEnsemble: Orchestrates detectors + thresholds per source
Development
Requires Python 3.13+ and uv.
# Install dependencies
uv sync
# Run tests
uv run pytest
# Run quality gates
uv run ruff check . && uv run ruff format --check . && uv run mypy src/ && uv run bandit -r src/ -c pyproject.toml && uv run pytest --cov=src/seerflow --cov-fail-under=90
Project Structure
src/seerflow/
__main__.py # CLI entry point (config → pipeline → detection → storage)
cli.py # argparse (--config, --version)
config.py # YAML config loader with ${ENV_VAR} interpolation
pipeline.py # Pipeline builder + consumer loop
models/ # SeerflowEvent, Alert, entity structs (msgspec)
storage/
protocols.py # Protocol interfaces (LogStore, AlertStore, ModelStore, EntityStore)
sqlite.py # SQLite backend (WAL, FTS5, WriteBuffer)
receivers/
base.py # RawEvent dataclass, Receiver protocol
manager.py # ReceiverManager (bounded queue, backpressure, shutdown)
syslog.py # UDP/TCP syslog (RFC 5424/3164)
otlp_grpc.py # OTLP gRPC receiver (protobuf LogRecord)
otlp_http.py # OTLP HTTP receiver (/v1/logs, protobuf + JSON)
file_tail.py # File tailing (glob, rotation, checkpoint)
webhook.py # Webhooks (JSON/form, field mapping, auth)
parsing/
drain.py # Drain3 wrapper for template extraction
entities.py # Regex entity extraction (6 types)
normalizer.py # EventNormalizer: RawEvent → SeerflowEvent
detection/
protocols.py # Detector Protocol (score, learn, serialize, deserialize)
hst.py # Half-Space Trees detector (River)
threshold.py # biDSPOT auto-threshold (scipy GPD)
ensemble.py # DetectionEnsemble orchestrator
tests/
unit/ # 670+ unit tests
integration/ # Integration tests (multi-source, real SQLite)
benchmarks/ # Throughput benchmarks (pytest-benchmark)
Benchmarks
uv run pytest tests/benchmarks/ --benchmark-autosave
uv run pytest tests/benchmarks/ --benchmark-compare
| Component | Throughput |
|---|---|
| Syslog parse | ~561K msgs/sec |
| Drain3 templates | ~120K msgs/sec |
| Entity extraction | ~41K msgs/sec |
| Full normalizer | ~39.5K msgs/sec |
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seerflow-0.3.0.tar.gz.
File metadata
- Download URL: seerflow-0.3.0.tar.gz
- Upload date:
- Size: 440.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a59f4cf0e851755d0ba9a05dad773c68a7ec28964c50b53bd8e699bcfab83cef
|
|
| MD5 |
945e07d031f8945ab2c7565c3b2c4e00
|
|
| BLAKE2b-256 |
382c437063ce4de64e87b2c854b68def4c6bdd27661896e27ae002e00a71ed86
|
Provenance
The following attestation bundles were made for seerflow-0.3.0.tar.gz:
Publisher:
release.yml on seerflow/seerflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seerflow-0.3.0.tar.gz -
Subject digest:
a59f4cf0e851755d0ba9a05dad773c68a7ec28964c50b53bd8e699bcfab83cef - Sigstore transparency entry: 1194784401
- Sigstore integration time:
-
Permalink:
seerflow/seerflow@186d6a9d9af0e12bb3671234224f72eadbe5635d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/seerflow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@186d6a9d9af0e12bb3671234224f72eadbe5635d -
Trigger Event:
push
-
Statement type:
File details
Details for the file seerflow-0.3.0-py3-none-any.whl.
File metadata
- Download URL: seerflow-0.3.0-py3-none-any.whl
- Upload date:
- Size: 143.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f971b3eb6ed23aeb7e9b11325be7a1314c691e9dd9a8077698b965771a20a122
|
|
| MD5 |
0fa7353c191bb17d7270c5990904e85c
|
|
| BLAKE2b-256 |
8f0889b5eae2d1ddf923e8bde6eefd05017803944c8bae1ee36b38244f59941f
|
Provenance
The following attestation bundles were made for seerflow-0.3.0-py3-none-any.whl:
Publisher:
release.yml on seerflow/seerflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
seerflow-0.3.0-py3-none-any.whl -
Subject digest:
f971b3eb6ed23aeb7e9b11325be7a1314c691e9dd9a8077698b965771a20a122 - Sigstore transparency entry: 1194784564
- Sigstore integration time:
-
Permalink:
seerflow/seerflow@186d6a9d9af0e12bb3671234224f72eadbe5635d -
Branch / Tag:
refs/heads/main - Owner: https://github.com/seerflow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@186d6a9d9af0e12bb3671234224f72eadbe5635d -
Trigger Event:
push
-
Statement type: