Skip to main content

Streaming log intelligence agent — detects operational failures and security threats with online ML

Project description

Seerflow

A streaming, entity-centric log intelligence agent that detects operational failures and security threats across log sources. Combines traditional ML (fast, cheap) for bulk detection with Sigma rules (3,000+ community detections) for known threat patterns.

Status

Alpha — Full ingestion + detection + Sigma rules pipeline operational.

CI PyPI Python 3.13+ License: AGPL-3.0

Quick Start

# Install from source
git clone https://github.com/seerflow/seerflow.git
cd seerflow
uv sync

# Copy and edit the example config
cp seerflow.example.yaml seerflow.yaml

# Start the pipeline
uv run python -m seerflow start

Command Line

# Start with default config (seerflow.yaml in current directory)
uv run python -m seerflow start

# Start with a specific config file
uv run python -m seerflow --config /path/to/seerflow.yaml start

# Show version
uv run python -m seerflow --version

Docker

# Build and run with SQLite defaults (zero config)
docker compose up -d

# Run with PostgreSQL (set password first)
export POSTGRES_PASSWORD=your-secure-password
docker compose --profile postgres up -d

# Or run standalone from a registry image
docker run -p 8080:8080 -p 4317:4317 -p 514:514/udp seerflow/seerflow

# Mount a custom config
docker run -v ./seerflow.yaml:/app/seerflow.yaml:ro seerflow/seerflow

What It Does

  1. Ingests logs from multiple sources simultaneously (syslog, OTLP gRPC/HTTP, file tailing, webhooks)
  2. Parses each log line with Drain3 (template extraction) and regex entity extraction (IPs, users, hosts, files, domains, processes)
  3. Resolves entities to deterministic UUID5 IDs for cross-source correlation
  4. Scores events with an ML ensemble: Half-Space Trees (content), Holt-Winters (volume), CUSUM (change), Markov chains (sequence) -- blended with z-normalization
  5. Thresholds scores with biDSPOT (EVT-based auto-threshold -- no manual tuning)
  6. Evaluates 63 bundled Sigma rules (Linux, web, DNS, process, network) with MITRE ATT&CK tagging
  7. Graphs entity relationships with igraph -- PageRank, Louvain, fan-out, betweenness centrality
  8. Accumulates per-entity risk with exponential decay -- catches slow-burn multi-step attacks
  9. Alerts on anomalies, Sigma matches, and risk threshold exceedances
  10. Persists all events, alerts, graph edges, and ML model state to SQLite

Example: Detect Anomalies in Syslog

# seerflow.yaml
receivers:
  syslog_enabled: true
  syslog_udp_port: 5514       # use high port to avoid root
  otlp_grpc_enabled: false
  otlp_http_enabled: false
  webhook_enabled: false

detection:
  hst_window_size: 100         # lower for faster calibration
  dspot:
    calibration_window: 200
    risk_level: 0.01           # more sensitive for testing
# Terminal 1: Start Seerflow
uv run python -m seerflow start

# Terminal 2: Send normal traffic
for i in $(seq 1 300); do
    echo "<134>1 2026-03-24T19:00:00Z web nginx $i - - GET /api/v$((i%5)) 200 ${i}ms" \
        | nc -u -w1 127.0.0.1 5514
done

# Terminal 2: Send anomalies
echo '<11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100' \
    | nc -u -w1 127.0.0.1 5514

Output:

INFO Seerflow 0.3.0 starting
INFO Receivers: syslog
INFO Pipeline running — Ctrl+C to stop
WARNING ANOMALY [syslog] score=0.952 threshold=0.009 dir=upper
WARNING   template: [7] <*> <*> postgres <*> - - FATAL connection limit exceeded <*>
WARNING   message:  <11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100
WARNING   entities: 203.0.113.1

Shutdown Summary

Press Ctrl+C to see session stats:

INFO --- Session Summary ---
INFO   Events processed: 312
INFO   Anomalies detected: 10
INFO   Unique templates: 7
INFO   Duration: 45.3s
INFO   Throughput: 7 events/sec
INFO Seerflow stopped

Configuration

See SETTINGS.md for the complete configuration reference.

All settings are optional -- Seerflow runs with sensible defaults (zero-config).

Key config sections:

  • receivers -- syslog, OTLP gRPC/HTTP, file tailing, webhooks (enable/disable + ports)
  • detection -- HST window size, DSPOT calibration, scoring weights, custom Sigma rule directories
  • storage -- SQLite (default) or PostgreSQL
  • alerting -- dedup window, webhook/PagerDuty targets

Receivers

Receiver Port Protocol Status
Syslog UDP/TCP 514 (5514) RFC 5424/3164 Done
OTLP gRPC 4317 Protobuf Done
OTLP HTTP 4318 Protobuf + JSON Done
File tailing -- Glob + watchfiles Done
Webhooks 8081 JSON/form + auth Done

Detection Pipeline

Log Sources → Receivers → Drain3 → UUID5 Entities → ML Ensemble → Sigma Rules
                                        ↓                ↓              ↓
                                  Entity Graph      blended score   ATT&CK tags
                                  Window Buffer     [0.0 - 1.0]    tactic/technique
                                  Risk Register         ↓              ↓
                                        ↓          Risk Accumulation → Alert
                                  PageRank, Louvain
                                  Fan-out, Betweenness
  • Drain3: Streaming log template extraction (120K msgs/sec)
  • UUID5 Entity Resolution: Deterministic cross-source entity IDs (same entity = same UUID)
  • Half-Space Trees: Content anomaly detection via River (constant time/memory)
  • Holt-Winters: Volume anomaly detection (trend + seasonal decomposition)
  • CUSUM: Change-point detection (bidirectional cumulative sum)
  • Markov Chains: Sequence anomaly detection (per-entity transition matrices)
  • biDSPOT: Bidirectional EVT auto-threshold (upper spikes + lower drops)
  • DetectionEnsemble: Orchestrates all detectors + blended scoring per source
  • Sigma Engine: 63 bundled SigmaHQ rules with logsource-indexed dispatch
  • Entity Graph: igraph-backed relationship graph with typed edges + 6 algorithms
  • Risk Accumulation: Per-entity risk register with exponential decay + configurable threshold
  • Sliding Window: Per-entity event buffer with watermark-based late arrival tolerance

Development

Requires Python 3.13+ and uv.

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run quality gates
uv run ruff check . && uv run ruff format --check . && uv run mypy src/ && uv run bandit -r src/ -c pyproject.toml && uv run pytest --cov=src/seerflow --cov-fail-under=95

Project Structure

src/seerflow/
    __main__.py      # CLI entry point (config → pipeline → detection → storage)
    cli.py           # argparse (--config, --version)
    config.py        # YAML config loader with ${ENV_VAR} interpolation
    models/          # SeerflowEvent, Alert, entity structs (msgspec)
    storage/
        protocols.py # Protocol interfaces (LogStore, AlertStore, ModelStore, EntityStore)
        sqlite.py    # SQLite backend (WAL, FTS5, WriteBuffer)
        migrations.py # Schema versioning + forward-only migration runner
    receivers/
        base.py      # RawEvent dataclass, Receiver protocol
        manager.py   # ReceiverManager (bounded queue, backpressure, shutdown)
        syslog.py    # UDP/TCP syslog (RFC 5424/3164)
        otlp_grpc.py # OTLP gRPC receiver (protobuf LogRecord)
        otlp_http.py # OTLP HTTP receiver (/v1/logs, protobuf + JSON)
        file_tail.py # File tailing (glob, rotation, checkpoint)
        webhook.py   # Webhooks (JSON/form, field mapping, auth)
    parsing/
        drain.py     # Drain3 wrapper for template extraction
        entities.py  # Regex entity extraction (6 types, params-aware tagging)
        normalizer.py # EventNormalizer: RawEvent → SeerflowEvent
    detection/
        protocols.py # Detector Protocol (score, learn, serialize, deserialize)
        hst.py       # Half-Space Trees detector (River)
        threshold.py # biDSPOT auto-threshold (scipy GPD)
        ensemble.py  # DetectionEnsemble orchestrator (4 detectors + blended scoring)
    sigma/
        engine.py    # SigmaEngine: rule loading, logsource dispatch, evaluation
        matcher.py   # Custom detection matcher (condition tree walker, regex cache)
        pipeline.py  # pySigma processing pipeline (22 field mappings)
        attack.py    # MITRE ATT&CK tactic/technique extraction
        bundled.py   # Bundled rule path discovery (importlib.resources)
        loader.py    # Custom rule directory discovery + validation
        rules/       # 63 curated SigmaHQ YAML rules (linux, web, dns, process, network)
    graph/
        entity_graph.py # igraph wrapper: vertices, edges, queries, algorithms
        edges.py     # Typed edge inference from entity pairs
        algorithms.py # PageRank, Louvain, fan-out, fan-in, betweenness, ego-graph
    correlation/
        window.py    # Per-entity sliding window buffer (deque, LRU eviction)
        watermark.py # Watermark-based late arrival tolerance
        risk.py      # Risk accumulation with exponential decay
    pipeline/
        handler.py   # Event handler: parse → detect → graph → correlate → store
        run.py       # Pipeline runner (config → receivers → handler → storage)
tests/
    unit/            # 1200+ unit tests
    integration/     # Integration tests (pipeline, graph, correlation, real SQLite)
    benchmarks/      # Throughput benchmarks (pytest-benchmark, CI history tracking)

Benchmarks

uv run pytest tests/benchmarks/ --benchmark-autosave
uv run pytest tests/benchmarks/ --benchmark-compare
Component Throughput
Syslog parse ~561K msgs/sec
Drain3 templates ~120K msgs/sec
Entity extraction ~41K msgs/sec
Full normalizer ~39.5K msgs/sec
Full pipeline (parse + ML + Sigma + storage) ~1,800 events/sec

License

AGPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seerflow-0.4.0.tar.gz (466.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seerflow-0.4.0-py3-none-any.whl (158.0 kB view details)

Uploaded Python 3

File details

Details for the file seerflow-0.4.0.tar.gz.

File metadata

  • Download URL: seerflow-0.4.0.tar.gz
  • Upload date:
  • Size: 466.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seerflow-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d8061cdb6b25b5cc07f3d5b3a9a50fcc9c4355d74554427664bb17f4762a6262
MD5 88c41e6b431bc08b000634d34a7d1825
BLAKE2b-256 db39f7a0165792b0e1dd679f8bc44bacbbce1e44eb0688d9af2176cd9049f223

See more details on using hashes here.

Provenance

The following attestation bundles were made for seerflow-0.4.0.tar.gz:

Publisher: release.yml on seerflow/seerflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seerflow-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: seerflow-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 158.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for seerflow-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb32a9d0f976c41abd5f53af55940005a243108dedc9181f163aa8f98eb4b0d8
MD5 ac4a0b5147a3158c212231053878e2bc
BLAKE2b-256 f81fdb812cd8b0467912b34ce4b52260129ec27370685e7fcb3e05e3352c70ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for seerflow-0.4.0-py3-none-any.whl:

Publisher: release.yml on seerflow/seerflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page