Streaming log intelligence agent — detects operational failures and security threats with online ML

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Seerflow

A streaming, entity-centric log intelligence agent that detects operational failures and security threats across log sources. Combines traditional ML (fast, cheap) for bulk detection with Sigma rules (3,000+ community detections) for known threat patterns.

Status

Alpha — Full ingestion + detection + Sigma rules pipeline operational.

Installation

From PyPI (recommended — fastest path)

pip install seerflow
# or, with uv:
uv pip install seerflow

The wheel bundles the pre-built React dashboard, the 63 curated Sigma rules, and every runtime dependency. No build step, no Node toolchain required.

From source

git clone https://github.com/seerflow/seerflow.git
cd seerflow
uv sync

Use the source install for development or to run the latest unreleased changes. See CONTRIBUTING.md for the full dev setup.

Quick Start

Zero to first alert in under 5 minutes

These steps take well under 5 minutes on a fresh machine — no Docker, no database, no config file, no tuning required (Seerflow runs zero-config with sensible defaults — NFR-006):

Install (~30s): pip install seerflow
Start the pipeline (~10s): seerflow start — boots receivers, detection engines, and the dashboard with built-in defaults (SQLite, syslog + OTLP + webhooks). No config file needed for the first run.
Send a log line that trips an anomaly or Sigma rule (~1m, see the syslog example below) and watch the WARNING ANOMALY ... / WARNING SIGMA ... line print to the console and the alert appear in the dashboard at http://127.0.0.1:8080/.

Total: well inside 5 minutes — install dominates, detection is instant once events flow. Drop in a seerflow.yaml (copy seerflow.example.yaml) only when you want to change ports, backends, or detector tuning.

# Install from source
git clone https://github.com/seerflow/seerflow.git
cd seerflow
uv sync

# Copy and edit the example config
cp seerflow.example.yaml seerflow.yaml

# Start the pipeline (also serves the React dashboard)
uv run python -m seerflow start
# → React dashboard:  http://127.0.0.1:8080/
# → REST API:         http://127.0.0.1:8080/api/v1/
# → WebSocket stream: ws://127.0.0.1:8080/api/v1/ws

A single seerflow start boots the receivers, detection engines, and the FastAPI dashboard on dashboard_port (default 8080). No second uvicorn process is required — the wheel ships the built React assets and the CLI mounts them via the same FastAPI app that exposes /api/v1/*.

Command Line

# Start with default config (seerflow.yaml in current directory)
uv run python -m seerflow start

# Start with a specific config file
uv run python -m seerflow --config /path/to/seerflow.yaml start

# Show version
uv run python -m seerflow --version

Inspect loaded detection rules

# List everything
uv run python -m seerflow rules list

# Only rules tagged with a MITRE technique (prefix match includes sub-techniques)
uv run python -m seerflow rules list --technique T1053

# Filter by tactic (name or ATT&CK ID)
uv run python -m seerflow rules list --tactic persistence
uv run python -m seerflow rules list --tactic TA0003

# JSON for scripting
uv run python -m seerflow rules list --format json

Docker

# Build and run with SQLite defaults (zero config)
docker compose up -d

# Run with PostgreSQL (set password first)
export POSTGRES_PASSWORD=your-secure-password
docker compose --profile postgres up -d

# Or run standalone from a registry image
docker run -p 8080:8080 -p 4317:4317 -p 514:514/udp seerflow/seerflow

# Mount a custom config
docker run -v ./seerflow.yaml:/app/seerflow.yaml:ro seerflow/seerflow

What It Does

Ingests logs from multiple sources simultaneously (syslog, OTLP gRPC/HTTP, file tailing, webhooks)
Parses each log line with Drain3 (template extraction) and regex entity extraction (IPs, users, hosts, files, domains, processes)
Resolves entities to deterministic UUID5 IDs for cross-source correlation
Scores events with an ML ensemble: Half-Space Trees (content), Holt-Winters (volume), CUSUM (change), Markov chains (sequence) -- blended with z-normalization
Thresholds scores with biDSPOT (EVT-based auto-threshold -- no manual tuning)
Evaluates 63 bundled Sigma rules (Linux, web, DNS, process, network) with MITRE ATT&CK tagging
Graphs entity relationships with igraph -- PageRank, Louvain, fan-out, betweenness centrality
Accumulates per-entity risk with exponential decay -- catches slow-burn multi-step attacks
Alerts on anomalies, Sigma matches, and risk threshold exceedances
Persists all events, alerts, graph edges, and ML model state to SQLite

Example: Detect Anomalies in Syslog

# seerflow.yaml
receivers:
  syslog_enabled: true
  syslog_udp_port: 5514       # use high port to avoid root
  otlp_grpc_enabled: false
  otlp_http_enabled: false
  webhook_enabled: false

detection:
  hst_window_size: 100         # lower for faster calibration
  dspot:
    calibration_window: 200
    risk_level: 0.01           # more sensitive for testing

# Terminal 1: Start Seerflow
uv run python -m seerflow start

# Terminal 2: Send normal traffic
for i in $(seq 1 300); do
    echo "<134>1 2026-03-24T19:00:00Z web nginx $i - - GET /api/v$((i%5)) 200 ${i}ms" \
        | nc -u -w1 127.0.0.1 5514
done

# Terminal 2: Send anomalies
echo '<11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100' \
    | nc -u -w1 127.0.0.1 5514

Output:

INFO Seerflow 0.3.0 starting
INFO Receivers: syslog
INFO Pipeline running — Ctrl+C to stop
WARNING ANOMALY [syslog] score=0.952 threshold=0.009 dir=upper
WARNING   template: [7] <*> <*> postgres <*> - - FATAL connection limit exceeded <*>
WARNING   message:  <11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100
WARNING   entities: 203.0.113.1

Shutdown Summary

Press Ctrl+C to see session stats:

INFO --- Session Summary ---
INFO   Events processed: 312
INFO   Anomalies detected: 10
INFO   Unique templates: 7
INFO   Duration: 45.3s
INFO   Throughput: 7 events/sec
INFO Seerflow stopped

Configuration

See SETTINGS.md for the complete configuration reference.

All settings are optional -- Seerflow runs with sensible defaults (zero-config).

Key config sections:

receivers -- syslog, OTLP gRPC/HTTP, file tailing, webhooks (enable/disable + ports)
detection -- HST window size, DSPOT calibration, scoring weights, custom Sigma rule directories
storage -- SQLite (default) or PostgreSQL
alerting -- dedup window, webhook/PagerDuty targets

Receivers

Receiver	Port	Protocol	Status
Syslog UDP/TCP	514 (5514)	RFC 5424/3164	Done
OTLP gRPC	4317	Protobuf	Done
OTLP HTTP	4318	Protobuf + JSON	Done
File tailing	--	Glob + watchfiles	Done
Webhooks	8081	JSON/form + auth	Done

Validation

Seerflow is validated by running the full detection stack (Drain3 -> ML ensemble -> Sigma -> UEBA -> IoC -> correlation -- the exact seerflow start wiring via assemble_handler, S-305/FR-073) against a synthetic LANL subset (~200 events modelled on the LANL Unified Host and Network Dataset, committed at tests/fixtures/lanl/). This exercises the real product, not a correlation-only shortcut. The numbers below are honestly scoped to this small synthetic subset -- online/cold-start detectors (ML/UEBA) warm up but rarely fire on so few events, which the per-family breakdown in the generated report makes explicit.

Metric	Value
Precision	16.67%
Recall	33.33%
F1 score	22.22%
False-positive rate	83.33%
AUC	0.0000
Events processed	137

Attack-level metrics (FR-079 / S-311) -- per red-team scenario mean-time-to-detect, precision-recall + ROC curves, AUC over a risk-score threshold sweep, and the silent detector family for every missed red-team event -- are emitted by the full report (python -m seerflow.lanl.report) and by seerflow validate <dir> --json. On this small synthetic subset only the C2-beaconing scenario is detected; the brute-force and credential-stuffing scenarios are missed by the (cold-start) stack and attributed to the correlation family in the JSON output.

Numbers are derived from the full-stack harness, not hand-maintained -- a drift test (tests/integration/test_lanl_report_drift.py) fails if this table diverges from run_validation(). Scope: synthetic LANL subset, full detection stack (not "end-to-end on the full LANL dataset").

Reproduce locally:

uv run pytest tests/integration/test_lanl_validation.py -v
uv run python -m seerflow.lanl.report

To run against the full LANL 2015 dataset: download it through LANL's self-service token gate with tools/download_lanl.sh --email you@example.com (prompts if you omit the email), then seerflow validate data/lanl — use the streaming API for the full ~1.6B-event set. Step-by-step walkthrough, dataset schema, and additional tests: documents/testing-seerflow-against-lanl.md.

Architecture

flowchart LR
    SRC["Log Sources<br/>(syslog · OTLP · files · webhooks)"] --> RCV[Receivers]
    RCV --> DRAIN["Drain3<br/>template extraction"]
    DRAIN --> ENT["UUID5 Entity<br/>Resolution"]
    ENT --> ML["ML Ensemble<br/>HST · Holt-Winters · CUSUM · Markov"]
    ENT --> SIGMA["Sigma Engine<br/>63 rules"]
    ENT --> GRAPH["Entity Graph<br/>igraph · PageRank · Louvain"]

    ML -->|blended score 0.0–1.0| RISK["Risk Accumulation<br/>per-entity decay register"]
    SIGMA -->|MITRE tactic/technique| RISK
    GRAPH -->|centrality / fan-out| RISK
    ML --> ALERT([Alert])
    SIGMA --> ALERT
    RISK -->|threshold exceeded| ALERT
    ALERT --> STORE[("SQLite / PostgreSQL")]

Drain3: Streaming log template extraction (120K msgs/sec)
UUID5 Entity Resolution: Deterministic cross-source entity IDs (same entity = same UUID)
Half-Space Trees: Content anomaly detection via River (constant time/memory)
Holt-Winters: Volume anomaly detection (trend + seasonal decomposition)
CUSUM: Change-point detection (bidirectional cumulative sum)
Markov Chains: Sequence anomaly detection (per-entity transition matrices)
biDSPOT: Bidirectional EVT auto-threshold (upper spikes + lower drops)
DetectionEnsemble: Orchestrates all detectors + blended scoring per source
Sigma Engine: 63 bundled SigmaHQ rules with logsource-indexed dispatch
Entity Graph: igraph-backed relationship graph with typed edges + 6 algorithms
Risk Accumulation: Per-entity risk register with exponential decay + configurable threshold
Sliding Window: Per-entity event buffer with watermark-based late arrival tolerance

Development

Requires Python 3.11+ and uv.

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run quality gates
uv run ruff check . && uv run ruff format --check . && uv run mypy src/ && uv run bandit -r src/ -c pyproject.toml && uv run pytest --cov=src/seerflow --cov-fail-under=95

Project Structure

src/seerflow/
    __main__.py      # CLI entry point (config → pipeline → detection → storage)
    cli.py           # argparse (--config, --version)
    config.py        # YAML config loader with ${ENV_VAR} interpolation
    models/          # SeerflowEvent, Alert, entity structs (msgspec)
    storage/
        protocols.py # Protocol interfaces (LogStore, AlertStore, ModelStore, EntityStore)
        sqlite.py    # SQLite backend (WAL, FTS5, WriteBuffer)
        migrations.py # Schema versioning + forward-only migration runner
    receivers/
        base.py      # RawEvent dataclass, Receiver protocol
        manager.py   # ReceiverManager (bounded queue, backpressure, shutdown)
        syslog.py    # UDP/TCP syslog (RFC 5424/3164)
        otlp_grpc.py # OTLP gRPC receiver (protobuf LogRecord)
        otlp_http.py # OTLP HTTP receiver (/v1/logs, protobuf + JSON)
        file_tail.py # File tailing (glob, rotation, checkpoint)
        webhook.py   # Webhooks (JSON/form, field mapping, auth)
    parsing/
        drain.py     # Drain3 wrapper for template extraction
        entities.py  # Regex entity extraction (6 types, params-aware tagging)
        normalizer.py # EventNormalizer: RawEvent → SeerflowEvent
    detection/
        protocols.py # Detector Protocol (score, learn, serialize, deserialize)
        hst.py       # Half-Space Trees detector (River)
        threshold.py # biDSPOT auto-threshold (scipy GPD)
        ensemble.py  # DetectionEnsemble orchestrator (4 detectors + blended scoring)
    sigma/
        engine.py    # SigmaEngine: rule loading, logsource dispatch, evaluation
        matcher.py   # Custom detection matcher (condition tree walker, regex cache)
        pipeline.py  # pySigma processing pipeline (22 field mappings)
        attack.py    # MITRE ATT&CK tactic/technique extraction
        bundled.py   # Bundled rule path discovery (importlib.resources)
        loader.py    # Custom rule directory discovery + validation
        rules/       # 63 curated SigmaHQ YAML rules (linux, web, dns, process, network)
    graph/
        entity_graph.py # igraph wrapper: vertices, edges, queries, algorithms
        edges.py     # Typed edge inference from entity pairs
        algorithms.py # PageRank, Louvain, fan-out, fan-in, betweenness, ego-graph
    correlation/
        window.py    # Per-entity sliding window buffer (deque, LRU eviction)
        watermark.py # Watermark-based late arrival tolerance
        risk.py      # Risk accumulation with exponential decay
    pipeline/
        handler.py   # Event handler: parse → detect → graph → correlate → store
        run.py       # Pipeline runner (config → receivers → handler → storage)
tests/
    unit/            # 1200+ unit tests
    integration/     # Integration tests (pipeline, graph, correlation, real SQLite)
    benchmarks/      # Throughput benchmarks (pytest-benchmark, CI history tracking)

Benchmarks

Benchmarks are produced by a committed, runnable harness — not hand-typed. Reproduce the full-pipeline benchmark on your own hardware:

python -m seerflow.launch.benchmark --count 20000 --markdown

Component micro-benchmarks (pytest-benchmark, CI history tracking):

uv run pytest tests/benchmarks/ --benchmark-autosave
uv run pytest tests/benchmarks/ --benchmark-compare

Representative measured figures (commodity hardware, synthetic syslog workload — your numbers will differ; reproduce with the command above):

Component	Throughput
Syslog parse	~561K msgs/sec
Drain3 templates	~120K msgs/sec
Entity extraction	~41K msgs/sec
Full normalizer	~39.5K msgs/sec
Full pipeline (parse + ML + Sigma + storage)	measured by `python -m seerflow.launch.benchmark`

Detection quality (precision / recall / F1 / FP-rate) is validated separately — see Validation.

How Seerflow Compares

Seerflow is not a SIEM replacement — it is a lightweight, streaming anomaly + detection layer you can run in minutes. Comparison is category-level (deployment posture and approach), not a feature-for-feature scorecard:

Dimension	Seerflow	Wazuh	OpenSearch	Splunk
Primary model	Streaming entity-centric anomaly + rule detection	Host-agent XDR / SIEM	Search + analytics engine (security analytics plugin)	Log analytics + SIEM platform
Deployment	Single `pip install`, zero-config, SQLite default	Manager + agents + indexer (Elastic stack)	Cluster (data/manager nodes) + Dashboards	Indexers + search heads (self-host or cloud)
Detection approach	Online ML ensemble (HST/Holt-Winters/CUSUM/Markov) + 3,000+ Sigma rules	Signature/rule + FIM + rootcheck	Query + correlation / anomaly-detection plugin (batch ML)	SPL queries + correlation searches + premium ES app
Streaming / online learning	Yes — constant-memory online detectors, no batch retrain	Limited (rule-based)	Batch / scheduled detectors	Batch search; ML via paid add-on
Sigma rule support	Native (pySigma, logsource-indexed dispatch)	Partial / via integrations	Via third-party conversion	Via third-party conversion
Footprint	Megabytes; one process	Multi-component cluster	JVM cluster	Heavy; indexer cluster
Cost posture	Open source (AGPL-3.0), no per-GB pricing	Open source	Open source (Apache-2.0)	Commercial, ingest-volume priced

Numbers and tiers for Wazuh, OpenSearch and Splunk reflect their general product categories at the time of writing and are deliberately not version-pinned; consult their docs for specifics.

Contributing

Contributions are welcome. See CONTRIBUTING.md for the development setup, quality gates, branching model, and pull-request process.

License

AGPL-3.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

fflores

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.7.0

Jun 3, 2026

0.6.1

May 21, 2026

0.5.1

May 13, 2026

0.4.0

Mar 31, 2026

0.3.0

Mar 29, 2026

0.2.0

Mar 25, 2026

0.1.0

Mar 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seerflow-0.7.0.tar.gz (2.0 MB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seerflow-0.7.0-py3-none-any.whl (601.6 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file seerflow-0.7.0.tar.gz.

File metadata

Download URL: seerflow-0.7.0.tar.gz
Upload date: Jun 3, 2026
Size: 2.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seerflow-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`5baaa7e480fd195c50a73682fffe9864a30c3e481c86241db6d7374c7efa577f`
MD5	`b9e66c1fe2875dd8208b748cbee77b6a`
BLAKE2b-256	`db3ffd18628227b49c75d5d5edeca9801d85dc7b6be2072681e93d026dd64584`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seerflow-0.7.0.tar.gz:

Publisher: release.yml on seerflow/seerflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seerflow-0.7.0.tar.gz
- Subject digest: 5baaa7e480fd195c50a73682fffe9864a30c3e481c86241db6d7374c7efa577f
- Sigstore transparency entry: 1712498575
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: seerflow/seerflow@8452ecbb609e518fa3c3f99f591deabbb0c4ff42
- Branch / Tag: refs/heads/main
- Owner: https://github.com/seerflow
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8452ecbb609e518fa3c3f99f591deabbb0c4ff42
- Trigger Event: push

File details

Details for the file seerflow-0.7.0-py3-none-any.whl.

File metadata

Download URL: seerflow-0.7.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 601.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seerflow-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b2f25d9eaaa48d1492d582f7a4a460cf0fa16fd205d7a2efc6f6e1afc67b9774`
MD5	`4103f5d4d67663f7841eb778fd59de69`
BLAKE2b-256	`133f64706d124e5e07af89980bb96516d9264b53b227958e81e1450768e96b10`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seerflow-0.7.0-py3-none-any.whl:

Publisher: release.yml on seerflow/seerflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seerflow-0.7.0-py3-none-any.whl
- Subject digest: b2f25d9eaaa48d1492d582f7a4a460cf0fa16fd205d7a2efc6f6e1afc67b9774
- Sigstore transparency entry: 1712498633
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: seerflow/seerflow@8452ecbb609e518fa3c3f99f591deabbb0c4ff42
- Branch / Tag: refs/heads/main
- Owner: https://github.com/seerflow
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8452ecbb609e518fa3c3f99f591deabbb0c4ff42
- Trigger Event: push

seerflow 0.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Seerflow

Status

Installation

From PyPI (recommended — fastest path)

From source

Quick Start

Zero to first alert in under 5 minutes

Command Line

Inspect loaded detection rules

Docker

What It Does

Example: Detect Anomalies in Syslog

Shutdown Summary

Configuration

Receivers

Validation

Architecture

Development

Project Structure

Benchmarks

How Seerflow Compares

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance