Skip to main content

AI-powered alert triage summarizer for SOC teams

Project description

sift

  ____ ___ _____ _____
 / ___|_ _|  ___|_   _|
 \___ \| || |_    | |
  ___) | ||  _|   | |
 |____/___|_|     |_|

AI-Powered Alert Triage Summarizer for SOC Teams

sift ingests raw security alerts, deduplicates and clusters related events, scores them by priority, and delivers a structured triage summary — with optional AI-generated analysis. Part of the barb → vex → sift SOC workflow trilogy.


Features

  • Ingest alerts from generic JSON, Splunk exports, or CSV
  • Deduplicate noisy alert streams before analysis
  • Extract IOCs (IPs, domains, hashes, URLs) from alert fields automatically
  • Cluster related alerts by IOC overlap, category + time window, or IP-pair correlation
  • Score clusters across five priority tiers: NOISE / LOW / MEDIUM / HIGH / CRITICAL
  • AI summarization via Anthropic Claude, OpenAI, Ollama (local), or template-based with no LLM required
  • Rich terminal output with priority-colored cluster table
  • Export to JSON, CSV, or STIX 2.1 for downstream tooling
  • Filter clusters using a boolean DSL (--filter 'priority >= HIGH AND ...')
  • Enrich IOCs via barb (phishing URL analysis) and vex (VirusTotal reputation) with --enrich
  • Cache triage results by input fingerprint with --cache (opt-in, 1h TTL)
  • Validate LLM output schema and detect prompt injection attacks
  • sift metrics <file> command for cluster and IOC distribution statistics
  • sift doctor diagnostics to verify configuration, LLM connectivity, and dependencies
  • PyPI version check on startup

Installation

pip install sift-triage

Optional extras:

# LLM summarization (Anthropic + OpenAI)
pip install "sift-triage[llm]"

# IOC enrichment via barb/vex
pip install "sift-triage[enrich]"

# Everything
pip install "sift-triage[llm,enrich]"

Kali Linux / Debian

# Recommended: use pipx for isolated CLI tool installation
sudo apt install pipx   # or: pip install pipx
pipx install sift-triage

# With LLM support
pipx install "sift-triage[llm]"

# With barb + vex enrichment
pipx install "sift-triage[enrich]"

Note: Python 3.11+ required. Kali Linux 2024+ includes Python 3.12 by default. On older systems: sudo apt install python3.12 python3.12-venv


Quick Start

Triage a JSON alert file:

sift triage alerts.json

Triage with AI summarization (Anthropic Claude):

sift triage alerts.json --summarize --provider anthropic

Pipe from Splunk or another tool:

cat splunk_export.json | sift triage -

Export triage report to JSON:

sift triage alerts.json -f json -o report.json

Export triage report as STIX 2.1 bundle:

sift triage alerts.json -f stix -o bundle.json

Filter to HIGH and CRITICAL clusters only:

sift triage alerts.json --filter 'priority >= HIGH'

Enable result caching (skip reprocessing on repeated runs):

sift triage alerts.json --cache

Show metrics for an alert file:

sift metrics alerts.json

Run diagnostics:

sift doctor

Enrich IOCs via barb (phishing URLs) + vex (VirusTotal):

sift triage alerts.json --enrich --summarize

Enrich only via barb (no VirusTotal API key needed):

sift triage alerts.json --enrich --enrich-mode barb

Correlate alerts across multiple sources:

# Two files — merged before clustering
sift triage firewall.json edr_alerts.json

# Mix of files and a directory (scanned recursively)
sift triage baseline.json new_alerts/ --filter 'priority >= HIGH'

# All .json/.csv files in a folder
sift triage /var/log/siem/ --summarize --provider anthropic

Configuration

sift stores settings in ~/.sift/config.yaml and credentials in ~/.sift/.env (mode 600). Both files are created automatically on first use.

Priority chain: CLI flags > SIFT_LLM_KEY env var > ~/.sift/.env > ~/.sift/config.yaml > defaults

Show current config

sift config --show

Set LLM API key

The API key is stored in ~/.sift/.env and is never written to config.yaml.

sift config --api-key sk-ant-...          # Anthropic Claude
sift config --api-key sk-...              # OpenAI
sift config --unset-api-key               # Remove key

Alternatively, set the SIFT_LLM_KEY environment variable directly.

Set default provider and model

sift config --provider anthropic
sift config --provider openai --model gpt-4o
sift config --provider ollama --model llama3
sift config --provider template           # no LLM required (default)

Set output defaults

sift config --quiet                       # suppress banner by default
sift config --no-quiet                    # re-enable banner
sift config --default-format json         # default output format
sift config --default-format rich         # back to Rich table (default)

Set pipeline defaults

sift config --chunk-size 100             # process large batches in chunks of 100
sift config --chunk-size 0               # disable chunking (default)
sift config --cache                      # enable result caching by default
sift config --no-cache                   # disable caching (default)
sift config --enrich-consent             # pre-approve IOC enrichment (no prompt)
sift config --no-enrich-consent          # require prompt before enrichment (default)

Run sift config --help for the full option reference.


Workflow

sift is the third stage of a SOC analyst trilogy. Use barb to score and flag suspicious URLs in incoming data, pass flagged IOCs to vex for VirusTotal enrichment, then feed the enriched alert data into sift for cluster-level triage and summarization. Each tool is useful standalone; together they cover URL analysis → IOC reputation → alert prioritization in a single scriptable pipeline. The --enrich flag automates barb and vex calls directly from within sift triage.


Input Formats

Format Description Notes
Generic JSON Array of alert objects or NDJSON Any field schema; sift normalizes automatically
Splunk export JSON export from Splunk Search Handles results wrapper and Splunk field names
CSV Comma-separated alert rows First row treated as header; all fields extracted

Multiple sources: Pass any number of files and/or directories. sift merges all alerts before dedup and clustering, enabling cross-source correlation:

sift triage firewall.json edr.json ids.csv
sift triage /var/log/siem/           # all .json/.csv/.ndjson/.log files, recursively
sift triage baseline.json new_alerts/

stdin: Pass - as the filename to read from stdin:

splunk-cli export | sift triage -

Large Data & Memory Management

sift uses a per-file streaming pipeline that bounds peak RAM regardless of total input size:

Input Size Behavior
< 50 MB File read entirely into memory — fastest
50 MB – 500 MB Streaming read (5k-line batches), single clustering pass
> 500 MB Sub-file chunking: batches of 100k alerts each run through the full pipeline independently, then merge via IOC-overlap Union-Find
Multiple files Each file processed and freed independently; cross-source correlation restored at merge

Recommended flags for large datasets

# 1–10 GB: use --drop-raw to halve per-alert RAM (drops 80-column raw dict)
sift triage big_flows.csv --drop-raw

# 10+ GB: combine --drop-raw with explicit chunk size
sift triage *.csv --drop-raw --chunk-size 100000

# Tuning via config (persistent)
sift config --chunk-size 50000          # smaller chunks = less RAM per batch

Scale guidelines

Scale Recommendation
< 100 MB (< 200k rows) Works as-is, no tuning needed
100 MB – 1 GB --chunk-size 100000 recommended
1 GB – 10 GB --drop-raw --chunk-size 100000 — expect 10–60 min
> 10 GB Pre-filter to specific time windows or attack types first
> 50 GB Use a SIEM (Splunk, Elastic) to aggregate, then export alerts for sift

Config options

# ~/.sift/config.yaml
clustering:
  chunk_size: 100000               # alerts per batch (0 = auto)
  sub_chunk_threshold_mb: 500      # files above this get sub-file chunking
  sub_chunk_size: 100000           # alerts per sub-file batch

AI Summarization

The --summarize flag adds an AI-generated executive summary and per-cluster recommendations on top of the standard triage output. Without --summarize, sift runs entirely offline with no LLM required.

sift triage alerts.json --summarize --provider anthropic

The summary includes:

  • Executive summary — one paragraph situational assessment across all clusters
  • Per-cluster narrative — what happened, which systems/users are involved, likely attack stage
  • Recommendations — prioritized action items (IMMEDIATE / WITHIN_1H / WITHIN_24H / MONITOR)

Provider Setup

Anthropic (Claude) — recommended

pip install "sift-triage[llm]"
sift config --provider anthropic --api-key sk-ant-...
sift triage alerts.json --summarize

Default model: claude-sonnet-4-20250514. Override with --model:

sift triage alerts.json --summarize --provider anthropic --model claude-opus-4-6

API key resolution order: sift config --api-key (~/.sift/.env) → ANTHROPIC_API_KEY env var.


OpenAI (GPT)

pip install "sift-triage[llm]"
sift config --provider openai --api-key sk-...
sift triage alerts.json --summarize

Default model: gpt-4o-mini. Override with --model gpt-4o.

API key resolution order: sift config --api-key (~/.sift/.env) → OPENAI_API_KEY env var.


Ollama (local, no API key)

Run any local model without sending data to an external API — recommended for sensitive environments.

# Install and start Ollama: https://ollama.com
ollama pull llama3.2

sift config --provider ollama
sift triage alerts.json --summarize

Default model: llama3.2. Default endpoint: http://localhost:11434. Override with:

SIFT_OLLAMA_URL=http://my-server:11434 sift triage alerts.json --summarize --provider ollama --model mistral

Template (default, no LLM)

Generates a structured summary using predefined rules — no API key, no network calls.

sift triage alerts.json --summarize --provider template

Use this for air-gapped environments or to test the summarization pipeline without an LLM.


Provider comparison

Provider Install extra API key required Data leaves machine Default model
template No No
mock No No — (testing only)
anthropic [llm] Yes Yes (Anthropic API) claude-sonnet-4-20250514
openai [llm] Yes Yes (OpenAI API) gpt-4o-mini
ollama No No (local) llama3.2

Enrichment (barb + vex)

The --enrich flag enriches extracted IOCs using the sister tools:

Tool PyPI What it does Required
barb barb-phish Heuristic phishing URL analysis No (local)
vex vex-ioc VirusTotal IOC reputation lookup API key via VT_API_KEY
# Install enrichment extras
pip install "sift-triage[enrich]"

# Run with enrichment
sift triage alerts.json --enrich

# Barb only (no API key needed)
sift triage alerts.json --enrich --enrich-mode barb

# Skip consent prompt
sift triage alerts.json --enrich --yes

sift limits enrichment to 20 IOCs per run to avoid API rate limits.


Ticketing

Create incident tickets directly from triage output — no copy-paste required.

Provider Auth Ticket type
TheHive 5 Bearer token Alert (analyst can promote to Case)
Jira Service Management Email + API token Issue (configurable type)
dry-run none JSON preview to stdout or file

Setup

# Install HTTP dependency
pip install "sift-triage[ticket]"

# TheHive
sift config --ticket-provider thehive --ticket-url https://thehive.example.com
sift config --ticket-token <THEHIVE_API_TOKEN>

# Jira
sift config --ticket-provider jira \
            --ticket-url https://company.atlassian.net \
            --ticket-project SOC \
            --ticket-jira-email analyst@company.com
sift config --ticket-token <JIRA_API_TOKEN>

API tokens are stored in ~/.sift/.env (mode 600) — never in config.yaml.

Usage

# Create ticket for top-priority cluster (uses configured default provider)
sift triage alerts.json --ticket thehive

# Jira ticket
sift triage alerts.json --ticket jira

# Preview ticket JSON without sending
sift triage alerts.json --ticket dry-run
sift triage alerts.json --ticket-output ticket.json

# One ticket per HIGH/CRITICAL cluster
sift triage alerts.json --ticket thehive --ticket-all

# Check connectivity
sift doctor

Ticket content

Each ticket contains:

  • Title: [sift] {SEVERITY} | {cluster label}
  • Summary: LLM narrative (if --summarize) or auto-generated description
  • Timeline: alerts sorted chronologically (up to 10 entries)
  • IOCs: all unique indicators from the cluster
  • ATT&CK: technique IDs mapped from alerts
  • Recommendations: actionable checklist from AI summary
  • Confidence: clustering confidence score (0–100 %)

TheHive: IOCs are automatically mapped as Observables (IP / hash / URL / domain). Jira: description uses Atlassian Document Format with checkbox task lists for recommendations.


Output Formats

Flag Output
rich (default) Color-coded cluster table in the terminal
console Plain-text output, safe for logging
json Structured JSON with all cluster and IOC data
csv Flat CSV suitable for SIEM import or spreadsheets
stix STIX 2.1 bundle JSON for threat intelligence platforms

Use -f / --format to select output format, and -o / --output to write to a file.


Advanced Usage

Alert Filtering

Use --filter to apply a boolean DSL to the cluster list after triage. Only matching clusters are included in the output.

# Only HIGH and CRITICAL clusters
sift triage alerts.json --filter 'priority >= HIGH'

# Malware or phishing clusters with more than 3 IOCs
sift triage alerts.json --filter 'category IN (malware, phishing) AND ioc_count > 3'

# Exclude low-signal categories
sift triage alerts.json --filter 'NOT category IN (false_positive)'

# Combine priority and alert count conditions
sift triage alerts.json --filter 'priority >= MEDIUM AND alert_count >= 5'

Supported fields: priority, category, ioc_count, alert_count. Supported operators: >=, <=, >, <, =, IN (...), NOT, AND, OR.

Result Caching

Use --cache to cache triage results by SHA-256 fingerprint of the input. Repeated runs over the same input return instantly from the cache (1-hour TTL, stored in ~/.sift/cache/).

# First run: processes and caches the result
sift triage alerts.json --cache

# Subsequent runs with the same file: returns from cache
sift triage alerts.json --cache

# Combine with other flags; cache stores the full triage output
sift triage alerts.json --cache --summarize --provider anthropic

STIX 2.1 Export Pipeline

Export triage results as a STIX 2.1 threat intelligence bundle for ingestion into SIEM or TIP platforms.

# Export to STIX bundle file
sift triage alerts.json -f stix -o bundle.json

# Combined enrichment and STIX export
sift triage alerts.json --enrich -f stix -o enriched_bundle.json

# Pipe STIX output to another tool
sift triage alerts.json -f stix | jq '.objects | length'

Max Clusters

Limit the number of clusters returned by the pipeline using max_clusters in ~/.sift/config.yaml. When the cluster count exceeds the limit, only the highest-priority clusters are retained. This is useful for large alert volumes where downstream tooling has per-report limits.

clustering:
  max_clusters: 50

Metrics

The sift metrics command runs the full normalization, dedup, and clustering pipeline over an alert file and displays summary statistics without generating a triage report.

sift metrics alerts.json

Output includes:

  • Total cluster count and alert count
  • Average cluster size
  • Top alert categories by frequency
  • IOC type distribution (IPs, domains, hashes, URLs)
  • AI summary success rate (if summaries were previously generated)
# Skip deduplication for raw counts
sift metrics alerts.json --no-dedup

# Use a custom config file
sift metrics alerts.json --config /path/to/config.yaml

Validation and Security

sift validates all LLM outputs against a strict JSON schema (--validate-only runs parse and validate only, then exits):

# Validate parsed structure without rendering output
sift triage alerts.json --validate-only

A built-in prompt injection detector scans LLM inputs for five pattern categories: instruction overrides, output manipulation, JSON escapes, encoded payloads, and shell injection. Suspicious content is flagged and summarization falls back to the template provider automatically.


Exit Codes

Code Meaning
0 Triage complete — no HIGH or CRITICAL clusters found
1 Triage complete — one or more HIGH or CRITICAL clusters found
2 Error — invalid input, configuration failure, or LLM error

Exit code 1 is designed for use in CI pipelines and automated response playbooks.


Configuration

sift config --show    # display current configuration
sift doctor           # verify config, LLM connectivity, and dependencies

Configuration is resolved in priority order: CLI flags > environment variables > ~/.sift/config.yaml > defaults.


Part of the SOC Trilogy

Tool Role PyPI
barb Heuristic phishing URL analyzer barb-phish
vex VirusTotal IOC enrichment vex-ioc
sift Alert triage summarizer sift-triage

License

MIT — see LICENSE for details.

Author: Christian Huhn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sift_triage-1.1.6.tar.gz (180.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sift_triage-1.1.6-py3-none-any.whl (118.6 kB view details)

Uploaded Python 3

File details

Details for the file sift_triage-1.1.6.tar.gz.

File metadata

  • Download URL: sift_triage-1.1.6.tar.gz
  • Upload date:
  • Size: 180.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sift_triage-1.1.6.tar.gz
Algorithm Hash digest
SHA256 99fdc9d012741ec75c772720bb3438681ed8237016a3afcaba67eff879847d85
MD5 85914967f2c776ab0233b12829e40544
BLAKE2b-256 84981000c7fb58cae941570ca89d8757c018db6e5af106ecc1622a1dc6cb4785

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_triage-1.1.6.tar.gz:

Publisher: publish.yml on duathron/sift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sift_triage-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: sift_triage-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 118.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sift_triage-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e89b0de7cb22315c0ac45e38e9e72da6cac655f23a3eff5bc0265838e5fa6e30
MD5 00b388f108447f926edf2803e7ba0015
BLAKE2b-256 dab7ba69e2d74f51e8e4dabfda531fceaad9246c002e3080aa5f2d0b4d4abf11

See more details on using hashes here.

Provenance

The following attestation bundles were made for sift_triage-1.1.6-py3-none-any.whl:

Publisher: publish.yml on duathron/sift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page