AI-powered alert triage summarizer for SOC teams
Project description
sift
____ ___ _____ _____
/ ___|_ _| ___|_ _|
\___ \| || |_ | |
___) | || _| | |
|____/___|_| |_|
AI-Powered Alert Triage Summarizer for SOC Teams
sift ingests raw security alerts, deduplicates and clusters related events, scores them by priority, and delivers a structured triage summary — with optional AI-generated analysis. Part of the barb → vex → sift SOC workflow trilogy.
Features
- Ingest alerts from generic JSON, Splunk exports (including NDJSON forwarder output), CSV, and Sysmon-format CSV (Image / CommandLine / EventID / ParentImage aliases).
- Deduplicate noisy alert streams before analysis (fingerprint includes host and user so distinct endpoints stay distinct).
- Extract a wide range of IOC types automatically:
- Network: IPv4, IPv6, domains, URLs, email addresses
- File hashes: MD5, SHA1, SHA256, SHA512, ssdeep, TLSH, JARM, JA3 / JA3S (keyword-anchored), imphash
- File observables: Windows executables / scripts (
.exe,.dll,.ps1,.docm, …) including underscore-bearing malware names - Vulnerability and framework references: CVE IDs, MITRE ATT&CK technique IDs (T1xxx / T1xxx.yyy)
- Persistence indicators: Windows registry keys (
HKLM\…,HKCU\…) - Obfuscation indicators: PowerShell encoded blocks (
-enc <b64>,FromBase64String("…")) — surfaced as a SHA-256 stub, never as raw base-64 - Tunnel and cloud-abuse domains: ngrok, serveo, trycloudflare,
Discord webhooks, Telegram bot URLs, paste sites — auto-tagged
highseverity - Defang refang preprocessor:
hxxp://,[.],(.),[at]/[dot], fullwidth Unicode (.,@), zero-width / BOM strips - Null-hash sentinels (Sysmon empty
IMPHASH, hashes-of-empty-bytestring) are silently dropped
- Severity-hint multipliers: PowerShell-encoded execution →
critical, persistence registry keys / tunnel domains / paste sites →high, feeding directly into cluster prioritisation, Jira priority bumps, TheHive tags, and STIX export. - Cluster related alerts by IOC overlap, category + time window, or IP-pair
correlation. Overflow alerts (when
max_clustersis hit) land in an explicitOthercluster instead of being silently dropped. - Score clusters across five priority tiers: NOISE / LOW / MEDIUM / HIGH / CRITICAL
- AI summarization via Anthropic Claude, OpenAI, Ollama (local), or
template-based with no LLM required. The
--no-llmflag forces the template provider for fully offline / keyless triage. - Rich terminal output with priority-colored cluster table and a per-cluster severity-hint column.
- Export to JSON, CSV, or STIX 2.1 for downstream tooling. PowerShell-encoded
payloads are sanitised in every export path by default; pass
--include-raw-payloadfor forensic-mode output. - Filter clusters using a boolean DSL (
--filter 'priority >= HIGH AND ...') - Enrich IOCs via barb (phishing URL analysis) and vex (VirusTotal reputation)
with
--enrich. Bridges run concurrently (ThreadPoolExecutor); IOCs are case-normalised and refanged before the cache dedup pass to collapse duplicate API calls. - Cache triage results by input fingerprint with
--cache(opt-in, 1h TTL, thread-safe SQLite). - Validate LLM output schema, normalize text via NFKC, and detect prompt injection attacks. PowerShell-encoded payloads are sanitised before any LLM submission.
- Ticketing for TheHive 5 and Jira Service Management with severity-hint
aware priority promotion (e.g. a cluster containing PowerShell-encoded
IOCs goes straight to
Highestin Jira). sift metrics <file>command for cluster and IOC distribution statisticssift doctordiagnostics to verify configuration, LLM connectivity, and dependencies- PyPI version check on startup
Installation
pip install sift-triage
Optional extras:
# LLM summarization (Anthropic + OpenAI)
pip install "sift-triage[llm]"
# IOC enrichment via barb/vex
pip install "sift-triage[enrich]"
# Everything
pip install "sift-triage[llm,enrich]"
Kali Linux / Debian
# Recommended: use pipx for isolated CLI tool installation
sudo apt install pipx # or: pip install pipx
pipx install sift-triage
# With LLM support
pipx install "sift-triage[llm]"
# With barb + vex enrichment
pipx install "sift-triage[enrich]"
Note: Python 3.11+ required. Kali Linux 2024+ includes Python 3.12 by default. On older systems:
sudo apt install python3.12 python3.12-venv
Quick Start
Triage a JSON alert file:
sift triage alerts.json
Triage with AI summarization (Anthropic Claude):
sift triage alerts.json --summarize --provider anthropic
Pipe from Splunk or another tool:
cat splunk_export.json | sift triage -
Triage offline / without an LLM (template-only summary):
sift triage alerts.json --no-llm
Forensic-mode export (keep raw PowerShell base-64 payloads):
sift triage alerts.json -f json --include-raw-payload -o forensic.json
Export triage report to JSON:
sift triage alerts.json -f json -o report.json
Export triage report as STIX 2.1 bundle:
sift triage alerts.json -f stix -o bundle.json
Filter to HIGH and CRITICAL clusters only:
sift triage alerts.json --filter 'priority >= HIGH'
Enable result caching (skip reprocessing on repeated runs):
sift triage alerts.json --cache
Show metrics for an alert file:
sift metrics alerts.json
Run diagnostics:
sift doctor
Enrich IOCs via barb (phishing URLs) + vex (VirusTotal):
sift triage alerts.json --enrich --summarize
Enrich only via barb (no VirusTotal API key needed):
sift triage alerts.json --enrich --enrich-mode barb
Correlate alerts across multiple sources:
# Two files — merged before clustering
sift triage firewall.json edr_alerts.json
# Mix of files and a directory (scanned recursively)
sift triage baseline.json new_alerts/ --filter 'priority >= HIGH'
# All .json/.csv files in a folder
sift triage /var/log/siem/ --summarize --provider anthropic
Configuration
sift stores settings in ~/.sift/config.yaml and credentials in ~/.sift/.env (mode 600). Both files are created automatically on first use.
Priority chain: CLI flags > SIFT_LLM_KEY env var > ~/.sift/.env > ~/.sift/config.yaml > defaults
Show current config
sift config --show
Set LLM API key
The API key is stored in ~/.sift/.env and is never written to config.yaml.
sift config --api-key sk-ant-... # Anthropic Claude
sift config --api-key sk-... # OpenAI
sift config --unset-api-key # Remove key
Alternatively, set the SIFT_LLM_KEY environment variable directly.
Set default provider and model
sift config --provider anthropic
sift config --provider openai --model gpt-4o
sift config --provider ollama --model llama3
sift config --provider template # no LLM required (default)
Set output defaults
sift config --quiet # suppress banner by default
sift config --no-quiet # re-enable banner
sift config --default-format json # default output format
sift config --default-format rich # back to Rich table (default)
Set pipeline defaults
sift config --chunk-size 100 # process large batches in chunks of 100
sift config --chunk-size 0 # disable chunking (default)
sift config --cache # enable result caching by default
sift config --no-cache # disable caching (default)
sift config --enrich-consent # pre-approve IOC enrichment (no prompt)
sift config --no-enrich-consent # require prompt before enrichment (default)
Run sift config --help for the full option reference.
Workflow
sift is the third stage of a SOC analyst trilogy. Use barb to score and flag suspicious URLs in incoming data, pass flagged IOCs to vex for VirusTotal enrichment, then feed the enriched alert data into sift for cluster-level triage and summarization. Each tool is useful standalone; together they cover URL analysis → IOC reputation → alert prioritization in a single scriptable pipeline. The --enrich flag automates barb and vex calls directly from within sift triage.
Input Formats
| Format | Description | Notes |
|---|---|---|
| Generic JSON | Array of alert objects or NDJSON | Any field schema; sift normalizes automatically |
| Splunk export | JSON export from Splunk Search | Handles results wrapper and Splunk field names |
| CSV | Comma-separated alert rows | First row treated as header; all fields extracted |
Multiple sources: Pass any number of files and/or directories. sift merges all alerts before dedup and clustering, enabling cross-source correlation:
sift triage firewall.json edr.json ids.csv
sift triage /var/log/siem/ # all .json/.csv/.ndjson/.log files, recursively
sift triage baseline.json new_alerts/
stdin: Pass - as the filename to read from stdin:
splunk-cli export | sift triage -
Large Data & Memory Management
sift uses a per-file streaming pipeline that bounds peak RAM regardless of total input size:
| Input Size | Behavior |
|---|---|
| < 50 MB | File read entirely into memory — fastest |
| 50 MB – 500 MB | Streaming read (5k-line batches), single clustering pass |
| > 500 MB | Sub-file chunking: batches of 100k alerts each run through the full pipeline independently, then merge via IOC-overlap Union-Find |
| Multiple files | Each file processed and freed independently; cross-source correlation restored at merge |
Recommended flags for large datasets
# 1–10 GB: use --drop-raw to halve per-alert RAM (drops 80-column raw dict)
sift triage big_flows.csv --drop-raw
# 10+ GB: combine --drop-raw with explicit chunk size
sift triage *.csv --drop-raw --chunk-size 100000
# Tuning via config (persistent)
sift config --chunk-size 50000 # smaller chunks = less RAM per batch
Scale guidelines
| Scale | Recommendation |
|---|---|
| < 100 MB (< 200k rows) | Works as-is, no tuning needed |
| 100 MB – 1 GB | --chunk-size 100000 recommended |
| 1 GB – 10 GB | --drop-raw --chunk-size 100000 — expect 10–60 min |
| > 10 GB | Pre-filter to specific time windows or attack types first |
| > 50 GB | Use a SIEM (Splunk, Elastic) to aggregate, then export alerts for sift |
Config options
# ~/.sift/config.yaml
clustering:
chunk_size: 100000 # alerts per batch (0 = auto)
sub_chunk_threshold_mb: 500 # files above this get sub-file chunking
sub_chunk_size: 100000 # alerts per sub-file batch
AI Summarization
The --summarize flag adds an AI-generated executive summary and per-cluster recommendations on top of the standard triage output. Without --summarize, sift runs entirely offline with no LLM required.
sift triage alerts.json --summarize --provider anthropic
The summary includes:
- Executive summary — one paragraph situational assessment across all clusters
- Per-cluster narrative — what happened, which systems/users are involved, likely attack stage
- Recommendations — prioritized action items (IMMEDIATE / WITHIN_1H / WITHIN_24H / MONITOR)
Provider Setup
Anthropic (Claude) — recommended
pip install "sift-triage[llm]"
sift config --provider anthropic --api-key sk-ant-...
sift triage alerts.json --summarize
Default model: claude-sonnet-4-6. Override with --model:
sift triage alerts.json --summarize --provider anthropic --model claude-opus-4-6
API key resolution order: sift config --api-key (~/.sift/.env) → ANTHROPIC_API_KEY env var.
OpenAI (GPT)
pip install "sift-triage[llm]"
sift config --provider openai --api-key sk-...
sift triage alerts.json --summarize
Default model: gpt-4o-mini. Override with --model gpt-4o.
API key resolution order: sift config --api-key (~/.sift/.env) → OPENAI_API_KEY env var.
Ollama (local, no API key)
Run any local model without sending data to an external API — recommended for sensitive environments.
# Install and start Ollama: https://ollama.com
ollama pull llama3.2
sift config --provider ollama
sift triage alerts.json --summarize
Default model: llama3.2. Default endpoint: http://localhost:11434. Override with:
SIFT_OLLAMA_URL=http://my-server:11434 sift triage alerts.json --summarize --provider ollama --model mistral
Template (default, no LLM)
Generates a structured summary using predefined rules — no API key, no network calls.
sift triage alerts.json --summarize --provider template
Use this for air-gapped environments or to test the summarization pipeline without an LLM.
Provider comparison
| Provider | Install extra | API key required | Data leaves machine | Default model |
|---|---|---|---|---|
template |
— | No | No | — |
mock |
— | No | No | — (testing only) |
anthropic |
[llm] |
Yes | Yes (Anthropic API) | claude-sonnet-4-6 |
openai |
[llm] |
Yes | Yes (OpenAI API) | gpt-4o-mini |
ollama |
— | No | No (local) | llama3.2 |
Enrichment (barb + vex)
The --enrich flag enriches extracted IOCs using the sister tools:
| Tool | PyPI | What it does | Required |
|---|---|---|---|
| barb | barb-phish |
Heuristic phishing URL analysis | No (local) |
| vex | vex-ioc |
VirusTotal IOC reputation lookup | API key via VT_API_KEY |
# Install enrichment extras
pip install "sift-triage[enrich]"
# Run with enrichment
sift triage alerts.json --enrich
# Barb only (no API key needed)
sift triage alerts.json --enrich --enrich-mode barb
# Skip consent prompt
sift triage alerts.json --enrich --yes
sift limits enrichment to 20 IOCs per run to avoid API rate limits.
Ticketing
Create incident tickets directly from triage output — no copy-paste required.
| Provider | Auth | Ticket type |
|---|---|---|
| TheHive 5 | Bearer token | Alert (analyst can promote to Case) |
| Jira Service Management | Email + API token | Issue (configurable type) |
| dry-run | none | JSON preview to stdout or file |
Setup
# Install HTTP dependency
pip install "sift-triage[ticket]"
# TheHive
sift config --ticket-provider thehive --ticket-url https://thehive.example.com
sift config --ticket-token <THEHIVE_API_TOKEN>
# Jira
sift config --ticket-provider jira \
--ticket-url https://company.atlassian.net \
--ticket-project SOC \
--ticket-jira-email analyst@company.com
sift config --ticket-token <JIRA_API_TOKEN>
API tokens are stored in ~/.sift/.env (mode 600) — never in config.yaml.
Usage
# Create ticket for top-priority cluster (uses configured default provider)
sift triage alerts.json --ticket thehive
# Jira ticket
sift triage alerts.json --ticket jira
# Preview ticket JSON without sending
sift triage alerts.json --ticket dry-run
sift triage alerts.json --ticket-output ticket.json
# One ticket per HIGH/CRITICAL cluster
sift triage alerts.json --ticket thehive --ticket-all
# Check connectivity
sift doctor
Ticket content
Each ticket contains:
- Title:
[sift] {SEVERITY} | {cluster label} - Summary: LLM narrative (if
--summarize) or auto-generated description - Timeline: alerts sorted chronologically (up to 10 entries)
- IOCs: all unique indicators from the cluster
- ATT&CK: technique IDs mapped from alerts
- Recommendations: actionable checklist from AI summary
- Confidence: clustering confidence score (0–100 %)
TheHive: IOCs are automatically mapped as Observables (IP / hash / URL / domain). Jira: description uses Atlassian Document Format with checkbox task lists for recommendations.
Output Formats
| Flag | Output |
|---|---|
rich (default) |
Color-coded cluster table in the terminal |
console |
Plain-text output, safe for logging |
json |
Structured JSON with all cluster and IOC data |
csv |
Flat CSV suitable for SIEM import or spreadsheets |
stix |
STIX 2.1 bundle JSON for threat intelligence platforms |
Use -f / --format to select output format, and -o / --output to write to a file.
Advanced Usage
Alert Filtering
Use --filter to apply a boolean DSL to the cluster list after triage. Only matching clusters are included in the output.
# Only HIGH and CRITICAL clusters
sift triage alerts.json --filter 'priority >= HIGH'
# Malware or phishing clusters with more than 3 IOCs
sift triage alerts.json --filter 'category IN (malware, phishing) AND ioc_count > 3'
# Exclude low-signal categories
sift triage alerts.json --filter 'NOT category IN (false_positive)'
# Combine priority and alert count conditions
sift triage alerts.json --filter 'priority >= MEDIUM AND alert_count >= 5'
Supported fields: priority, category, ioc_count, alert_count.
Supported operators: >=, <=, >, <, =, IN (...), NOT, AND, OR.
Result Caching
Use --cache to cache triage results by SHA-256 fingerprint of the input. Repeated runs over the same input return instantly from the cache (1-hour TTL, stored in ~/.sift/cache/).
# First run: processes and caches the result
sift triage alerts.json --cache
# Subsequent runs with the same file: returns from cache
sift triage alerts.json --cache
# Combine with other flags; cache stores the full triage output
sift triage alerts.json --cache --summarize --provider anthropic
STIX 2.1 Export Pipeline
Export triage results as a STIX 2.1 threat intelligence bundle for ingestion into SIEM or TIP platforms.
# Export to STIX bundle file
sift triage alerts.json -f stix -o bundle.json
# Combined enrichment and STIX export
sift triage alerts.json --enrich -f stix -o enriched_bundle.json
# Pipe STIX output to another tool
sift triage alerts.json -f stix | jq '.objects | length'
Max Clusters
Limit the number of clusters returned by the pipeline using max_clusters in ~/.sift/config.yaml. When the cluster count exceeds the limit, only the highest-priority clusters are retained. This is useful for large alert volumes where downstream tooling has per-report limits.
clustering:
max_clusters: 50
Metrics
The sift metrics command runs the full normalization, dedup, and clustering pipeline over an alert file and displays summary statistics without generating a triage report.
sift metrics alerts.json
Output includes:
- Total cluster count and alert count
- Average cluster size
- Top alert categories by frequency
- IOC type distribution (IPs, domains, hashes, URLs)
- AI summary success rate (if summaries were previously generated)
# Skip deduplication for raw counts
sift metrics alerts.json --no-dedup
# Use a custom config file
sift metrics alerts.json --config /path/to/config.yaml
Validation and Security
sift validates all LLM outputs against a strict JSON schema (--validate-only runs parse and validate only, then exits):
# Validate parsed structure without rendering output
sift triage alerts.json --validate-only
A built-in prompt injection detector scans LLM inputs for five pattern categories: instruction overrides, output manipulation, JSON escapes, encoded payloads, and shell injection. Suspicious content is flagged and summarization falls back to the template provider automatically.
Exit Codes
| Code | Meaning |
|---|---|
0 |
Triage complete — no HIGH or CRITICAL clusters found |
1 |
Triage complete — one or more HIGH or CRITICAL clusters found |
2 |
Error — invalid input, configuration failure, or LLM error |
Exit code 1 is designed for use in CI pipelines and automated response playbooks.
Configuration
sift config --show # display current configuration
sift doctor # verify config, LLM connectivity, and dependencies
Configuration is resolved in priority order: CLI flags > environment variables > ~/.sift/config.yaml > defaults.
Part of the SOC Trilogy
| Tool | Role | PyPI |
|---|---|---|
| barb | Heuristic phishing URL analyzer | barb-phish |
| vex | VirusTotal IOC enrichment | vex-ioc |
| sift | Alert triage summarizer | sift-triage |
License
MIT — see LICENSE for details.
Author: Christian Huhn
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sift_triage-1.1.101.tar.gz.
File metadata
- Download URL: sift_triage-1.1.101.tar.gz
- Upload date:
- Size: 211.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e5cf8b3f06b50e38d785d364402fef42094e8dbee37dbd5e75ec43bef4768e1
|
|
| MD5 |
19f062d07bd9c0cb06cbd50dd874159b
|
|
| BLAKE2b-256 |
47fa1b830f557e19351664f390eec1b95a35f78c044c5e3d598718c6b1f38ec8
|
Provenance
The following attestation bundles were made for sift_triage-1.1.101.tar.gz:
Publisher:
publish.yml on duathron/sift
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sift_triage-1.1.101.tar.gz -
Subject digest:
8e5cf8b3f06b50e38d785d364402fef42094e8dbee37dbd5e75ec43bef4768e1 - Sigstore transparency entry: 1517388948
- Sigstore integration time:
-
Permalink:
duathron/sift@26297e37934d10c3b0910f8fcd680843761974c3 -
Branch / Tag:
refs/tags/v1.1.101 - Owner: https://github.com/duathron
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@26297e37934d10c3b0910f8fcd680843761974c3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sift_triage-1.1.101-py3-none-any.whl.
File metadata
- Download URL: sift_triage-1.1.101-py3-none-any.whl
- Upload date:
- Size: 136.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c1c3f57919dfe4b845106175f7dc5457d9a104ed84e79392fa6e919412f2176
|
|
| MD5 |
b719f5a27fc8fec5fff7bbcad64b9b25
|
|
| BLAKE2b-256 |
375e749cf53c7fc5896d5a4a1f12a6e80c8ff03620b77a2d7f60233e2a8d6ef3
|
Provenance
The following attestation bundles were made for sift_triage-1.1.101-py3-none-any.whl:
Publisher:
publish.yml on duathron/sift
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sift_triage-1.1.101-py3-none-any.whl -
Subject digest:
1c1c3f57919dfe4b845106175f7dc5457d9a104ed84e79392fa6e919412f2176 - Sigstore transparency entry: 1517389061
- Sigstore integration time:
-
Permalink:
duathron/sift@26297e37934d10c3b0910f8fcd680843761974c3 -
Branch / Tag:
refs/tags/v1.1.101 - Owner: https://github.com/duathron
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@26297e37934d10c3b0910f8fcd680843761974c3 -
Trigger Event:
release
-
Statement type: