AI-powered Sigma rule generator using MITRE ATT&CK and RAG
Project description
SigmaGen
Generate production-ready Sigma detection rules from MITRE ATT&CK technique IDs or raw attack telemetry — powered by RAG.
Quickstart • How It Works • CLI Reference • REST API • Contributing
The Problem
CISA and every major threat intel framework agree: the #1 gap in enterprise security is the time between a new technique appearing in the wild and a detection rule landing in your SIEM. Most SOC teams don't have enough detection engineers. Writing a high-quality Sigma rule from scratch — with the right logsource, field mappings, false positive filters, and ATT&CK tags — takes 30-60 minutes per technique.
The Solution
SigmaGen closes this gap. Give it a technique ID or describe the attack behavior, and it generates deployable Sigma YAML in seconds:
$ sigmagen generate --technique T1059.001
──────────────────── SigmaGen Rule Generation ────────────────────
Retrieving context from knowledge base...
Retrieved 5 techniques, 5 existing rules
Generating rules via anthropic...
LLM returned 3 rule(s)
──────────────────────── Rule 1 ────────────────────────────────
title: Suspicious PowerShell Encoded Command Execution
id: 7f3a2c1e-84b6-4d9f-a031-5e8c7b2f9d14
status: experimental
logsource:
category: process_creation
product: windows
detection:
selection_image:
Image|endswith:
- '\powershell.exe'
- '\pwsh.exe'
selection_encoded:
CommandLine|contains:
- ' -EncodedCommand '
- ' -enc '
- ' -EC '
filter_known_tools:
ParentImage|endswith: '\msiexec.exe'
condition: selection_image and selection_encoded
and not filter_known_tools
level: medium
Validation: PASSED
──────────────────────── Summary ───────────────────────────────
3 rules generated | 3 passed | 0 failed
Output: output/
v suspicious_powershell_encoded_command_execution.yml [medium]
v powershell_suspicious_download_cradle_execution.yml [high]
v powershell_amsi_bypass_attempt_detected.yml [high]
Every generated rule includes specific detection logic with field-value conditions, false positive filters, ATT&CK tags, and logsource mappings — not generic templates.
How It Works
SigmaGen is not a wrapper around "write me a Sigma rule." It's a RAG pipeline that retrieves real ATT&CK detection guidance and existing Sigma rules from a local vector store, then uses that context to ground the LLM's output in production patterns.
User Input Knowledge Base
(T1059.001 or text) ┌──────────────────┐
│ │ ATT&CK (691) │
▼ │ Sigma (3110) │
┌────────────────┐ └────────┬─────────┘
│ Retriever │◄────────────────────┘
│ (ChromaDB) │ cosine similarity
└───────┬────────┘ + metadata filter
│
▼
┌────────────────┐
│ Prompt Builder │ packs ATT&CK context
│ │ + 3 best Sigma examples
└───────┬────────┘
│
▼
┌────────────────┐
│ Claude / GPT │ generates 1-3 rules
└───────┬────────┘
│
▼
┌────────────────┐
│ Validator │ schema + field checks
└───────┬────────┘
│
▼
.yml files in output/
Ingestion — The ATT&CK STIX bundle (690+ techniques with detection guidance, data sources, tactics) and SigmaHQ's stable rules (3100+ community rules) are parsed and embedded into ChromaDB using all-MiniLM-L6-v2.
Retrieval — Technique IDs hit an exact metadata filter first, then semantic similarity for related context. Free-text queries use pure semantic search across both collections. Results are deduplicated and ranked.
Generation — The prompt includes the ATT&CK technique's detection guidance, data sources, and platforms, plus up to 3 existing Sigma rules as structural examples. The system prompt enforces specific detection conditions — no selection: * or match-all patterns.
Validation — Every rule is checked for required fields, valid levels/statuses, UUID format, logsource structure, detection logic (must have named selections + condition), and ATT&CK tags.
Quickstart
Install
git clone https://github.com/sigmagen-project/sigmagen.git
cd sigmagen
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
Setup
The setup wizard handles everything:
$ sigmagen init
╭─ SigmaGen Setup Wizard ─╮
╰────────────────────────╯
Step 1/3 Checking API key...
v ANTHROPIC_API_KEY is set (provider: anthropic)
Step 2/3 Checking knowledge base...
v ATT&CK techniques: 691
v Sigma rules: 3110
Knowledge base is ready.
Step 3/3 Verifying setup...
v Everything looks good!
╭─ Next steps ──────────────────────────────────────────╮
│ Ready to generate. │
│ │
│ sigmagen generate --technique T1059.001 │
│ sigmagen generate --description "certutil download" │
╰───────────────────────────────────────────────────────╯
Or manually:
cp .env.example .env # add your ANTHROPIC_API_KEY or OPENAI_API_KEY
sigmagen ingest all # one-time, ~5 min
Generate
Three ways to generate rules:
# By technique ID (supports tab completion)
sigmagen generate --technique T1059.001
# By description
sigmagen generate --description "attacker used certutil to download a payload"
# Interactive — just run it, get prompted
sigmagen generate
Input type (technique, description, telemetry): technique
Technique ID (e.g. T1059.001): T1059.001
How many rules? (1-5) [1]: 3
Search the Knowledge Base
Query what ATT&CK techniques and existing Sigma rules match your input:
$ sigmagen retrieve --query "credential dumping lsass"
ATT&CK Techniques
┌───────────┬───────────────────────┬───────────────────┬───────┐
│ ID │ Name │ Tactics │ Score │
├───────────┼───────────────────────┼───────────────────┼───────┤
│ T1003.001 │ LSASS Memory │ credential-access │ 0.631 │
│ T1003.004 │ LSA Secrets │ credential-access │ 0.607 │
│ T1547.008 │ LSASS Driver │ persistence │ 0.540 │
│ T1003 │ OS Credential Dumping │ credential-access │ 0.512 │
│ T1110.004 │ Credential Stuffing │ credential-access │ 0.480 │
└───────────┴───────────────────────┴───────────────────┴───────┘
Sigma Rules
┌─────────────────────────────────────┬──────────┬───────────┬───────┐
│ Title │ Level │ Technique │ Score │
├─────────────────────────────────────┼──────────┼───────────┼───────┤
│ Credential Dumping Via LSASS │ medium │ T1003.001 │ 0.787 │
│ LSASS Process Clone │ critical │ T1003 │ 0.765 │
│ Credential Dumping By Python Tool │ high │ T1003.001 │ 0.763 │
│ LSASS SilentProcessExit Technique │ critical │ T1003.001 │ 0.741 │
│ Password Dumper Activity on LSASS │ high │ T1003.001 │ 0.739 │
└─────────────────────────────────────┴──────────┴───────────┴───────┘
Validate
$ sigmagen validate output/suspicious_powershell_encoded_command.yml
──── Validating suspicious_powershell_encoded_command.yml ────
Validation PASSED
Status Dashboard
$ sigmagen status
SigmaGen v0.1.0
───────────────────────────────────────────────────────
Knowledge Base Documents Status
attack_techniques 691 v Ready
sigma_rules 3110 v Ready
LLM Provider anthropic
API Key v Set
Model claude-sonnet-4-6
Embedding Model all-MiniLM-L6-v2
───────────────────────────────────────────────────────
Ready to generate. Run: sigmagen generate --technique T1059.001
Error Handling
Invalid inputs are caught early with clear guidance:
$ sigmagen generate --technique fdsb
x 'fdsb' is not a valid ATT&CK technique ID.
Expected format: T1059 or T1059.001
$ sigmagen generate --technique T1059.001 # before running ingest
x Knowledge base is not ready.
- attack_techniques collection is empty
Run: sigmagen ingest all
$ sigmagen generate # without an API key
x No API key found.
Add ANTHROPIC_API_KEY to your .env file.
Run: cp .env.example .env
CLI Reference
sigmagen
├── init First-run setup wizard
├── ingest
│ ├── attack Download and embed ATT&CK techniques [--force]
│ ├── sigma Clone and embed Sigma rules [--force] [--full-corpus]
│ └── all Both [--force] [--full-corpus]
├── generate Generate Sigma rules via RAG + LLM
│ ├── -t T1059.001 by technique ID (tab-completable)
│ ├── -d "description" by free text
│ ├── -T ./log.xml by telemetry file
│ ├── -o ./output output directory
│ ├── -n 3 number of rules (1-5)
│ └── -p openai override LLM provider
├── retrieve Search the knowledge base -q "query" [-n 5]
├── validate Validate a Sigma YAML file <path>
├── status Dashboard: collections + config
├── serve Start REST API server [--host] [--port]
└── setup-shell Enable tab completion [bash|zsh|fish|powershell]
REST API
Start with sigmagen serve. Swagger docs at http://localhost:8000/docs.
| Method | Path | Description |
|---|---|---|
POST |
/generate |
Generate Sigma rules |
GET |
/retrieve?q=... |
Search the knowledge base |
GET |
/status |
Collection stats + config |
GET |
/health |
Health check |
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"technique_id": "T1059.001", "n_rules": 2}'
Tech Stack
| Component | Technology | Why |
|---|---|---|
| CLI | Click + Rich | Subcommands, tab completion, syntax-highlighted output |
| Vector store | ChromaDB (local, persistent) | No external database needed |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) |
Local, no API key, fast |
| LLM | Anthropic Claude / OpenAI GPT | Swappable via env var |
| API | FastAPI + Pydantic | Type-safe, auto-documented |
| Validation | Pure Python + optional sigma-cli | Zero external deps for core validation |
No LangChain. No LlamaIndex. The RAG pipeline is built directly on ChromaDB queries.
Knowledge Bases
| Source | Documents | What's embedded |
|---|---|---|
| MITRE ATT&CK Enterprise | 691 techniques | ID, name, tactics, platforms, data sources, detection guidance |
| SigmaHQ/sigma | 3110 rules | Title, description, logsource, detection logic, level, technique tags |
Both are downloaded and embedded locally during sigmagen ingest all. No data leaves your machine except the LLM API call during generation.
Environment Variables
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
anthropic |
anthropic or openai |
ANTHROPIC_API_KEY |
— | Required if provider is anthropic |
ANTHROPIC_MODEL |
claude-sonnet-4-6 |
Claude model ID |
OPENAI_API_KEY |
— | Required if provider is openai |
OPENAI_MODEL |
gpt-4o |
OpenAI model ID |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Local embedding model |
SIGMAGEN_DATA_DIR |
./data |
Where ATT&CK JSON, Sigma repo, and ChromaDB live |
Contributing
git clone https://github.com/sigmagen-project/sigmagen.git
cd sigmagen
pip install -e ".[dev]"
pytest tests/ -v # 38 tests, all passing
- Fork the repo
- Create a feature branch
- Make sure tests pass
- Submit a PR
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sigmagen-0.1.0.tar.gz.
File metadata
- Download URL: sigmagen-0.1.0.tar.gz
- Upload date:
- Size: 42.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2559c1f9b201ff87d1a82954a66b44476df3dc02508531c0283813562485fa6f
|
|
| MD5 |
6276ac355dc7d1171d86558a0e2550da
|
|
| BLAKE2b-256 |
fae0282001ebb0d48a6fe3b432bdb4dc820c20c415f73d83fd98df19cd3edb02
|
File details
Details for the file sigmagen-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sigmagen-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33e66c4789d6b99a61668a7a37a10d7f9359124d3a55afaa23f2f9a701945b55
|
|
| MD5 |
290993206727a9a5fb1fd1fbc0b10122
|
|
| BLAKE2b-256 |
f68de3269df41cbedcd37e0f11e412f6d801317ab28192ac7e3b9b6ce6e8202e
|