Skip to main content

AI-powered Sigma rule generator using MITRE ATT&CK and RAG

Project description

SigmaGen

Generate production-ready Sigma detection rules from MITRE ATT&CK technique IDs or raw attack telemetry — powered by RAG.

QuickstartHow It WorksCLI ReferenceREST APIContributing


The Problem

CISA and every major threat intel framework agree: the #1 gap in enterprise security is the time between a new technique appearing in the wild and a detection rule landing in your SIEM. Most SOC teams don't have enough detection engineers. Writing a high-quality Sigma rule from scratch — with the right logsource, field mappings, false positive filters, and ATT&CK tags — takes 30-60 minutes per technique.

The Solution

SigmaGen closes this gap. Give it a technique ID or describe the attack behavior, and it generates deployable Sigma YAML in seconds:

$ sigmagen generate --technique T1059.001
──────────────────── SigmaGen Rule Generation ────────────────────
Retrieving context from knowledge base...
  Retrieved 5 techniques, 5 existing rules
Generating rules via anthropic...
  LLM returned 3 rule(s)
──────────────────────── Rule 1 ────────────────────────────────
  title: Suspicious PowerShell Encoded Command Execution
  id: 7f3a2c1e-84b6-4d9f-a031-5e8c7b2f9d14
  status: experimental
  logsource:
    category: process_creation
    product: windows
  detection:
    selection_image:
      Image|endswith:
        - '\powershell.exe'
        - '\pwsh.exe'
    selection_encoded:
      CommandLine|contains:
        - ' -EncodedCommand '
        - ' -enc '
        - ' -EC '
    filter_known_tools:
      ParentImage|endswith: '\msiexec.exe'
    condition: selection_image and selection_encoded
              and not filter_known_tools
  level: medium

  Validation: PASSED

──────────────────────── Summary ───────────────────────────────
  3 rules generated  |  3 passed  |  0 failed
  Output: output/
    v suspicious_powershell_encoded_command_execution.yml   [medium]
    v powershell_suspicious_download_cradle_execution.yml   [high]
    v powershell_amsi_bypass_attempt_detected.yml           [high]

Every generated rule includes specific detection logic with field-value conditions, false positive filters, ATT&CK tags, and logsource mappings — not generic templates.


How It Works

SigmaGen is not a wrapper around "write me a Sigma rule." It's a RAG pipeline that retrieves real ATT&CK detection guidance and existing Sigma rules from a local vector store, then uses that context to ground the LLM's output in production patterns.

          User Input                    Knowledge Base
     (T1059.001 or text)            ┌──────────────────┐
              │                     │  ATT&CK (691)    │
              ▼                     │  Sigma  (3110)   │
     ┌────────────────┐             └────────┬─────────┘
     │   Retriever    │◄────────────────────┘
     │  (ChromaDB)    │   cosine similarity
     └───────┬────────┘   + metadata filter
             │
             ▼
     ┌────────────────┐
     │ Prompt Builder │  packs ATT&CK context
     │                │  + 3 best Sigma examples
     └───────┬────────┘
             │
             ▼
     ┌────────────────┐
     │  Claude / GPT  │  generates 1-3 rules
     └───────┬────────┘
             │
             ▼
     ┌────────────────┐
     │   Validator    │  schema + field checks
     └───────┬────────┘
             │
             ▼
       .yml files in output/

Ingestion — The ATT&CK STIX bundle (690+ techniques with detection guidance, data sources, tactics) and SigmaHQ's stable rules (3100+ community rules) are parsed and embedded into ChromaDB using all-MiniLM-L6-v2.

Retrieval — Technique IDs hit an exact metadata filter first, then semantic similarity for related context. Free-text queries use pure semantic search across both collections. Results are deduplicated and ranked.

Generation — The prompt includes the ATT&CK technique's detection guidance, data sources, and platforms, plus up to 3 existing Sigma rules as structural examples. The system prompt enforces specific detection conditions — no selection: * or match-all patterns.

Validation — Every rule is checked for required fields, valid levels/statuses, UUID format, logsource structure, detection logic (must have named selections + condition), and ATT&CK tags.


Quickstart

Install

git clone https://github.com/sigmagen-project/sigmagen.git
cd sigmagen
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Setup

The setup wizard handles everything:

$ sigmagen init
╭─ SigmaGen Setup Wizard ─╮
╰────────────────────────╯

Step 1/3  Checking API key...
  v ANTHROPIC_API_KEY is set (provider: anthropic)

Step 2/3  Checking knowledge base...
  v ATT&CK techniques: 691
  v Sigma rules: 3110
  Knowledge base is ready.

Step 3/3  Verifying setup...
  v Everything looks good!

╭─ Next steps ──────────────────────────────────────────╮
│ Ready to generate.                                    │
│                                                       │
│   sigmagen generate --technique T1059.001               │
│   sigmagen generate --description "certutil download"   │
╰───────────────────────────────────────────────────────╯

Or manually:

cp .env.example .env          # add your ANTHROPIC_API_KEY or OPENAI_API_KEY
sigmagen ingest all              # one-time, ~5 min

Generate

Three ways to generate rules:

# By technique ID (supports tab completion)
sigmagen generate --technique T1059.001

# By description
sigmagen generate --description "attacker used certutil to download a payload"

# Interactive — just run it, get prompted
sigmagen generate
  Input type (technique, description, telemetry): technique
  Technique ID (e.g. T1059.001): T1059.001
  How many rules? (1-5) [1]: 3

Search the Knowledge Base

Query what ATT&CK techniques and existing Sigma rules match your input:

$ sigmagen retrieve --query "credential dumping lsass"
             ATT&CK Techniques
┌───────────┬───────────────────────┬───────────────────┬───────┐
│ ID        │ Name                  │ Tactics           │ Score │
├───────────┼───────────────────────┼───────────────────┼───────┤
│ T1003.001 │ LSASS Memory          │ credential-access │ 0.631 │
│ T1003.004 │ LSA Secrets           │ credential-access │ 0.607 │
│ T1547.008 │ LSASS Driver          │ persistence       │ 0.540 │
│ T1003     │ OS Credential Dumping │ credential-access │ 0.512 │
│ T1110.004 │ Credential Stuffing   │ credential-access │ 0.480 │
└───────────┴───────────────────────┴───────────────────┴───────┘
                 Sigma Rules
┌─────────────────────────────────────┬──────────┬───────────┬───────┐
│ Title                               │ Level    │ Technique │ Score │
├─────────────────────────────────────┼──────────┼───────────┼───────┤
│ Credential Dumping Via LSASS        │ medium   │ T1003.001 │ 0.787 │
│ LSASS Process Clone                 │ critical │ T1003     │ 0.765 │
│ Credential Dumping By Python Tool   │ high     │ T1003.001 │ 0.763 │
│ LSASS SilentProcessExit Technique   │ critical │ T1003.001 │ 0.741 │
│ Password Dumper Activity on LSASS   │ high     │ T1003.001 │ 0.739 │
└─────────────────────────────────────┴──────────┴───────────┴───────┘

Validate

$ sigmagen validate output/suspicious_powershell_encoded_command.yml
──── Validating suspicious_powershell_encoded_command.yml ────
Validation PASSED

Status Dashboard

$ sigmagen status
SigmaGen v0.1.0
───────────────────────────────────────────────────────
Knowledge Base     Documents  Status
attack_techniques        691  v Ready
sigma_rules             3110  v Ready

LLM Provider     anthropic
API Key          v Set
Model            claude-sonnet-4-6
Embedding Model  all-MiniLM-L6-v2
───────────────────────────────────────────────────────
Ready to generate. Run: sigmagen generate --technique T1059.001

Error Handling

Invalid inputs are caught early with clear guidance:

$ sigmagen generate --technique fdsb
x 'fdsb' is not a valid ATT&CK technique ID.
  Expected format: T1059 or T1059.001

$ sigmagen generate --technique T1059.001   # before running ingest
x Knowledge base is not ready.
  - attack_techniques collection is empty
  Run: sigmagen ingest all

$ sigmagen generate                          # without an API key
x No API key found.
  Add ANTHROPIC_API_KEY to your .env file.
  Run: cp .env.example .env

CLI Reference

sigmagen
├── init             First-run setup wizard
├── ingest
│   ├── attack       Download and embed ATT&CK techniques  [--force]
│   ├── sigma        Clone and embed Sigma rules           [--force] [--full-corpus]
│   └── all          Both                                  [--force] [--full-corpus]
├── generate         Generate Sigma rules via RAG + LLM
│   ├── -t T1059.001       by technique ID (tab-completable)
│   ├── -d "description"   by free text
│   ├── -T ./log.xml       by telemetry file
│   ├── -o ./output        output directory
│   ├── -n 3               number of rules (1-5)
│   └── -p openai          override LLM provider
├── retrieve         Search the knowledge base             -q "query" [-n 5]
├── validate         Validate a Sigma YAML file            <path>
├── status           Dashboard: collections + config
├── serve            Start REST API server                 [--host] [--port]
└── setup-shell      Enable tab completion                 [bash|zsh|fish|powershell]

REST API

Start with sigmagen serve. Swagger docs at http://localhost:8000/docs.

Method Path Description
POST /generate Generate Sigma rules
GET /retrieve?q=... Search the knowledge base
GET /status Collection stats + config
GET /health Health check
curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{"technique_id": "T1059.001", "n_rules": 2}'

Tech Stack

Component Technology Why
CLI Click + Rich Subcommands, tab completion, syntax-highlighted output
Vector store ChromaDB (local, persistent) No external database needed
Embeddings sentence-transformers (all-MiniLM-L6-v2) Local, no API key, fast
LLM Anthropic Claude / OpenAI GPT Swappable via env var
API FastAPI + Pydantic Type-safe, auto-documented
Validation Pure Python + optional sigma-cli Zero external deps for core validation

No LangChain. No LlamaIndex. The RAG pipeline is built directly on ChromaDB queries.


Knowledge Bases

Source Documents What's embedded
MITRE ATT&CK Enterprise 691 techniques ID, name, tactics, platforms, data sources, detection guidance
SigmaHQ/sigma 3110 rules Title, description, logsource, detection logic, level, technique tags

Both are downloaded and embedded locally during sigmagen ingest all. No data leaves your machine except the LLM API call during generation.


Environment Variables

Variable Default Description
LLM_PROVIDER anthropic anthropic or openai
ANTHROPIC_API_KEY Required if provider is anthropic
ANTHROPIC_MODEL claude-sonnet-4-6 Claude model ID
OPENAI_API_KEY Required if provider is openai
OPENAI_MODEL gpt-4o OpenAI model ID
EMBEDDING_MODEL all-MiniLM-L6-v2 Local embedding model
SIGMAGEN_DATA_DIR ./data Where ATT&CK JSON, Sigma repo, and ChromaDB live

Contributing

git clone https://github.com/sigmagen-project/sigmagen.git
cd sigmagen
pip install -e ".[dev]"
pytest tests/ -v   # 38 tests, all passing
  1. Fork the repo
  2. Create a feature branch
  3. Make sure tests pass
  4. Submit a PR

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sigmagen-0.1.0.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sigmagen-0.1.0-py3-none-any.whl (39.5 kB view details)

Uploaded Python 3

File details

Details for the file sigmagen-0.1.0.tar.gz.

File metadata

  • Download URL: sigmagen-0.1.0.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sigmagen-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2559c1f9b201ff87d1a82954a66b44476df3dc02508531c0283813562485fa6f
MD5 6276ac355dc7d1171d86558a0e2550da
BLAKE2b-256 fae0282001ebb0d48a6fe3b432bdb4dc820c20c415f73d83fd98df19cd3edb02

See more details on using hashes here.

File details

Details for the file sigmagen-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sigmagen-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sigmagen-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33e66c4789d6b99a61668a7a37a10d7f9359124d3a55afaa23f2f9a701945b55
MD5 290993206727a9a5fb1fd1fbc0b10122
BLAKE2b-256 f68de3269df41cbedcd37e0f11e412f6d801317ab28192ac7e3b9b6ce6e8202e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page