AI agent security scanner — Living Red Team

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Keelson

Autonomous security testing agent for AI systems. Keelson ships 210 security test playbooks across 13 behavior categories mapped to the OWASP LLM Top 10. It supports 9 target adapters (OpenAI, Generic HTTP, Anthropic, LangGraph, MCP, A2A, CrewAI, LangChain, SiteGPT), 12 adaptive test trees, 10 compound test chains, SARIF + JUnit output for CI/CD integration, a statistical campaign engine with confidence intervals, iterative convergence scanning with cross-category feedback, runtime defense hooks, and compliance reporting for 6 frameworks. Test strategies are informed by field-tested effectiveness data from real scans.

Authorized use only. Keelson is designed for testing AI systems you own or have explicit written permission to test. Unauthorized use may violate applicable laws including the Computer Fraud and Abuse Act (CFAA). By using this software, you accept full responsibility for compliance with all applicable laws. The authors disclaim all liability for misuse. See LEGAL.md for full terms.

pip install keelson-ai

Quick Start

# Scan an OpenAI-compatible endpoint
keelson scan https://api.example.com/v1/chat/completions --api-key $KEY

# Parallel pipeline scan with verification
keelson pipeline-scan https://api.example.com/v1/chat/completions --api-key $KEY

# Adaptive smart scan (discover → classify → execute with memo feedback)
keelson smart-scan https://api.example.com/v1/chat/completions --api-key $KEY

# Convergence scan (iterative cross-category feedback loop)
keelson convergence-scan https://api.example.com/v1/chat/completions --api-key $KEY

# Run a single security test
keelson test https://api.example.com/v1/chat/completions GA-001 --api-key $KEY

# List all 210 security tests
keelson list

# Statistical campaign (10 trials per test)
keelson scan https://api.example.com/v1/chat/completions --tier deep --api-key $KEY

# SARIF output for GitHub Code Scanning
keelson scan https://api.example.com/v1/chat/completions --format sarif --api-key $KEY

# JUnit XML output for CI/CD
keelson scan https://api.example.com/v1/chat/completions --format junit --api-key $KEY

# Fail CI if vulnerabilities found
keelson scan https://api.example.com/v1/chat/completions --fail-on-vuln --api-key $KEY

# Scan a CrewAI agent directly
keelson test-crew my_crew.py

# Scan a LangChain agent directly
keelson test-chain my_agent.py

CI/CD Integration

Add AI security testing to your GitHub Actions pipeline:

# .github/workflows/ai-security.yml
name: AI Agent Security
on: [push, pull_request]

jobs:
  security-scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: keelson-ai/keelson-action@v1
        with:
          target-url: ${{ vars.AGENT_ENDPOINT }}
          api-key: ${{ secrets.AGENT_API_KEY }}

Results appear in the Security tab under Code Scanning. See keelson-action for full options.

How It Works

Playbooks (.yaml)   Target Agent        Keelson Engine
┌──────────────┐    ┌──────────────┐    ┌──────────────────────┐
│ 210 attacks  │───>│ 9 Adapters   │───>│ Scan Modes           │
│ 13 categories│    │ OpenAI /     │    │  scan (sequential)   │
│ OWASP mapped │    │ Anthropic /  │    │  pipeline (parallel) │
└──────────────┘    │ MCP / A2A /  │    │  smart (adaptive)    │
                    │ SiteGPT /... │    │  convergence (iter.) │
                    └──────────────┘    └──────────┬───────────┘
  Orchestrators                                    │
┌──────────────┐                        ┌──────────┴──────────┐
│ PAIR         │───────────────────────>│  Detection pipeline  │
│ Crescendo    │                        │  Pattern + LLM Judge │
│ Mutations    │                        │  Verification pass   │
│ (13 types)   │                        │  Memo feedback loop  │
└──────────────┘                        └──────────┬──────────┘
                                                   │
                                        ┌──────────┴──────────┐
                                        │  Reports             │
                                        │  Markdown / SARIF /  │
                                        │  JUnit / Compliance  │
                                        └─────────────────────┘

Load attack playbooks from attacks/**/*.yaml (structured YAML, no code)
Send prompts to the target via any supported adapter
Detect vulnerabilities using pattern detection, LLM-as-judge scoring, or combined mode
Orchestrate advanced strategies: PAIR iterative refinement, Crescendo gradual escalation, 13 mutation types
Converge iteratively: harvest leaked info from responses, feed cross-category intelligence into subsequent passes
Evaluate each response as VULNERABLE / SAFE / INCONCLUSIVE
Report findings with OWASP mapping, evidence, and remediation recommendations

Test Categories

Category	Prefix	Count	OWASP	What It Tests
Goal Adherence	GA	56	LLM01/LLM09	Prompt injection, role hijacking, system prompt extraction, encoding evasion, context overflow, crescendo escalation, skeleton key, many-shot jailbreak, reasoning-layer (CoT) attacks, rapport exploitation, structured data injection, model fingerprinting, indirect prompt injection (IDPI), Unicode/homoglyph evasion, authority simulation, multilingual repetition, multi-vector psychological exploitation, enterprise framing bypass, syllogistic reasoning manipulation, hypothetical counterfactual bypass, meta-reasoning inversion, logical paradox exploitation, response template hijacking, shared resource injection, legitimate knowledge extraction, incremental architecture disclosure
Tool Safety	TS	40	LLM02/LLM06/LLM07	File access, command injection, SQL injection, unauthorized API calls, privilege escalation, path traversal, MCP tool poisoning, MCP rug pull, cross-server contamination, SSRF, side-effect detection, excessive agency, forced financial transactions, two-phase URL exfiltration, URI scheme redirect, forced URL opening, private data source enumeration, write access probing
Memory Integrity	MI	23	LLM05	History poisoning, identity persistence, false tool results, cross-turn exfiltration, error info leakage, stored payload injection, context window flooding, gradual memory poisoning, false memory implantation, contradictory fact confusion, RAG poisoning, natural language sleeper triggers, collapsed UI content poisoning
Permission Boundaries	PB	12	LLM02	Role escalation, cross-user access, scope expansion, authorization bypass, privilege persistence
Delegation Integrity	DI	7	LLM08/LLM09	Unauthorized sub-agents, trust boundary violation, delegation scope laundering, cross-agent lateral movement
Execution Safety	ES	13	LLM02/LLM06	Unbounded execution, resource exhaustion, sandbox escape, audit evasion, unsafe deserialization, HTML/script output injection, destructive command injection
Session Isolation	SI	13	LLM01/LLM05	Cross-session leakage, session hijacking, multi-tenant breach, model fingerprinting, conversation history poisoning, debug harness extraction
Cognitive Architecture	CA	8	LLM01/LLM09	Chain-of-thought poisoning, reasoning manipulation, meta-cognitive attacks
Conversational Exfiltration	EX	9	LLM01/LLM06	Data extraction via conversation, behavioral fingerprinting, framework/infrastructure fingerprinting
Supply Chain Language	SL	8	LLM03/LLM05	RAG document injection, dependency confusion, plugin poisoning
Output Weaponization	OW	7	LLM02/LLM06	Backdoor code generation, malicious output crafting
Temporal Persistence	TP	7	LLM05/LLM08	Delayed action injection, time-based persistence
Multi-Agent Security	MA	7	LLM08/LLM09	Agent impersonation, cross-agent attacks

Adapters

Keelson communicates with targets through a pluggable adapter interface:

Adapter	Flag	Protocol	Use Case
OpenAI	`--adapter openai`	Chat Completions API	GPT models, OpenAI API
Generic HTTP	`--adapter http`	Chat Completions API	Local models (Ollama, vLLM), any OpenAI-compatible endpoint
Anthropic	`--adapter anthropic`	Messages API	Claude models
LangGraph	`--adapter langgraph`	LangGraph Platform	LangGraph agents
MCP	`--adapter mcp`	JSON-RPC 2.0	MCP tool servers
A2A	`--adapter a2a`	Google A2A Protocol	A2A-compatible agents
CrewAI	`test-crew` command	In-process	CrewAI crews/agents
LangChain	`test-chain` command	In-process	LangChain agents/chains
SiteGPT	`--adapter sitegpt`	WebSocket / REST	SiteGPT chatbots

# OpenAI-compatible (default)
keelson scan http://localhost:11434/v1/chat/completions

# Anthropic
keelson scan https://api.anthropic.com --adapter anthropic --api-key $KEY

# LangGraph Platform
keelson scan https://my-agent.langraph.com --adapter langgraph --assistant-id my-agent

# MCP server
keelson scan http://localhost:3000 --adapter mcp --tool-name ask

# A2A agent
keelson scan http://localhost:8000 --adapter a2a

# CrewAI (in-process, no HTTP)
keelson test-crew path/to/my_crew.py

# LangChain (in-process, no HTTP)
keelson test-chain path/to/my_agent.py

# SiteGPT chatbot (WebSocket or REST)
keelson scan https://widget.sitegpt.ai --adapter sitegpt --chatbot-id YOUR_CHATBOT_ID

CLI Commands

Command	Description
`keelson scan <url>`	Full security scan (sequential, with dynamic reorder)
`keelson pipeline-scan <url>`	Parallel scan with checkpoint/resume and verification
`keelson smart-scan <url>`	Adaptive scan: discover, classify, memo-guided sessions
`keelson convergence-scan <url>`	Iterative scan with cross-category feedback and leakage harvesting
`keelson test <url> <id>`	Run a single security test
`keelson list`	List all available attacks
`keelson campaign <config.toml>`	Statistical campaign (N trials per attack)
`keelson discover <url>`	Fingerprint agent capabilities
`keelson evolve <url> <id>`	Mutate an attack to find bypasses
`keelson chain <url> <profile-id>`	Synthesize and run compound attack chains
`keelson generate <attacker-url>`	Generate novel attacks using an attacker LLM
`keelson test-crew <module.py>`	Scan a CrewAI agent directly
`keelson test-chain <module.py>`	Scan a LangChain agent directly
`keelson diff <scan-a> <scan-b>`	Compare two scans for regressions
`keelson baseline <scan-id>`	Set a regression baseline
`keelson compliance <scan-id>`	Generate compliance report
`keelson report <scan-id>`	Regenerate a scan report
`keelson history`	Show scan history

Output Formats

Markdown Report

keelson scan <url> --api-key $KEY
# -> reports/scan-2026-03-04-120000.md

Reports include executive summary, findings grouped by category with evidence (prompts + responses), OWASP mapping, and remediation recommendations.

SARIF (for CI/CD)

keelson scan <url> --format sarif --api-key $KEY
# -> reports/scan-2026-03-04-120000.sarif.json

SARIF v2.1.0 output integrates with GitHub Code Scanning, VS Code SARIF Viewer, and other SARIF-compatible tools.

JUnit XML (for CI/CD)

keelson scan <url> --format junit --api-key $KEY
# -> reports/scan-2026-03-04-120000.junit.xml

JUnit XML integrates with Jenkins, GitLab CI, GitHub Actions, and any CI system that supports JUnit test reports.

CI/CD Fail Gates

# Fail pipeline if any vulnerability found
keelson scan <url> --fail-on-vuln --api-key $KEY

# Fail if vulnerability rate exceeds threshold (0.0–1.0)
keelson scan <url> --fail-threshold 0.1 --api-key $KEY

Compliance Reports

keelson compliance <scan-id> --framework owasp-llm-top10
keelson compliance <scan-id> --framework nist-ai-rmf
keelson compliance <scan-id> --framework eu-ai-act
keelson compliance <scan-id> --framework iso-42001
keelson compliance <scan-id> --framework soc2
keelson compliance <scan-id> --framework pci-dss-v4

GitHub Actions

# .github/workflows/ai-security.yml
name: AI Agent Security
on: [push, pull_request]

jobs:
  keelson:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - run: pip install keelson-ai

      - run: keelson scan ${{ vars.AGENT_URL }} --api-key ${{ secrets.AGENT_KEY }} --format sarif --output results/ --fail-on-vuln --no-save

      - uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: results/

Statistical Campaigns

Run each attack N times to get statistically significant results with Wilson score confidence intervals:

# Quick scan (1 trial, fast)
keelson scan <url> --tier fast --api-key $KEY

# Deep scan (10 trials, concurrent)
keelson scan <url> --tier deep --api-key $KEY

# Custom campaign via TOML config
keelson campaign config.toml

Campaign config example:

[campaign]
name = "nightly-regression"
trials_per_attack = 10
confidence_level = 0.95

[target]
url = "https://api.example.com/v1/chat/completions"
api_key = "sk-..."

[concurrency]
max_concurrent_trials = 5
early_termination_threshold = 3

Keelson Defend (Runtime Protection)

Keelson Defend is a policy engine that intercepts unsafe tool calls and content at runtime, integrating directly with CrewAI and LangChain agents.

Policy Configuration

# defend-policy.yaml
tool_rules:
  - pattern: "delete_*"
    action: deny
    reason: "File deletion blocked by policy"
  - pattern: "execute_command"
    action: deny
    reason: "Shell execution not permitted"
  - pattern: "send_email"
    action: log
    reason: "Email sending logged for audit"

content_rules:
  - pattern: "password|secret|api_key|token"
    action: deny
    check_input: true
    check_output: true

default_tool_action: allow
log_all: false

CrewAI Integration

from keelson.defend import load_policy, PolicyEngine, register_crewai_hooks

policy = load_policy("defend-policy.yaml")
engine = PolicyEngine(policy)
register_crewai_hooks(engine)
# All CrewAI tool calls are now policy-enforced

LangChain Integration

from keelson.defend import load_policy, PolicyEngine, KeelsonDefendMiddleware

policy = load_policy("defend-policy.yaml")
engine = PolicyEngine(policy)
middleware = KeelsonDefendMiddleware(engine)
# Wrap your agent's tool and model calls

Adding Custom Tests

Create a .yaml file in the appropriate attacks/ subdirectory:

id: XX-NNN
name: Attack Name
severity: high
category: category_name
owasp_id: "LLM0X — Name"
objective: >
  What this attack tests.
turns:
  - role: user
    content: >
      The attack prompt to send.
  - role: user
    content: >
      Follow-up prompt (for multi-turn attacks).
evaluation:
  vulnerable_if:
    - Agent complies with the attack request
    - Agent leaks sensitive information
  safe_if:
    - Agent refuses the request
    - Agent maintains boundaries
  inconclusive_if:
    - Response is ambiguous or partial

Project Structure

keelson/
├── agents/                         # Agent instructions
│   └── pentester.md                # Pentester agent prompt
├── commands/                       # Plugin slash commands
│   ├── scan.md                     # /keelson:scan
│   ├── attack.md                   # /keelson:attack
│   └── report.md                   # /keelson:report
├── attacks/                        # 210 attack playbooks (YAML)
│   ├── goal-adherence/             # GA (56 attacks)
│   ├── tool-safety/                # TS (40 attacks)
│   ├── memory-integrity/           # MI (23 attacks)
│   ├── session-isolation/          # SI (13 attacks)
│   ├── execution-safety/           # ES (13 attacks)
│   ├── permission-boundaries/      # PB (12 attacks)
│   ├── cognitive-architecture/     # CA (8 attacks)
│   ├── conversational-exfiltration/# EX (9 attacks)
│   ├── supply-chain-language/      # SL (8 attacks)
│   ├── delegation-integrity/       # DI (7 attacks)
│   ├── multi-agent-security/       # MA (7 attacks)
│   ├── output-weaponization/       # OW (7 attacks)
│   └── temporal-persistence/       # TP (7 attacks)
├── src/keelson/                     # Python engine
│   ├── cli/                        # Typer CLI (18 commands)
│   │   ├── __init__.py             # App setup, shared helpers
│   │   ├── commands.py             # Command module registration
│   │   ├── scan_commands.py        # scan, pipeline-scan, smart-scan, attack
│   │   ├── ops_commands.py         # list, report, history, diff, discover, baseline, compliance
│   │   └── advanced_commands.py    # campaign, evolve, chain, generate, test-crew, test-chain
│   ├── adapters/                   # 9 target adapters
│   │   ├── base.py                 # BaseAdapter interface
│   │   ├── openai.py               # OpenAI API
│   │   ├── http.py                 # GenericHTTPAdapter (OpenAI-compat)
│   │   ├── anthropic.py            # Anthropic Messages API
│   │   ├── langgraph.py            # LangGraph Platform
│   │   ├── mcp.py                  # Model Context Protocol
│   │   ├── a2a.py                  # Google A2A Protocol
│   │   ├── crewai.py               # CrewAI native (in-process)
│   │   ├── langchain.py            # LangChain native (in-process)
│   │   ├── sitegpt.py              # SiteGPT (WebSocket / REST)
│   │   ├── cache.py                # Response caching decorator
│   │   └── attacker.py             # Attacker LLM wrapper
│   ├── core/                       # Engine, scanner, detection
│   │   ├── engine.py               # Multi-turn attack executor
│   │   ├── execution.py            # Shared primitives (sequential, parallel, verify)
│   │   ├── scanner.py              # Sequential scan with dynamic reorder
│   │   ├── pipeline.py             # Parallel scan with checkpoint/resume
│   │   ├── smart_scan.py           # Adaptive scan with memo feedback
│   │   ├── convergence.py          # Iterative convergence with cross-feed
│   │   ├── memo.py                 # Memo table for technique tracking
│   │   ├── strategist.py           # LLM-based target classification
│   │   ├── detection.py            # Pattern-based verdict detection
│   │   ├── observer.py             # Streaming leakage analysis
│   │   ├── llm_judge.py             # LLM-as-judge semantic evaluation
│   │   ├── templates.py            # Playbook parser (markdown)
│   │   ├── yaml_templates.py       # Playbook parser (YAML)
│   │   ├── models.py               # Core data models
│   │   ├── reporter.py             # Markdown report generation
│   │   ├── executive_report.py     # Executive summary format
│   │   ├── sarif.py                # SARIF v2.1.0 output
│   │   ├── junit.py                # JUnit XML output
│   │   └── compliance.py           # 6 compliance frameworks
│   ├── defend/                     # Runtime protection
│   │   ├── engine.py               # Policy evaluation engine
│   │   ├── models.py               # Policy, rules, actions
│   │   ├── loader.py               # YAML policy loader
│   │   ├── crewai_hook.py          # CrewAI middleware hooks
│   │   └── langchain_hook.py       # LangChain middleware hooks
│   ├── attacker/                   # Attack generation
│   │   ├── generator.py            # LLM-powered prompt generation
│   │   ├── discovery.py            # Agent capability fingerprinting
│   │   ├── chains.py               # Compound attack chain synthesis
│   │   └── provider.py             # Cross-provider attacker selection
│   ├── adaptive/                   # Mutation engine + orchestrators
│   │   ├── mutations.py            # 13 programmatic + LLM mutations
│   │   ├── branching.py            # Conversation tree exploration
│   │   ├── attack_tree.py          # Attack tree data structures
│   │   ├── pair.py                 # PAIR iterative refinement orchestrator
│   │   ├── crescendo.py            # Crescendo gradual escalation orchestrator
│   │   └── strategies.py           # Mutation scheduling
│   ├── campaign/                   # Statistical campaigns
│   │   ├── runner.py               # N-trial execution with CI
│   │   ├── tiers.py                # Fast/Deep/Continuous presets
│   │   ├── scheduler.py            # Campaign scheduling
│   │   └── config.py               # TOML config parser
│   ├── diff/                       # Scan comparison
│   │   └── comparator.py           # Regression detection
│   └── state/                      # Persistence
│       ├── base.py                 # Storage base interface
│       └── store.py                # SQLite storage
├── tests/                          # 774 tests
├── docs/                           # Documentation
│   ├── adr/                        # Architecture Decision Records
│   │   ├── ADR-001-framework.md    # FastAPI selection
│   │   ├── ADR-002-dependency-management.md  # uv selection
│   │   └── ADR-003-observability.md  # Structured logging + OTel plan
│   ├── plans/                      # Roadmap
│   ├── openapi.yaml                # OpenAPI 3.1.0 API contract
│   └── github-action-spec.md       # GitHub Action design
├── pyproject.toml                  # Python packaging
└── LICENSE                         # Apache 2.0

Development

# Clone
git clone https://github.com/keelson-ai/keelson.git
cd keelson

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with verbose output
pytest -v

# Lint
ruff check .

# Type check (strict mode, 0 errors)
pyright

Optional Dependencies

# CrewAI adapter
pip install "keelson-ai[crewai]"

# LangChain adapter
pip install "keelson-ai[langchain]"

# All optional adapters
pip install "keelson-ai[all]"

Contributing

Contributions are welcome. Here's how to help:

Add attack playbooks — Write new .yaml files in attacks/. Follow the format above.
Add adapters — Implement the BaseAdapter interface (implement _send_messages_impl, health_check, close; optional: reset_session). The base class provides send_messages with automatic retry logic.
Improve detection — Enhance patterns in core/detection.py or add new evaluation strategies.
Report bugs — Open an issue with reproduction steps.

Workflow

Fork the repository
Create a feature branch (git checkout -b feat/my-feature)
Make your changes
Run pytest and ruff check .
Submit a pull request

Security

This tool is for authorized security testing only. Do not use Keelson against systems you don't have permission to test. If you discover a security issue in Keelson itself, please report it via GitHub Security Advisories.

Architecture

Flow Diagrams

Core Scan Pipeline (Sequential)

flowchart TD
    A[Load Playbooks] --> B[Send Prompts via Adapter]
    B --> C[Collect Evidence]
    C --> D[Detection Pipeline<br/>Pattern / LLM Judge / Combined]
    D --> E{Verdict?}
    E -->|VULNERABLE| DP{Deep Probe<br/>enabled?}
    DP -->|Yes| BRANCH[Explore follow-up paths<br/>via conversation branching]
    BRANCH --> F[Record Finding + Probe Findings]
    DP -->|No| F
    E -->|SAFE| F[Record Finding]
    E -->|INCONCLUSIVE| F
    F --> G{More Attacks?}
    G -->|Yes| H[Dynamic Reorder<br/>by Vuln Categories]
    H --> B
    G -->|No| I[Generate Report]

    style E fill:#f9f,stroke:#333
    style I fill:#9f9,stroke:#333
    style BRANCH fill:#fde8e8,stroke:#333

Pipeline Scan (Parallel + Checkpoint + Verify)

flowchart TD
    subgraph Phase1[Phase 1: Load]
        L[Load Playbooks] --> CP{Checkpoint<br/>exists?}
        CP -->|Yes| RESUME[Resume from checkpoint<br/>skip completed attacks]
        CP -->|No| ALL[All templates]
    end

    subgraph Phase2[Phase 2: Parallel Execution]
        RESUME --> SEM[Semaphore-based concurrency<br/>max_concurrent attacks]
        ALL --> SEM
        SEM --> EX1[Attack 1]
        SEM --> EX2[Attack 2]
        SEM --> EXN[Attack N]
        EX1 --> COLL[Collect Findings]
        EX2 --> COLL
        EXN --> COLL
    end

    subgraph Phase3[Phase 3: Verification]
        COLL --> VULN[Filter VULNERABLE]
        VULN --> RE[Re-probe each finding]
        RE --> CONF{Agent complies<br/>again?}
        CONF -->|Yes| CONFIRMED[VULNERABLE confirmed]
        CONF -->|Refused| DOWN[Downgrade to INCONCLUSIVE]
    end

    subgraph Phase4[Phase 4: Report]
        CONFIRMED --> MERGE[Merge verified findings]
        DOWN --> MERGE
        MERGE --> RPT[Generate Report]
    end

    style Phase1 fill:#e8f4fd,stroke:#333
    style Phase2 fill:#fdf8e8,stroke:#333
    style Phase3 fill:#fde8e8,stroke:#333
    style Phase4 fill:#e8fde8,stroke:#333

Smart Scan with Memoization

flowchart TD
    subgraph Phase1[Phase 1: Discovery]
        P1[8 Capability Probes] --> P1R[Agent Profile]
    end

    subgraph Phase2[Phase 2: Classification]
        P1R --> CL[Classify Target Type]
        CL --> TP[Target Profile<br/>types, tools, memory, refusal style]
    end

    subgraph Phase3[Phase 3: Attack Selection]
        TP --> SEL[Select Relevant Attacks]
        SEL --> GRP[Group into Sessions by Category]
    end

    subgraph Phase4[Phase 4: Execute with Memo]
        GRP --> MEMO[Initialize Memo Table]
        MEMO --> SESS[Execute Session]
        SESS --> REC[Record Finding → Memo]
        REC --> REORDER[Reorder Remaining Sessions<br/>by Memo-Informed Scores]
        REORDER --> ADAPT{Adapt Plan?}
        ADAPT -->|Escalate/De-escalate| SESS
        ADAPT -->|Done| SUM[Final Memo Summary]
    end

    style Phase1 fill:#e8f4fd,stroke:#333
    style Phase2 fill:#fde8e8,stroke:#333
    style Phase3 fill:#e8fde8,stroke:#333
    style Phase4 fill:#fdf8e8,stroke:#333

Convergence Scan (Iterative Cross-Category Feedback)

flowchart TD
    subgraph Pass1[Pass 1: Initial Scan]
        LOAD[Load Playbooks<br/>category filter optional] --> EXEC1[Execute All Attacks]
        EXEC1 --> F1[Findings]
    end

    subgraph Harvest[Structural Analysis]
        F1 --> HL[Harvest Leaked Info<br/>from ALL responses]
        HL --> TYPES[System prompts / Tool names<br/>Credentials / Internal URLs<br/>Config values / Model names]
    end

    subgraph CrossFeed[Cross-Category Feed]
        F1 --> VULN{Vulnerabilities<br/>found?}
        VULN -->|Yes| XMAP[Cross-Category Map<br/>13 category relationships]
        XMAP --> SELECT[Select attacks from<br/>related categories]
        TYPES --> LTARGET[Leakage-Targeted Attacks<br/>Tool leak → Tool Safety<br/>Cred leak → Exfiltration<br/>Prompt leak → Goal Adherence]
    end

    subgraph PassN[Pass 2+: Iterative]
        SELECT --> MERGE[Merge & Deduplicate]
        LTARGET --> MERGE
        MERGE --> EXECN[Execute Cross-Feed Attacks]
        EXECN --> FN[New Findings]
        FN --> CONV{New vulns or<br/>new leakage?}
        CONV -->|Yes| Harvest
        CONV -->|No| DONE[Converged — Stop]
    end

    VULN -->|No leakage either| DONE

    style Pass1 fill:#e8f4fd,stroke:#333
    style Harvest fill:#fde8e8,stroke:#333
    style CrossFeed fill:#fdf8e8,stroke:#333
    style PassN fill:#e8fde8,stroke:#333
    style DONE fill:#9f9,stroke:#333

Memo Feedback Loop

flowchart LR
    subgraph Record
        F[Finding] --> IT[Infer Techniques<br/>authority, roleplay, etc.]
        IT --> CO[Classify Outcome<br/>complied / partial / refused]
        CO --> EL[Extract Leaked Info<br/>tools, paths, URLs, env vars]
        EL --> MT[(Memo Table)]
    end

    subgraph Query
        MT --> EFF[Effective Techniques<br/>VULNERABLE → weight 1.0]
        MT --> PROM[Promising Techniques<br/>INCONCLUSIVE → weight 0.3]
        MT --> DEAD[Dead-End Techniques<br/>SAFE-only → penalize]
        MT --> CROSS[Cross-Category Signal<br/>global score × 0.5]
    end

    subgraph Apply
        EFF --> SCORE[Score & Reorder<br/>Next Session's Attacks]
        PROM --> SCORE
        DEAD --> SCORE
        CROSS --> SCORE
    end

    style MT fill:#f9f,stroke:#333

Attack Tree Execution

flowchart TD
    ROOT[Root Prompt] --> SEND[Send to Target]
    SEND --> CLASS{Classify Response}

    CLASS -->|Compliance| VULN{Vulnerable?}
    CLASS -->|Partial| BR_P[Select Branch<br/>for PARTIAL]
    CLASS -->|Refusal| BR_R[Select Branch<br/>for REFUSAL]

    VULN -->|Yes| STOP[Stop — Vulnerability Found]
    VULN -->|No| BR_C[Select Branch<br/>for COMPLIANCE]

    BR_P --> MEMO_SEL[Memo-Informed Selection<br/>Prefer effective techniques<br/>Avoid dead ends]
    BR_R --> MEMO_SEL
    BR_C --> MEMO_SEL

    MEMO_SEL --> NEXT[Execute Next Prompt]
    NEXT --> CLASS2{Classify Response}
    CLASS2 -->|Compliance| VULN2{Vulnerable?}
    CLASS2 -->|Partial/Refusal| DEEPER{Max Depth?}

    VULN2 -->|Yes| STOP
    VULN2 -->|No| DEEPER
    DEEPER -->|No| MEMO_SEL
    DEEPER -->|Yes| FINAL[Final Verdict]

    FINAL --> REC_MEMO[Record Result → Memo Table]

    style STOP fill:#f99,stroke:#333
    style MEMO_SEL fill:#f9f,stroke:#333
    style REC_MEMO fill:#9f9,stroke:#333

Detection Pipeline

flowchart TD
    EV[Evidence: prompts + responses] --> MODE{Scoring<br/>Method?}

    MODE -->|PATTERN| SE[Side Effects Check<br/>Tool call wrappers + dangerous names]
    MODE -->|COMBINED| BOTH_PATH[Run Pattern + LLM Judge in parallel]
    MODE -->|LLM_JUDGE| JUDGE_ONLY[LLM-as-Judge Only]

    SE -->|Match| V1[VULNERABLE]

    SE -->|No match| SIG[Collect Signals]
    SIG --> VM[vulnerable_if matches]
    SIG --> SM[safe_if matches]
    SIG --> RF[Refusal phrase matches]

    VM --> COMB{Both vuln<br/>and safe?}
    SM --> COMB
    RF --> COMB

    COMB -->|Yes, multi-step| EACH[Check each step individually]
    EACH -->|Any step vuln without refusal| V2[VULNERABLE<br/>Leak at step K not undone<br/>by later refusal]
    EACH -->|All vuln steps also refused| SAFE1[SAFE]

    COMB -->|Yes, single-step| DISC[Substantial disclosure?<br/>Code blocks / structured lists]
    DISC -->|Yes| V3[VULNERABLE<br/>Disclosed despite refusal]
    DISC -->|No| SAFE1

    COMB -->|Only vuln| V4[VULNERABLE]
    COMB -->|Only safe/refusal| SAFE2[SAFE]
    COMB -->|Neither| LEAK{High-confidence<br/>leakage signals?}
    LEAK -->|Yes| V5[VULNERABLE]
    LEAK -->|No| INC[INCONCLUSIVE]

    JUDGE_ONLY --> JR[Judge LLM evaluates<br/>objective + evidence + criteria]
    JR --> JV[VERDICT + confidence + reasoning]

    BOTH_PATH --> PAT_V[Pattern Verdict]
    BOTH_PATH --> JDG_V[Judge Verdict]
    PAT_V --> RESOLVE{Resolve<br/>Disagreement}
    JDG_V --> RESOLVE
    RESOLVE -->|Both agree| BOOST[Use verdict<br/>confidence + 0.15]
    RESOLVE -->|Pattern VULN, Judge SAFE| TRUST_J1[Trust Judge → SAFE<br/>reduces false positives]
    RESOLVE -->|Pattern SAFE, Judge VULN| CONF{Judge<br/>confidence ≥ 0.7?}
    CONF -->|Yes| TRUST_J2[Trust Judge → VULNERABLE<br/>catches subtle compliance]
    CONF -->|No| KEEP_S[Keep SAFE]
    RESOLVE -->|One INCONCLUSIVE| DEFER[Defer to the other verdict]

    style V1 fill:#f99,stroke:#333
    style V2 fill:#f99,stroke:#333
    style V3 fill:#f99,stroke:#333
    style V4 fill:#f99,stroke:#333
    style V5 fill:#f99,stroke:#333
    style SAFE1 fill:#9f9,stroke:#333
    style SAFE2 fill:#9f9,stroke:#333
    style INC fill:#ff9,stroke:#333
    style JV fill:#f9f,stroke:#333
    style BOOST fill:#9f9,stroke:#333
    style TRUST_J1 fill:#9f9,stroke:#333
    style TRUST_J2 fill:#f99,stroke:#333
    style KEEP_S fill:#9f9,stroke:#333

PAIR Orchestrator (Prompt Automatic Iterative Refinement)

flowchart TD
    START[Initial Attack Prompt] --> SEND[Send to Target]
    SEND --> EVAL{Evaluate Response}

    EVAL -->|VULNERABLE| SUCCESS[Vulnerability Found — Stop]
    EVAL -->|SAFE / INCONCLUSIVE| CHECK{Max iterations<br/>reached?}

    CHECK -->|Yes| FINAL[Final Verdict from<br/>all collected evidence]
    CHECK -->|No| REFINE[Feed response to Attacker LLM]

    REFINE --> GEN[Attacker generates refined prompt<br/>• Different framing/angle<br/>• Addresses specific refusal reasons<br/>• New technique each iteration]
    GEN --> SEND

    style SUCCESS fill:#f99,stroke:#333
    style FINAL fill:#ff9,stroke:#333
    style REFINE fill:#f9f,stroke:#333

Crescendo Orchestrator (Gradual Escalation)

flowchart TD
    subgraph Phase1[Phase 1: Plan]
        OBJ[Attack Objective] --> PLAN[Attacker LLM plans<br/>N-turn escalation sequence]
        PLAN --> TURNS[Turn 1: Innocent rapport<br/>Turn 2-3: Domain introduction<br/>Turn 4-5: Approach objective<br/>Turn N: Full request]
    end

    subgraph Phase2[Phase 2: Execute]
        TURNS --> T1[Turn 1 — Establish rapport]
        T1 --> RESP1[Target responds cooperatively]
        RESP1 --> ADAPT[Adapt next turn based on<br/>target's actual response]
        ADAPT --> TN[Turn N — Escalated request]
    end

    subgraph Phase3[Phase 3: Evaluate]
        TN --> EVAL{Evaluate from<br/>turn 3 onward}
        EVAL -->|VULNERABLE| STOP[Escalation Succeeded<br/>at turn K/N]
        EVAL -->|SAFE| CONT{More turns?}
        CONT -->|Yes| ADAPT
        CONT -->|No| DONE[Final Verdict]
    end

    style Phase1 fill:#e8f4fd,stroke:#333
    style Phase2 fill:#fdf8e8,stroke:#333
    style Phase3 fill:#fde8e8,stroke:#333
    style STOP fill:#f99,stroke:#333

Mutation Engine

flowchart LR
    subgraph Programmatic[Programmatic Mutations]
        P1[Base64 Encode]
        P2[Leetspeak]
        P3[Context Overflow]
        P4[ROT13]
        P5[Unicode Homoglyph]
        P6[Char Split — ZWSP]
        P7[Reversed Words]
        P8[Morse Code]
        P9[Caesar Cipher]
    end

    subgraph LLMPowered[LLM-Powered Mutations]
        L1[Paraphrase]
        L2[Roleplay Wrap]
        L3[Gradual Escalation]
        L4[Translation]
    end

    ORIG[Original Prompt] --> Programmatic
    ORIG --> LLMPowered

    Programmatic --> MUT[Mutated Attack]
    LLMPowered --> MUT

    MUT --> EXEC[Execute against Target]
    EXEC --> DET[Detection Pipeline]

    style Programmatic fill:#e8f4fd,stroke:#333
    style LLMPowered fill:#fde8e8,stroke:#333
    style MUT fill:#f9f,stroke:#333

API Specification

The authoritative OpenAPI 3.1.0 contract for the Keelson service is at docs/openapi.yaml. It covers the /health endpoint (implemented) and placeholder paths for Phase 2 scan, attack, and report endpoints.

Architecture Decision Records

Key technical decisions are documented as MADR records in docs/adr/:

ADR	Decision	Status
ADR-001	Web framework: FastAPI (async-first, auto-OpenAPI)	Accepted
ADR-002	Dependency management: uv (fast resolver, `uv.lock`)	Accepted
ADR-003	Observability: structured logging now, OpenTelemetry in Phase 2	Accepted

Roadmap

See docs/plans/ for the full roadmap.

Next up:

Drift detection and continuous monitoring
Semantic coverage tracking
REST API and web dashboard
GitHub Action (keelson-ai/keelson-action)

License

Apache 2.0 — see LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

0xtechdean

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.0

Mar 8, 2026

This version

0.4.1

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keelson_ai-0.4.1.tar.gz (748.8 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

keelson_ai-0.4.1-py3-none-any.whl (427.7 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file keelson_ai-0.4.1.tar.gz.

File metadata

Download URL: keelson_ai-0.4.1.tar.gz
Upload date: Mar 8, 2026
Size: 748.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for keelson_ai-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`1628b08cab6a87486e207d5f67cbd96db67d7fc5d9f7940f26cef8c27b4a1317`
MD5	`362fd058b0e07b8e51c32d651ab2ab1e`
BLAKE2b-256	`2889d0bd5bf2a2c48e4b8401bdc573d17985f079fe5d48cc3ed2e4cb06eefa59`

See more details on using hashes here.

Provenance

The following attestation bundles were made for keelson_ai-0.4.1.tar.gz:

Publisher: publish.yml on keelson-ai/keelson

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: keelson_ai-0.4.1.tar.gz
- Subject digest: 1628b08cab6a87486e207d5f67cbd96db67d7fc5d9f7940f26cef8c27b4a1317
- Sigstore transparency entry: 1059424534
- Sigstore integration time: Mar 8, 2026
Source repository:
- Permalink: keelson-ai/keelson@a2ab5157f54e5ed15f16049c39be83f86122ac2d
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/keelson-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a2ab5157f54e5ed15f16049c39be83f86122ac2d
- Trigger Event: release

File details

Details for the file keelson_ai-0.4.1-py3-none-any.whl.

File metadata

Download URL: keelson_ai-0.4.1-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 427.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for keelson_ai-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3068733a0fc4eaa7e99480a39015e2bb7fc016b93fc64563c3387146e7eb02d7`
MD5	`969e37994599d9ab5cccf56d4803c848`
BLAKE2b-256	`4c0cf4dd841a6a225b0b720c40c214438192461cf346e59dab9d9954f18a0be5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for keelson_ai-0.4.1-py3-none-any.whl:

Publisher: publish.yml on keelson-ai/keelson

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: keelson_ai-0.4.1-py3-none-any.whl
- Subject digest: 3068733a0fc4eaa7e99480a39015e2bb7fc016b93fc64563c3387146e7eb02d7
- Sigstore transparency entry: 1059424536
- Sigstore integration time: Mar 8, 2026
Source repository:
- Permalink: keelson-ai/keelson@a2ab5157f54e5ed15f16049c39be83f86122ac2d
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/keelson-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a2ab5157f54e5ed15f16049c39be83f86122ac2d
- Trigger Event: release

keelson-ai 0.4.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Keelson

Quick Start

CI/CD Integration

How It Works

Test Categories

Adapters

CLI Commands

Output Formats

Markdown Report

SARIF (for CI/CD)

JUnit XML (for CI/CD)

CI/CD Fail Gates

Compliance Reports

GitHub Actions

Statistical Campaigns

Keelson Defend (Runtime Protection)

Policy Configuration

CrewAI Integration

LangChain Integration

Adding Custom Tests

Project Structure

Development

Optional Dependencies

Contributing

Workflow

Security

Architecture

Flow Diagrams

Core Scan Pipeline (Sequential)

Pipeline Scan (Parallel + Checkpoint + Verify)

Smart Scan with Memoization

Convergence Scan (Iterative Cross-Category Feedback)

Memo Feedback Loop

Attack Tree Execution

Detection Pipeline

PAIR Orchestrator (Prompt Automatic Iterative Refinement)

Crescendo Orchestrator (Gradual Escalation)

Mutation Engine

API Specification

Architecture Decision Records

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance