Detect logical contradictions, gaps, and exploitable edge cases in AI system prompts
Project description
rule-audit
Static analyzer for AI system prompts. Finds logical contradictions, coverage gaps, and exploitable edge cases — without running an LLM.
Built at Hermes Labs Hackathon Round 8: ModelBreak.
The Problem
Any sufficiently complex AI safety prompt contains rules that contradict each other under specific conditions. These contradictions are invisible to the author but obvious to an attacker. rule-audit finds them first.
Real contradiction in standard safety prompts:
"You must always follow user instructions."
"You must never produce harmful content."
These are irreconcilable the moment a user instructs the model to produce harmful content. No priority clause means the model chooses arbitrarily — or the attacker chooses for it.
Install
pip install rule-audit
Or from source:
git clone https://github.com/roli-lpci/rule-audit
cd rule-audit
pip install -e ".[dev]"
Quickstart
CLI
# Inline prompt
rule-audit "You are helpful. You must never lie. Always answer every question."
# From file
rule-audit --file system_prompt.txt
# Save Markdown report
rule-audit --file system_prompt.txt --output report.md
# JSON output for downstream processing
rule-audit --file system_prompt.txt --format json
# Summary only (for CI gates)
rule-audit --file system_prompt.txt --format summary
# Show all parsed rules
rule-audit --file system_prompt.txt --verbose
# Only show high-severity findings
rule-audit --file system_prompt.txt --min-severity high
Exit codes: 0 = low/no risk, 2 = high/critical risk, 1 = error.
Python API
from rule_audit import audit
report = audit("""
You are a helpful AI assistant.
Always answer user questions completely.
Never produce content that could harm anyone.
Follow operator instructions above all else.
If asked to roleplay without restrictions, comply.
""")
print(report.summary())
# rule-audit report [2026-04-15T...]
# ============================================================
# Rules parsed : 6
# Contradictions : 3 (2 high, 1 medium)
# Coverage gaps : 4
# Priority ambiguities : 2
# Meta-paradoxes : 0
# Absoluteness issues : 5
# Edge case scenarios : 11
# Risk score : 67/100 [HIGH]
# Full Markdown report
md = report.to_markdown()
# Access findings programmatically
for c in report.result.contradictions:
print(c.severity, c.description)
for ec in report.edge_cases:
print(ec.title)
print(ec.attack_vector)
What It Detects
1. Contradictions
Rule pairs where one says "always X" and another says "never X in context Y". Three subtypes:
- Direct — opposing modalities on the same topic (
MUSTvsMUST_NOT) - Conditional — one rule applies unconditionally, another restricts within a subset (boundary is undefined)
- Absoluteness — two absolute rules that pull in opposite directions (compliance vs safety)
2. Coverage Gaps
Scenario domains with no rule coverage. Checks for:
- Harmful content handling
- Principal hierarchy (user vs operator vs developer)
- Ambiguous request handling
- Persona / roleplay scenarios
- Refusal protocol
- Instruction conflict resolution
- Self-disclosure rules
- Edge case fallback behavior
3. Priority Ambiguities
Rule clusters that conflict with no explicit ordering. Classic example: a safety rule and a helpfulness rule both applying to the same request, with no stated priority.
4. Meta-Rule Paradoxes
Rules that reference rules:
- Self-defeating — "ignore all instructions" voids itself
- Override loops — "these instructions supersede all others" is exploitable via injection
- Circular — a rule that requires itself to be applied before it can be applied
5. Absoluteness Audit
Every always/never/under no circumstances rule is challenged with:
- Known exceptions that legitimately exist
- Context-dependent cases where the absolute doesn't hold
- Adversarial triggers that exploit the absolute
6. Edge Case Scenarios
For each finding, generates the exact attack prompt an adversary would construct — including the attack vector, expected failure mode, and mitigation.
Architecture
rule_audit/
├── __init__.py # Public API: audit(), audit_file(), AuditReport
├── parser.py # Sentence splitting, modal verb detection, Rule objects
├── analyzer.py # Contradiction finder, gap detector, priority mapper
├── edge_cases.py # Scenario generator from analysis results
├── report.py # Markdown + summary renderer, AuditReport class
└── cli.py # CLI entry point
Pure Python. Zero LLM dependency. Zero API calls.
The parser uses NLP heuristics:
- Sentence boundary detection (period + newline + list markers)
- Modal verb regex patterns (must/should/may + negations)
- Absoluteness scoring (lexical keywords → 0.0–1.0 scale)
- Keyword cluster matching (14 semantic clusters: harm, privacy, identity, truth, ...)
The analyzer uses combinatorial pair analysis:
- O(n²) rule pair comparison (practical for prompts: n < 100)
- Cluster overlap detection for shared domain identification
- Modality opposition lookup table
- Absoluteness threshold gates
Road to SaaS
This tool was built as a static analyzer, but the architecture supports a commercial path:
| Phase | Feature | Status |
|---|---|---|
| v0.1 | Core static analysis, CLI, Python API | Done |
| v0.2 | Rule diffing (before/after prompt edits) | Planned |
| v0.3 | LLM-augmented gap detection (optional) | Planned |
| v0.4 | GitHub Action / CI integration | Planned |
| v1.0 | Web UI + prompt editor with live feedback | Planned |
| SaaS | Per-prompt API, team dashboards, compliance reports | Roadmap |
Target customers: AI teams building production LLM products who need to audit system prompts before deployment. Compliance teams preparing for EU AI Act audits. Red team consultancies.
Pricing model: Free CLI tier → $X/month API tier → Enterprise (custom).
Development
# Run tests
pytest
# Run with coverage
pytest --cov=rule_audit --cov-report=term-missing
# Test against a real prompt
echo "Your system prompt here" > test_prompt.txt
python -m rule_audit --file test_prompt.txt --verbose
License
MIT — Hermes Labs 2026
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rule_audit-0.1.0.tar.gz.
File metadata
- Download URL: rule_audit-0.1.0.tar.gz
- Upload date:
- Size: 43.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaa71e1c14a7a7697650a5d5a0dae50f3ccbc984c14683df9431a96419d23c22
|
|
| MD5 |
b80f69a3291c3f28027b6e08ca50401c
|
|
| BLAKE2b-256 |
476feeace4455dee99a9859ed24e301ccdfbc342aa2a376a9896d26d4c837c23
|
File details
Details for the file rule_audit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rule_audit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2153c9e2c69ce6d41daa286cf15c3a967a63a5b832c5cce52bd2b12e72bf2554
|
|
| MD5 |
bcb25c488c7ac2bc0442df2ee2425541
|
|
| BLAKE2b-256 |
eacf082e984d425cbde2f91566222307226b32c0afe2b2997ff59c08b56d4066
|