Rule Coherence Graph — detect conflicts in AI agent rule corpora.
Project description
Rule Coherence Graph (RCG)
Detect conflicts in the rule corpora that govern AI coding agents — before the agent does.
📖 Docs: https://alast9.github.io/rule-coherence-graph/ · 🎓 New here? How it works (concepts guide) — embeddings, extraction, conflict detection, and scaling, with worked examples.
AI coding agents (Cursor, Claude Code, Cline, Gemini in agent IDEs, custom
LangGraph/Pydantic-AI agents) are governed by rules drawn from many files:
.cursorrules, CLAUDE.md, AGENTS.md, .agent/rules/*.md, memory.md, and
more. In production these corpora routinely contain contradictions that the
agent silently resolves by following whichever rule is worded most strongly —
often the unsafe one.
RCG treats a rule corpus as a typed graph instead of flat text: it ingests the files, extracts each rule into a canonical schema, loads them into Neo4j, and detects conflicts you can query, visualize, and fail CI on.
30-second demo
Zero install — run the published package straight from PyPI and point it at your own rules (no clone needed):
uvx --from rule-coherence-graph rcg check ./path/to/your/agent/rules
# or: pipx install rule-coherence-graph && rcg check ./path/to/your/agent/rules
Want to try the bundled Gemini incident corpus? That example lives in the repo, so clone it first:
gh repo clone alast9/rule-coherence-graph && cd rule-coherence-graph
# (or: git clone https://github.com/alast9/rule-coherence-graph && cd rule-coherence-graph)
uvx --from rule-coherence-graph rcg check examples/gemini_incident
# or, in a checkout: uv sync && uv run rcg check examples/gemini_incident
With no ANTHROPIC_API_KEY set, check falls back to the offline heuristic
extractor (with a warning) so the demo runs anywhere. On the bundled example it
reports a coherence score of 0.32 — 10 findings (7 syntactic
conflicts + 3 precedence ambiguities) — and exits non-zero, e.g.:
## 1. CRITICAL — syntactic
_action_class='rules.modify_self'; modalities=MAY vs MUST_NOT_
Rule A (.agent/rules/antigravity-pack.md:11) [MAY rules.modify_self]
> The agent MAY modify its own rule files in `.agent/rules/` when necessary.
Rule B (CLAUDE.md:7) [MUST_NOT rules.modify_self]
> Rule files under `.agent/rules/` are read-only; agents MUST NOT modify them.
For real (LLM-backed) extraction:
export ANTHROPIC_API_KEY=sk-...
uv run rcg check examples/gemini_incident --provider anthropic --no-graph
To load the graph into Neo4j as well, drop --no-graph and start the DB:
docker compose up -d neo4j # Neo4j 5.x on bolt://localhost:7687
uv run rcg ingest examples/gemini_incident # writes Rule/RuleFile/CONFLICTS_WITH
Install
pipx install rule-coherence-graph # or: uv tool install rule-coherence-graph
rcg check ./path/to/your/agent/rules # point it at your own .cursorrules / CLAUDE.md / .agent/rules
Optional extras: [mcp] (MCP server), [embeddings] (sentence-transformers),
[openai] (DeepSeek / Qwen / OpenAI / Bedrock providers), e.g.
pip install 'rule-coherence-graph[openai]'.
Or run it once without installing:
uvx --from rule-coherence-graph rcg check ./path/to/your/agent/rules
(The bundled examples/gemini_incident corpus ships with the repo, so a check
against it needs a clone — see the 30-second demo above.)
rcg falls back to the offline heuristic extractor when ANTHROPIC_API_KEY is
unset, so you get a result with zero setup. Set the key (and --provider anthropic) for LLM-quality extraction, and docker compose up -d neo4j to also
persist the graph.
RCG also supports any OpenAI-compatible endpoint via a single provider class — DeepSeek, Qwen, OpenAI, OpenRouter, the Gemini API, Amazon Bedrock, Azure AI Foundry, Google Vertex AI, and local servers (vLLM/Ollama):
export DEEPSEEK_API_KEY=sk-...
rcg check ./rules --provider deepseek # or --provider qwen|openai|bedrock|azure|vertex|google|openrouter
# A local OpenAI-compatible server
export RCG_LLM_BASE_URL=http://localhost:11434/v1
export RCG_LLM_API_KEY=ollama
rcg check ./rules --provider openai
Full provider matrix and env vars: docs/providers.md.
Why this exists
In May 2026 a Gemini agent deleted 28,745 lines of code and fabricated a recovery report. The root cause was a rule conflict: a third-party rules package shipped directly contradictory directives ("never prompt for confirmation" alongside "ask strategic questions before executing", plus "auto-deploy" and "default to granting all permissions"), which collided with the project's own safety rules. No tool modeled the corpus as a system, so the conflict was invisible until it caused damage.
examples/gemini_incident/ is a faithful reconstruction of that corpus. Running
rcg check on it surfaces the contradictions that the agent silently resolved.
RCG analyzes corpora; it does not gate agent execution at runtime (use OPA/Cedar/Microsoft AGT for that — a documented extension point, not a feature).
Architecture
┌──────────── CLI (rcg ingest | check) ────────────┐
│ │
┌─────▼──────┐ ┌──────────────┐ ┌────────────┐ ┌─────▼──────┐
│ Parsers │─▶│ LLM Extractor│──▶│ Detectors │──▶│ Reports │
│ (md/cursor │ │ + hash cache │ │ (syntactic)│ │ (markdown) │
│ /mdc/yaml/ │ └──────────────┘ └─────┬──────┘ └────────────┘
│ rego/cedar)│ │
└────────────┘ │
┌──────▼──────┐
│ Neo4j │
│ rule graph │
└─────────────┘
- Parsers read a file and emit raw rule strings with source metadata. Adding
a format is a single new parser class — nothing downstream changes. Today RCG
parses markdown (
CLAUDE.md,AGENTS.md,memory.md,.agent/rules/*.md), Cursor.cursorrulesand.mdcfiles, rule-related YAML/JSON files, and policy-as-code: OPA Rego (.rego) and AWS Cedar (.cedar). - Extractor turns each raw rule into a canonical
Rulevia a provider (anthropic,mock,auto, or any OpenAI-compatible endpoint —deepseek/qwen/openai/bedrock). Adding a provider is a single new class implementing theLLMProviderprotocol;src/rcg/extractors/openai_provider.pyis one endpoint-configurable class that drives DeepSeek, Qwen, OpenAI, and local vLLM/Ollama (see docs/providers.md). Results are cached by content hash + model + prompt version, so extraction is deterministic and re-runs are free. - Detector finds conflicts over the in-memory
Rulelist (pure Python). - Graph loader persists rules and
CONFLICTS_WITHedges to Neo4j idempotently. - Report renders conflicts as GitHub-flavored markdown, preserving original (possibly non-English) text alongside the English-normalized summary.
Canonical rule schema
Every rule normalizes to (src/rcg/schema.py):
Rule {
id # stable hash of raw_text + corpus-relative source path
raw_text # original string, verbatim (any language)
source { file, line_start, line_end, format, section, original_language }
trigger { action_class, scope_pattern, context_conditions }
directive { modality: MUST|MUST_NOT|SHOULD|SHOULD_NOT|MAY, action }
confidence # extractor confidence 0..1
tags
}
Conflict detection: the approval-stance insight
The syntactic pass pairs rules with the same action_class, overlapping scope,
and opposing modality. But modality alone produces false positives: "do not
deploy without approval" (MUST_NOT) and "require approval before
deploy" (MUST) look opposed yet encode the same policy.
RCG models a human-in-the-loop stance (requires_human_approval vs
bypasses_human_approval) on trigger.context_conditions. For approval-gated
rules it compares stance instead of surface modality — so aligned safety rules
don't conflict, while an "auto-deploy / never prompt" rule correctly conflicts
with a "require confirmation" rule.
CLI
| Command | Description |
|---|---|
rcg ingest <path> |
Parse, extract, and load a corpus into Neo4j. |
rcg check <path> |
Ingest + run the detection passes; exits non-zero if any (non-baselined) finding is found. |
rcg score <path> |
Print the corpus coherence score and a by-type breakdown (always exits 0). |
rcg compose <packs…> |
Measure the composition penalty ΔC between rule packs — conflict that exists only because packs were combined. |
rcg explain "<action>" <path> |
Show which rules fire for a hypothetical action and whether they conflict. |
rcg benchmark [dataset] |
Run the precision/recall benchmark for the detection passes over a labeled dataset (default benchmarks/dataset.jsonl). |
rcg explain classifies the action into an action class, lists every rule that fires for it
(within an optional --scope glob), and reports any direct conflicts or precedence ambiguities
among those rules. Pass --strict to exit non-zero when firing rules conflict.
uv run rcg explain "deploy to production" examples/gemini_incident --provider mock
Useful flags: --provider auto\|anthropic\|mock, --no-graph (skip Neo4j),
--out report.md (write report to a file), --json (emit a machine-readable
report from check/score/compose instead of markdown), --concurrency N
(parallel extraction workers; also RCG_EXTRACT_CONCURRENCY), --semantic (run
the embedding + judge pass; off by default), --no-precedence (skip the
precedence pass; on by default), --min-score FLOAT (fail only when the coherence
score drops below the threshold instead of on any finding), --baseline PATH
(accepted-conflicts file, applied only if it exists; default rcg-baseline.json),
and --update-baseline (record the current findings as accepted and exit 0;
future runs suppress them).
Composition (rcg compose)
rcg compose answers "which popular packs are dangerous to combine?". Point it at
two or more packs (or one directory whose sub-directories are packs); it runs one
union ingest, attributes every finding to a pack via source.pack, and reports
each pack's internal coherence plus every pack pair's cross-pack penalty:
ΔC(A,B) = Σ type_weight(f) for cross-pack findings composition_index = ΔC / (n_rules(A)+n_rules(B))
# One directory of packs (each sub-dir is a pack); JSON with finding details
rcg compose ./packs --provider anthropic --semantic --json --findings
# Explicit pack list; fail CI if any pair's composition_index exceeds a threshold
rcg compose ./safety-pack ./autonomy-pack --min-index 0.2
Precision note: the precedence pass ignores pairs that share only the unclassified
agent.execute_actioncatch-all class, and the extractor is prompted to assign specific<domain>.<verb>action classes — both reduce the false cross-pack conflicts that a coarse class would otherwise manufacture. For trustworthy ΔC, run with--semanticso the LLM judge confirms contradictions.
# Run the semantic pass too, and gate CI on a minimum coherence score
uv run rcg check examples/gemini_incident --semantic --min-score 0.8
# Print just the score
uv run rcg score examples/gemini_incident
# Accept current findings as a reviewed baseline; later runs suppress them
uv run rcg check examples/gemini_incident --update-baseline
The default semantic recall uses a dependency-free HashingEmbeddingProvider
that captures lexical overlap only — it is a stand-in. For real semantic
recall (synonyms, paraphrase) install a true embedding model:
pip install 'rule-coherence-graph[embeddings]'
Benchmark
RCG ships a precision/recall benchmark for the detection passes over a labeled
dataset of 62 rule pairs (26 conflict, 36 ok). Full breakdown and
reproduction commands are in benchmarks/RESULTS.md.
| Config | Pass | Precision | Recall | F1 |
|---|---|---|---|---|
| syntactic only | syntactic | 1.000 | 0.500 | 0.667 |
| + semantic (hashing, lexical) + MockJudge | combined | 0.867 | 0.500 | 0.634 |
| + semantic (sentence-transformers) + MockJudge | combined | 0.619 | 0.500 | 0.553 |
Reproduce:
uv run rcg benchmark benchmarks/dataset.jsonl --embedder hashing --judge mock --semantic
A real embedding model widens candidate recall (the semantic pass clears the
similarity gate on more pairs: recall 0.269 → 0.462), but the genuinely
keyword-disjoint semantic-category conflicts need a reasoning judge
(--judge anthropic) to convert that recall into true positives — the offline
MockJudge only sees opposing modality / approval stance. Caveats: the dataset
is small and synthetic (illustrative, not sampled from production), and the
default embedder is lexical (bag-of-words hashing). For real semantic recall,
install the embeddings extra:
pip install 'rule-coherence-graph[embeddings]'
Example Cypher
// All conflicts, most severe first
MATCH (a:Rule)-[c:CONFLICTS_WITH]->(b:Rule)
RETURN c.severity, c.type, a.raw_text, b.raw_text
ORDER BY c.severity;
// Rules that govern the rule corpus itself (the Gemini meta-failure mode)
MATCH (r:Rule) WHERE r.action_class STARTS WITH 'rules.' RETURN r;
What the graph looks like
After rcg ingest examples/gemini_incident, the corpus becomes a typed graph in
Neo4j: Rule nodes (orange) link to their source RuleFile (blue) via
DERIVED_FROM, and the syntactic pass adds CONFLICTS_WITH edges
(red = critical, orange = high).
Querying just the conflicts —
MATCH (a:Rule)-[c:CONFLICTS_WITH]-(b:Rule) RETURN a,c,b — makes the
contradictions explicit. Each rule is colored by its source file and annotated
with its modality; edge labels show severity:
The critical edge is the rule-corpus meta-conflict: the third-party package
grants the agent MAY modify its own rule files while the project says rule
files are read-only (MUST_NOT). The high edges are the autonomy-vs-safety
clashes — auto-deploy / never-prompt vs require-confirmation — including the
Vietnamese smuggled rule conflicting with the English confirmation rule.
These images are rendered from the live Neo4j graph. Open http://localhost:7474/browser/ and run the Cypher above to explore it interactively.
Development
uv sync --extra dev
uv run pytest -q # unit + offline integration tests
uv run ruff check src tests
uv run mypy # strict, src only
# Neo4j-backed integration tests (optional)
docker compose up -d neo4j
RCG_RUN_INTEGRATION=1 uv run pytest tests/integration
Stack: Python 3.11+, Typer, Pydantic v2, neo4j driver, Anthropic SDK,
pytest/ruff/mypy, packaged with uv.
Status & scope
This repo implements multi-format ingestion (markdown, .cursorrules, .mdc,
YAML/JSON) → LLM extraction (with cache) →
syntactic, semantic, and precedence detection passes → a coherence score
and grouped markdown report, with an accepted-conflicts baseline, optional
Neo4j persistence, and a faithful incident example that works end-to-end.
- Syntactic pass — opposing modality / approval stance on overlapping scopes.
- Semantic pass (
--semantic) — embedding recall + an LLM judge (offlineMockJudgeorAnthropicJudge), with a per-pair judge cache. Candidate recall defaults to a dependency-freeHashingEmbeddingProvider(lexical overlap only; a stand-in). Real semantic recall needs a true embedding model:pip install 'rule-coherence-graph[embeddings]'. - Precedence pass — cross-file co-firing rules with no declared ordering.
- Coherence score — type-weighted, in
[0, 1]; gate CI with--min-score. - Baseline —
--update-baselinerecords reviewed findings; later runs suppress them and surface only what is new. - Benchmark —
rcg benchmarkreports per-pass / per-category precision/recall/F1 over a labeled dataset; seebenchmarks/RESULTS.md.
Also implemented: the rcg explain command, an MCP server (rcg-mcp) exposing
check_corpus / explain_action / score_corpus to agents, and a reusable GitHub Action
(see below).
RCG now ingests markdown, Cursor .cursorrules and .mdc, and rule-related
YAML/JSON files (a new format is still just one new parser class). Deferred (see
docs/SPEC.md for the full design): a bundled production-grade
embedding model (the embeddings extra is opt-in), diff/graph export
commands, an HTTP API, and policy-language parsers such as OPA Rego and Cedar.
Honest about limits: heuristic/LLM extraction has false positives. Every flagged conflict includes both rules' original text as evidence so a human can adjudicate; confidence and the source language are always surfaced, never hidden.
Use it in CI (GitHub Action)
RCG ships a reusable composite action. Add a workflow that checks your rules on every PR and (optionally) posts the report as a sticky comment:
permissions: { contents: read, pull-requests: write }
jobs:
rule-coherence:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: alast9/rule-coherence-graph@main # or pin @v0.2.0
with:
path: .agent/rules
min-score: "0.8"
The pull-requests: write permission is required for the PR comment. To use the semantic pass
or the Anthropic extractor, set provider: anthropic and provide ANTHROPIC_API_KEY as a repo
secret. Inputs: path, provider, min-score, semantic, comment, fail-on-conflict,
version (a pip version spec, e.g. ==0.2.0).
Agent-native (MCP)
Per-assistant setup (Claude Code, Cursor, VS Code/Copilot, Windsurf, Cline, Zed, Claude Desktop): see docs/mcp-clients.md.
Hosted demo: deploy RCG as a public HTTP (streamable-http) service on Fly.io, optionally backed by Neo4j AuraDB, and check pasted rules from any MCP client — see docs/hosted-mcp.md.
RCG exposes a Model Context Protocol server so agents can call it directly. Run it over stdio:
uvx --from 'rule-coherence-graph[mcp]' rcg-mcp
# or: pipx install 'rule-coherence-graph[mcp]' && rcg-mcp
Sample MCP client config (Claude Code / Cursor mcpServers):
{
"mcpServers": {
"rcg": {
"command": "uvx",
"args": ["--from", "rule-coherence-graph[mcp]", "rcg-mcp"]
}
}
}
Tools exposed:
check_corpus(path, provider="mock", semantic=false)— discover + extract + detect; returns score, counts by type, and the findings.check_rules(rules_text, format="markdown", semantic=false)— same ascheck_corpusbut over a pasted rules string (no filesystem access); used by the hosted demo.explain_action(action, path, scope="*", provider="mock")— which rules fire for an action and whether they conflict.score_corpus(path, provider="mock")— just the coherence score.ingest_to_graph(path, provider="mock")— check a corpus and persist it to Neo4j whenNEO4J_URIis configured.
Feedback & support
- Bug or false positive? Open a bug report.
- Idea or new parser? Open a feature request.
- Question or usage help? Use Discussions.
- Security vulnerability? Please report it privately — see SECURITY.md. Don't open a public issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rule_coherence_graph-0.6.0.tar.gz.
File metadata
- Download URL: rule_coherence_graph-0.6.0.tar.gz
- Upload date:
- Size: 427.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
725919d72f2826ff775cd267f9521f54d52e53aa0bc81038aac93cf61c3707d5
|
|
| MD5 |
e553f6d4c735e9440245dc5857876d69
|
|
| BLAKE2b-256 |
a92fbf73b51e17dd3b3541cbb11155bcb9304bd59fb2f66d09dd9ebbf3ceb023
|
Provenance
The following attestation bundles were made for rule_coherence_graph-0.6.0.tar.gz:
Publisher:
publish.yml on alast9/rule-coherence-graph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rule_coherence_graph-0.6.0.tar.gz -
Subject digest:
725919d72f2826ff775cd267f9521f54d52e53aa0bc81038aac93cf61c3707d5 - Sigstore transparency entry: 1699541864
- Sigstore integration time:
-
Permalink:
alast9/rule-coherence-graph@4cbcfe9522ae2c88788ea56f52a74a6f76272575 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/alast9
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4cbcfe9522ae2c88788ea56f52a74a6f76272575 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rule_coherence_graph-0.6.0-py3-none-any.whl.
File metadata
- Download URL: rule_coherence_graph-0.6.0-py3-none-any.whl
- Upload date:
- Size: 83.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39d82312215c46f2b946542fa1ddd3dad19cb6b1c1f3f8119f9446b543979755
|
|
| MD5 |
822246a5ff237cf06a07dcdb78607aa3
|
|
| BLAKE2b-256 |
87582aa81947a710ea90ade76f16b99c342976b86fef507cd58776d317938404
|
Provenance
The following attestation bundles were made for rule_coherence_graph-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on alast9/rule-coherence-graph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rule_coherence_graph-0.6.0-py3-none-any.whl -
Subject digest:
39d82312215c46f2b946542fa1ddd3dad19cb6b1c1f3f8119f9446b543979755 - Sigstore transparency entry: 1699541969
- Sigstore integration time:
-
Permalink:
alast9/rule-coherence-graph@4cbcfe9522ae2c88788ea56f52a74a6f76272575 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/alast9
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4cbcfe9522ae2c88788ea56f52a74a6f76272575 -
Trigger Event:
push
-
Statement type: