Detect silent incoherence in AI chain-of-thought reasoning
Project description
cot-coherence
Detect silent incoherence in AI chain-of-thought reasoning.
Most evaluation tools check if each reasoning step is correct. cot-coherence checks if the logical progression between steps holds together — schema-level coherence, not step-level accuracy.
Why?
Chain-of-thought reasoning in LLMs silently degrades. Recent research (arXiv, Feb-Mar 2026) shows:
- CoT faithfulness decays at 70-85% of chain length ("Reasoning Horizon")
- Reasoning tokens have a negative effect past this horizon
- No existing tool detects this — current tools check RAG grounding, linguistic flow, or safety. None check logical coherence between steps.
cot-coherence fills this gap.
Install
pip install cot-coherence
With CLI support:
pip install cot-coherence[cli]
Quick Start
import cot_coherence
report = cot_coherence.analyze("""
Step 1: The user asks about Python performance.
Step 2: Python is interpreted, so it's generally slower than compiled languages.
Step 3: Let me discuss the history of JavaScript frameworks.
Step 4: Therefore, Python is definitely the fastest language available.
""", original_question="Is Python fast?")
print(report.overall_score) # 0.43
print(report.is_coherent) # False
print(len(report.flags)) # 3+ (scope_creep, conclusion_drift, confidence_inflation)
What It Detects
5 Incoherence Patterns
| Pattern | What It Catches | Example |
|---|---|---|
| Premise Abandonment | Premises introduced but never referenced again | "Given PostgreSQL uses RLS..." then never mentions DB security |
| Conclusion Drift | Conclusions that shift topic mid-chain | Concludes about databases, then about Kubernetes |
| Confidence Inflation | Unjustified jumps from hedging to certainty | "might work" → "definitely always works" with no evidence |
| Scope Creep | Reasoning that drifts from the original question | Asked about Python, starts discussing Roman architecture |
| Circular Return | Steps that repeat earlier reasoning | Step 5 restates Step 1 in different words |
Reasoning Horizon
Detects the point in a chain where quality starts to degrade — the "Reasoning Horizon" described in recent research. Reports the estimated horizon position and degradation signals.
CLI
# Analyze a trace file
cot-coherence check trace.txt -q "What is quantum computing?"
# Pipe from stdin
echo "Step 1: ..." | cot-coherence check
# JSON output
cot-coherence check trace.txt --json-output
# Adjust sensitivity (0.0=lenient, 1.0=strict)
cot-coherence check trace.txt -s 0.8
# Disable horizon analysis
cot-coherence check trace.txt --no-horizon
Configuration
from cot_coherence import analyze, CoherenceConfig, IncoherenceType
config = CoherenceConfig(
sensitivity=0.7, # 0.0=lenient, 1.0=strict
enabled_detectors={ # Enable specific detectors
IncoherenceType.PREMISE_ABANDONMENT,
IncoherenceType.CONFIDENCE_INFLATION,
},
analyze_horizon=True, # Enable horizon analysis
weights={ # Custom pattern weights
IncoherenceType.PREMISE_ABANDONMENT: 2.0,
IncoherenceType.CONFIDENCE_INFLATION: 1.5,
},
)
report = analyze("Step 1: ...", config=config)
Trace Formats
Auto-detects 4 formats:
# Numbered (default for most LLMs)
"Step 1: First\nStep 2: Second"
"1. First\n2. Second"
# XML (Claude-style thinking)
"<step>First</step><step>Second</step>"
"<thinking>Reasoning here</thinking>"
# Markdown
"## Step 1\nFirst\n## Step 2\nSecond"
# Newline-separated (fallback)
"First block\n\nSecond block"
Or pass pre-split steps:
report = analyze(steps=["First step", "Second step"])
Custom Parsers
from cot_coherence import register_parser
from cot_coherence.models import ReasoningStep
def my_parser(text):
parts = text.split("|||")
return [ReasoningStep(index=i, text=p.strip(), raw_text=p)
for i, p in enumerate(parts) if p.strip()]
register_parser("pipe", my_parser)
report = analyze("First ||| Second ||| Third", trace_format="pipe")
The CoherenceReport
report = analyze(trace)
report.overall_score # float 0.0-1.0
report.is_coherent # True if score >= 0.7
report.steps # list[ReasoningStep]
report.flags # list[IncoherenceFlag]
report.critical_flags # flags with CRITICAL severity
report.horizon # HorizonAnalysis or None
report.pattern_scores # dict[IncoherenceType, float]
# Each flag contains:
flag.type # IncoherenceType enum
flag.severity # LOW, MEDIUM, HIGH, CRITICAL
flag.confidence # 0.0-1.0
flag.step_range # (start_step, end_step)
flag.summary # Human-readable description
flag.evidence # Specific evidence
flag.suggestion # How to fix it
How It Works
v0.1 uses rule-based heuristics — zero API cost, works offline:
- Premise Abandonment: Extracts premise markers ("given", "assuming"), checks if key entities appear in subsequent steps
- Conclusion Drift: Identifies conclusion markers ("therefore", "thus"), compares topic overlap via Jaccard similarity
- Confidence Inflation: Tracks hedging vs. certainty word ratios, flags unjustified jumps without evidence markers
- Scope Creep: Measures content-word overlap between each step and the original question
- Circular Return: Computes content-word fingerprints, flags high similarity between non-adjacent steps
Scoring: each pattern starts at 1.0, penalties applied per flag based on severity and confidence. Overall score is the weighted average.
Dependencies
Required: pydantic>=2.0
Optional:
[cli]—click,rich(for terminal interface)[llm]—anthropic(for future LLM-powered detection)[dev]—pytest,pytest-cov,ruff,mypy
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cot_coherence-0.1.0.tar.gz.
File metadata
- Download URL: cot_coherence-0.1.0.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3c34992f9c0b0f5ae5ffdfdd54553a997d41c2c37d29b5a7d6c295c110be3fc
|
|
| MD5 |
3e61163519b7911835bd2b62fd84bfff
|
|
| BLAKE2b-256 |
042570373588f70348f405fefb4d762f13f19a2a9ad30f181363ac8a493c6d02
|
File details
Details for the file cot_coherence-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cot_coherence-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32e8ae86bb31512998f9100d90ec74c26d81286f7b38941068827e3dc4c0ee29
|
|
| MD5 |
72b46fa6beb5babf8f3900e96f36f85b
|
|
| BLAKE2b-256 |
f43e12f4396fa3c45978ffe95264e7e98c47ad12aa8d82c267aa8bc2db6dab6c
|