Skip to main content

Detect silent incoherence in AI chain-of-thought reasoning

Project description

cot-coherence

Detect silent incoherence in AI chain-of-thought reasoning.

Most evaluation tools check if each reasoning step is correct. cot-coherence checks if the logical progression between steps holds together — schema-level coherence, not step-level accuracy.

Why?

Chain-of-thought reasoning in LLMs silently degrades. Recent research (arXiv, Feb-Mar 2026) shows:

  • CoT faithfulness decays at 70-85% of chain length ("Reasoning Horizon")
  • Reasoning tokens have a negative effect past this horizon
  • No existing tool detects this — current tools check RAG grounding, linguistic flow, or safety. None check logical coherence between steps.

cot-coherence fills this gap.

Install

pip install cot-coherence

With CLI support:

pip install cot-coherence[cli]

Quick Start

import cot_coherence

report = cot_coherence.analyze("""
Step 1: The user asks about Python performance.
Step 2: Python is interpreted, so it's generally slower than compiled languages.
Step 3: Let me discuss the history of JavaScript frameworks.
Step 4: Therefore, Python is definitely the fastest language available.
""", original_question="Is Python fast?")

print(report.overall_score)  # 0.43
print(report.is_coherent)    # False
print(len(report.flags))     # 3+ (scope_creep, conclusion_drift, confidence_inflation)

What It Detects

5 Incoherence Patterns

Pattern What It Catches Example
Premise Abandonment Premises introduced but never referenced again "Given PostgreSQL uses RLS..." then never mentions DB security
Conclusion Drift Conclusions that shift topic mid-chain Concludes about databases, then about Kubernetes
Confidence Inflation Unjustified jumps from hedging to certainty "might work" → "definitely always works" with no evidence
Scope Creep Reasoning that drifts from the original question Asked about Python, starts discussing Roman architecture
Circular Return Steps that repeat earlier reasoning Step 5 restates Step 1 in different words

Reasoning Horizon

Detects the point in a chain where quality starts to degrade — the "Reasoning Horizon" described in recent research. Reports the estimated horizon position and degradation signals.

CLI

# Analyze a trace file
cot-coherence check trace.txt -q "What is quantum computing?"

# Pipe from stdin
echo "Step 1: ..." | cot-coherence check

# JSON output
cot-coherence check trace.txt --json-output

# Adjust sensitivity (0.0=lenient, 1.0=strict)
cot-coherence check trace.txt -s 0.8

# Disable horizon analysis
cot-coherence check trace.txt --no-horizon

Configuration

from cot_coherence import analyze, CoherenceConfig, IncoherenceType

config = CoherenceConfig(
    sensitivity=0.7,                    # 0.0=lenient, 1.0=strict
    enabled_detectors={                 # Enable specific detectors
        IncoherenceType.PREMISE_ABANDONMENT,
        IncoherenceType.CONFIDENCE_INFLATION,
    },
    analyze_horizon=True,               # Enable horizon analysis
    weights={                           # Custom pattern weights
        IncoherenceType.PREMISE_ABANDONMENT: 2.0,
        IncoherenceType.CONFIDENCE_INFLATION: 1.5,
    },
)

report = analyze("Step 1: ...", config=config)

Trace Formats

Auto-detects 4 formats:

# Numbered (default for most LLMs)
"Step 1: First\nStep 2: Second"
"1. First\n2. Second"

# XML (Claude-style thinking)
"<step>First</step><step>Second</step>"
"<thinking>Reasoning here</thinking>"

# Markdown
"## Step 1\nFirst\n## Step 2\nSecond"

# Newline-separated (fallback)
"First block\n\nSecond block"

Or pass pre-split steps:

report = analyze(steps=["First step", "Second step"])

Custom Parsers

from cot_coherence import register_parser
from cot_coherence.models import ReasoningStep

def my_parser(text):
    parts = text.split("|||")
    return [ReasoningStep(index=i, text=p.strip(), raw_text=p)
            for i, p in enumerate(parts) if p.strip()]

register_parser("pipe", my_parser)
report = analyze("First ||| Second ||| Third", trace_format="pipe")

The CoherenceReport

report = analyze(trace)

report.overall_score    # float 0.0-1.0
report.is_coherent      # True if score >= 0.7
report.steps            # list[ReasoningStep]
report.flags            # list[IncoherenceFlag]
report.critical_flags   # flags with CRITICAL severity
report.horizon          # HorizonAnalysis or None
report.pattern_scores   # dict[IncoherenceType, float]

# Each flag contains:
flag.type               # IncoherenceType enum
flag.severity           # LOW, MEDIUM, HIGH, CRITICAL
flag.confidence         # 0.0-1.0
flag.step_range         # (start_step, end_step)
flag.summary            # Human-readable description
flag.evidence           # Specific evidence
flag.suggestion         # How to fix it

How It Works

v0.1 uses rule-based heuristics — zero API cost, works offline:

  • Premise Abandonment: Extracts premise markers ("given", "assuming"), checks if key entities appear in subsequent steps
  • Conclusion Drift: Identifies conclusion markers ("therefore", "thus"), compares topic overlap via Jaccard similarity
  • Confidence Inflation: Tracks hedging vs. certainty word ratios, flags unjustified jumps without evidence markers
  • Scope Creep: Measures content-word overlap between each step and the original question
  • Circular Return: Computes content-word fingerprints, flags high similarity between non-adjacent steps

Scoring: each pattern starts at 1.0, penalties applied per flag based on severity and confidence. Overall score is the weighted average.

Dependencies

Required: pydantic>=2.0

Optional:

  • [cli]click, rich (for terminal interface)
  • [llm]anthropic (for future LLM-powered detection)
  • [dev]pytest, pytest-cov, ruff, mypy

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cot_coherence-0.1.0.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cot_coherence-0.1.0-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file cot_coherence-0.1.0.tar.gz.

File metadata

  • Download URL: cot_coherence-0.1.0.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cot_coherence-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b3c34992f9c0b0f5ae5ffdfdd54553a997d41c2c37d29b5a7d6c295c110be3fc
MD5 3e61163519b7911835bd2b62fd84bfff
BLAKE2b-256 042570373588f70348f405fefb4d762f13f19a2a9ad30f181363ac8a493c6d02

See more details on using hashes here.

File details

Details for the file cot_coherence-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cot_coherence-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cot_coherence-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32e8ae86bb31512998f9100d90ec74c26d81286f7b38941068827e3dc4c0ee29
MD5 72b46fa6beb5babf8f3900e96f36f85b
BLAKE2b-256 f43e12f4396fa3c45978ffe95264e7e98c47ad12aa8d82c267aa8bc2db6dab6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page