Entropy-based early exit for efficient agent reasoning
Project description
Entroplain
Entropy-based early exit for efficient agent reasoning.
Stop burning tokens. Know when your agent has finished thinking.
What It Does
Entroplain monitors your LLM's predictive entropy — the uncertainty in its output distribution — to detect when reasoning has converged.
High entropy → Model is searching, exploring, uncertain
Low entropy → Model is confident, converged, ready to output
Key insight: Reasoning follows a multi-modal entropy trajectory. Local minima ("valleys") mark reasoning milestones. Exit at the right valley, save 40-60% compute with minimal accuracy loss.
Quick Start
Install
# Python (pip)
pip install entroplain
# Node.js (npm)
npm install entroplain
Requirements
Python: 3.8+
Node.js: 18+
For cloud providers: Set API keys via environment variables:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export NVIDIA_API_KEY=nvapi-...
For local models: Install Ollama or llama.cpp
Use with Any Agent
from entroplain import EntropyMonitor
monitor = EntropyMonitor()
# Stream tokens with entropy tracking
async for token, entropy in monitor.stream(agent.generate()):
print(f"{token} (entropy: {entropy:.3f})")
# Detect reasoning convergence
if monitor.is_converged():
break # Early exit — reasoning complete
How It Works
1. Track Entropy Per Token
Every token has an entropy value derived from the model's output distribution:
entropy = -sum(p * log2(p) for p in probabilities if p > 0)
2. Detect Valleys
Local minima in the entropy trajectory indicate reasoning milestones:
Entropy: 0.8 → 0.6 → 0.3* → 0.5 → 0.2* → 0.1*
↑ ↑
Valley 1 Valley 2
3. Exit at the Right Moment
When valley count plateaus and velocity stabilizes, reasoning is complete.
Experimental Evidence
Tested on Llama-3.1-70b via NVIDIA API:
| Difficulty | Avg Valleys | Avg Entropy | Avg Velocity |
|---|---|---|---|
| Easy | 61.3 | 0.3758 | 0.4852 |
| Medium | 53.0 | 0.3267 | 0.4394 |
| Hard | 70.2 | 0.2947 | 0.4095 |
Finding: Hard problems have more entropy valleys (70.2 vs 61.3) — valleys correlate with reasoning complexity.
Platform Support
| Platform | Support | How to Enable |
|---|---|---|
| Local (llama.cpp, Ollama) | ✅ Full | Built-in, no config |
| OpenAI | ✅ Yes | logprobs: true |
| Anthropic Claude | ✅ Yes (Claude 4) | logprobs: True |
| Google Gemini | ✅ Yes | response_logprobs=True |
| NVIDIA NIM | ✅ Yes | logprobs: true |
| OpenRouter | ⚠️ Partial | ~23% of models support it |
Integration Examples
OpenAI / NVIDIA / OpenRouter
from openai import OpenAI
from entroplain import EntropyMonitor
client = OpenAI()
monitor = EntropyMonitor()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Solve this step by step..."}],
logprobs=True,
top_logprobs=5,
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
entropy = monitor.calculate_entropy(chunk.choices[0].logprobs)
if monitor.should_exit():
print("\n[Early exit — reasoning converged]")
break
print(token, end="")
Ollama (Local)
import ollama
from entroplain import EntropyMonitor
monitor = EntropyMonitor()
# Ollama exposes logits for local models
response = ollama.generate(
model="llama3.1",
prompt="Think through this carefully...",
options={"num_ctx": 4096}
)
# Direct access to token probabilities
for token_data in response.get("token_probs", []):
entropy = monitor.calculate_from_logits(token_data["logits"])
monitor.track(token_data["token"], entropy)
Anthropic Claude
from anthropic import Anthropic
from entroplain import EntropyMonitor
client = Anthropic()
monitor = EntropyMonitor()
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Analyze this..."}],
) as stream:
for text in stream.text_stream:
entropy = monitor.get_entropy()
if monitor.should_exit():
break
print(text, end="", flush=True)
Agent Frameworks
OpenClaw:
# In your agent config
entropy_monitor:
enabled: true
exit_threshold: 0.15 # Exit when entropy drops below this
min_valleys: 3 # Require at least N reasoning milestones
Claude Code:
{
"hooks": {
"on_token": "entroplain.hooks.track_entropy",
"on_converge": "entroplain.hooks.early_exit"
}
}
Configuration
Environment Variables
# For cloud providers
ENTROPPLAIN_OPENAI_API_KEY=sk-...
ENTROPPLAIN_ANTHROPIC_API_KEY=sk-ant-...
ENTROPPLAIN_NVIDIA_API_KEY=nvapi-...
# For local models
ENTROPPLAIN_LOCAL_PROVIDER=ollama # or llama.cpp
ENTROPPLAIN_LOCAL_MODEL=llama3.1
Exit Conditions
monitor = EntropyMonitor(
# Exit when entropy drops below threshold
entropy_threshold=0.15,
# Require minimum valleys before exit
min_valleys=2,
# Exit when velocity stabilizes (change < this)
velocity_threshold=0.05,
# Don't exit before N tokens
min_tokens=50,
# Custom exit condition
exit_condition="valleys_plateau" # or "entropy_drop", "velocity_zero"
)
CLI Usage
# Analyze a prompt's entropy trajectory
entroplain analyze "What is 2+2?" --model gpt-4o
# Stream with early exit
entroplain stream "Solve this step by step: x^2 = 16" --exit-on-converge
# Benchmark entropy patterns
entroplain benchmark --problems gsm8k --output results.json
# Visualize entropy trajectory
entroplain visualize results.json --output entropy_plot.png
API Reference
EntropyMonitor
class EntropyMonitor:
def __init__(
self,
entropy_threshold: float = 0.15,
min_valleys: int = 2,
velocity_threshold: float = 0.05,
min_tokens: int = 50
): ...
def calculate_entropy(self, logprobs: List[float]) -> float:
"""Calculate Shannon entropy from log probabilities."""
def track(self, token: str, entropy: float) -> None:
"""Track a token and its entropy value."""
def get_valleys(self) -> List[Tuple[int, float]]:
"""Get all entropy valleys (local minima)."""
def get_velocity(self) -> float:
"""Get current entropy velocity (rate of change)."""
def should_exit(self) -> bool:
"""Determine if reasoning has converged."""
def is_converged(self) -> bool:
"""Alias for should_exit()."""
def get_trajectory(self) -> List[float]:
"""Get full entropy trajectory."""
def reset(self) -> None:
"""Clear all tracked data."""
calculate_entropy(logprobs)
from entroplain import calculate_entropy
# From log probabilities
entropy = calculate_entropy([-0.5, -2.1, -0.1, -5.2])
# Returns: 0.847
# From probabilities
entropy = calculate_entropy([0.6, 0.125, 0.9, 0.005], from_probs=True)
Research
Paper
See paper.md for the full research proposal: "Entropy-Based Early Exit for Efficient Agent Reasoning"
Key Findings
- H1 Supported: Entropy valleys correlate with reasoning complexity (70.2 valleys for hard problems vs 61.3 for easy)
- H2 Supported: Entropy velocity differs by difficulty (0.4852 easy vs 0.4095 hard)
- Potential: 40-60% compute reduction with 95%+ accuracy retention
Citation
@software{entroplain2026,
title = {Entroplain: Entropy-Based Early Exit for Efficient Agent Reasoning},
author = {Entroplain Contributors},
year = {2026},
url = {https://github.com/entroplain/entroplain}
}
Roadmap
- v0.1.0 — Core entropy tracking (Python)
- v0.2.0 — Multi-provider support (OpenAI, Anthropic, Gemini, NVIDIA)
- v0.3.0 — Local model support (llama.cpp, Ollama)
- v0.4.0 — Agent framework integrations (OpenClaw, Claude Code)
- v0.5.0 — JavaScript/Node.js SDK
- v1.0.0 — Production release with benchmarks
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
Development Setup
git clone https://github.com/entroplain/entroplain.git
cd entroplain
pip install -e ".[dev]"
pytest
License
MIT License — see LICENSE for details.
Acknowledgments
- Research inspired by early exit architectures in transformers
- Experimental validation using NVIDIA NIM API
- Built for the agent-first future of AI
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file entroplain-0.1.0.tar.gz.
File metadata
- Download URL: entroplain-0.1.0.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3725a45e07510c7a7672865aca583c18d557ed571fb68b61040ce1fb98c0347
|
|
| MD5 |
b2d353b155b25cb7c720e6e2869d0e62
|
|
| BLAKE2b-256 |
a138b8804016124f21fbd7fab1d69915f77141ec7b12da57c7d87d5793669706
|
File details
Details for the file entroplain-0.1.0-py3-none-any.whl.
File metadata
- Download URL: entroplain-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58bc040bbe9ce0700101e679346f66b788ccbc2e03d4973ccab1f7b2c91778e1
|
|
| MD5 |
6fd031197f2c9a42540549eddd3101ef
|
|
| BLAKE2b-256 |
dc9be4124ce4c23b2cc900699701ef4f18bedc8ebeb8d698fe5e4808753daef1
|