Skip to main content

Transform LLMs into robust problem-solving agents with advanced reasoning strategies

Project description

Agent Reasoning: The Thinking Layer

License Python PyPI Ollama Reasoning Status

Vision & Purpose

The Reasoning Layer is the cognitive engine of the AI stack. While traditional LLMs excel at token generation, they often struggle with complex planning, logical deduction, and self-correction.

This repository transforms standard Open Source models (like gemma3, llama3) into robust problem solvers by wrapping them in advanced cognitive architectures. It implements findings from key research papers (CoT, ToT, ReAct) to give models "agency" over their thinking process.

"From predicting the next token to predicting the next thought."


๐Ÿ“ฆ Installation

From PyPI (Recommended)

pip install agent-reasoning

# With server dependencies (for the reasoning gateway):
pip install "agent-reasoning[server]"

From Source

# Clone the repo
git clone https://github.com/jasperan/agent-reasoning.git
cd agent-reasoning

# Install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -e .

Prerequisite: Ollama must be running.

ollama pull gemma3:270m

๐Ÿ““ Notebooks

Interactive Jupyter notebooks demonstrating agent reasoning capabilities:

Name Description Stack Link
agent_reasoning_demo Comprehensive demo of all reasoning strategies (CoT, ToT, ReAct, Self-Reflection) with benchmarks and comparisons Ollama, Gemma3/Llama3, FastAPI Open Notebook

๐Ÿš€ Features

โœ… Verified against ArXiv Papers

  • Plug & Play: Use via Python Class or as a Network Proxy.
  • Model Agnostic: Works with any model served by Ollama.
  • Advanced Architectures:
    • ๐Ÿ”— Chain-of-Thought (CoT) & Self-Consistency: Implements Majority Voting ($k$ samples) with temperature sampling.
    • ๐ŸŒณ Tree of Thoughts (ToT): BFS strategy with robust heuristic scoring and pruning.
    • ๐Ÿ› ๏ธ ReAct (Reason + Act): Real-time tool usage (Web Search via scraping, Wikipedia API, Calculator) with fallback/mock capabilities. External grounding implemented.
    • ๐Ÿชž Self-Reflection: Dynamic multi-turn Refinement Loop (Draft -> Critique -> Improve).
    • ๐Ÿงฉ Decomposition & Least-to-Most: Planning and sub-task execution.

๐Ÿ’ป Usage

1. Interactive CLI (Recommended)

Access all agents, comparisons, and benchmarks via the rich CLI.

# If installed via pip:
agent-reasoning

# Or from source:
python agent_cli.py

Interactive Experience:

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ AGENT REASONING CLI                        โ”‚
โ”‚ Advanced Cognitive Architectures (Gemma 3) โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

? Select an Activity:
  Chat with Standard Agent
  Chain of Thought (CoT)
  Tree of Thoughts (ToT)
  ReAct (Tools + Web)
  โš”๏ธ  ARENA: Run All Compare
  Select AI Model
  Exit

2. Python API (For Developers)

Use the ReasoningInterceptor as a drop-in replacement for your LLM client.

from agent_reasoning import ReasoningInterceptor

client = ReasoningInterceptor()

# Append the strategy to the model name with a '+'
response = client.generate(
    model="gemma3:270m+tot",
    prompt="I have a 3-gallon and 5-gallon jug. How do I measure 4 gallons?"
)
print(response["response"])

Using agents directly:

from agent_reasoning.agents import CoTAgent, ToTAgent, ReActAgent

# Create an agent
agent = CoTAgent(model="gemma3:270m")

# Stream responses
for chunk in agent.stream("Explain quantum entanglement step by step"):
    print(chunk, end="")

3. Reasoning Gateway Server

Run a proxy server that impersonates Ollama. This allows any Ollama-compatible app (LangChain, Web UIs) to gain reasoning capabilities without code changes.

# If installed via pip:
agent-reasoning-server --port 8080

# Or from source:
python server.py

Then configure your app:

  • Base URL: http://localhost:8080
  • Model: gemma3:270m+cot (or +tot, +react, etc.)

Example:

curl http://localhost:8080/api/generate -d '{
  "model": "gemma3:270m+cot",
  "prompt": "Why is the sky blue?"
}'

๐Ÿง  Architectures in Detail

Architecture Description Best For Papers
Chain-of-Thought Step-by-step reasoning prompt injection. Math, Logic, Explanations Wei et al. (2022)
Self-Reflection Draft -> Critique -> Refine loop. Creative Writing, High Accuracy Shinn et al. (2023)
ReAct Interleaves Reasoning and Tool Usage. Fact-checking, Calculations Yao et al. (2022)
Tree of Thoughts Explores multiple reasoning branches (BFS/DFS). Complex Riddles, Strategy Yao et al. (2023)
Decomposed Breaks complex queries into sub-tasks. Planning, Long-form answers Khot et al. (2022)
Recursive (RLM) Uses Python REPL to recursively process prompt variables. Long-context processing Author et al. (2025)

๐Ÿ“š Appendix A: Extending the System

To add a new reasoning strategy (e.g., "Reviewer-Critic"), simply:

  1. Create a class in src/agent_reasoning/agents/ inheriting from BaseAgent.
  2. Implement the stream(self, query) method.
  3. Register it in AGENT_MAP in src/agent_reasoning/interceptor.py.
from agent_reasoning.agents.base import BaseAgent

class MyNewAgent(BaseAgent):
    def stream(self, query):
        yield "Thinking differently...\n"
        # ... your custom logic ...
        yield "Final Answer"

๐Ÿ”ง Appendix B: Troubleshooting

  • Model Not Found: Ensure you have pulled the base model (ollama pull gemma3:270m).
  • Timeout / Slow: ToT and Self-Reflection make multiple calls to the LLM. With larger models (Llama3 70b), this can take time.
  • Hallucinations: The default demo uses gemma3:270m which is extremely small and prone to logic errors. Switch to gemma2:9b or llama3 for robust results.

๐Ÿ“Š Benchmark Report (Example Outputs)

Below are real outputs generated by the main.py benchmark using gemma3:270m. Note that while the small model strives to follow the reasoning structures, its logic limitations highlight the importance of using larger models (e.g., llama3 or gemma2:9b) for production.

1. Philosophy (Self-Consistency)

Generates multiple reasoning paths and votes for the best answer.

Query: "What is the meaning of life? Answer with a mix of biological and philosophical perspectives."

[ConsistencyAgent]: Processing query via Self-Consistency (k=3)...
  Sample 1: [Detailed biological perspective on propagation...]
  Sample 2: [Philosophical view on existentialism and purpose...]
  Sample 3: [Synthesis of both views...]
Majority Logic: [Aggregated Best Answer from Votes]

2. Logic (Tree of Thoughts)

Explores multiple branches (BFS) to solve riddles.

Query: "I have a 3-gallon jug and a 5-gallon jug. How can I measure exactly 4 gallons of water?"

[ToTAgent]: Processing query via Tree of Thoughts (BFS)...
Thinking via Tree of Thoughts (Depth=3, Width=2)...

[Step 1/3 - Exploring branches]
  Path Score: 0.0
  Path Score: 1.0

[Step 2/3 - Exploring branches]
  Path Score: 1.0
  Path Score: 1.0
  Path Score: 0.1

[Step 3/3 - Exploring branches]
  Path Score: 1.0 (Found solution state)

[Best Logic Trace selected. Generating Final Answer]
**Final Answer:**
1. Pour water from the 5-gallon jug into the 3-gallon jug.
2. You now have 2 gallons left in the 5-gallon jug.
3. Empty the 3-gallon jug.
4. Pour the 2 gallons from the 5-gallon jug into the 3-gallon jug.
5. Fill the 5-gallon jug again.
6. Pour from the 5-gallon jug into the 3-gallon jug until full (needs 1 gallon).
7. You are left with exactly 4 gallons in the 5-gallon jug.

3. Planning (Decomposed Agent)

Breaks down complex tasks into sub-problems.

Query: "Plan a detailed 3-day itinerary for Tokyo for a history buff who loves samurais and tea."

[DecomposedAgent]: Decomposing the problem...

Sub-tasks Plan:
1.  **Define the Scope:** What historical period and specific area of Tokyo will the itinerary cover?
2.  **Identify Key Historical Sites:** What historical sites will the itinerary focus on?
3.  **Determine Traveler's Interests:** What types of historical sites will the itinerary include?
4.  **Outline the Itinerary:** What activities and attractions will be included in each day?
5.  **Estimate Duration:** How long will the itinerary last?

[DecomposedAgent]: Solving sub-task: 1. Define the Scope...
[DecomposedAgent]: Solving sub-task: 2. Identify Key Historical Sites...
...
Final Answer: [Detailed 3-day plan covering Meiji Shrine, Tea Ceremonies, and Samurai Museum]

4. Tool Use (ReAct Agent)

Interleaves thought, action, and observation to solve problems.

Query: "Who is the current CEO of Google? Calculate the square root of 144."

[ReActAgent]: Processing query with ReAct...

--- Step 1 ---
Agent: Action: web_search[current CEO of Google]
Observation: Sundar Pichai is the current CEO of Google.
Final Answer: Sundar Pichai

Running web_search...
Observation: [1] Sundar Pichai - Wikipedia: ... He is the chief executive officer (CEO) of Alphabet Inc. and its subsidiary Google.

๐Ÿ“„ License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_reasoning-1.0.2.tar.gz (52.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_reasoning-1.0.2-py3-none-any.whl (69.7 kB view details)

Uploaded Python 3

File details

Details for the file agent_reasoning-1.0.2.tar.gz.

File metadata

  • Download URL: agent_reasoning-1.0.2.tar.gz
  • Upload date:
  • Size: 52.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for agent_reasoning-1.0.2.tar.gz
Algorithm Hash digest
SHA256 87223b627b56bfde10775b061ff5553ffe1b7362e78dc372a80c6b5d4e9971fc
MD5 672a2e5ba338c44e3b840569c403c1cd
BLAKE2b-256 0386a752600bf4f74318ee7300b9b72b6de4849d339074dad39ce506cf3b7573

See more details on using hashes here.

File details

Details for the file agent_reasoning-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_reasoning-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1f9b702a25fbbdcdde443d5825971c78ae3f3aa227720eddb6c8e8efb0686eb0
MD5 95ecff5f1a8f8b7cf9b5735b12560b39
BLAKE2b-256 027957c8a3f78d5fc4dd7cedf46dedce809c3ab4158139ae8d90056d215c8ce3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page