Skip to main content

Inference-time reasoning framework that improves LLM accuracy through search and verification

Project description

ARES - Inference-Time Reasoning Framework

ARES improves LLM accuracy on complex reasoning tasks through explicit search and verification.

Trade compute for correctness. No training required.

What is ARES?

ARES wraps an existing open-source LLM (Llama, Mixtral, etc.) with:

  1. Multi-candidate generation - Generate N reasoning paths instead of 1
  2. Rule-based verification - Reject bad reasoning steps
  3. Beam search - Use voting + verification to select the best answer
Single LLM:         Problem → LLM → Answer (often wrong)

ARES:               Problem → LLM → [5 candidates]
                                  ↓
                            [Verify each]
                                  ↓
                          [Vote + Score]
                                  ↓
                         Best answer (usually right)

Results

Method Accuracy Time
Baseline (single-shot) 27% ~10s
ARES v1 (5 candidates) ~80%+ ~3min

The tradeoff: 10x more compute → 3x better accuracy

Quick Start

# 1. Clone and setup
cd ares
pip install -r requirements.txt

# 2. Configure LLM (OpenRouter or Ollama)
cp .env.example .env
# Edit .env with your API key

# 3. Run ARES via CLI
python -m ares solve "What is 15% of 240?"

CLI Usage

# Solve a problem
python -m ares solve "Your math problem here"

# Solve with options
python -m ares solve "Problem text" --candidates 5 --type math

# Run built-in tests
python -m ares test

# Test specific problem
python -m ares test --id math_006

# Show framework info
python -m ares info

Framework Usage (for developers)

from ares.search import AresSearch

# Create ARES instance
ares = AresSearch(n_candidates=5, temperature=0.8)

# Run on a problem
result = ares.search(
    problem="In a class of 30 students, 18 play soccer...",
    problem_type="math"
)

print(f"ARES answer: {result.predicted_answer}")
print(f"Confidence: {result.best_score:.0%}")

Project Structure

ares/
├── ares/
│   ├── llm.py          # LLM provider (Ollama/OpenRouter)
│   ├── generator.py    # Multi-candidate generation
│   ├── verifier.py     # Rule-based verification
│   ├── search.py       # ARES beam search
│   └── problems.py     # Evaluation problems
├── docs/
│   ├── design.md       # Design document
│   ├── eval_tasks.md   # Evaluation tasks
│   └── twitter_posts.md # Build in public posts
├── results/            # Evaluation results
└── test_*.py           # Test scripts

How It Works

Phase 1: Baseline shows LLMs fail

Single-shot inference on reasoning problems: 27% accuracy

Phase 2: Generate multiple candidates

Instead of 1 answer, generate 5. See diversity in responses.

Phase 3: Verify each candidate

Rule-based checks:

  • Is answer a valid number?
  • Does reasoning contain math steps?
  • Does LLM express self-doubt ("doesn't make sense")?

Phase 4: Beam search with voting

Combine verification confidence with voting consensus:

combined_score = confidence × vote_percentage

Pick the answer with highest combined score.

Limitations (v1)

  • Not real beam search over steps - We generate complete answers, not step-by-step search
  • Rule-based verifier only - No learned verification model
  • Slow - 5 candidates × API latency = minutes per problem
  • Rate limits - Free API tiers are limiting
  • Math/logic only - Tested on GSM8K-style problems

What ARES is NOT

  • ❌ A new foundation model
  • ❌ A fine-tuned model
  • ❌ AGI
  • ❌ Guaranteed to be correct
  • ❌ Comparable to GPT-4/o1

v2 Roadmap (Future)

  • Monte Carlo Tree Search
  • Learned value functions
  • Step-by-step reasoning search
  • Tool use inside search
  • Self-improving verifier

License

MIT

Credits

Built in public as a learning project to understand inference-time reasoning.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ares_reasoning-1.0.0.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ares_reasoning-1.0.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file ares_reasoning-1.0.0.tar.gz.

File metadata

  • Download URL: ares_reasoning-1.0.0.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ares_reasoning-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2324d6be9039dc66c2315972761bf7fdb8def9101a9d24bcc84488b1139cead5
MD5 0b929e93043f74ed08a27a6894733a96
BLAKE2b-256 f35aea9844924c22e28e9f88913611bfae4e4ccfa5c0fafee0f861abc5b9f84c

See more details on using hashes here.

File details

Details for the file ares_reasoning-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ares_reasoning-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for ares_reasoning-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d0bcb6a26cca2e3fb2cdf32b5db986363d88eba5f23b158f60043fcd42234699
MD5 1c38740e7105b2b477956a2b91fb1182
BLAKE2b-256 c09a07937c6bb95ad56dd2587879786db1da428210adb4c4bf59df04fcc2f02f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page