Skip to main content

Reasoning harness framework for Anthropic Claude

Project description

upryaga

For now it is personal research project with goal to understand and build recurrent self-improving harnesses.

Name derived from Упряж - means harness in ukraninan, russian and may be other slavic languages.

A reasoning harness framework for Anthropic Claude. Runs a Propose-Critique-Refine-Verify loop over reasoning problems, with Reflexion-style memory that accumulates failure experiences across runs.

Personal research project exploring recurrent self-improving harnesses — learning by building.

Install

pip install upryaga

Or for development:

git clone https://github.com/walnutgeek/upryaga.git
cd upryaga
uv sync --all-extras

Quick Start

import asyncio
from upryaga import Harness, Problem
from upryaga.reasoner import ClaudeReasoner
from upryaga.critic import ClaudeCritic
from upryaga.memory import ReflexionMemory
from upryaga.verifier import MathVerifier

harness = Harness(
    reasoner=ClaudeReasoner(model="claude-sonnet-4-6"),
    critic=ClaudeCritic(model="claude-haiku-4-5-20251001"),
    memory=ReflexionMemory(path="./memory"),
    verifier=MathVerifier(),
    max_iterations=3,
)

solution = asyncio.run(harness.solve(Problem(question="What is 247 * 83?")))
print(solution.answer)

Benchmarking

Evaluate harness configurations against GSM8K or MATH:

from upryaga.benchmark.runner import BenchmarkRunner

runner = BenchmarkRunner(harness=harness, verifier=MathVerifier())
results = asyncio.run(runner.run("gsm8k", split="test", limit=50))
print(results.summary())
# Dataset: gsm8k
# Accuracy: 41/50 (82.0%)
# Avg iterations: 2.3
# Avg tokens/problem: 4,120
# Total cost: $1.47

Architecture

Problem in
    |
    v
 Memory ----> retrieves similar past experiences
    |
    v
 Reasoner --> generates solution (Claude API)
    |
    v
 Critic ----> evaluates solution (Claude API)
    |
    |-- unsatisfactory --> loop back with feedback
    |
    +-- satisfactory ----> final answer
                              |
                              v
                          Verifier --> checks against ground truth
                              |
                              v
                          Memory <--- stores experience + reflection

All components are pluggable Python Protocols — swap any implementation without changing the rest.

Components

Component Default Implementation Purpose
Reasoner ClaudeReasoner Generates step-by-step solutions
Critic ClaudeCritic Evaluates solution quality
Memory ReflexionMemory TF-IDF episodic memory, failure-prioritized
Verifier MathVerifier Deterministic answer comparison

Development

make install   # uv sync
make lint      # ruff + basedpyright + codespell
make test      # pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

upryaga-0.0.1.tar.gz (217.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

upryaga-0.0.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file upryaga-0.0.1.tar.gz.

File metadata

  • Download URL: upryaga-0.0.1.tar.gz
  • Upload date:
  • Size: 217.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.13

File hashes

Hashes for upryaga-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ecd3173d79aaaf99969d2ee99616ae5158ad00e8a7222aa49bcfc59516a71a86
MD5 f0c8f4d992e63a05d812b273e0d933da
BLAKE2b-256 e5f097fbcf93cd8a17a9a8ad4c460209c476f1eb86da3195053fb9e68a942403

See more details on using hashes here.

File details

Details for the file upryaga-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: upryaga-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.13

File hashes

Hashes for upryaga-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c98c0be81e275f6794ec771c548d6a779a4e7b21de536f02e217cf874d501624
MD5 0e00ef5fa628dbbb0f4c8196db8e6727
BLAKE2b-256 a4b98a2a8508ec9fbfe9b66048ad97a66e5e971c0a3c498c38aeb7d1bc19b041

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page