Reasoning harness framework for Anthropic Claude
Project description
upryaga
For now it is personal research project with goal to understand and build recurrent self-improving harnesses.
Name derived from Упряж - means harness in ukraninan, russian and may be other slavic languages.
A reasoning harness framework for Anthropic Claude. Runs a Propose-Critique-Refine-Verify loop over reasoning problems, with Reflexion-style memory that accumulates failure experiences across runs.
Personal research project exploring recurrent self-improving harnesses — learning by building.
Install
pip install upryaga
Or for development:
git clone https://github.com/walnutgeek/upryaga.git
cd upryaga
uv sync --all-extras
Quick Start
import asyncio
from upryaga import Harness, Problem
from upryaga.reasoner import ClaudeReasoner
from upryaga.critic import ClaudeCritic
from upryaga.memory import ReflexionMemory
from upryaga.verifier import MathVerifier
harness = Harness(
reasoner=ClaudeReasoner(model="claude-sonnet-4-6"),
critic=ClaudeCritic(model="claude-haiku-4-5-20251001"),
memory=ReflexionMemory(path="./memory"),
verifier=MathVerifier(),
max_iterations=3,
)
solution = asyncio.run(harness.solve(Problem(question="What is 247 * 83?")))
print(solution.answer)
Benchmarking
Evaluate harness configurations against GSM8K or MATH:
from upryaga.benchmark.runner import BenchmarkRunner
runner = BenchmarkRunner(harness=harness, verifier=MathVerifier())
results = asyncio.run(runner.run("gsm8k", split="test", limit=50))
print(results.summary())
# Dataset: gsm8k
# Accuracy: 41/50 (82.0%)
# Avg iterations: 2.3
# Avg tokens/problem: 4,120
# Total cost: $1.47
Architecture
Problem in
|
v
Memory ----> retrieves similar past experiences
|
v
Reasoner --> generates solution (Claude API)
|
v
Critic ----> evaluates solution (Claude API)
|
|-- unsatisfactory --> loop back with feedback
|
+-- satisfactory ----> final answer
|
v
Verifier --> checks against ground truth
|
v
Memory <--- stores experience + reflection
All components are pluggable Python Protocols — swap any implementation without changing the rest.
Components
| Component | Default Implementation | Purpose |
|---|---|---|
| Reasoner | ClaudeReasoner |
Generates step-by-step solutions |
| Critic | ClaudeCritic |
Evaluates solution quality |
| Memory | ReflexionMemory |
TF-IDF episodic memory, failure-prioritized |
| Verifier | MathVerifier |
Deterministic answer comparison |
Development
make install # uv sync
make lint # ruff + basedpyright + codespell
make test # pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file upryaga-0.0.1.tar.gz.
File metadata
- Download URL: upryaga-0.0.1.tar.gz
- Upload date:
- Size: 217.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecd3173d79aaaf99969d2ee99616ae5158ad00e8a7222aa49bcfc59516a71a86
|
|
| MD5 |
f0c8f4d992e63a05d812b273e0d933da
|
|
| BLAKE2b-256 |
e5f097fbcf93cd8a17a9a8ad4c460209c476f1eb86da3195053fb9e68a942403
|
File details
Details for the file upryaga-0.0.1-py3-none-any.whl.
File metadata
- Download URL: upryaga-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c98c0be81e275f6794ec771c548d6a779a4e7b21de536f02e217cf874d501624
|
|
| MD5 |
0e00ef5fa628dbbb0f4c8196db8e6727
|
|
| BLAKE2b-256 |
a4b98a2a8508ec9fbfe9b66048ad97a66e5e971c0a3c498c38aeb7d1bc19b041
|