Framework-agnostic implementation of ReasoningBank (Ouyang et al., ICLR 2026): agents that learn from successful and failed trajectories, with memory-aware test-time scaling (MaTTS).

These details have not been verified by PyPI

Project links

Project description

reasoning-bank

A framework-agnostic implementation of ReasoningBank (Ouyang et al., ICLR 2026): agents that distill lessons from past trajectories and retrieve them on new tasks. Includes MaTTS (memory-aware test-time scaling).

What this is

ReasoningBank is a memory mechanism for LLM agents. After each task, it judges the trajectory, distills generalizable reasoning strategies from both successes and failures, and indexes them. On future tasks it retrieves the relevant strategies and injects them into the agent's context.

This package is a clean, framework-agnostic implementation. It works with any agent loop, the raw Anthropic or OpenAI SDK, or your own ReAct loop. It does not depend on any agent framework. (It is used as the memory layer in bottensor-fleet, but does not require it.)

The reference implementation from the paper is welded to specific benchmark harnesses. This one is a standalone library you can attach to anything.

Install

pip install reasoning-bank

How it plugs into any agent loop

Three calls:

from reasoning_bank import ReasoningBank, Turn

async def my_llm(prompt, *, system=None):
    # wrap your provider however you like
    ...

bank = ReasoningBank(llm=my_llm, scope="my-project")

# 1. retrieve relevant past lessons before your agent runs
memories = await bank.retrieve(task)
system_block = bank.format_as_system_block(memories) if memories else None

# 2. run your agent however you want, optionally prepending system_block
answer = await my_agent(task, system=system_block)

# 3. ingest the trajectory so the bank learns from it
trajectory = [Turn("user", task), Turn("assistant", answer)]
await bank.ingest_trajectory(trajectory, task=task)

The bank accepts any trajectory shape (its own Turn type, plain dicts, or duck-typed objects with .role/.content) and any async LLM callable. Pluggable embedder (MiniLM default) and store (SQLite+vec default, in-memory for tests).

MaTTS

Memory-aware test-time scaling: run k rollouts in parallel, contrast them, distill higher-quality memories.

from reasoning_bank import matts_run

async def rollout():
    answer = await my_agent(task)
    return [Turn("user", task), Turn("assistant", answer)]

trajectories, memories = await matts_run(rollout, task=task, bank=bank, k=3)

Does it actually work?

This is the honest part, and the reason the repo includes a full experiment suite.

The machinery works end-to-end: it judges trajectories, distills sensible transferable lessons, retrieves them, and injects them, with no framework dependency. That is verified.

Whether it produces a measurable performance lift is a separate question, and the answer depends heavily on the task distribution and the model. I ran four controlled experiments to find out, including a SWE-bench-lite harness with a no-retry control, a naive-retry control, a per-instance bank, and a persistent cross-instance bank, scored by the official SWE-bench Docker evaluator.

Headline findings (Haiku 4.5, full methodology in experiments/):

On task suites where the base model is already at its capability ceiling, the bank cannot help, because there is no failure to learn from. Two early experiments hit this.
On SWE-bench-lite (a real spread of difficulty, ~50% baseline), the persistent cross-instance bank produced no measurable lift over a no-retry baseline. The cross-task transfer hypothesis was not supported in this setup (n=45 clean cells on the persistent arm; late instances did not outperform early ones).
The one consistently positive observation was defensive: naive retry (re-prompting with the raw error) hurt performance, and structured-reflection retry recovered it. The value was in how a failure is framed to the model, not in accumulating a memory bank.

In short: the implementation is faithful and the machinery is sound, but I did not find a regime in these experiments where cross-task memory accumulation produced net positive value. The full data, including infrastructure caveats and threats to validity, is in the experiments directory. Negative results are results.

Design

Framework-agnostic: neutral trajectory type and a plain async LLM callable at every boundary. No agent framework required.
Pluggable store: SQLite + sqlite-vec by default; in-memory for tests.
Pluggable embedder: sentence-transformers MiniLM by default.
Memory items are distilled strategies, not raw traces, per the paper: a title, a description, and an actionable lesson, derived from both successes and failures.

Paper

Ouyang et al., ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory, ICLR 2026. arXiv:2509.25140

This package is an independent implementation and is not affiliated with the paper's authors.

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

May 31, 2026

0.1.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reasoning_bank-0.1.1.tar.gz (22.2 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reasoning_bank-0.1.1-py3-none-any.whl (31.9 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file reasoning_bank-0.1.1.tar.gz.

File metadata

Download URL: reasoning_bank-0.1.1.tar.gz
Upload date: May 31, 2026
Size: 22.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for reasoning_bank-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`ceccf7e34d1d6b64a6894d029a375bc3b04dd41f9e278bac11ec3c4f39c6d5f7`
MD5	`5d1a6453da52fbef23f33a3eda22c341`
BLAKE2b-256	`5a6831927674d11f8a51954ef2fe16359e350a0311539a41c1127c93be6fc5bd`

See more details on using hashes here.

File details

Details for the file reasoning_bank-0.1.1-py3-none-any.whl.

File metadata

Download URL: reasoning_bank-0.1.1-py3-none-any.whl
Upload date: May 31, 2026
Size: 31.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for reasoning_bank-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4860ef57be675c757215641e0b8fc56f656ebbe025417fc99f72277288feb05`
MD5	`9b20391e1bbbd7c40b862ce065cbc24b`
BLAKE2b-256	`8ac30b5d40e5f3b20c2db816f7ad8287e328216f55aeaa6ec6ab33631eb6388a`

See more details on using hashes here.

reasoning-bank 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

reasoning-bank

What this is

Install

How it plugs into any agent loop

MaTTS

Does it actually work?

Design

Paper

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes