The neutral benchmark harness for AI memory systems. Run LongMemEval, LOCOMO, and more against any memory system.
Project description
Bench'd Harness
The neutral benchmark harness for AI memory systems. Every score is independently run, cryptographically signed, and verifiable by anyone.
Leaderboard | Docs | Methodology | Submit Results
Quick Start
pip install benchd-harness
# Generate signing keys
benchd keys generate --out ./keys
# Set your LLM API key (for the judge)
export OPENROUTER_API_KEY=sk-or-...
# Run LongMemEval against your MCP-compatible memory system
benchd run -a mcp -b longmemeval-v1 --judge --key ./keys/private.key \
--adapter-config '{"endpoint": "http://localhost:3000/mcp"}'
# Submit results to the leaderboard
benchd submit ./runs/run_xxx/manifest.signed.json
MCP Systems: Zero-Code Testing
If your memory system exposes an MCP server with ingest and query tools, you don't need to write any adapter code:
benchd run -a mcp -b longmemeval-v1 --judge \
--adapter-config '{"endpoint": "http://localhost:3000/mcp"}'
The MCP adapter auto-discovers your tools and maps them to Bench'd's interface.
Available Benchmarks
| Benchmark | Slug | Questions | What it tests |
|---|---|---|---|
| LongMemEval | longmemeval-v1 |
500 | Recall, temporal reasoning, knowledge updates |
| LoCoMo | locomo-v1 |
1,540 | Multi-session conversational memory |
| Smoke | smoke-memory-v0 |
10 | Quick sanity check |
Built-in Adapters
| Adapter | System | Install |
|---|---|---|
mcp |
Any MCP server | Built-in |
mem0-local |
Mem0 OSS | pip install benchd-harness[mem0] |
langchain-memory |
LangChain | pip install benchd-harness[langchain] |
llamaindex-memory |
LlamaIndex | pip install benchd-harness[llamaindex] |
llm-baseline |
Raw LLM (no memory) | pip install openai |
echo |
Test adapter | Built-in |
Writing a Custom Adapter
from benchd_harness.adapters.base import BaseAdapter
class MyAdapter(BaseAdapter):
@property
def name(self) -> str:
return "my-system"
def setup(self) -> None:
self.client = MyMemoryClient()
def ingest(self, turns: list[dict]) -> None:
for turn in turns:
self.client.add(role=turn["role"], content=turn["content"])
def recall(self, query: str) -> str:
return self.client.search(query).text
def reset(self) -> None:
self.client.clear()
Register in benchd_harness/adapters/__init__.py and run with benchd run -a my-system.
Commands
| Command | Description |
|---|---|
benchd run |
Run a benchmark against a memory system |
benchd submit |
Submit signed results to benchd.ai |
benchd verify |
Verify a signed manifest |
benchd keys generate |
Generate Ed25519 signing keys |
benchd list |
List available adapters and benchmarks |
Signing & Verification
Every run produces an Ed25519-signed manifest containing all inputs, outputs, scores, and failure traces. Anyone can verify:
benchd verify ./runs/run_xxx/manifest.signed.json
Current Results (May 2026)
| # | System | LongMemEval | Status |
|---|---|---|---|
| 1 | LlamaIndex | 59.0% | Verified |
| 1 | LangChain | 59.0% | Verified |
| 3 | LLM Baseline | 57.6% | Verified |
| 4 | Mem0 OSS | 32.4% | Verified |
Full results at benchd.ai/leaderboard.
Links
- Website: benchd.ai
- Leaderboard: benchd.ai/leaderboard
- Docs: benchd.ai/docs
- Submit: benchd.ai/submit
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file benchd_harness-0.1.0.tar.gz.
File metadata
- Download URL: benchd_harness-0.1.0.tar.gz
- Upload date:
- Size: 54.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdb62a15ad52f079313ec67750e3abb44bcdae3004c3e579793593d53c9e1410
|
|
| MD5 |
e68c4b179984c1c2306299d46dc8dc6d
|
|
| BLAKE2b-256 |
61456f3676e3ee78fefeb1be982a8f5d29d19a009aac1d186eb2aec3ff29b410
|
File details
Details for the file benchd_harness-0.1.0-py3-none-any.whl.
File metadata
- Download URL: benchd_harness-0.1.0-py3-none-any.whl
- Upload date:
- Size: 68.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e69477ef1209fc1716b458fa4ca4e151024abe12df22ab5c4a2ab725cc02fc1
|
|
| MD5 |
677b9c3d21b2e0e298271f9dfdac89c2
|
|
| BLAKE2b-256 |
c8f98569da90b6bc1c6a84b7481909c00ecb534870767de81b64c2b0e321301d
|