Agent-agnostic faithfulness evaluation framework — evaluator → judge → curator pipeline for agent memory quality scoring
Project description
dream-eval
Agent-agnostic faithfulness evaluation framework for agent memory quality scoring.
What it does
dream-eval implements the evaluator → judge → curator pipeline pattern:
- Evaluator reads transcripts + soul (interpretive lens), proposes items
- Judge scores against labels WITHOUT reading soul (enforcing objectivity)
- Curator writes results (enforcing separation of concerns)
This pattern is unique in the agent memory space — no competitor (mem0, Cognee, LangMem) offers automated faithfulness evaluation.
Install
pip install dream-eval
Quick start
from dream_eval import compute_faithfulness
from dream_eval.types import ProposedItem, LabeledItem
proposed = [
ProposedItem(id="pref-1", category="pref", content={"key": "dark_mode"}),
ProposedItem(id="workflow-1", category="workflow", content={"key": "ci_merge"}),
]
labels = [
LabeledItem(id="pref-1", category="pref"),
LabeledItem(id="workflow-1", category="workflow"),
]
report = compute_faithfulness(proposed, labels)
print(f"Faithfulness: {report.faithfulness_score}")
CLI
# Score evaluator report against labels
dream-eval score --report report.json --labels labels.json
# Run deterministic gates
dream-eval gate --labels labels.json --output evaluator_output.txt
# Export to metrics.json format
dream-eval export --input eval_result.json --output metrics.json
Deterministic gates
These fail the eval regardless of LLM scores:
- secret_leak — checks for forbidden patterns (API keys, tokens, passwords)
- hash_determinism — verifies BOM/CRLF normalization produces stable hashes
Memory backend adapter
dream-eval works with any memory backend via BaseMemoryBackend:
from dream_eval.adapter import BaseMemoryBackend
class MyBackend(BaseMemoryBackend):
def read_transcripts(self, corpus_path=None):
# Read from your storage
...
def read_labels(self, labels_path=None):
# Read ground truth labels
...
def write_eval_result(self, result):
# Write evaluation results
...
Built-in DictMemoryBackend for testing.
Architecture
dream-eval/
├── src/dream_eval/
│ ├── __init__.py # Package exports
│ ├── types.py # Pydantic models (EvalResult, FaithfulnessReport, etc.)
│ ├── scoring.py # Faithfulness, precision, recall algorithms
│ ├── gates.py # Deterministic gates (secret_leak, hash_determinism)
│ ├── adapter.py # Abstract BaseMemoryBackend + DictMemoryBackend
│ └── cli.py # CLI entry point
└── tests/ # Test suite
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dream_eval-0.2.0.tar.gz.
File metadata
- Download URL: dream_eval-0.2.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
352b7fcf6b969938dbef083213fc0653de894ebe08ad2a4fff6512617a2bcb9c
|
|
| MD5 |
e47990e9ff97614cbd93782e2b7e240f
|
|
| BLAKE2b-256 |
4b05e29f51cae2098770ed9615d652a47aca16233ac5c8abe664d3301b5faa5f
|
File details
Details for the file dream_eval-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dream_eval-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14c57fd73b63e66389da2b12aecfdd9af410007ec982c7931d75db0b16e1ba9b
|
|
| MD5 |
fed090434230b159fb74e2c751b55580
|
|
| BLAKE2b-256 |
4ecb017ea6efd1cccf4487e7b8f7a1012ab0ae9bf5c6c72f6387076e94df607f
|