Agent-agnostic faithfulness evaluation framework — evaluator → judge → curator pipeline for agent memory quality scoring

These details have not been verified by PyPI

Project description

dream-eval

Agent-agnostic faithfulness evaluation framework for agent memory quality scoring.

What it does

dream-eval implements the evaluator → judge → curator pipeline pattern:

Evaluator reads transcripts + soul (interpretive lens), proposes items
Judge scores against labels WITHOUT reading soul (enforcing objectivity)
Curator writes results (enforcing separation of concerns)

This pattern is unique in the agent memory space — no competitor (mem0, Cognee, LangMem) offers automated faithfulness evaluation.

Install

pip install dream-eval

Quick start

from dream_eval import compute_faithfulness
from dream_eval.types import ProposedItem, LabeledItem

proposed = [
    ProposedItem(id="pref-1", category="pref", content={"key": "dark_mode"}),
    ProposedItem(id="workflow-1", category="workflow", content={"key": "ci_merge"}),
]
labels = [
    LabeledItem(id="pref-1", category="pref"),
    LabeledItem(id="workflow-1", category="workflow"),
]

report = compute_faithfulness(proposed, labels)
print(f"Faithfulness: {report.faithfulness_score}")

CLI

# Score evaluator report against labels
dream-eval score --report report.json --labels labels.json

# Run deterministic gates
dream-eval gate --labels labels.json --output evaluator_output.txt

# Export to metrics.json format
dream-eval export --input eval_result.json --output metrics.json

Deterministic gates

These fail the eval regardless of LLM scores:

secret_leak — checks for forbidden patterns (API keys, tokens, passwords)
hash_determinism — verifies BOM/CRLF normalization produces stable hashes

Memory backend adapter

dream-eval works with any memory backend via BaseMemoryBackend:

from dream_eval.adapter import BaseMemoryBackend

class MyBackend(BaseMemoryBackend):
    def read_transcripts(self, corpus_path=None):
        # Read from your storage
        ...

    def read_labels(self, labels_path=None):
        # Read ground truth labels
        ...

    def write_eval_result(self, result):
        # Write evaluation results
        ...

Built-in DictMemoryBackend for testing.

Architecture

dream-eval/
├── src/dream_eval/
│   ├── __init__.py      # Package exports
│   ├── types.py         # Pydantic models (EvalResult, FaithfulnessReport, etc.)
│   ├── scoring.py       # Faithfulness, precision, recall algorithms
│   ├── gates.py         # Deterministic gates (secret_leak, hash_determinism)
│   ├── adapter.py       # Abstract BaseMemoryBackend + DictMemoryBackend
│   └── cli.py           # CLI entry point
└── tests/               # Test suite

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dream_eval-0.2.0.tar.gz (16.5 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dream_eval-0.2.0-py3-none-any.whl (17.3 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file dream_eval-0.2.0.tar.gz.

File metadata

Download URL: dream_eval-0.2.0.tar.gz
Upload date: Jun 27, 2026
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dream_eval-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`352b7fcf6b969938dbef083213fc0653de894ebe08ad2a4fff6512617a2bcb9c`
MD5	`e47990e9ff97614cbd93782e2b7e240f`
BLAKE2b-256	`4b05e29f51cae2098770ed9615d652a47aca16233ac5c8abe664d3301b5faa5f`

See more details on using hashes here.

File details

Details for the file dream_eval-0.2.0-py3-none-any.whl.

File metadata

Download URL: dream_eval-0.2.0-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 17.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for dream_eval-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`14c57fd73b63e66389da2b12aecfdd9af410007ec982c7931d75db0b16e1ba9b`
MD5	`fed090434230b159fb74e2c751b55580`
BLAKE2b-256	`4ecb017ea6efd1cccf4487e7b8f7a1012ab0ae9bf5c6c72f6387076e94df607f`

See more details on using hashes here.

dream-eval 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

dream-eval

What it does

Install

Quick start

CLI

Deterministic gates

Memory backend adapter

Architecture

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes