NeuroDock eval corpora and harness — versioned datasets for translation, skills, and guardrails.

Project description

neurodock-evals

The versioned eval corpora and the air-gapped harness that runs ND prompts against them.

The corpus is the strategic asset that makes the translation layer honest. We prove that ND-aware prompts help neurodivergent users in real situations, and we catch regressions when prompts change. The harness gates prompt PRs in CI.

This package is v0.0.1 — the scaffold, the harness, and 6-10 hand-authored seed examples. The seeds are synthesised by to demonstrate the format — they are NOT real corporate messages. Real contributed corpora arrive over Phase 2 (target ~300 examples by month 6, per ).

What's here

packages/evals/
├── src/neurodock_evals/        # Harness, anonymiser, deduper, scorer
├── corpora/                    # Versioned YAML eval examples by slice
├── schemas/                    # JSON Schemas for examples + annotations
└── tests/                      # Tests for the harness itself

Quick start

Run the harness against the seed corpora:

uv run python -m neurodock_evals.harness --corpus translation/incoming \
    --tool translate_incoming

Run all four translation slices:

uv run python -m neurodock_evals.harness --ci

Anonymise a contribution before opening a PR:

uv run python -m neurodock_evals.anonymise path/to/example.yaml

Air-gapped by design

The harness never calls an LLM. It exercises each tool's deterministic baseline (the heuristic layer the translation server returns even before any LLM refinement) and scores the baseline against the human-rated expected block. Any LLM-side eval is a separate concern that the maintainer reviews under a different policy.

Privacy

The harness never logs example contents to stdout or to anywhere outside .eval-reports/.
Reports contain example IDs and scores only — never verbatim text.
The contribution pipeline (anonymise.py) is a safety net, NOT a substitute for contributor judgement. See CONTRIBUTING.md.
All corpora are licensed AGPL-3.0-or-later.

Glossary

Term	Meaning
corpus slice	a directory under `corpora/<server>/<slice>/`; the unit of versioning
example	one YAML file under a slice — one input, one `expected` block, multiple ratings
rating	one ND-rater's judgement of how close the `expected` block matches their read
deterministic baseline	the heuristic output a translation tool returns without invoking an LLM
eval-corpus binding	every `mcp-translation` tool cites the slice that validates it (ADR 0005 §4)

Status

v0.0.1 (current): scaffold + harness + 10 synthesised seed examples
v0.0.2 (planned): first contributed corpus (after )
v0.1.0 (planned): HuggingFace publication pipeline under the neurodock org

See CHANGELOG.md for detail.

Project details

Release history Release notifications | RSS feed

0.0.2

May 17, 2026

This version

0.0.1

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurodock_evals-0.0.1.tar.gz (25.1 kB view details)

Uploaded May 17, 2026 Source

File details

Details for the file neurodock_evals-0.0.1.tar.gz.

File metadata

Download URL: neurodock_evals-0.0.1.tar.gz
Upload date: May 17, 2026
Size: 25.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for neurodock_evals-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`b050315d30d7ee44da08f5cb10256ed48c6e8e87d6fb84814ab1e8518152e359`
MD5	`f5388483ff8bfdcaf43af628373e2b02`
BLAKE2b-256	`ba1d1e494043202b0df518cf4fde482536130057f7e8fdfc1943025d71db058c`

See more details on using hashes here.

neurodock-evals 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta