Skip to main content

NeuroDock eval corpora and harness — versioned datasets for translation, skills, and guardrails.

Project description

neurodock-evals

The versioned eval corpora and the air-gapped harness that runs ND prompts against them.

The corpus is the strategic asset that makes the translation layer honest. We prove that ND-aware prompts help neurodivergent users in real situations, and we catch regressions when prompts change. The harness gates prompt PRs in CI.

This package is v0.0.1 — the scaffold, the harness, and 6-10 hand-authored seed examples. The seeds are synthesised by to demonstrate the format — they are NOT real corporate messages. Real contributed corpora arrive over Phase 2 (target ~300 examples by month 6, per ).

What's here

packages/evals/
├── src/neurodock_evals/        # Harness, anonymiser, deduper, scorer
├── corpora/                    # Versioned YAML eval examples by slice
├── schemas/                    # JSON Schemas for examples + annotations
└── tests/                      # Tests for the harness itself

Quick start

Run the harness against the seed corpora:

uv run python -m neurodock_evals.harness --corpus translation/incoming \
    --tool translate_incoming

Run all four translation slices:

uv run python -m neurodock_evals.harness --ci

Anonymise a contribution before opening a PR:

uv run python -m neurodock_evals.anonymise path/to/example.yaml

Air-gapped by design

The harness never calls an LLM. It exercises each tool's deterministic baseline (the heuristic layer the translation server returns even before any LLM refinement) and scores the baseline against the human-rated expected block. Any LLM-side eval is a separate concern that the maintainer reviews under a different policy.

Privacy

  • The harness never logs example contents to stdout or to anywhere outside .eval-reports/.
  • Reports contain example IDs and scores only — never verbatim text.
  • The contribution pipeline (anonymise.py) is a safety net, NOT a substitute for contributor judgement. See CONTRIBUTING.md.
  • All corpora are licensed AGPL-3.0-or-later.

Glossary

Term Meaning
corpus slice a directory under corpora/<server>/<slice>/; the unit of versioning
example one YAML file under a slice — one input, one expected block, multiple ratings
rating one ND-rater's judgement of how close the expected block matches their read
deterministic baseline the heuristic output a translation tool returns without invoking an LLM
eval-corpus binding every mcp-translation tool cites the slice that validates it (ADR 0005 §4)

Status

  • v0.0.1 (current): scaffold + harness + 10 synthesised seed examples
  • v0.0.2 (planned): first contributed corpus (after )
  • v0.1.0 (planned): HuggingFace publication pipeline under the neurodock org

See CHANGELOG.md for detail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurodock_evals-0.0.1.tar.gz (25.1 kB view details)

Uploaded Source

File details

Details for the file neurodock_evals-0.0.1.tar.gz.

File metadata

  • Download URL: neurodock_evals-0.0.1.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for neurodock_evals-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b050315d30d7ee44da08f5cb10256ed48c6e8e87d6fb84814ab1e8518152e359
MD5 f5388483ff8bfdcaf43af628373e2b02
BLAKE2b-256 ba1d1e494043202b0df518cf4fde482536130057f7e8fdfc1943025d71db058c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page