Skip to main content

NeuroDock eval corpora and harness — versioned datasets for translation, skills, and guardrails.

Project description

neurodock-evals

The versioned eval corpora and the air-gapped harness that runs ND prompts against them.

The corpus is the strategic asset that makes the translation layer honest. We prove that ND-aware prompts help neurodivergent users in real situations, and we catch regressions when prompts change. The harness gates prompt PRs in CI.

This package is v0.0.1 — the scaffold, the harness, and 6-10 hand-authored seed examples. The seeds are synthesised by to demonstrate the format — they are NOT real corporate messages. Real contributed corpora arrive over Phase 2 (target ~300 examples by month 6, per ).

What's here

packages/evals/
├── src/neurodock_evals/        # Harness, anonymiser, deduper, scorer
├── corpora/                    # Versioned YAML eval examples by slice
├── schemas/                    # JSON Schemas for examples + annotations
└── tests/                      # Tests for the harness itself

Quick start

Run the harness against the seed corpora:

uv run python -m neurodock_evals.harness --corpus translation/incoming \
    --tool translate_incoming

Run all four translation slices:

uv run python -m neurodock_evals.harness --ci

Anonymise a contribution before opening a PR:

uv run python -m neurodock_evals.anonymise path/to/example.yaml

Air-gapped by design

The harness never calls an LLM. It exercises each tool's deterministic baseline (the heuristic layer the translation server returns even before any LLM refinement) and scores the baseline against the human-rated expected block. Any LLM-side eval is a separate concern that the maintainer reviews under a different policy.

Privacy

  • The harness never logs example contents to stdout or to anywhere outside .eval-reports/.
  • Reports contain example IDs and scores only — never verbatim text.
  • The contribution pipeline (anonymise.py) is a safety net, NOT a substitute for contributor judgement. See CONTRIBUTING.md.
  • All corpora are licensed AGPL-3.0-or-later.

Glossary

Term Meaning
corpus slice a directory under corpora/<server>/<slice>/; the unit of versioning
example one YAML file under a slice — one input, one expected block, multiple ratings
rating one ND-rater's judgement of how close the expected block matches their read
deterministic baseline the heuristic output a translation tool returns without invoking an LLM
eval-corpus binding every mcp-translation tool cites the slice that validates it (ADR 0005 §4)

Status

  • v0.0.1 (current): scaffold + harness + 10 synthesised seed examples
  • v0.0.2 (planned): first contributed corpus (after )
  • v0.1.0 (planned): HuggingFace publication pipeline under the neurodock org

See CHANGELOG.md for detail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurodock_evals-0.0.2.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neurodock_evals-0.0.2-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file neurodock_evals-0.0.2.tar.gz.

File metadata

  • Download URL: neurodock_evals-0.0.2.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for neurodock_evals-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ae1bde816c8111b1df238aca72899b3ef26b2d8614bd7707c683d3c1a8acf8f3
MD5 8ffba6aa54464e7914e5b425f32a006f
BLAKE2b-256 84e531a8fea2ccc6304528990b5087df07c3fe8d1d49ed91d559f75a27eee3ee

See more details on using hashes here.

File details

Details for the file neurodock_evals-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: neurodock_evals-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for neurodock_evals-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f32ea5fb871895fbdfd8f7f820b7623af44f14060fb3b273cbed3634c3709cf8
MD5 908e1a7f609cb421809a3ab761bf2c0b
BLAKE2b-256 2a72cc088c4b8170b01414e35d95dedb2fa30241da747a6e666c7a536b923f32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page