Local open-source evaluation tooling for rubric validation, linting, and deterministic scoring.
Project description
AuraOne EvalKit
AuraOne EvalKit is a standalone local Python package for rubric validation, rubric linting, and deterministic scoring. It installs as auraone-evalkit, imports as auraone_evalkit, and exposes the evalkit CLI.
EvalKit does not require an AuraOne account, API key, hosted tenant, database, or private reviewer pool. The files in examples/tutorial/ are synthetic tutorial data only. They are not expert-authored, human-validated, benchmark-grade, safety certifications, or claims about model quality.
Package Distinction
AuraOne has separate hosted SDKs:
| Tool | Package or binary | Purpose |
|---|---|---|
| EvalKit | auraone-evalkit, auraone_evalkit, evalkit |
Local open-source rubric tools. No API key. |
| Hosted Python SDK | auraone-sdk |
Hosted AuraOne API client. Uses hosted services. |
| Hosted TypeScript SDK | @auraone/sdk |
Hosted AuraOne API client for Node/TypeScript. Uses hosted services. |
| Hosted API CLI | aura |
Hosted AuraOne command line workflows. Separate from evalkit. |
Use evalkit for local files and tutorial workflows. Use auraone-sdk, @auraone/sdk, or aura only when you intend to call hosted AuraOne services.
Install
From this repository:
cd opensource/evalkit
python -m pip install -e .
After install:
evalkit --help
evalkit --version
Five-Minute Quickstart
Validate the synthetic tutorial rubric:
evalkit validate-rubric examples/tutorial/rubric.jsonl
Lint the same rubric:
evalkit lint-rubric examples/tutorial/rubric.jsonl
Score the synthetic tutorial model outputs. If --labels is omitted, EvalKit looks for labels.jsonl next to the responses file.
evalkit score \
--rubric examples/tutorial/rubric.jsonl \
--responses examples/tutorial/model_outputs.jsonl \
--out /tmp/evalkit-tutorial-scores.json
Expected summary for the bundled tutorial data:
{
"average_score": 0.645833,
"pass_rate": 0.666667,
"scored_outputs": 3
}
The full deterministic expected output is stored in examples/tutorial/expected_scores.json.
Commands
evalkit validate-rubric
Validates JSONL or JSON-array rubric files against the AuraOne EvalKit rubric contract.
evalkit validate-rubric examples/tutorial/rubric.jsonl --format json
Validation errors include row number, field, message, and a suggested fix.
evalkit lint-rubric
Runs rubric quality checks that catch common authoring problems before scoring.
evalkit lint-rubric examples/tutorial/rubric.jsonl --format json
The v0.1 linter includes rules for compound criteria, vague wording, missing examples, missing weight, duplicate IDs, duplicate text, inconsistent severity, unscorable language, unavailable context, unclear scoring boundaries, and weight totals.
evalkit score
Aggregates per-criterion labels into deterministic weighted scores.
evalkit score \
--rubric examples/tutorial/rubric.jsonl \
--responses examples/tutorial/model_outputs.jsonl \
--labels examples/tutorial/labels.jsonl \
--format json \
--out /tmp/evalkit-tutorial-scores.json
Supported output formats are json, jsonl, csv, and report-json.
Data Contracts
Rubric rows are JSON objects with required fields:
criterion_iddomaintask_typecriterionweightseverityscoring_typeexamplesedge_casesdisagreement_risk
See docs/schema/rubric-schema.md for the full schema and examples.
Scoring labels use:
output_idcriterion_idscore- optional
applicable - optional
rationale
Scores are normalized by scoring type, multiplied by criterion weight, and divided by the applicable rubric weight. Missing labels are reported in every output record. In --strict mode, missing labels fail the command.
Documentation
docs/architecture/two-package-architecture.mddocs/schema/rubric-schema.md- Repository roadmap context:
../../opensource.md - Public AuraOne open resources:
https://auraone.ai/open
Limitations
- v0.1 ships local tooling and synthetic tutorial fixtures only.
- The tutorial data is not a benchmark and should not be used to compare vendors or publish model claims.
- The linter is a deterministic authoring aid, not a replacement for domain review.
- The scorer aggregates labels supplied by the user. It does not generate labels, call LLM judges, or contact AuraOne hosted services.
Development
Run focused checks from opensource/evalkit:
python -m pytest tests/test_package_imports.py tests/schema/test_rubric_schema.py tests/scoring/test_score_cli.py tests/linting/test_rules.py tests/examples/test_tutorial_dataset.py
python -m pip wheel . --no-deps -w /tmp/evalkit-wheel
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auraone_evalkit-0.1.1.tar.gz.
File metadata
- Download URL: auraone_evalkit-0.1.1.tar.gz
- Upload date:
- Size: 47.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35709bcd2bf89148667f8bef463ea151db45bfa5e4ed96e88d00e483bda78234
|
|
| MD5 |
c6e02fc7f976019ce5782c7c59aac733
|
|
| BLAKE2b-256 |
138bf9f6e872629b4cd540d53ac5137cc0014b436db6a4545355e2224a06e9d2
|
Provenance
The following attestation bundles were made for auraone_evalkit-0.1.1.tar.gz:
Publisher:
release-python.yml on auraoneai/open
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auraone_evalkit-0.1.1.tar.gz -
Subject digest:
35709bcd2bf89148667f8bef463ea151db45bfa5e4ed96e88d00e483bda78234 - Sigstore transparency entry: 1509633231
- Sigstore integration time:
-
Permalink:
auraoneai/open@1e5cd8f5204d10f15060e484398fc07b413fb4e1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/auraoneai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-python.yml@1e5cd8f5204d10f15060e484398fc07b413fb4e1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file auraone_evalkit-0.1.1-py3-none-any.whl.
File metadata
- Download URL: auraone_evalkit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 62.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5138bcf6aacc12d52dd182ccfbfc7a7596bd8f75b1b8dcd20e1818e749ca78aa
|
|
| MD5 |
431f177ae2c2d6df5f93d90f98ce1ab4
|
|
| BLAKE2b-256 |
3b6499c5122f36267eb1f5016811d201582ecaac41638bfa2fb335efd3f2e4eb
|
Provenance
The following attestation bundles were made for auraone_evalkit-0.1.1-py3-none-any.whl:
Publisher:
release-python.yml on auraoneai/open
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auraone_evalkit-0.1.1-py3-none-any.whl -
Subject digest:
5138bcf6aacc12d52dd182ccfbfc7a7596bd8f75b1b8dcd20e1818e749ca78aa - Sigstore transparency entry: 1509633444
- Sigstore integration time:
-
Permalink:
auraoneai/open@1e5cd8f5204d10f15060e484398fc07b413fb4e1 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/auraoneai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-python.yml@1e5cd8f5204d10f15060e484398fc07b413fb4e1 -
Trigger Event:
push
-
Statement type: