Defensive verifier framework and helpers for Harbor evaluations

These details have not been verified by PyPI

Project description

graded 🍳

graded is a defensive verifier and grading framework designed for agent evaluations (particularly for Harbor agent evaluations). It allows you to declare structured grading criteria, leverage LLM judges with automatic tracing, and safely manage evaluation artifacts.

Installation

Install graded directly from PyPI (or your internal registry):

pip install graded

Or with uv:

uv pip install graded

Quick Start

Create an evaluation script (e.g. verify.py) to grade a task workspace:

from pathlib import Path
from graded import Evaluator

# Initialize the evaluator
ev = Evaluator(
    workspace="/workspace",
    output_path="/logs/verifier/reward.json",
    auto_save_artifacts=True
)

# 1. Declare a standard criterion
@ev.criterion(name="has_output_file", weight=1.0)
def check_output(workspace: Path) -> bool:
    return (workspace / "output.txt").is_file()

# 2. Declare a fatal criterion (short-circuits score to 0.0 if failed)
@ev.criterion(name="no_syntax_errors", weight=2.0, fatal=True)
def check_syntax(workspace: Path) -> bool:
    # return True or False (or float 0.0 - 1.0)
    return True

# 3. Declare a fractional scoring criterion
@ev.criterion(name="test_pass_rate", weight=3.0)
def check_tests(workspace: Path) -> float:
    # Returns a score between 0.0 and 1.0
    return 0.8  # e.g., 80% of tests passed

# Run the evaluation and write outputs
if __name__ == "__main__":
    ev.run()

Core Features

1. Criteria Declarations (`@ev.criterion`)

Define check functions using the @ev.criterion decorator.

name: The unique identifier for the criterion.
weight: Relative weight of the score in the final weighted average calculation.
fatal: If set to True, any score of 0.0 or False immediately short-circuits the final score to 0.0.
Return Value: Must return a bool, int, or float. Anything else raises a ValueError.

2. LLM Judge with Automatic Tracing

graded integrates with instructor to run structured, schema-validated LLM grading prompts, automatically logging prompt, parameters, response schema, and LLM responses to traces.json.

from pydantic import BaseModel, Field

class Rubric(BaseModel):
    score: float = Field(description="Score between 0.0 and 1.0 based on correctness.")
    reasoning: str = Field(description="Detailed reasoning for the score.")

# In your criterion:
result = ev.llm_judge(
    model="google/gemini-3.5-flash",
    response_model=Rubric,
    system="You are a strict code correctness evaluator.",
    prompt="Compare the student's solution in code.py with the requirements...",
)

print(f"LLM Score: {result.score}")
print(f"Reasoning: {result.reasoning}")

3. File & Artifact Management

Safely access files and copy evaluation artifacts to the logs directory for post-evaluation review.

ev.read_file(filename): Safely reads content as a string. Auto-saves a copy to artifacts.
ev.load_json(filename): Safely parses JSON file content. Auto-saves a copy to artifacts.
ev.save_file(filename, content): Save arbitrary text/data to the artifacts directory.
ev.save_dir(dirname): Copy an entire directory from the workspace to the artifacts directory.
ev.load_trajectory(path): Load and parse an agent's ATIF trajectory.json file into a typed Trajectory object.

Outputs

When ev.run() completes, the following files are written to the directory containing your configured output_path:

reward.json: A flat JSON dictionary containing the final calculated reward and the individual scores for each criterion:
```
{
  "reward": 0.75,
  "has_output_file": 1.0,
  "no_syntax_errors": 1.0,
  "test_pass_rate": 0.8
}
```
reward.txt: A text file containing just the final reward float value (e.g. 0.7500\n).
traces.json: A list of structured LLM calls made via ev.llm_judge, detailing inputs, responses, latencies, and metadata.
metadata.json: (Optional) Contains evaluator-level and run-level metadata.
artifacts/: Subfolder containing copy-back files preserved during the evaluation run.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.5

Jun 12, 2026

1.0.4

Jun 11, 2026

This version

1.0.3

Jun 11, 2026

1.0.2

Jun 11, 2026

1.0.1

Jun 11, 2026

1.0.0

Jun 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graded-1.0.3.tar.gz (11.7 kB view details)

Uploaded Jun 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

graded-1.0.3-py3-none-any.whl (8.3 kB view details)

Uploaded Jun 11, 2026 Python 3

File details

Details for the file graded-1.0.3.tar.gz.

File metadata

Download URL: graded-1.0.3.tar.gz
Upload date: Jun 11, 2026
Size: 11.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graded-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`a8a3a70f95563b8d64139346708928c00ab9118a3db9d11f0571d83b020b998e`
MD5	`3f6e62c55123f6353630be3007792541`
BLAKE2b-256	`a8e6a43163cd816448e755a0e77cebffc4d42f6c44a5e62c04d1c3129379a6c0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for graded-1.0.3.tar.gz:

Publisher: ci.yml on ivanleomk/graded

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: graded-1.0.3.tar.gz
- Subject digest: a8a3a70f95563b8d64139346708928c00ab9118a3db9d11f0571d83b020b998e
- Sigstore transparency entry: 1787219621
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: ivanleomk/graded@7a6dfbf506e42f7d85639482d3bfa00a857fd611
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/ivanleomk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@7a6dfbf506e42f7d85639482d3bfa00a857fd611
- Trigger Event: push

File details

Details for the file graded-1.0.3-py3-none-any.whl.

File metadata

Download URL: graded-1.0.3-py3-none-any.whl
Upload date: Jun 11, 2026
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for graded-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d26c53434e3930031ce9a05c51215bdfcaaa0ed5a1e432c63c6f2192ef6b784`
MD5	`37a20eb9ecceab2dfad8af7ff37c0139`
BLAKE2b-256	`016b82f07dbf1e4b976135138623c9fb46519b49d00476d2953cdb0eb6c366e7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for graded-1.0.3-py3-none-any.whl:

Publisher: ci.yml on ivanleomk/graded

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: graded-1.0.3-py3-none-any.whl
- Subject digest: 7d26c53434e3930031ce9a05c51215bdfcaaa0ed5a1e432c63c6f2192ef6b784
- Sigstore transparency entry: 1787219733
- Sigstore integration time: Jun 11, 2026
Source repository:
- Permalink: ivanleomk/graded@7a6dfbf506e42f7d85639482d3bfa00a857fd611
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/ivanleomk
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@7a6dfbf506e42f7d85639482d3bfa00a857fd611
- Trigger Event: push

graded 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

graded 🍳

Installation

Quick Start

Core Features

1. Criteria Declarations (`@ev.criterion`)

2. LLM Judge with Automatic Tracing

3. File & Artifact Management

Outputs

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

graded 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

graded 🍳

Installation

Quick Start

Core Features

1. Criteria Declarations (@ev.criterion)

2. LLM Judge with Automatic Tracing

3. File & Artifact Management

Outputs

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Criteria Declarations (`@ev.criterion`)