Skip to main content

rubric

Project description

PyPI version Python versions License

Rubric

A Python library for LLM-based evaluation using weighted rubrics.

Installation

uv add rubric

Usage

  1. Set up environment variables:
export OPENAI_API_KEY=your_api_key_here
  1. Run the example below
import asyncio
import os
from openai import AsyncOpenAI
from rubric import Rubric
from rubric.autograders import PerCriterionGrader

async def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=400,
        temperature=0.0,
    )
    return response.choices[0].message.content or ""

async def main():
    rubric = Rubric.from_dict([
        {"weight": 10.0, "requirement": "States Q4 2023 base margin as 17.2%"},
        {"weight": 8.0, "requirement": "Explicitly uses Shapley attribution for decomposition"},
        {"weight": -15.0, "requirement": "Uses total deliveries instead of cash-only deliveries"}
    ])

    grader = PerCriterionGrader(
        generate_fn=generate_with_async_openai,
        system_prompt="This overrides the default system prompt",
    )

    result = await rubric.grade(
        to_grade="Your text to evaluate...",
        autograder=grader
    )

    print(f"Score: {result.score}/100")
    for criterion in result.report:
        print(f"  {criterion.verdict}: {criterion.requirement}")

asyncio.run(main())

Autograder Strategies

  • PerCriterionGrader - Evaluates each criterion in parallel LLM calls
  • PerCriterionOneShotGrader - Evaluates all criteria in a single LLM call
  • RubricAsJudgeGrader - Holistic evaluation, LLM returns final score directly

Default System Prompts

Each autograder uses a specialized system prompt optimized for its evaluation approach:

PerCriterionGrader - Detailed criterion-by-criterion evaluation with strict JSON formatting requirements. The prompt instructs the LLM to evaluate each criterion independently, handling both positive and negative criteria with specific response formats.

PerCriterionOneShotGrader - Streamlined prompt for evaluating all criteria in a single response. Focuses on providing verdicts (MET/UNMET) and explanations for each criterion in a structured JSON format.

RubricAsJudgeGrader - Holistic evaluation prompt that asks the LLM to consider the output as a whole and provide a single overall score from 0-100, taking into account the weights of all criteria.

You can view the complete default prompts in the source files:

Customizing System Prompts: You can override the default system prompt by passing a system_prompt parameter to any autograder:

grader = PerCriterionGrader(
    generate_fn=your_function,
    system_prompt="Your custom system prompt here"
)

Customization

You can customize grading at multiple levels:

1. Custom generate_fn (most common) Pass any function that takes (system_prompt, user_prompt) and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):

grader = PerCriterionGrader(generate_fn=your_custom_function)

2. Override specific methods Subclass any autograder and override:

  • judge() - Orchestrates LLM calls to evaluate criteria and parse responses into structured results
  • generate() - Wraps your generate_fn to customize how prompts are sent to the LLM
  • aggregation() - Transforms individual criterion results into a final score and optional report

3. Full control Override the entire grade() method for complete end-to-end control over the grading process.

Loading Rubrics

# Direct construction
rubric = Rubric([
    Criterion(weight=10.0, requirement="States Q4 2023 base margin as 17.2%"),
    Criterion(weight=8.0, requirement="Explicitly uses Shapley attribution for decomposition"),
    Criterion(weight=-15.0, requirement="Uses total deliveries instead of cash-only deliveries")
])

# From list of dictionaries
rubric = Rubric.from_dict([
    {"weight": 10.0, "requirement": "States Q4 2023 base margin as 17.2%"},
    {"weight": 8.0, "requirement": "Explicitly uses Shapley attribution for decomposition"},
    {"weight": -15.0, "requirement": "Uses total deliveries instead of cash-only deliveries"}
])

# From JSON string
rubric = Rubric.from_json('[{"weight": 10.0, "requirement": "Example requirement"}]')

# From YAML string
yaml_data = '''
- weight: 10.0
  requirement: "Example requirement"
'''
rubric = Rubric.from_yaml(yaml_data)

# From files
rubric = Rubric.from_file('rubric.json')
rubric = Rubric.from_file('rubric.yaml')

JSON Format

[
  {
    "weight": 10.0,
    "requirement": "States Q4 2023 base margin as 17.2%"
  },
  {
    "weight": 8.0,
    "requirement": "Explicitly uses Shapley attribution for decomposition"
  },
  {
    "weight": -15.0,
    "requirement": "Uses total deliveries instead of cash-only deliveries"
  }
]

YAML Format

- weight: 10.0
  requirement: "States Q4 2023 base margin as 17.2%"
- weight: 8.0
  requirement: "Explicitly uses Shapley attribution for decomposition"
- weight: -15.0
  requirement: "Uses total deliveries instead of cash-only deliveries"

Requirements

  • Python 3.11+
  • An LLM API (e.g., OpenAI, Anthropic, OpenRouter) - set appropriate API keys as environment variables

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubric-1.1.8.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rubric-1.1.8-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file rubric-1.1.8.tar.gz.

File metadata

  • Download URL: rubric-1.1.8.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for rubric-1.1.8.tar.gz
Algorithm Hash digest
SHA256 c0f020828fde0d93332a810a9190f8ed0a46730a4facd9bb6bd97bf5549c4186
MD5 710953fc5fab8889b8d8106181b0e279
BLAKE2b-256 a8b0c4d3e2850c32b77f2842180021b621c372351e52294481f4071eea87ec23

See more details on using hashes here.

File details

Details for the file rubric-1.1.8-py3-none-any.whl.

File metadata

  • Download URL: rubric-1.1.8-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for rubric-1.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 7f093cb9892cb804bb25c7cf416f6fb20b42d3bcc381747d8547fcc932c80f2f
MD5 8da1f26664e2d3e038f4b17d7bcac756
BLAKE2b-256 d7e8b646342c585e84676953af0526e07a83a3b2840f67c63b450c5dddfa21bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page