Skip to main content

rubric

Project description

Rubrics

A Python library for LLM-based evaluation using weighted rubrics.

Installation

uv add rubric

Usage

import asyncio
import os
from openai import AsyncOpenAI
from rubric import Rubric
from autograders.per_criterion_grader import PerCriterionGrader

async def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=400,
        temperature=0.0,
    )
    return response.choices[0].message.content or ""

async def main():
    rubric = Rubric.from_dict([
        {"weight": 10.0, "requirement": "States Q4 2023 base margin as 17.2%"},
        {"weight": 8.0, "requirement": "Explicitly uses Shapley attribution for decomposition"},
        {"weight": -15.0, "requirement": "Uses total deliveries instead of cash-only deliveries"}
    ])

    grader = PerCriterionGrader(generate_fn=generate_with_async_openai)
    
    result = await rubric.grade(
        to_grade="Your text to evaluate...",
        autograder=grader
    )

    print(f"Score: {result.score}/100")
    for criterion in result.report:
        print(f"  {criterion.verdict}: {criterion.requirement}")

asyncio.run(main())

Autograder Strategies

  • PerCriterionGrader - Evaluates each criterion in parallel LLM calls
  • PerCriterionOneShotGrader - Evaluates all criteria in a single LLM call
  • RubricAsJudgeGrader - Holistic evaluation, LLM returns final score directly

Customization

You can customize grading at multiple levels:

1. Custom generate_fn (most common) Pass any function that takes (system_prompt, user_prompt) and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):

grader = PerCriterionGrader(generate_fn=your_custom_function)

2. Override specific methods Subclass any autograder and override:

  • judge() - How rubric criteria are evaluated against the text
  • generate() - How prompts are constructed and LLM is called, typically calls the generate_fn
  • aggregation() - How individual criterion scores are combined

3. Full control Override the entire grade() method for complete end-to-end control over the grading process.

Loading Rubrics

rubric = Rubric.from_dict([...])
rubric = Rubric.from_json('{"criteria": [...]}')
rubric = Rubric.from_yaml('...')
rubric = Rubric.from_file('rubric.yaml')

Requirements

  • Python 3.11+
  • An LLM API (e.g., OpenAI, OpenRouter) - set appropriate API keys as environment variables

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubric-1.1.6.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rubric-1.1.6-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file rubric-1.1.6.tar.gz.

File metadata

  • Download URL: rubric-1.1.6.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.6.tar.gz
Algorithm Hash digest
SHA256 4c057428e2fc4799b42e519dcf618e4f48b2da5cc1a221bafb3b15763f954236
MD5 7f953531a8faeaacd6ca6a0d388017c3
BLAKE2b-256 c864c1b7f986a755ca9d4105acedfdf2b2c60703c2a736818443bdbca0c9bbe3

See more details on using hashes here.

File details

Details for the file rubric-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: rubric-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e1c28eb5686826eb332a0a50db4d5435be8c9eb1bbb25ac671133d109fa359c4
MD5 b5a5947b62e6bfbdaa6b69ff7b73790e
BLAKE2b-256 4ee66fa9c3a4ab8fd8ada4c265391532d6bb5c068199faaddb61c3f74f583fc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page