Skip to main content

rubric

Project description

Rubrics

A Python library for LLM-based evaluation using weighted rubrics.

Installation

uv add rubric

Usage

import asyncio
import os
from openai import AsyncOpenAI
from rubric import Rubric
from autograders.per_criterion_grader import PerCriterionGrader

async def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=400,
        temperature=0.0,
    )
    return response.choices[0].message.content or ""

async def main():
    rubric = Rubric.from_dict([
        {"weight": 1.0, "requirement": "Output includes a clear introduction"},
        {"weight": 2.0, "requirement": "Analysis is supported by specific examples"},
        {"weight": -1.0, "requirement": "Contains factual errors"}
    ])

    grader = PerCriterionGrader(generate_fn=generate_with_async_openai)
    
    result = await rubric.grade(
        to_grade="Your text to evaluate...",
        autograder=grader
    )

    print(f"Score: {result.score}/100")
    for criterion in result.report:
        print(f"  {criterion.verdict}: {criterion.requirement}")

asyncio.run(main())

Autograder Strategies

  • PerCriterionGrader - Evaluates each criterion in parallel LLM calls
  • PerCriterionOneShotGrader - Evaluates all criteria in a single LLM call
  • RubricAsJudgeGrader - Holistic evaluation, LLM returns final score directly

Customization

You can customize grading at multiple levels:

1. Custom generate_fn (most common) Pass any function that takes (system_prompt, user_prompt) and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):

grader = PerCriterionGrader(generate_fn=your_custom_function)

2. Override specific methods Subclass any autograder and override:

  • judge() - How rubric criteria are evaluated against the text
  • generate() - How prompts are constructed and LLM is called, typically calls the generate_fn
  • aggregation() - How individual criterion scores are combined

3. Full control Override the entire grade() method for complete end-to-end control over the grading process.

Loading Rubrics

rubric = Rubric.from_dict([...])
rubric = Rubric.from_json('{"criteria": [...]}')
rubric = Rubric.from_yaml('...')
rubric = Rubric.from_file('rubric.yaml')

Requirements

  • Python 3.13+
  • An LLM API (e.g., OpenAI, OpenRouter) - set appropriate API keys as environment variables

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubric-1.1.4.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rubric-1.1.4-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file rubric-1.1.4.tar.gz.

File metadata

  • Download URL: rubric-1.1.4.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.4.tar.gz
Algorithm Hash digest
SHA256 931e92fae525883a97cbb593f3835fb7b356928fd9bfab02b83a46cc0edb3c48
MD5 48438c907f1aba7c204e6c03731f98db
BLAKE2b-256 789e42ec78ed55aa84503b5411095401198e891454c8ab21119c4d030e2dcea6

See more details on using hashes here.

File details

Details for the file rubric-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: rubric-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b0210a6824fb2d19572e3443efcb41402176e0da45351875ac526108069dddfb
MD5 88e2c1650b50c5aece8103e991a65aff
BLAKE2b-256 f135e330a4444e47548e53d8e29e2a097eb7e59bf9ab41763ec59ba8d2f9ebf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page