Skip to main content

rubric

Project description

Rubrics

A Python library for LLM-based evaluation using weighted rubrics.

Installation

uv add rubric

Usage

import asyncio
import os
from openai import AsyncOpenAI
from rubric import Rubric
from autograders.per_criterion_grader import PerCriterionGrader

async def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=400,
        temperature=0.0,
    )
    return response.choices[0].message.content or ""

async def main():
    rubric = Rubric.from_dict([
        {"weight": 10.0, "requirement": "States Q4 2023 base margin as 17.2%"},
        {"weight": 8.0, "requirement": "Explicitly uses Shapley attribution for decomposition"},
        {"weight": -15.0, "requirement": "Uses total deliveries instead of cash-only deliveries"}
    ])

    grader = PerCriterionGrader(generate_fn=generate_with_async_openai)
    
    result = await rubric.grade(
        to_grade="Your text to evaluate...",
        autograder=grader
    )

    print(f"Score: {result.score}/100")
    for criterion in result.report:
        print(f"  {criterion.verdict}: {criterion.requirement}")

asyncio.run(main())

Autograder Strategies

  • PerCriterionGrader - Evaluates each criterion in parallel LLM calls
  • PerCriterionOneShotGrader - Evaluates all criteria in a single LLM call
  • RubricAsJudgeGrader - Holistic evaluation, LLM returns final score directly

Customization

You can customize grading at multiple levels:

1. Custom generate_fn (most common) Pass any function that takes (system_prompt, user_prompt) and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):

grader = PerCriterionGrader(generate_fn=your_custom_function)

2. Override specific methods Subclass any autograder and override:

  • judge() - How rubric criteria are evaluated against the text
  • generate() - How prompts are constructed and LLM is called, typically calls the generate_fn
  • aggregation() - How individual criterion scores are combined

3. Full control Override the entire grade() method for complete end-to-end control over the grading process.

Loading Rubrics

rubric = Rubric.from_dict([...])
rubric = Rubric.from_json('{"criteria": [...]}')
rubric = Rubric.from_yaml('...')
rubric = Rubric.from_file('rubric.yaml')

Requirements

  • Python 3.13+
  • An LLM API (e.g., OpenAI, OpenRouter) - set appropriate API keys as environment variables

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubric-1.1.5.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rubric-1.1.5-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file rubric-1.1.5.tar.gz.

File metadata

  • Download URL: rubric-1.1.5.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.5.tar.gz
Algorithm Hash digest
SHA256 55f777a19c005409d1b5de8d2387a0ed8518eb7a086e16d721a0498628021fa5
MD5 32c41dc9644c1e83f84ef5c8000c4995
BLAKE2b-256 ed3159943cdec967882640d61069cc619728029f9b70c6047e6b4cefad7e196a

See more details on using hashes here.

File details

Details for the file rubric-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: rubric-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for rubric-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2e99166cd2506c5b5cc5db0600ef9f183805d3ba027cfd6a6d10e9e47efc7ced
MD5 37192727fdd3a10750f896d7d266b1a9
BLAKE2b-256 72131e89e54a7896491cf539f0f96be2cd4f2006b5eeefae787011050c65f8cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page