Skip to main content

rubric

Project description

Rubric

A Python library for LLM-based evaluation using weighted rubrics.

Installation

uv add rubric

Usage

import asyncio
import os
from openai import AsyncOpenAI
from rubric import Rubric
from rubric.autograders import PerCriterionGrader

async def generate_with_async_openai(system_prompt: str, user_prompt: str) -> str:
    client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        max_tokens=400,
        temperature=0.0,
    )
    return response.choices[0].message.content or ""

async def main():
    rubric = Rubric.from_dict([
        {"weight": 10.0, "requirement": "States Q4 2023 base margin as 17.2%"},
        {"weight": 8.0, "requirement": "Explicitly uses Shapley attribution for decomposition"},
        {"weight": -15.0, "requirement": "Uses total deliveries instead of cash-only deliveries"}
    ])

    grader = PerCriterionGrader(
        generate_fn=generate_with_async_openai, 
        system_prompt="This overrides the default system prompt",
    )
    
    result = await rubric.grade(
        to_grade="Your text to evaluate...",
        autograder=grader
    )

    print(f"Score: {result.score}/100")
    for criterion in result.report:
        print(f"  {criterion.verdict}: {criterion.requirement}")

asyncio.run(main())

Autograder Strategies

  • PerCriterionGrader - Evaluates each criterion in parallel LLM calls
  • PerCriterionOneShotGrader - Evaluates all criteria in a single LLM call
  • RubricAsJudgeGrader - Holistic evaluation, LLM returns final score directly

Customization

You can customize grading at multiple levels:

1. Custom generate_fn (most common) Pass any function that takes (system_prompt, user_prompt) and returns a string. Use any LLM provider (OpenAI, Anthropic, local models, etc.):

grader = PerCriterionGrader(generate_fn=your_custom_function)

2. Override specific methods Subclass any autograder and override:

  • judge() - Orchestrates LLM calls to evaluate criteria and parse responses into structured results
  • generate() - Wraps your generate_fn to customize how prompts are sent to the LLM
  • aggregation() - Transforms individual criterion results into a final score and optional report

3. Full control Override the entire grade() method for complete end-to-end control over the grading process.

Loading Rubrics

rubric = Rubric([Criterion(...)])
rubric = Rubric.from_dict([...])
rubric = Rubric.from_json('{"criteria": [...]}')
rubric = Rubric.from_yaml('...')
rubric = Rubric.from_file('rubric.yaml')

Requirements

  • Python 3.11+
  • An LLM API (e.g., OpenAI, Anthropic, OpenRouter) - set appropriate API keys as environment variables

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubric-1.1.7.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rubric-1.1.7-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file rubric-1.1.7.tar.gz.

File metadata

  • Download URL: rubric-1.1.7.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for rubric-1.1.7.tar.gz
Algorithm Hash digest
SHA256 f6c888ff69f05dfdc9a07e41ab87cee4d2336b70fa37f7b1e1c3907a572c3d3d
MD5 05d8bdb8417ae85090d078416d5364cf
BLAKE2b-256 871921e56841544d717ba1ef213fc5ddd03c5ddc1c0d85c3135619f65d71c9cb

See more details on using hashes here.

File details

Details for the file rubric-1.1.7-py3-none-any.whl.

File metadata

  • Download URL: rubric-1.1.7-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for rubric-1.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 25e6afa5f48e1451baacf4a8eea8a2fd1a47b4a25538bce1d883d22078f69cb7
MD5 0acbd948229c8de969db2e8abd9a1c50
BLAKE2b-256 b5c04ffff99c94563c704ed9c3db7dba43cac6f6757df98d7e7ab1387354ab37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page