Skip to main content

A Python library encapsulating best practices for rubric-based evaluation of LLM/VLM outputs using LLM-as-a-judge.

Project description

PyPI version Python versions Website arXiv

AutoRubric

A Python library for evaluating text outputs against weighted criteria using LLM-as-a-judge.

  @misc{rao2026autorubric,
        title={Autorubric: A Unified Framework for Rubric-Based LLM Evaluation},
        author={Delip Rao and Chris Callison-Burch},
        year={2026},
        eprint={2603.00077},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2603.00077},
  }

Installation

pip install autorubric

Quick Example

import asyncio
from autorubric import Rubric, LLMConfig
from autorubric.graders import CriterionGrader

async def main():
    grader = CriterionGrader(llm_config=LLMConfig(model="openai/gpt-5.1-mini"))

    rubric = Rubric.from_dict([
        {"weight": 10.0, "requirement": "States NMC cell-level energy density in the 250-300 Wh/kg range"},
        {"weight": 8.0, "requirement": "Identifies LFP thermal runaway threshold (~270°C) as higher than NMC (~210°C)"},
        {"weight": 6.0, "requirement": "States LFP cycle life advantage (2000-5000 cycles vs 1000-2000 for NMC)"},
        {"weight": -15.0, "requirement": "Incorrectly claims LFP has higher gravimetric energy density than NMC"}
    ])

    result = await rubric.grade(
        to_grade="""NMC cathodes (LiNixMnyCozO2) achieve 250-280 Wh/kg at the cell level,
        while LFP (LiFePO4) typically reaches 150-205 Wh/kg. However, LFP offers superior
        thermal stability with decomposition onset at ~270°C compared to ~210°C for NMC,
        and delivers 2000-5000 charge cycles versus 1000-2000 for NMC.""",
        grader=grader,
        query="Compare NMC and LFP cathode materials for EV battery applications.",
    )

    print(f"Score: {result.score:.2f}")
    for criterion in result.report:
        print(f"  [{criterion.final_verdict}] {criterion.criterion.requirement}")

asyncio.run(main())

Documentation

Full documentation, API reference, and a cookbook with several dozen recipes are available at autorubric.org.

Resource Link
Project site autorubric.org
API reference autorubric.org/docs/api
Cookbook autorubric.org/docs/cookbook

Features

Feature Description
Weighted criteria Positive and negative weights with explicit requirements
Per-criterion explanations Every verdict includes the judge's reasoning
100+ LLM providers OpenAI, Anthropic, Google, Azure, Groq, Ollama, and more via LiteLLM
Ensemble judging Combine multiple LLM judges with configurable aggregation strategies
Few-shot calibration Provide labeled examples to improve grading consistency
Multi-choice criteria Ordinal and nominal scales beyond binary met/unmet verdicts
Batch evaluation High-throughput EvalRunner with checkpointing and resumption
Metrics & validation Agreement metrics, bootstrap confidence intervals, distribution analysis
Length penalty Configurable penalty for overly long responses
Thinking/reasoning support Budget-controlled extended thinking for supported models
Response caching Disk-based caching to avoid redundant LLM calls
Dataset support Structured datasets with per-item rubrics, prompts, and ground truth
YAML configuration Define rubrics, LLM configs, and datasets in YAML
Meta-rubric evaluation Evaluate and automatically improve rubric quality

License

MIT License - see LICENSE file for details.

Acknowledgments

This research was developed with funding from the Defense Advanced Research Projects Agency’s (DARPA) SciFy program (Agreement No. HR00112520300). The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autorubric-1.0.0.tar.gz (126.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autorubric-1.0.0-py3-none-any.whl (132.2 kB view details)

Uploaded Python 3

File details

Details for the file autorubric-1.0.0.tar.gz.

File metadata

  • Download URL: autorubric-1.0.0.tar.gz
  • Upload date:
  • Size: 126.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for autorubric-1.0.0.tar.gz
Algorithm Hash digest
SHA256 16fa938c77732df963e2c7516bf5d774de6f04aaab00bc4465179451a011484a
MD5 c282cd9e63050321186d6a095878b6e5
BLAKE2b-256 2f176fc01ee17781842b9c1b5adbad89eeff77472acd4e732a1651b48ae44c84

See more details on using hashes here.

File details

Details for the file autorubric-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: autorubric-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 132.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for autorubric-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3cc582661821f8bb61c09be17137a2d22d806e3be944654700c5b708846f949
MD5 ecad46687190fd4a6f680d06e7df9c09
BLAKE2b-256 fecbe5a1ea5c819219a3a185b0c35a7b24c8c61671054c73e0133c85e4fbbf74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page