Skip to main content

A flexible library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring

Project description

LLM Regression Tester

PyPI version Python 3.8+ MIT License

A Python library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring. Features easy assert methods for clean, readable tests. Simple, focused, and powerful.

🤔 Why This Library?

Traditional LLM-as-a-judge approaches lacked sophisticated scoring mechanisms - they couldn't properly weight different issues or apply negative marking, making accurate grading challenging. This library solves that problem by implementing a college-style rubric system with negative marking, enabling precise and nuanced evaluation of LLM responses.

Key Innovation:

  • Weighted Scoring: Different rubric criteria can have different point values based on importance
  • Negative Marking: Incorrect or missing elements can deduct points, not just give zero
  • Flexible Rubrics: Define custom evaluation criteria that reflect real-world grading standards
  • Accurate Assessment: More precise scoring that mirrors human evaluation processes

🚀 Features

  • OpenAI Integration: Seamless integration with OpenAI's API
  • Flexible Rubric System: Define custom evaluation criteria and scoring rules
  • Automated Scoring: AI-powered evaluation of responses against guidelines
  • Easy Assert Methods: Simple assert_pass(), assert_fail(), and assert_score() methods for testing
  • Environment Variables: Support for .env files and environment variables
  • Simple API: Easy to use with minimal configuration
  • Type Hints: Full type annotation support for better IDE experience
  • Comprehensive Testing: Built-in test examples and utilities

📦 Installation

Basic Installation

pip install llm-regression-tester

Environment Variables

The library uses .env files to securely store your API keys:

.env File Setup (Required)

# Create a .env file in your project root
echo "OPENAI_API_KEY=your-openai-api-key-here" > .env

# Edit the .env file with your actual API key
# OPENAI_API_KEY=sk-your-actual-api-key-here

Important: The library automatically loads .env files. Make sure .env is in your .gitignore to keep your keys secure.

🔧 Quick Start

1. Create a Rubric File

Create a JSON file defining your evaluation criteria:

[
  {
    "name": "customer_support_response",
    "min_score_to_pass": 7,
    "guidelines": [
      {
        "id": "polite",
        "description": "Response is polite and professional",
        "correct_score": 3,
        "incorrect_score": 0
      },
      {
        "id": "accurate",
        "description": "Response provides accurate information",
        "correct_score": 2,
        "incorrect_score": 0
      },
      {
        "id": "helpful",
        "description": "Response offers specific help or next steps",
        "correct_score": 2,
        "incorrect_score": 0
      }
    ]
  }
]

2. Basic Usage

from llm_regression_tester import LLMRegressionTester

# Option 1: Initialize with API key parameter
tester = LLMRegressionTester(
    rubric_file_path="rubrics.json",
    openai_api_key="your-openai-api-key"
)

# Option 2: Initialize with .env file (recommended for security)
# Create a .env file with: OPENAI_API_KEY=your-actual-api-key
tester = LLMRegressionTester(
    rubric_file_path="rubrics.json"
    # API key will be automatically loaded from .env file
)

# Test a response
result = tester.test_response("customer_support_response", "Thank you for your question...")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")
print(f"Pass: {result['pass_status']}")

# Or use easy assert methods for testing
tester.assert_pass("customer_support_response", "Thank you for your question...")
tester.assert_fail("customer_support_response", "This is a terrible response.")
tester.assert_score("customer_support_response", "Good response", 7)

3. Easy Assert Methods for Testing

The library provides simple assert methods that make testing LLM responses intuitive and readable:

from llm_regression_tester import LLMRegressionTester

tester = LLMRegressionTester("rubrics.json")

# Assert that a response passes the rubric
tester.assert_pass("customer_support", good_response)

# Assert that a response fails the rubric
tester.assert_fail("customer_support", bad_response)

# Assert a specific score
tester.assert_score("customer_support", response, 8)

# Custom error messages
tester.assert_pass("rubric", response, "Professional response should pass quality check")

Benefits:

  • Clean Syntax: One-line assertions instead of multiple lines of result checking
  • Clear Errors: Helpful error messages showing exactly what failed and why
  • Flexible: Optional custom messages for better test documentation
  • Powerful: Supports pass/fail/score assertions for comprehensive testing

4. Using .env Files

from llm_regression_tester import LLMRegressionTester

# Create a .env file with your API key:
# OPENAI_API_KEY=your-actual-api-key

# Initialize without API key parameter
tester = LLMRegressionTester("rubrics.json")
# API key will be automatically loaded from .env file

result = tester.test_response("customer_support_response", "Hello, how can I help?")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")

5. Test Examples

The library includes comprehensive test examples showing how to use the assert methods in practice:

# Run the test examples
python test_examples.py

This demonstrates:

  • Customer service response quality testing
  • Code review evaluation
  • Content moderation
  • Practical usage patterns with the assert methods

Running Tests

# Run all tests
pytest

# Run specific test examples
pytest tests/test_basic.py::test_assert_pass_method -v

# Run the example demonstration
python test_examples.py

📋 API Reference

LLMRegressionTester

Constructor

LLMRegressionTester(
    rubric_file_path: str,
    openai_api_key: Optional[str] = None,
    openai_model: str = "gpt-4o-mini"
)

Parameters:

  • rubric_file_path: Path to JSON file containing rubrics
  • openai_api_key: OpenAI API key. If None, will check OPENAI_API_KEY environment variable
  • openai_model: OpenAI model to use (default: gpt-4o-mini)

Methods

test_response(name: str, response: str) -> Dict[str, Any]

Test a response against a specific rubric.

Returns:

{
    "total_score": int,
    "pass_status": bool,
    "min_score_to_pass": int,
    "details": [
        {
            "id": str,
            "description": str,
            "meets": bool,
            "score": int
        }
    ]
}
get_available_rubrics() -> List[str]

Get list of available rubric names.

get_rubric_details(name: str) -> Optional[Dict[str, Any]]

Get details of a specific rubric.

assert_pass(rubric_name: str, response: str, message: str = None) -> None

Assert that a response passes the specified rubric test. Raises AssertionError if the test fails.

assert_fail(rubric_name: str, response: str, message: str = None) -> None

Assert that a response fails the specified rubric test. Raises AssertionError if the test passes when it should fail.

assert_score(rubric_name: str, response: str, expected_score: int, message: str = None) -> None

Assert that a response achieves a specific score. Raises AssertionError if the score doesn't match.

🏗️ Architecture

The library has a simple, focused architecture:

LLMRegressionTester
├── Rubric Management (JSON file loading and validation)
├── OpenAI Integration (direct API calls)
├── Response Evaluation (automated scoring)
└── Assert Methods (easy testing with assert_pass/fail/score)

Key Components:

  • Rubric System: JSON-based evaluation criteria
  • OpenAI Client: Direct integration with OpenAI's API
  • Assert Methods: Simple assert_pass(), assert_fail(), and assert_score() methods
  • Environment Support: Automatic loading from .env files and environment variables
  • Error Handling: Comprehensive validation and error reporting

📝 Rubric Format

Rubrics are defined in JSON format:

{
  "name": "rubric_name",
  "min_score_to_pass": 7,
  "guidelines": [
    {
      "id": "unique_id",
      "description": "Description of the criterion",
      "correct_score": 2,
      "incorrect_score": 0
    }
  ]
}

🔐 Environment Variables

The library supports the following environment variables for API keys:

  • OPENAI_API_KEY: OpenAI API key for GPT models

Security Best Practices:

  • Never commit API keys to version control
  • Use environment variables or secure credential management
  • Rotate keys regularly
  • Use different keys for development and production

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Guidelines

  1. Follow the existing code style and patterns
  2. Add comprehensive tests for new features
  3. Update documentation for any changes
  4. Ensure backward compatibility when possible
  5. Test with different rubric configurations

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Happy Testing! 🎉

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_regression_tester-0.1.2.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_regression_tester-0.1.2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file llm_regression_tester-0.1.2.tar.gz.

File metadata

  • Download URL: llm_regression_tester-0.1.2.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for llm_regression_tester-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4d8178d4843366b07ebeaf8c28317aa5733ff4c9d5b29504a46f8243990c3ed5
MD5 5581512b3e3747fc0e293b48fe22b488
BLAKE2b-256 ff0c7b93e8a821822b7027555fec951f4e985bfaa8729dd391e2a5efb0c26dd9

See more details on using hashes here.

File details

Details for the file llm_regression_tester-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_regression_tester-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7ea8c2ad4cb4b821f9968cc3d2393122260a45ade240fa1534499bfae91a0734
MD5 d169fc49128946b75bd87873f555c51b
BLAKE2b-256 c62739a345de2b46d8688373ef7d7536add21e5a4b012c27749ac4124cb4e44c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page