A flexible library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring

These details have not been verified by PyPI

Project links

Project description

LLM Regression Tester

A Python library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring. Features easy assert methods for clean, readable tests. Simple, focused, and powerful.

🤔 Why This Library?

Traditional LLM-as-a-judge approaches lacked sophisticated scoring mechanisms - they couldn't properly weight different issues or apply negative marking, making accurate grading challenging. This library solves that problem by implementing a college-style rubric system with negative marking, enabling precise and nuanced evaluation of LLM responses.

Key Innovation:

Weighted Scoring: Different rubric criteria can have different point values based on importance
Negative Marking: Incorrect or missing elements can deduct points, not just give zero
Flexible Rubrics: Define custom evaluation criteria that reflect real-world grading standards
Accurate Assessment: More precise scoring that mirrors human evaluation processes

🚀 Features

OpenAI Integration: Seamless integration with OpenAI's API
Flexible Rubric System: Define custom evaluation criteria and scoring rules
Automated Scoring: AI-powered evaluation of responses against guidelines
Easy Assert Methods: Simple assert_pass(), assert_fail(), and assert_score() methods for testing
Environment Variables: Support for .env files and environment variables
Simple API: Easy to use with minimal configuration
Type Hints: Full type annotation support for better IDE experience
Comprehensive Testing: Built-in test examples and utilities

📦 Installation

Basic Installation

pip install llm-regression-tester

Environment Variables

The library uses .env files to securely store your API keys:

.env File Setup (Required)

# Create a .env file in your project root
echo "OPENAI_API_KEY=your-openai-api-key-here" > .env

# Edit the .env file with your actual API key
# OPENAI_API_KEY=sk-your-actual-api-key-here

Important: The library automatically loads .env files. Make sure .env is in your .gitignore to keep your keys secure.

🔧 Quick Start

1. Create a Rubric File

Create a JSON file defining your evaluation criteria:

[
  {
    "name": "customer_support_response",
    "min_score_to_pass": 7,
    "guidelines": [
      {
        "id": "polite",
        "description": "Response is polite and professional",
        "correct_score": 3,
        "incorrect_score": 0
      },
      {
        "id": "accurate",
        "description": "Response provides accurate information",
        "correct_score": 2,
        "incorrect_score": 0
      },
      {
        "id": "helpful",
        "description": "Response offers specific help or next steps",
        "correct_score": 2,
        "incorrect_score": 0
      }
    ]
  }
]

2. Basic Usage

from llm_regression_tester import LLMRegressionTester

# Option 1: Initialize with API key parameter
tester = LLMRegressionTester(
    rubric_file_path="rubrics.json",
    openai_api_key="your-openai-api-key"
)

# Option 2: Initialize with .env file (recommended for security)
# Create a .env file with: OPENAI_API_KEY=your-actual-api-key
tester = LLMRegressionTester(
    rubric_file_path="rubrics.json"
    # API key will be automatically loaded from .env file
)

# Test a response
result = tester.test_response("customer_support_response", "Thank you for your question...")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")
print(f"Pass: {result['pass_status']}")

# Or use easy assert methods for testing
tester.assert_pass("customer_support_response", "Thank you for your question...")
tester.assert_fail("customer_support_response", "This is a terrible response.")
tester.assert_score("customer_support_response", "Good response", 7)

3. Easy Assert Methods for Testing

The library provides simple assert methods that make testing LLM responses intuitive and readable:

from llm_regression_tester import LLMRegressionTester

tester = LLMRegressionTester("rubrics.json")

# Assert that a response passes the rubric
tester.assert_pass("customer_support", good_response)

# Assert that a response fails the rubric
tester.assert_fail("customer_support", bad_response)

# Assert a specific score
tester.assert_score("customer_support", response, 8)

# Custom error messages
tester.assert_pass("rubric", response, "Professional response should pass quality check")

Benefits:

Clean Syntax: One-line assertions instead of multiple lines of result checking
Clear Errors: Helpful error messages showing exactly what failed and why
Flexible: Optional custom messages for better test documentation
Powerful: Supports pass/fail/score assertions for comprehensive testing

4. Using .env Files

from llm_regression_tester import LLMRegressionTester

# Create a .env file with your API key:
# OPENAI_API_KEY=your-actual-api-key

# Initialize without API key parameter
tester = LLMRegressionTester("rubrics.json")
# API key will be automatically loaded from .env file

result = tester.test_response("customer_support_response", "Hello, how can I help?")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")

5. Test Examples

The library includes comprehensive test examples showing how to use the assert methods in practice:

# Run the test examples
python test_examples.py

This demonstrates:

Customer service response quality testing
Code review evaluation
Content moderation
Practical usage patterns with the assert methods

Running Tests

# Run all tests
pytest

# Run specific test examples
pytest tests/test_basic.py::test_assert_pass_method -v

# Run the example demonstration
python test_examples.py

📋 API Reference

LLMRegressionTester

Constructor

LLMRegressionTester(
    rubric_file_path: str,
    openai_api_key: Optional[str] = None,
    openai_model: str = "gpt-4o-mini"
)

Parameters:

rubric_file_path: Path to JSON file containing rubrics
openai_api_key: OpenAI API key. If None, will check OPENAI_API_KEY environment variable
openai_model: OpenAI model to use (default: gpt-4o-mini)

Methods

`test_response(name: str, response: str) -> Dict[str, Any]`

Test a response against a specific rubric.

Returns:

{
    "total_score": int,
    "pass_status": bool,
    "min_score_to_pass": int,
    "details": [
        {
            "id": str,
            "description": str,
            "meets": bool,
            "score": int
        }
    ]
}

`get_available_rubrics() -> List[str]`

Get list of available rubric names.

`get_rubric_details(name: str) -> Optional[Dict[str, Any]]`

Get details of a specific rubric.

`assert_pass(rubric_name: str, response: str, message: str = None) -> None`

Assert that a response passes the specified rubric test. Raises AssertionError if the test fails.

`assert_fail(rubric_name: str, response: str, message: str = None) -> None`

Assert that a response fails the specified rubric test. Raises AssertionError if the test passes when it should fail.

`assert_score(rubric_name: str, response: str, expected_score: int, message: str = None) -> None`

Assert that a response achieves a specific score. Raises AssertionError if the score doesn't match.

🏗️ Architecture

The library has a simple, focused architecture:

LLMRegressionTester
├── Rubric Management (JSON file loading and validation)
├── OpenAI Integration (direct API calls)
├── Response Evaluation (automated scoring)
└── Assert Methods (easy testing with assert_pass/fail/score)

Key Components:

Rubric System: JSON-based evaluation criteria
OpenAI Client: Direct integration with OpenAI's API
Assert Methods: Simple assert_pass(), assert_fail(), and assert_score() methods
Environment Support: Automatic loading from .env files and environment variables
Error Handling: Comprehensive validation and error reporting

📝 Rubric Format

Rubrics are defined in JSON format:

{
  "name": "rubric_name",
  "min_score_to_pass": 7,
  "guidelines": [
    {
      "id": "unique_id",
      "description": "Description of the criterion",
      "correct_score": 2,
      "incorrect_score": 0
    }
  ]
}

🔐 Environment Variables

The library supports the following environment variables for API keys:

OPENAI_API_KEY: OpenAI API key for GPT models

Security Best Practices:

Never commit API keys to version control
Use environment variables or secure credential management
Rotate keys regularly
Use different keys for development and production

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Guidelines

Follow the existing code style and patterns
Add comprehensive tests for new features
Update documentation for any changes
Ensure backward compatibility when possible
Test with different rubric configurations

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Happy Testing! 🎉

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Sep 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_regression_tester-0.1.2.tar.gz (23.3 kB view details)

Uploaded Sep 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_regression_tester-0.1.2-py3-none-any.whl (9.9 kB view details)

Uploaded Sep 6, 2025 Python 3

File details

Details for the file llm_regression_tester-0.1.2.tar.gz.

File metadata

Download URL: llm_regression_tester-0.1.2.tar.gz
Upload date: Sep 6, 2025
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for llm_regression_tester-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4d8178d4843366b07ebeaf8c28317aa5733ff4c9d5b29504a46f8243990c3ed5`
MD5	`5581512b3e3747fc0e293b48fe22b488`
BLAKE2b-256	`ff0c7b93e8a821822b7027555fec951f4e985bfaa8729dd391e2a5efb0c26dd9`

See more details on using hashes here.

File details

Details for the file llm_regression_tester-0.1.2-py3-none-any.whl.

File metadata

Download URL: llm_regression_tester-0.1.2-py3-none-any.whl
Upload date: Sep 6, 2025
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for llm_regression_tester-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ea8c2ad4cb4b821f9968cc3d2393122260a45ade240fa1534499bfae91a0734`
MD5	`d169fc49128946b75bd87873f555c51b`
BLAKE2b-256	`c62739a345de2b46d8688373ef7d7536add21e5a4b012c27749ac4124cb4e44c`

See more details on using hashes here.

llm-regression-tester 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLM Regression Tester

🤔 Why This Library?

🚀 Features

📦 Installation

Basic Installation

Environment Variables

.env File Setup (Required)

🔧 Quick Start

1. Create a Rubric File

2. Basic Usage

3. Easy Assert Methods for Testing

4. Using .env Files

5. Test Examples

Running Tests

📋 API Reference

LLMRegressionTester

Constructor

Methods

test_response(name: str, response: str) -> Dict[str, Any]

get_available_rubrics() -> List[str]

get_rubric_details(name: str) -> Optional[Dict[str, Any]]

assert_pass(rubric_name: str, response: str, message: str = None) -> None

assert_fail(rubric_name: str, response: str, message: str = None) -> None

assert_score(rubric_name: str, response: str, expected_score: int, message: str = None) -> None

🏗️ Architecture

📝 Rubric Format

🔐 Environment Variables

🤝 Contributing

Development Guidelines

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`test_response(name: str, response: str) -> Dict[str, Any]`

`get_available_rubrics() -> List[str]`

`get_rubric_details(name: str) -> Optional[Dict[str, Any]]`

`assert_pass(rubric_name: str, response: str, message: str = None) -> None`

`assert_fail(rubric_name: str, response: str, message: str = None) -> None`

`assert_score(rubric_name: str, response: str, expected_score: int, message: str = None) -> None`