A flexible library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring
Project description
LLM Regression Tester
A Python library for testing LLM responses against predefined rubrics using OpenAI's API for automated scoring. Features easy assert methods for clean, readable tests. Simple, focused, and powerful.
🤔 Why This Library?
Traditional LLM-as-a-judge approaches lacked sophisticated scoring mechanisms - they couldn't properly weight different issues or apply negative marking, making accurate grading challenging. This library solves that problem by implementing a college-style rubric system with negative marking, enabling precise and nuanced evaluation of LLM responses.
Key Innovation:
- Weighted Scoring: Different rubric criteria can have different point values based on importance
- Negative Marking: Incorrect or missing elements can deduct points, not just give zero
- Flexible Rubrics: Define custom evaluation criteria that reflect real-world grading standards
- Accurate Assessment: More precise scoring that mirrors human evaluation processes
🚀 Features
- OpenAI Integration: Seamless integration with OpenAI's API
- Flexible Rubric System: Define custom evaluation criteria and scoring rules
- Automated Scoring: AI-powered evaluation of responses against guidelines
- Easy Assert Methods: Simple
assert_pass(),assert_fail(), andassert_score()methods for testing - Environment Variables: Support for .env files and environment variables
- Simple API: Easy to use with minimal configuration
- Type Hints: Full type annotation support for better IDE experience
- Comprehensive Testing: Built-in test examples and utilities
📦 Installation
Basic Installation
pip install llm-regression-tester
Environment Variables
The library uses .env files to securely store your API keys:
.env File Setup (Required)
# Create a .env file in your project root
echo "OPENAI_API_KEY=your-openai-api-key-here" > .env
# Edit the .env file with your actual API key
# OPENAI_API_KEY=sk-your-actual-api-key-here
Important: The library automatically loads .env files. Make sure .env is in your .gitignore to keep your keys secure.
🔧 Quick Start
1. Create a Rubric File
Create a JSON file defining your evaluation criteria:
[
{
"name": "customer_support_response",
"min_score_to_pass": 7,
"guidelines": [
{
"id": "polite",
"description": "Response is polite and professional",
"correct_score": 3,
"incorrect_score": 0
},
{
"id": "accurate",
"description": "Response provides accurate information",
"correct_score": 2,
"incorrect_score": 0
},
{
"id": "helpful",
"description": "Response offers specific help or next steps",
"correct_score": 2,
"incorrect_score": 0
}
]
}
]
2. Basic Usage
from llm_regression_tester import LLMRegressionTester
# Option 1: Initialize with API key parameter
tester = LLMRegressionTester(
rubric_file_path="rubrics.json",
openai_api_key="your-openai-api-key"
)
# Option 2: Initialize with .env file (recommended for security)
# Create a .env file with: OPENAI_API_KEY=your-actual-api-key
tester = LLMRegressionTester(
rubric_file_path="rubrics.json"
# API key will be automatically loaded from .env file
)
# Test a response
result = tester.test_response("customer_support_response", "Thank you for your question...")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")
print(f"Pass: {result['pass_status']}")
# Or use easy assert methods for testing
tester.assert_pass("customer_support_response", "Thank you for your question...")
tester.assert_fail("customer_support_response", "This is a terrible response.")
tester.assert_score("customer_support_response", "Good response", 7)
3. Easy Assert Methods for Testing
The library provides simple assert methods that make testing LLM responses intuitive and readable:
from llm_regression_tester import LLMRegressionTester
tester = LLMRegressionTester("rubrics.json")
# Assert that a response passes the rubric
tester.assert_pass("customer_support", good_response)
# Assert that a response fails the rubric
tester.assert_fail("customer_support", bad_response)
# Assert a specific score
tester.assert_score("customer_support", response, 8)
# Custom error messages
tester.assert_pass("rubric", response, "Professional response should pass quality check")
Benefits:
- Clean Syntax: One-line assertions instead of multiple lines of result checking
- Clear Errors: Helpful error messages showing exactly what failed and why
- Flexible: Optional custom messages for better test documentation
- Powerful: Supports pass/fail/score assertions for comprehensive testing
4. Using .env Files
from llm_regression_tester import LLMRegressionTester
# Create a .env file with your API key:
# OPENAI_API_KEY=your-actual-api-key
# Initialize without API key parameter
tester = LLMRegressionTester("rubrics.json")
# API key will be automatically loaded from .env file
result = tester.test_response("customer_support_response", "Hello, how can I help?")
print(f"Score: {result['total_score']}/{result['min_score_to_pass']}")
5. Test Examples
The library includes comprehensive test examples showing how to use the assert methods in practice:
# Run the test examples
python test_examples.py
This demonstrates:
- Customer service response quality testing
- Code review evaluation
- Content moderation
- Practical usage patterns with the assert methods
Running Tests
# Run all tests
pytest
# Run specific test examples
pytest tests/test_basic.py::test_assert_pass_method -v
# Run the example demonstration
python test_examples.py
📋 API Reference
LLMRegressionTester
Constructor
LLMRegressionTester(
rubric_file_path: str,
openai_api_key: Optional[str] = None,
openai_model: str = "gpt-4o-mini"
)
Parameters:
rubric_file_path: Path to JSON file containing rubricsopenai_api_key: OpenAI API key. If None, will check OPENAI_API_KEY environment variableopenai_model: OpenAI model to use (default: gpt-4o-mini)
Methods
test_response(name: str, response: str) -> Dict[str, Any]
Test a response against a specific rubric.
Returns:
{
"total_score": int,
"pass_status": bool,
"min_score_to_pass": int,
"details": [
{
"id": str,
"description": str,
"meets": bool,
"score": int
}
]
}
get_available_rubrics() -> List[str]
Get list of available rubric names.
get_rubric_details(name: str) -> Optional[Dict[str, Any]]
Get details of a specific rubric.
assert_pass(rubric_name: str, response: str, message: str = None) -> None
Assert that a response passes the specified rubric test. Raises AssertionError if the test fails.
assert_fail(rubric_name: str, response: str, message: str = None) -> None
Assert that a response fails the specified rubric test. Raises AssertionError if the test passes when it should fail.
assert_score(rubric_name: str, response: str, expected_score: int, message: str = None) -> None
Assert that a response achieves a specific score. Raises AssertionError if the score doesn't match.
🏗️ Architecture
The library has a simple, focused architecture:
LLMRegressionTester
├── Rubric Management (JSON file loading and validation)
├── OpenAI Integration (direct API calls)
├── Response Evaluation (automated scoring)
└── Assert Methods (easy testing with assert_pass/fail/score)
Key Components:
- Rubric System: JSON-based evaluation criteria
- OpenAI Client: Direct integration with OpenAI's API
- Assert Methods: Simple
assert_pass(),assert_fail(), andassert_score()methods - Environment Support: Automatic loading from .env files and environment variables
- Error Handling: Comprehensive validation and error reporting
📝 Rubric Format
Rubrics are defined in JSON format:
{
"name": "rubric_name",
"min_score_to_pass": 7,
"guidelines": [
{
"id": "unique_id",
"description": "Description of the criterion",
"correct_score": 2,
"incorrect_score": 0
}
]
}
🔐 Environment Variables
The library supports the following environment variables for API keys:
OPENAI_API_KEY: OpenAI API key for GPT models
Security Best Practices:
- Never commit API keys to version control
- Use environment variables or secure credential management
- Rotate keys regularly
- Use different keys for development and production
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Guidelines
- Follow the existing code style and patterns
- Add comprehensive tests for new features
- Update documentation for any changes
- Ensure backward compatibility when possible
- Test with different rubric configurations
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Happy Testing! 🎉
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_regression_tester-0.1.2.tar.gz.
File metadata
- Download URL: llm_regression_tester-0.1.2.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d8178d4843366b07ebeaf8c28317aa5733ff4c9d5b29504a46f8243990c3ed5
|
|
| MD5 |
5581512b3e3747fc0e293b48fe22b488
|
|
| BLAKE2b-256 |
ff0c7b93e8a821822b7027555fec951f4e985bfaa8729dd391e2a5efb0c26dd9
|
File details
Details for the file llm_regression_tester-0.1.2-py3-none-any.whl.
File metadata
- Download URL: llm_regression_tester-0.1.2-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ea8c2ad4cb4b821f9968cc3d2393122260a45ade240fa1534499bfae91a0734
|
|
| MD5 |
d169fc49128946b75bd87873f555c51b
|
|
| BLAKE2b-256 |
c62739a345de2b46d8688373ef7d7536add21e5a4b012c27749ac4124cb4e44c
|