A tool for using SOLAR as a judge in coding competitions
Project description
Solar-as-Judge
Solar-as-Judge is a powerful Python package for evaluating and comparing AI-generated responses. It leverages the Upstage API to provide accurate and consistent judgments on the quality of AI outputs.
Features
- Evaluate individual AI responses on a scale of 1-5
- Compare two AI responses and determine a winner
- Utilize ground truth for more accurate evaluations
- Ensure consistency through multiple trials
Installation
pip install solar-as-judge
Quick Start
Single Answer Scoring
import solar_as_judge as saj
prompt = "Please extract one keyword from this text: I love you so much"
ground_truth = "love"
answer = "love"
score = saj.get_judge_score(prompt, answer, ground_truth)
print(f"Score: {score}")
A/B Testing
import solar_as_judge as saj
prompt = "Please extract one keyword from this text: I love you so much"
ground_truth = "love"
A_answer = "love"
B_answer = "so much"
a_score, b_score = saj.judge(prompt, A_answer, B_answer, ground_truth)
print(f"Scores: A={a_score}, B={b_score}")
API Reference
saj.get_judge_score(prompt, answer, ground_truth_answer, judge_llm=None, trials=3)
Evaluates a single AI response.
- Returns: Integer score (1-5)
saj.judge(prompt, A_answer, B_answer, ground_truth=None, judge_llm=None, trials=7)
Evaluates and compares two AI responses.
- Returns: Tuple of scores (A_score, B_score)
saj.get_winner(prompt, A_answer, B_answer, ground_truth_answer, judge_llm=None, trials=3)
Determines the winner between two AI responses.
- Returns: String ("A" or "B")
Configuration
Set the UPSTAGE_API_KEY environment variable with your key from the Upstage console.
Examples
Check out the test.py file in the repository for more usage examples and test cases.
Contributing
We welcome contributions! Please see our Contributing Guidelines for more details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
For issues, feature requests, or questions, please open an issue on our GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file solar_as_judge-0.3.0.tar.gz.
File metadata
- Download URL: solar_as_judge-0.3.0.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10c1d7e207efbd2d0e541884d8a02199bc98f022976dba63defc9b4f31c89406
|
|
| MD5 |
130c94e0c8cc40cfcc54e23831c1ac0a
|
|
| BLAKE2b-256 |
f88c480f07facb29c8f1b2279477c8c7fdf8bc8dcecf1af4d91715421c0223a7
|
File details
Details for the file solar_as_judge-0.3.0-py3-none-any.whl.
File metadata
- Download URL: solar_as_judge-0.3.0-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34715573b92f2fefea5abab6f3415a8114073c2af347146ce4a9911310c7f2c7
|
|
| MD5 |
f92c762ac872abcd2300fb9d7cb645b4
|
|
| BLAKE2b-256 |
9ab6f0085d5b157d18c5bd6ba147a92eef1472c52b73cbb05867d41ed8d67723
|