Skip to main content

A tool for using SOLAR as a judge in coding competitions

Project description

Solar-as-Judge

Solar-as-Judge Demo

Solar-as-Judge is a powerful Python package for evaluating and comparing AI-generated responses. It leverages the Upstage API to provide accurate and consistent judgments on the quality of AI outputs.

Features

  • Evaluate individual AI responses on a scale of 1-5
  • Compare two AI responses and determine a winner
  • Utilize ground truth for more accurate evaluations
  • Ensure consistency through multiple trials

Installation

pip install solar-as-judge

Quick Start

Single Answer Scoring

import solar_as_judge as saj

prompt = "Please extract one keyword from this text: I love you so much"
ground_truth = "love"
answer = "love"

score = saj.get_judge_score(prompt, answer, ground_truth)
print(f"Score: {score}")

A/B Testing

import solar_as_judge as saj

prompt = "Please extract one keyword from this text: I love you so much"
ground_truth = "love"

A_answer = "love"
B_answer = "so much"

a_score, b_score = saj.judge(prompt, A_answer, B_answer, ground_truth)
print(f"Scores: A={a_score}, B={b_score}")

API Reference

saj.get_judge_score(prompt, answer, ground_truth_answer, judge_llm=None, trials=3)

Evaluates a single AI response.

  • Returns: Integer score (1-5)

saj.judge(prompt, A_answer, B_answer, ground_truth=None, judge_llm=None, trials=7)

Evaluates and compares two AI responses.

  • Returns: Tuple of scores (A_score, B_score)

saj.get_winner(prompt, A_answer, B_answer, ground_truth_answer, judge_llm=None, trials=3)

Determines the winner between two AI responses.

  • Returns: String ("A" or "B")

Configuration

Set the UPSTAGE_API_KEY environment variable with your key from the Upstage console.

Examples

Check out the test.py file in the repository for more usage examples and test cases.

Contributing

We welcome contributions! Please see our Contributing Guidelines for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues, feature requests, or questions, please open an issue on our GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

solar_as_judge-0.3.0.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

solar_as_judge-0.3.0-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file solar_as_judge-0.3.0.tar.gz.

File metadata

  • Download URL: solar_as_judge-0.3.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for solar_as_judge-0.3.0.tar.gz
Algorithm Hash digest
SHA256 10c1d7e207efbd2d0e541884d8a02199bc98f022976dba63defc9b4f31c89406
MD5 130c94e0c8cc40cfcc54e23831c1ac0a
BLAKE2b-256 f88c480f07facb29c8f1b2279477c8c7fdf8bc8dcecf1af4d91715421c0223a7

See more details on using hashes here.

File details

Details for the file solar_as_judge-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: solar_as_judge-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for solar_as_judge-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34715573b92f2fefea5abab6f3415a8114073c2af347146ce4a9911310c7f2c7
MD5 f92c762ac872abcd2300fb9d7cb645b4
BLAKE2b-256 9ab6f0085d5b157d18c5bd6ba147a92eef1472c52b73cbb05867d41ed8d67723

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page