Skip to main content

Guide LLM to judge an answer better using grading notes.

Project description

Grading Notes

Grading Notes

Grading Notes is a Python package that leverages Large Language Models (LLMs) as automated judges for evaluating AI-generated answers against human-written grading criteria. The repo is based on the awesome post by Databricks. The idea is to guide LLMs wtih simple grading notes rather than provide full ground truth answers thereby lowering the cost of creating ground truth data.

Inspired by the awesome post by Databricks

Key Features

  • LLM-powered Evaluation: Harness the intelligence of LLMs guided by humans to evaluate AI-generated answers.
  • Flexible AI Providers: Support for multiple LLM providers, including Anthropic's Claude and OpenAI's GPT models.
  • Bulk Processing: Evaluate multiple questions and answers efficiently using CSV files.

Installation

You can install the Grading Notes package using pip:

pip install grading-notes

Quick Start

from grading_notes import GradingNote, evaluate, get_judge

# set ANTHROPIC_API_KEY and OPENAI_API_KEY env. variables

# create a judge
judge = get_judge(provider="anthropic", model="claude-3-5-sonnet-20240620")

# create a grading note
grade_note = GradingNote(
    question="What is the capital of France?",
    grading_note="Answer is 'Paris'. Accept case-insensitive variations."
)

# evaluate an answer
result = evaluate(judge=judge, grading_note=grade_note, answer="Paris")
print(result) # True

More complex examples

Here is one complex example inspired from the Databricks post (answer generated by Gemini):

grade_note = GradingNote(
question="How do I drop all tables in a Unity Catalog schema?",

grading_note="The response should contain steps to get all table names then \
drop each of them. Alternatively the response can suggest dropping the entire \
schema with risks explained. The response should not treat tables as views."
)

answer = """Unfortunately, Unity Catalog doesn't currently provide a direct 
way to drop all tables within a schema with a single command. You'll need to 
list the tables in the schema and then drop them individually.

Here's a breakdown of how you could approach this using Databricks SQL:

1. List the Tables in the Schema using Python notebook

You can use the SHOW TABLES command to get a list of tables within your schema:

%sql
SHOW TABLES IN your_db_name.your_schema_name;

2. Drop Each Table
You can use the DROP TABLE command to drop each table:

tables = spark.sql("SHOW TABLES IN your_db_name.your_schema_name").collect()

for table in tables:
    table_name = table["tableName"]
    spark.sql(f"DROP TABLE your_db_name.your_schema_name.{table_name}")
"""

result = evaluate(judge=judge, grading_note=grade_note, answer=answer)
print(result) # True

Main Components

  • evaluate: Function to evaluate an answer against a grading note. Optionally takes an answer.
  • evaluate_from_csv: Function to evaluate multiple questions and answers from a CSV file. Optionally takes an answer.
  • GradingNote: Represents the grading criteria for a specific question.
  • Judge: Represents the judge client for different AI providers.
  • Evaluation: Represents the evaluation result (Good or Bad).
  • get_judge: Function to create an Judge for different AI providers.

CSV Evaluation

You can evaluate multiple questions and answers using a CSV file:

from grading_notes import get_judge, evaluate_from_csv
judge = get_judge(provider="openai", model="gpt-4-turbo-preview")
results = evaluate_from_csv(judge=judge, csv_file="path/to/your/csv_file.csv")

The CSV file should have columns question, grading_note, and answer.

Customization

The repo currently supports Anthropic and OpenAI through the instructor library.

Environment Variables

Make sure to set the following environment variables:

  • ANTHROPIC_API_KEY: Your Anthropic API key
  • OPENAI_API_KEY: Your OpenAI API key

License

This project is licensed under the Apache 2.0 License.

Contributing

We welcome contributions! Please see our Contributing Guide for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grading_notes-0.0.1.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grading_notes-0.0.1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file grading_notes-0.0.1.tar.gz.

File metadata

  • Download URL: grading_notes-0.0.1.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for grading_notes-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b87f4faca845685e6628091ef311b39c8558da7b4708f648600adfcc7e79d991
MD5 5ddbbfc5fc0ef9e156127abeb0084a5e
BLAKE2b-256 3003a381d58b4b0368e30765aa27a448ccf86c51b6aa8d2740c21de0dfcdb31a

See more details on using hashes here.

File details

Details for the file grading_notes-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: grading_notes-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for grading_notes-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6fdb791daddc634b9a8440ebe48967e6a7915a15ff905d5b67fb3384b5144c41
MD5 604400b29f8a733f18c44d913715b427
BLAKE2b-256 cfa24d41cdd2fffa6a5d9b9f32eb23c6626a0644b1e5d7bed1bc8c6bfe57c1ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page