Skip to main content

Qualitatively evaluate LLMs

Project description

Gauge Logo

Gauge - LLM Evaluation

License: MIT Python Version PyPI version

Gauge is a Python library for evaluating and comparing language models (LLMs). Compare models based on their performance on complex and custom tasks, alongside numeric measurements like latency and cost.

How does it work?

Gauge uses a model-on-model approach to evaluate LLMs qualitatively. An advanced arbiter model (GPT-4) evaluates the performance of smaller LLMs on specific tasks, providing a numeric score based on their output. This allows users to create custom benchmarks for their tasks and obtain qualitative evaluations of different LLMs. Gauge is useful for evaluating and ranking LLMs on a wide range of complex and subjective tasks, such as creative writing, staying in character, formatting outputs, extracting information, and translating text.

Features

  • Evaluate and compare multiple LLMs using custom benchmarks
  • Straightforward API for running and evaluating LLMs
  • Extensible architecture for including additional models

Installation

To install Gauge, run the following command:

pip install gauge-llm

Before using Gauge, set your HUGGINGFACE_TOKEN environment variable, your REPLICATE_API_TOKEN, and import the openai library and set your .api_key.

import os
import openai

os.environ["HUGGINGFACE_TOKEN"] = "your_huggingface_token"
os.environ["REPLICATE_API_TOKEN"] = "your_replicate_api_token"
openai.api_key = "your_openai_api_key"

Examples

Information Extraction: Historical Event

import gauge

query = "Extract the main points from the following paragraph: On July 20, 1969, American astronauts Neil Armstrong and Buzz Aldrin became the first humans to land on the Moon. Armstrong stepped onto the lunar surface and described the event as 'one small step for man, one giant leap for mankind.'"
gauge.evaluate(query)

Staying in Character: Detective's Monologue

import gauge

query = "Write a monologue for a detective character in a film noir setting."
gauge.evaluate(query)

Translation: English to Spanish

import gauge

query = "Translate the following English text to Spanish: 'The quick brown fox jumps over the lazy dog.'"
gauge.evaluate(query)

Formatting Output: Recipe Conversion

import gauge

query = "Convert the following recipe into a shopping list: 2 cups flour, 1 cup sugar, 3 eggs, 1/2 cup milk, 1/4 cup butter."
gauge.evaluate(query)

These examples will display a table with the results for each model, including their name, response, score, explanation, latency, and cost.

API

gauge.run(model, query)

Runs the specified model with the given query and returns the output, latency, and cost.

Parameters:

  • model: A dictionary containing the model's information (type, name, id, and price_per_second).
  • query: The input query for the model.

Returns:

  • output: The generated output from the model.
  • latency: The time taken to run the model.
  • cost: The cost of running the model.

gauge.evaluate(query)

Evaluates multiple LLMs using the given query and displays a table with the results, including the model's name, response, score, explanation, latency, and cost.

Parameters:

  • query: The input query for the models.

Contributing

Contributions to Gauge are welcome! If you'd like to add a new model or improve the existing code, please submit a pull request. If you encounter issues or have suggestions, open an issue on GitHub.

License

Gauge is released under the MIT License.

Acknowledgements

This project was created by Killian Lucas and Roger Hu during the AI Tinkerers Summer Hackathon, which took place on June 10th, 2023 in Seattle at Create 33. The event was sponsored by AWS Startups, Cohere, Madrona Venture Group, and supported by Pinecone, Weaviate, and Blueprint AI. Gauge made it to the semi-finals.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gauge_llm-0.0.121.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

gauge_llm-0.0.121-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file gauge_llm-0.0.121.tar.gz.

File metadata

  • Download URL: gauge_llm-0.0.121.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.8 Linux/5.19.0-1025-gcp

File hashes

Hashes for gauge_llm-0.0.121.tar.gz
Algorithm Hash digest
SHA256 601d0a78bd30bb19177aada072c48a84f22a2e6a0c096d0c7a5944e6300a7e76
MD5 79c093bd2a204784fc143859b4892448
BLAKE2b-256 85aba6bcdc9c3f4c85fe3fec475b56a6ce4936def56a2de2afe278e90e7a24e3

See more details on using hashes here.

File details

Details for the file gauge_llm-0.0.121-py3-none-any.whl.

File metadata

  • Download URL: gauge_llm-0.0.121-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.8 Linux/5.19.0-1025-gcp

File hashes

Hashes for gauge_llm-0.0.121-py3-none-any.whl
Algorithm Hash digest
SHA256 ebc4aa24d807de9038f8b03bb9656472c7d19c46fa866fc88eef61cd7d22b8fb
MD5 b138dc1fc7d83947d7d6d86c145de136
BLAKE2b-256 3205f87fe1a6414d33ec9817a592e95c94f65bfb3c4bbcdb82f454ef51ecf434

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page