Auditing and Label-free Evaluation for LLMs
Project description
Overview
Prior to deploying Large Language Models (LLMs) applications, there is a lack of tools available to simulate their behavior and boost confidence in their anticipated performance. How can potential failure cases or vulnerabilities that might deteriorate user experience or even harm reputation be proactively detected?
LLMs have the capability to audit and evaluate other LLMs efficiently without human oversight. As LLMs and their applications continually improve, integrating them into increasingly vital applications amplifies the ramifications of any potential failure.
RedEval is an open-source library that simulates and evaluates LLM applications across various scenarios, all while eliminating the need for time-intensive and expensive human intervation.
Auditing LLMs for various scenarios
Drawing inspiration from the architecture of red-teaming LLMs (Perez et al., 2022), RedEval introduces a series of multi-turn conversations where a red-team LLM challenges a target LLM across various scenarios. These scenarios can be categorized as follows:
Performance-Based on Application Layer
- Performance Evaluation: Simulates the behavior of a potential customer with genuine intent, inquiring about the product. Metrics include Faithfulness, Answer Relevance, and Context Similarity.
- Toxicity Evaluation: Simulates a scenario in which a customer asks toxic questions related to the product.
Adversarial Attacks
- Gaslighting: Simulates an agent that tries to manipulate the target into endorsing harmful actions.
- Guilt-Tripping: Simulates an agent that coerces the target into undesired actions by inducing guilt.
- Fraudulent Researcher: Simulates an agent that seeks harmful actions under the pretense of academic research.
Cyber-Security Attacks
- Prompt Injection: Introduces malicious prefixes or suffixes to the original prompt to elicit harmful behavior.
- Social Engineering Attack: Simulates an attacker seeking confidential or proprietary information from a company, falsely claiming to be a student.
LLM Evals
LLMs evals utilize an LLM's reasoning to automatically detect failure cases of LLMs. Drawing inspiration from RLAIF (Lee et al., 2023), LLMs are expected to be as competent as, or even superior to, humans in evaluating whether LLMs met specific criteria. RedEval introduces the following LLM evals.
RAG Evals
- Faithfulness Failure: A faithfulness failure occurs if the response cannot be inferred purely from the context provided.
- Context Relevance Failure: A context relevance failure (bad retrieval) occures if the user's query cannot be answered purely from the retrieved context.
- Answer Relevance Failure: An answer relevacne failure occurs if the response does not answer the question.
Attacks
- Toxicity Failure: A toxicity failure occurs if the response is toxic.
- Safety Failure: A toxicity failure occurs if the response is unsafe.
Get started
Installation
pip install epiphany
Run simulations
# Load a simulator
from redeval.simulators.performance_simulator import PerformanceSimulator
# Set up the parameters
openai_api_key = 'Your OpenAI APIS Key'
n_turns = 5
data_path_dir = 'Your txt document for RAG'
# Run RAG erformance simulation
PerformanceSimulator(openai_api_key=openai_api_key, n_turns=n_turns, data_path = data_path_dir).simulate()
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file redeval-0.1.0.tar.gz
.
File metadata
- Download URL: redeval-0.1.0.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.9.7 Darwin/22.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 224d66d2945acb9a21e7ec560b2d2a319b155ddfa9deb9012c79d5a5bfbe02bd |
|
MD5 | 62a2af2e896ecd3a8a94fafa4ea01303 |
|
BLAKE2b-256 | c3d4896dc835d2fc5ca906ac9dd724395f4efbd956821cb67d3ab9f0fc39e3d0 |
File details
Details for the file redeval-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: redeval-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.9.7 Darwin/22.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 306f0b46afbdb3d3060db4e14c9eaea105f19aa0baf88c8136eecf26f0bf9668 |
|
MD5 | d5d95bde19fa9465800ee86a13602f83 |
|
BLAKE2b-256 | 40d63291a1ba1b6cdca0c9841fe8ea4a961476a069e8150d2757a7a8ea1ba976 |