ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models
Project description
ReFeR: Reason Feedback Review
ReFeR (Reason Feedback Review) is a LLM or VLM Agents framework for conducting comprehensive evaluations or reasoning using a peer review mechanism and a Hierarchy of Models. It allows for setting up multiple peer models and AC (Area Chair) models, with options to set prompts, hyperparameters, and control over the number of peers and ACs.
Key Features
- Multi-Platform Support: Integrates with multiple AI platforms including OpenAI, Mistral, TogetherAI, Google (Gemini), and Groq.
- Flexible Model Configuration: Easily set up multiple peer models and AC models with customizable parameters.
- Optimized Prompt Generation: Utilizes AutoPrompt to generate optimized prompts based on user-provided task instructions and examples.
- Batch Processing: Supports batch inference with optional multi-threading for improved performance.
- Multimodal Capabilities: Handles both text and image inputs for versatile tasks (currently only supports OpenAI and Google models for multimodal inputs).
- Customizable Response Processing: Allows for regex patterns or custom functions to process peer responses before passing them to the AC model.
- Comprehensive Logging: Detailed logging and error handling for easy debugging and monitoring.
Installation
You can install ReFeR directly through pip or from this repository:
pip install refer-agents
or
git clone https://github.com/yaswanth-iitkgp/ReFeR
cd refer
pip install .
Requirements
- Python 3.12 or later
- API keys for supported platforms (OpenAI, Mistral, TogetherAI, Google, Groq)
Basic Usage
Here's a simple example of how to use ReFeR:
from refer_agents.core import ReFeR
# Initialize ReFeR
refer = ReFeR(log_level='INFO')
# Set API keys
refer.set_api_key('openai', 'your-openai-api-key')
refer.set_api_key('mistral', 'your-mistral-api-key')
refer.set_api_key('togetherai', 'your-togetherai-api-key')
refer.set_api_key('groq', 'your-groq-api-key')
# Configure models
refer.set_num_peers(3)
refer.set_num_acs(1)
refer.add_peer(model_name='meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', platform='togetherai')
refer.add_peer(model_name='open-mistral-nemo', platform='mistral')
refer.add_peer(model_name='gemma2-9b-it', platform='groq')
refer.set_ac_model(model_name='gpt-4o-mini', platform='openai')
# Set prompts and generate optimized versions
prompt = "Your peer prompt here"
refer.set_peer_prompt(prompt)
optimized_peer_prompt, optimized_ac_prompt = refer.generate_optimized_prompts()
#skip optimization if you already have optimized prompts.
#always include placeholder in peer prompt as {{user_input}} and for AC prompt as {{user_input}} ,{{peer_response}}
optimized_peer_prompt = "Your optimized peer prompt here"
optimized_ac_prompt = "Your optimized AC prompt here"
# Run inference
user_input = "The content to be evaluated"
result = refer.infer(user_input, optimized_peer_prompt, optimized_ac_prompt)
print(result)
# Run batch inference
user_inputs = ["Input 1", "Input 2", "Input 3"]
results = refer.batch_infer(user_inputs, optimized_peer_prompt, optimized_ac_prompt, use_threading=True, max_workers=4, output_file='results.json')
Advanced Usage
Multimodal Evaluation
ReFeR supports multimodal inputs, allowing you to evaluate image-text pairs:
# Configure multimodal models
refer.add_peer(model_name='gpt-4o-mini', platform='openai')
refer.add_peer(model_name='gemini-1.5-flash', platform='google')
refer.set_ac_model(model_name='gpt-4o', platform='openai')
#use your optimized prompts
#always include placeholder in peer prompt as {{user_input}} and for AC prompt as {{user_input}} ,{{peer_response}}
optimized_peer_prompt = "Your optimized peer prompt here"
optimized_ac_prompt = "Your optimized AC prompt here"
# Prepare inputs
inputs = ["Text description 1", "Text description 2"]
image_paths = ["path/to/image1.jpg", "path/to/image2.jpg"]
# Run multimodal batch inference
results = refer.batch_infer_multimodal(
inputs,
image_paths,
optimized_peer_prompt,
optimized_ac_prompt,
sleep_time=1,
output_file='multimodal_results.json'
)
Custom Response Processing
You can set a custom function to process peer responses:
def custom_processor(response):
# Your custom processing logic here for processing peer responses before passing them to AC.
return processed_response
refer.set_peer_response_processing_function(custom_processor)
Setting AC Mode
Choose between 'Lite' and 'Turbo' modes for the AC model:
refer.set_ac_mode('Lite') # or 'Turbo' (turbo is only supported for openai models as Area Chair and it generates 20 (by default)responses for AC.)
Setting Hyperparameters
You can set hyperparameters for the AC model:
refer.set_hyperparameters(temperature=0.7)
Example Use Cases
ReFeR can be applied to various evaluation tasks, such as:
- Mathematical Problem Solving: Evaluate solutions to complex math problems (see
example_usage_gsm8k.py). - Conversational Engagement: Rate the engagingness of responses in a conversation (see
example_usage_topicalchat.py). - Image-Text Alignment: Assess how well text descriptions match given images (see
example_multimodal.py).
Error Handling and Logging
ReFeR includes comprehensive error handling and logging. Set the logging level when initializing:
refer = ReFeR(log_level='INFO') # Options: 'INFO', 'WARNING', 'ERROR'
Contributing
We welcome contributions! For major changes, please open an issue first to discuss what you'd like to change.
License
Credits
The codebase was developed by Yaswanth Narsupalli and Sreevatsa Muppirala.
For any issues, doubts, or questions regarding the codebase, please feel free to contact us (yasshu.yaswanth@gmail.com, sreevatsa2002@gmail.com). We are here to help and would be happy to assist you with any concerns or clarifications you may need.
Citation
If you use this software in your research, please cite the paper as follows:
@misc{narsupalli2024reviewfeedbackreasonrefernovelframework,
title={ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models},
author={Yaswanth Narsupalli and Abhranil Chandra and Sreevatsa Muppirala and Manish Gupta and Pawan Goyal},
year={2024},
eprint={2407.12877},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.12877},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file refer_agents-0.1.2.tar.gz.
File metadata
- Download URL: refer_agents-0.1.2.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cc0d11e9d39ae72b6c38b17732e82ecb1b8825478b5709177e8cf9d7eb391c5
|
|
| MD5 |
1050647ba84957d9f07ba9a659bf7c4a
|
|
| BLAKE2b-256 |
17717bbbb8c4f423d12853c421b2c6495cefa14ae47585d612b710df05c19f64
|
File details
Details for the file refer_agents-0.1.2-py3-none-any.whl.
File metadata
- Download URL: refer_agents-0.1.2-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f76e300470a7770d93048368c13ec920d74825cf0560e52d32f4c1635ae4304
|
|
| MD5 |
9f5f6085a9574ba13d2919c240f04ec2
|
|
| BLAKE2b-256 |
5f46802c85d702893307e3c513fd01179079a345721dd18ff6c359c994de4765
|