Python SDK to configure and run evaluations for your LLM-based application
Project description
Overview
athina-evals
is an framework to help you quickly set up evaluations and monitoring for your LLM-powered application
It's difficult to know if your LLM response is good or bad. Most developers start out by simply eyeballing the responses. This is fine when you're building a prototype and testing on 5-10 examples.
But once you optimize for reliability in production, this method breaks down.
Evals can help you:
- Detect regressions
- Measure performance of model (as defined by your goals)
- A/B test different models and prompts rapidly
- Monitor production data with confidence
- Run quantifiable experiments against ambiguous conversations
Think of evals like unit tests for your LLM app.
Documentation
See https://docs.athina.ai for the complete documentation.
Quick Start
1. Install the package
pip install athina-evals
2. Get an Athina API key
Sign up at athina.ai to get an API key.
(free, and only takes about 30 seconds)
3. Set API keys
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))
4. Run evals
# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagLoader().load_json(json_file)
# Run the DoesResponseAnswerQuery evaluator on the dataset
DoesResponseAnswerQuery().run_batch(data=dataset)
Why should I use Athina's Evals instead of writing my own?
You could build your own eval system from scratch, but here's why Athina might be better for you:
- Athina provides you with plug-and-play preset evals that have been well-tested
- Athina evals can run on both development and production, giving you consistent metrics for evaluating model performance and drift.
- Athina removes the need for your team to write boilerplate loaders, implement LLMs, normalize data formats, etc
- Athina offers a modular, extensible framework for writing and running evals
- Athina calculate analytics like pass rate and flakiness, and allows you to batch run evals against live production data or dev datasets
Need Production Monitoring and Evals? We've got you covered...
- Athina eval runs automatically write into Athina Dashboard, so you can view results and analytics in a beautiful UI.
- Athina track your experiments automatically, so you can view a historical record of previous eval runs.
- Athina calculates analytics segmented at every level possible, so you can view and compare your model performance at very granular levels.
About Athina
Athina is building an end-to-end LLM monitoring and evaluation platform.
Contact us at hello@athina.ai for any questions about the eval library.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.