Skip to main content

Python SDK to configure and run evaluations for your LLM-based application

Project description

Overview

athina-evals is an framework to help you quickly set up evaluations and monitoring for your LLM-powered application

It's difficult to know if your LLM response is good or bad. Most developers start out by simply eyeballing the responses. This is fine when you're building a prototype and testing on 5-10 examples.

But once you optimize for reliability in production, this method breaks down.

Evals can help you:

  • Detect regressions
  • Measure performance of model (as defined by your goals)
  • A/B test different models and prompts rapidly
  • Monitor production data with confidence
  • Run quantifiable experiments against ambiguous conversations

Think of evals like unit tests for your LLM app.

Documentation

See https://docs.athina.ai for the complete documentation.

Quick Start

1. Install the package

pip install athina-evals

2. Get an Athina API key

Sign up at athina.ai to get an API key.

(free, and only takes about 30 seconds)

3. Set API keys

from athina.keys import AthinaApiKey, OpenAiApiKey

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

4. Run evals

# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagLoader().load_json(json_file)

# Run the DoesResponseAnswerQuery evaluator on the dataset
DoesResponseAnswerQuery().run_batch(data=dataset)



Why should I use Athina's Evals instead of writing my own?

You could build your own eval system from scratch, but here's why Athina might be better for you:

  • Athina provides you with plug-and-play preset evals that have been well-tested
  • Athina evals can run on both development and production, giving you consistent metrics for evaluating model performance and drift.
  • Athina removes the need for your team to write boilerplate loaders, implement LLMs, normalize data formats, etc
  • Athina offers a modular, extensible framework for writing and running evals
  • Athina calculate analytics like pass rate and flakiness, and allows you to batch run evals against live production data or dev datasets

Athina Evals Platform

Need Production Monitoring and Evals? We've got you covered...

  • Athina eval runs automatically write into Athina Dashboard, so you can view results and analytics in a beautiful UI.
  • Athina track your experiments automatically, so you can view a historical record of previous eval runs.
  • Athina calculates analytics segmented at every level possible, so you can view and compare your model performance at very granular levels.

Athina Observe Platform

About Athina

Athina is building an end-to-end LLM monitoring and evaluation platform.

Website | Demo Video

Contact us at hello@athina.ai for any questions about the eval library.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athina-0.1.0.tar.gz (25.1 kB view hashes)

Uploaded Source

Built Distribution

athina-0.1.0-py3-none-any.whl (38.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page