Skip to main content

Python SDK to configure and run evaluations for your LLM-based application

Project description

Overview

athina-evals is an framework to help you quickly set up evaluations and monitoring for your LLM-powered application

It's difficult to know if your LLM response is good or bad. Most developers start out by simply eyeballing the responses. This is fine when you're building a prototype and testing on 5-10 examples.

But once you optimize for reliability in production, this method breaks down.

Evals can help you:

  • Detect regressions
  • Measure performance of model (as defined by your goals)
  • A/B test different models and prompts rapidly
  • Monitor production data with confidence
  • Run quantifiable experiments against ambiguous conversations

Think of evals like unit tests for your LLM app.

Documentation

See https://docs.athina.ai for the complete documentation.

Quick Start

1. Install the package

pip install athina-evals

2. Get an Athina API key

Sign up at athina.ai to get an API key.

(free, and only takes about 30 seconds)

3. Set API keys

from athina.keys import AthinaApiKey, OpenAiApiKey

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

4. Run evals

# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagLoader().load_json(json_file)

# Run the DoesResponseAnswerQuery evaluator on the dataset
DoesResponseAnswerQuery().run_batch(data=dataset)



Why should I use Athina's Evals instead of writing my own?

You could build your own eval system from scratch, but here's why Athina might be better for you:

  • Athina provides you with plug-and-play preset evals that have been well-tested
  • Athina evals can run on both development and production, giving you consistent metrics for evaluating model performance and drift.
  • Athina removes the need for your team to write boilerplate loaders, implement LLMs, normalize data formats, etc
  • Athina offers a modular, extensible framework for writing and running evals
  • Athina calculate analytics like pass rate and flakiness, and allows you to batch run evals against live production data or dev datasets

Athina Evals Platform

Need Production Monitoring and Evals? We've got you covered...

  • Athina eval runs automatically write into Athina Dashboard, so you can view results and analytics in a beautiful UI.
  • Athina track your experiments automatically, so you can view a historical record of previous eval runs.
  • Athina calculates analytics segmented at every level possible, so you can view and compare your model performance at very granular levels.

Athina Observe Platform

About Athina

Athina is building an end-to-end LLM monitoring and evaluation platform.

Website | Demo Video

Contact us at hello@athina.ai for any questions about the eval library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athina_evals-0.0.3.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

athina_evals-0.0.3-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file athina_evals-0.0.3.tar.gz.

File metadata

  • Download URL: athina_evals-0.0.3.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.9.16 Darwin/23.0.0

File hashes

Hashes for athina_evals-0.0.3.tar.gz
Algorithm Hash digest
SHA256 a64adbafac9040c096147b26d18a136fd14effc3c2e5c32344287d3b05e99f27
MD5 6c1a7390fd5e5fad99bbc92f862f4e1a
BLAKE2b-256 5e47fd0f73e2f03d7d5ded5717916537a46e167527785203ec21efdbe578bbf5

See more details on using hashes here.

File details

Details for the file athina_evals-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: athina_evals-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 30.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.9.16 Darwin/23.0.0

File hashes

Hashes for athina_evals-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 82cc04c664143472b81439356e6a47ed41c3f394ca22e5ad6dc7a98d83add2b6
MD5 184c70f4a0bec39cea77728dd7c2b79b
BLAKE2b-256 4debf1ca2e83023938b76819dbda65f96ca8431e2e9c79e471fea60e13df775d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page