athina

Python SDK to configure and run evaluations for your LLM-based application

These details have not been verified by PyPI

Project description

Overview

athina-evals is an framework to help you quickly set up evaluations and monitoring for your LLM-powered application

It's difficult to know if your LLM response is good or bad. Most developers start out by simply eyeballing the responses. This is fine when you're building a prototype and testing on 5-10 examples.

But once you optimize for reliability in production, this method breaks down.

Evals can help you:

Detect regressions
Measure performance of model (as defined by your goals)
A/B test different models and prompts rapidly
Monitor production data with confidence
Run quantifiable experiments against ambiguous conversations

Think of evals like unit tests for your LLM app.

Documentation

See https://docs.athina.ai for the complete documentation.

Quick Start

1. Install the package

pip install athina-evals

2. Get an Athina API key

(free, and only takes about 30 seconds)

3. Set API keys

from athina.keys import AthinaApiKey, OpenAiApiKey

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

4. Run evals

# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagLoader().load_json(json_file)

# Run the DoesResponseAnswerQuery evaluator on the dataset
DoesResponseAnswerQuery().run_batch(data=dataset)

Why should I use Athina's Evals instead of writing my own?

You could build your own eval system from scratch, but here's why Athina might be better for you:

Athina provides you with plug-and-play preset evals that have been well-tested
Athina evals can run on both development and production, giving you consistent metrics for evaluating model performance and drift.
Athina removes the need for your team to write boilerplate loaders, implement LLMs, normalize data formats, etc
Athina offers a modular, extensible framework for writing and running evals
Athina calculate analytics like pass rate and flakiness, and allows you to batch run evals against live production data or dev datasets

Athina Evals Platform

Need Production Monitoring and Evals? We've got you covered...

Athina eval runs automatically write into Athina Dashboard, so you can view results and analytics in a beautiful UI.
Athina track your experiments automatically, so you can view a historical record of previous eval runs.
Athina calculates analytics segmented at every level possible, so you can view and compare your model performance at very granular levels.

Athina Observe Platform

About Athina

Athina is building an end-to-end LLM monitoring and evaluation platform.

Website | Demo Video

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.6.10

Nov 11, 2024

1.6.9

Nov 5, 2024

1.6.8

Nov 5, 2024

1.6.7

Oct 22, 2024

1.6.6

Oct 16, 2024

1.6.5

Oct 14, 2024

1.6.4

Oct 11, 2024

1.6.3

Aug 17, 2024

1.6.2

Aug 16, 2024

1.6.1

Aug 12, 2024

1.6.0

Aug 11, 2024

1.5.30

Oct 11, 2024

1.5.29

Oct 10, 2024

1.5.28

Oct 9, 2024

1.5.27

Oct 8, 2024

1.5.26

Oct 4, 2024

1.5.25

Oct 2, 2024

1.5.24

Oct 2, 2024

1.5.23

Sep 30, 2024

1.5.22

Sep 26, 2024

1.5.21

Sep 26, 2024

1.5.20

Sep 25, 2024

1.5.19

Sep 25, 2024

1.5.18

Sep 20, 2024

1.5.17

Sep 20, 2024

1.5.16

Sep 19, 2024

1.5.15

Sep 18, 2024

1.5.14

Sep 16, 2024

1.5.13

Sep 13, 2024

1.5.12

Aug 28, 2024

1.5.11

Aug 22, 2024

1.5.10

Aug 22, 2024

1.5.9

Aug 20, 2024

1.5.8

Aug 5, 2024

1.5.7

Aug 5, 2024

1.5.6

Aug 3, 2024

1.5.5

Aug 2, 2024

1.5.4

Aug 1, 2024

1.5.3

Aug 1, 2024

1.5.2

Jul 30, 2024

1.5.1

Jul 23, 2024

1.5.0

Jul 19, 2024

1.4.28

Jul 17, 2024

1.4.27

Jul 12, 2024

1.4.26

Jul 7, 2024

1.4.25

Jul 5, 2024

1.4.24

Jul 3, 2024

1.4.22

Jul 3, 2024

1.4.21

Jul 3, 2024

1.4.20

Jul 2, 2024

1.4.19

Jun 25, 2024

1.4.18

Jun 25, 2024

1.4.17

Jun 21, 2024

1.4.16

Jun 21, 2024

1.4.15

Jun 18, 2024

1.4.14

Jun 13, 2024

1.4.13

Jun 12, 2024

1.4.12

Jun 12, 2024

1.4.11

Jun 12, 2024

1.4.10

Jun 11, 2024

1.4.9

Jun 6, 2024

1.4.8

Jun 5, 2024

1.4.7

Jun 4, 2024

1.4.6

Jun 3, 2024

1.4.5

Jun 3, 2024

1.4.4

Jun 2, 2024

1.4.3

Jun 1, 2024

1.4.2

Jun 1, 2024

1.4.1

May 30, 2024

1.4.0

May 29, 2024

1.3.3

May 28, 2024

1.3.2

May 27, 2024

1.3.1

May 25, 2024

1.3.0

May 22, 2024

1.2.19

May 16, 2024

1.2.18

May 14, 2024

1.2.17

May 14, 2024

1.2.16

May 11, 2024

1.2.15

May 3, 2024

1.2.14

Apr 19, 2024

1.2.13

Apr 19, 2024

1.2.12

Apr 16, 2024

1.2.11

Apr 16, 2024

1.2.10

Apr 13, 2024

1.2.9

Apr 13, 2024

1.2.8

Apr 7, 2024

1.2.7

Mar 31, 2024

1.2.6

Mar 30, 2024

1.2.5

Mar 27, 2024

1.2.4

Mar 27, 2024

1.2.3

Mar 27, 2024

1.2.2

Mar 27, 2024

1.2.1

Mar 20, 2024

1.2.0

Mar 20, 2024

1.1.5

Mar 20, 2024

1.1.4

Mar 6, 2024

1.1.3

Mar 6, 2024

1.1.2

Mar 6, 2024

1.1.1

Mar 3, 2024

1.1.0

Mar 3, 2024

1.0.4

Feb 29, 2024

1.0.3

Feb 28, 2024

1.0.2

Feb 13, 2024

1.0.1

Feb 5, 2024

1.0.0

Jan 30, 2024

0.3.7

Jan 30, 2024

0.3.6

Jan 25, 2024

0.3.5

Jan 24, 2024

0.3.4

Jan 24, 2024

0.3.3

Jan 24, 2024

0.3.2

Jan 24, 2024

0.3.1

Jan 22, 2024

0.3.0

Jan 19, 2024

0.2.0

Jan 17, 2024

0.1.9

Jan 16, 2024

0.1.8

Jan 15, 2024

0.1.7

Jan 15, 2024

0.1.6

Jan 14, 2024

0.1.5

Jan 11, 2024

0.1.4

Dec 26, 2023

0.1.3

Dec 26, 2023

0.1.2

Dec 21, 2023

0.1.1

Dec 21, 2023

This version

0.1.0

Dec 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athina-0.1.0.tar.gz (25.1 kB view hashes)

Uploaded Dec 15, 2023 Source

Built Distribution

athina-0.1.0-py3-none-any.whl (38.7 kB view hashes)

Uploaded Dec 15, 2023 Python 3

Hashes for athina-0.1.0.tar.gz

Hashes for athina-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ea38ee6ff16cf8bbc82a4761d57bbab09fefaaffa058bfb9d63e8b7ad661def0`
MD5	`7eeed678dd6782b6f75d3c2669af9829`
BLAKE2b-256	`107b2034b28c494d3ba511537196ab43604765fb45eb283c28e2cf292bce659a`

Hashes for athina-0.1.0-py3-none-any.whl

Hashes for athina-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27a66e6a541b869a1b311a78ab0ecf8553313eae19de635f081771f8b14f8c00`
MD5	`84bcd6d92d34ad7aafbef66167384b8a`
BLAKE2b-256	`6bb09ee599747fd7320f5816e5cee1fe2c7aca7d6cc6174be5ce07e17db71004`