athina

Python SDK to configure and run evaluations for your LLM-based application

These details have not been verified by PyPI

Project description

Overview

Athina is an open-source library with plug-and-play preset evals designed to help engineers systematically improve their LLM reliability and performance through eval-driven-development.

develop-ui-results-metrics-5-bg

Quick Links

Why you need evals

Evaluations (evals) play a crucial role in assessing the performance of LLM responses, especially when scaling from prototyping to production.

They are akin to unit tests for LLM applications, allowing developers to:

Catch and prevent hallucinations and bad outputs
Measure the performance of model
Run quantifiable experiments against ambiguous, unstructured text data
A/B test different models and prompts rapidly
Detect regressions before they get to production
Monitor production data with confidence

🔴 Problem: Flaws with Current LLM Developer Workflows

The journey from a demo AI to a reliable production application is not easy.

Developers usually start iterating on performance by manually inspecting the outputs. Eventually they progress to using spreadsheets, CSVs, or evaluating against a golden dataset.

Each method has drawbacks, requires different tooling, and evaluation methods. See more

A lot of manual effort is required to set up a good infrastructure for running evals - creating a dataset, reviewing the responses, creating evals, and internal tooling / dashboard, tracking experiment parameters and metrics for historical record.

Eventually every LLM developer realizes the indispensable need for evals and an infrastructure to consistently run and track iterations to improve performance and reliability systematically.

🟢 Solution: Athina Evals

Github | Watch Demo Video | Docs

Athina is an open-source library that offers a system for eval-driven development, overcoming the limitations of traditional workflows.

Our solution allows for rapid experimentation, and customizable evaluators with consistent metrics.

Here’s why this is better than building in-house eval infrastructure:

Plug-and-Play Preset Evals: Ready-to-use evals for immediate application
Integrated Dashboard: For tracking experiments and inspecting the results in a web UI.
Custom Evaluators : A flexible framework to craft tailored evals.
Consistent Metrics: Uniform evaluation standards across all stages. Evaluate your model in dev and prod using a consistent set of metrics.
Historical Record: Automatic tracking of every prompt iteration.
Quick Start: Easy 5-min set up.

Here’s a demo video.

Quick Start

The easiest way to get started is to use one of our Example Notebooks as a starting point.

To get started with Athina Evals:

1. Install the athina package

pip install athina

2. Set your API keys

If you are using the python SDK, then can set the API keys like this:

from athina.keys import AthinaApiKey, OpenAiApiKey

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

If you are using the CLI, then run athina init, and enter the API keys when prompted.

3. Load your dataset like this:

You can also load data using a CSV or Python Dictionary

from athina.loaders import RagLoader

dataset = RagLoader().load_json(json_filepath)

4. Now you can run evals like this.

from athina.evals import DoesResponseAnswerQuery

DoesResponseAnswerQuery().run_batch(data=dataset)

For more detailed guides, you can follow the links below to get started running evals using Athina.

Preset Evals

You can use our preset evaluators to add evaluation to your dev stack rapidly.

Here are the preset evaluators in this library:

RAG Evals

These evals are useful for evaluating LLM applications with Retrieval Augmented Generation (RAG).

We have also built other evaluators that are not yet a part of this library (but will soon be) You can find more information about these in our documentation.

Summarization Accuracy Evals:

These evals are useful for evaluating LLM-powered summarization performance.

More Evals

Other Evals

Custom Evals

See this page for more information, on how to write your own custom evals.

Why should I use Athina's Evals instead of writing my own?

You could build your own eval system from scratch, but here's why Athina might be better for you:

Athina provides you with plug-and-play preset evals that have been well-tested
Athina evals can run on both development and production, giving you consistent metrics for evaluating model performance and drift.
Athina removes the need for your team to write boilerplate loaders, implement LLMs, normalize data formats, etc
Athina offers a modular, extensible framework for writing and running evals
Athina calculate analytics like pass rate and flakiness, and allows you to batch run evals against live production data or dev datasets

Need Production Monitoring and Evals? We've got you covered...

Athina eval runs automatically write into Athina Dashboard, so you can view results and analytics in a beautiful UI.
Athina track your experiments automatically, so you can view a historical record of previous eval runs.
Athina calculates analytics segmented at every level possible, so you can view and compare your model performance at very granular levels.

Athina Observe Platform

About Athina

Athina is building an end-to-end LLM monitoring and evaluation platform.

Website | Demo Video

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.7.39

May 19, 2025

1.7.38

Apr 28, 2025

1.7.37

Apr 28, 2025

1.7.36

Apr 24, 2025

1.7.35

Apr 15, 2025

1.7.34

Apr 5, 2025

1.7.33

Mar 27, 2025

1.7.32

Mar 26, 2025

1.7.31

Mar 25, 2025

1.7.30

Mar 17, 2025

1.7.29

Mar 13, 2025

1.7.28

Mar 10, 2025

1.7.27

Mar 7, 2025

1.7.26

Mar 6, 2025

1.7.25

Mar 6, 2025

1.7.24

Mar 3, 2025

1.7.23

Mar 3, 2025

1.7.22

Mar 3, 2025

1.7.21

Mar 1, 2025

1.7.20

Feb 28, 2025

1.7.19

Feb 27, 2025

1.7.18

Feb 27, 2025

1.7.17

Feb 27, 2025

1.7.16

Feb 26, 2025

1.7.15

Feb 22, 2025

1.7.14

Feb 13, 2025

1.7.13

Feb 12, 2025

1.7.12

Feb 5, 2025

1.7.11

Feb 1, 2025

1.7.10

Jan 30, 2025

1.7.9

Jan 27, 2025

1.7.8

Jan 27, 2025

1.7.7

Jan 11, 2025

1.7.6

Jan 8, 2025

1.7.5

Jan 7, 2025

1.7.4

Jan 7, 2025

1.7.3

Jan 4, 2025

1.7.2

Jan 1, 2025

1.7.1

Jan 1, 2025

1.7.0

Dec 28, 2024

1.6.33

Dec 28, 2024

1.6.32

Dec 24, 2024

1.6.31

Dec 23, 2024

1.6.30

Dec 18, 2024

1.6.29 yanked

Dec 17, 2024

1.6.28

Dec 12, 2024

1.6.27

Dec 11, 2024

1.6.26

Dec 10, 2024

1.6.25

Dec 6, 2024

1.6.24

Dec 6, 2024

1.6.23

Dec 6, 2024

1.6.22

Dec 6, 2024

1.6.21

Dec 6, 2024

1.6.20

Dec 6, 2024

1.6.19

Dec 5, 2024

1.6.18

Dec 3, 2024

1.6.17

Dec 2, 2024

1.6.16

Dec 2, 2024

1.6.15

Nov 30, 2024

1.6.14

Nov 28, 2024

1.6.13

Nov 25, 2024

1.6.12

Nov 15, 2024

1.6.11

Nov 13, 2024

1.6.10

Nov 11, 2024

1.6.9

Nov 5, 2024

1.6.8

Nov 5, 2024

1.6.7

Oct 22, 2024

1.6.6

Oct 16, 2024

1.6.5

Oct 14, 2024

1.6.4

Oct 11, 2024

1.6.3

Aug 17, 2024

1.6.2

Aug 16, 2024

1.6.1

Aug 12, 2024

1.6.0

Aug 11, 2024

1.5.30

Oct 11, 2024

1.5.29

Oct 10, 2024

1.5.28

Oct 9, 2024

1.5.27

Oct 8, 2024

1.5.26

Oct 4, 2024

1.5.25

Oct 2, 2024

1.5.24

Oct 2, 2024

1.5.23

Sep 30, 2024

1.5.22

Sep 26, 2024

1.5.21

Sep 26, 2024

1.5.20

Sep 25, 2024

1.5.19

Sep 25, 2024

1.5.18

Sep 20, 2024

1.5.17

Sep 20, 2024

1.5.16

Sep 19, 2024

1.5.15

Sep 18, 2024

1.5.14

Sep 16, 2024

1.5.13

Sep 13, 2024

1.5.12

Aug 28, 2024

1.5.11

Aug 22, 2024

1.5.10

Aug 22, 2024

1.5.9

Aug 20, 2024

1.5.8

Aug 5, 2024

1.5.7

Aug 5, 2024

1.5.6

Aug 3, 2024

1.5.5

Aug 2, 2024

1.5.4

Aug 1, 2024

1.5.3

Aug 1, 2024

1.5.2

Jul 30, 2024

1.5.1

Jul 23, 2024

1.5.0

Jul 19, 2024

1.4.28

Jul 17, 2024

1.4.27

Jul 12, 2024

1.4.26

Jul 7, 2024

1.4.25

Jul 5, 2024

1.4.24

Jul 3, 2024

1.4.22

Jul 3, 2024

1.4.21

Jul 3, 2024

1.4.20

Jul 2, 2024

1.4.19

Jun 25, 2024

1.4.18

Jun 25, 2024

1.4.17

Jun 21, 2024

1.4.16

Jun 21, 2024

1.4.15

Jun 18, 2024

1.4.14

Jun 13, 2024

1.4.13

Jun 12, 2024

1.4.12

Jun 12, 2024

1.4.11

Jun 12, 2024

1.4.10

Jun 11, 2024

1.4.9

Jun 6, 2024

1.4.8

Jun 5, 2024

1.4.7

Jun 4, 2024

1.4.6

Jun 3, 2024

1.4.5

Jun 3, 2024

1.4.4

Jun 2, 2024

1.4.3

Jun 1, 2024

1.4.2

Jun 1, 2024

1.4.1

May 30, 2024

1.4.0

May 29, 2024

1.3.3

May 28, 2024

1.3.2

May 27, 2024

1.3.1

May 25, 2024

1.3.0

May 22, 2024

1.2.19

May 16, 2024

1.2.18

May 14, 2024

1.2.17

May 14, 2024

1.2.16

May 11, 2024

1.2.15

May 3, 2024

1.2.14

Apr 19, 2024

1.2.13

Apr 19, 2024

1.2.12

Apr 16, 2024

1.2.11

Apr 16, 2024

1.2.10

Apr 13, 2024

1.2.9

Apr 13, 2024

1.2.8

Apr 7, 2024

1.2.7

Mar 31, 2024

1.2.6

Mar 30, 2024

1.2.5

Mar 27, 2024

1.2.4

Mar 27, 2024

1.2.3

Mar 27, 2024

1.2.2

Mar 27, 2024

1.2.1

Mar 20, 2024

1.2.0

Mar 20, 2024

1.1.5

Mar 20, 2024

1.1.4

Mar 6, 2024

1.1.3

Mar 6, 2024

1.1.2

Mar 6, 2024

1.1.1

Mar 3, 2024

1.1.0

Mar 3, 2024

1.0.4

Feb 29, 2024

1.0.3

Feb 28, 2024

1.0.2

Feb 13, 2024

1.0.1

Feb 5, 2024

1.0.0

Jan 30, 2024

0.3.7

Jan 30, 2024

0.3.6

Jan 25, 2024

0.3.5

Jan 24, 2024

0.3.4

Jan 24, 2024

0.3.3

Jan 24, 2024

0.3.2

Jan 24, 2024

0.3.1

Jan 22, 2024

0.3.0

Jan 19, 2024

0.2.0

Jan 17, 2024

0.1.9

Jan 16, 2024

0.1.8

Jan 15, 2024

0.1.7

Jan 15, 2024

0.1.6

Jan 14, 2024

0.1.5

Jan 11, 2024

This version

0.1.4

Dec 26, 2023

0.1.3

Dec 26, 2023

0.1.2

Dec 21, 2023

0.1.1

Dec 21, 2023

0.1.0

Dec 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

athina-0.1.4.tar.gz (36.2 kB view details)

Uploaded Dec 26, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

athina-0.1.4-py3-none-any.whl (54.4 kB view details)

Uploaded Dec 26, 2023 Python 3

File details

Details for the file athina-0.1.4.tar.gz.

File metadata

Download URL: athina-0.1.4.tar.gz
Upload date: Dec 26, 2023
Size: 36.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.9.16 Darwin/23.0.0

File hashes

Hashes for athina-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`506cf11d1f9b3befbd727dec6f18091d6f18dafec1368ea473da3365ee23bf76`
MD5	`795f3d77bbd5c04be987a29906364e99`
BLAKE2b-256	`343e21611c553fc9b78b8cc06bcc848c0f3f04ca3b5d51a0c057d605188689e6`

See more details on using hashes here.

File details

Details for the file athina-0.1.4-py3-none-any.whl.

File metadata

Download URL: athina-0.1.4-py3-none-any.whl
Upload date: Dec 26, 2023
Size: 54.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.9.16 Darwin/23.0.0

File hashes

Hashes for athina-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`711796b5caf601ebbba6fb08242f06a29c61ab8f7f21f96c7f881ffacf10f274`
MD5	`f4d3ee5e5b82ee90fafeae9676dc1059`
BLAKE2b-256	`afcdfadde4e1ac7e8361b305e5e3ef23e3230c59b4112bd4d1274506cd985c51`

See more details on using hashes here.

athina 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Overview

Why you need evals

🔴 Problem: Flaws with Current LLM Developer Workflows

🟢 Solution: Athina Evals

Quick Start

Preset Evals

RAG Evals

Summarization Accuracy Evals:

More Evals

Custom Evals

Why should I use Athina's Evals instead of writing my own?

Need Production Monitoring and Evals? We've got you covered...

About Athina

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes