An SDK to measure evaluation criteria (ex: faithfulness) of generative AI outputs
Project description
LastMile AI Eval
An SDK to measure evaluation criteria (ex: faithfulness) of generative AI outputs.
Particularly, we evaluate based on this triplet of information:
- User query
- Data that goes into the LLM
- LLM's output response
The method get_rag_eval_scores()
takes in these 3 arguments (and other ones like api_token
) and outputs a faithfulness score between 0 to 1.
Usage
To use this library, add this to your code, replacing queries
, data
, and responses
with your own values.
from lastmile_eval.rag import get_rag_eval_scores
statement1 = "the sky is red"
statement2 = "the sky is blue"
queries = ["what color is the sky?", "is the sky blue?"]
data = [statement1, statement1]
responses = [statement1, statement2]
api_token = <lastmile-api-token>
result = get_rag_eval_scores(
queries,
data,
responses,
api_token,
)
# result will look something like:
# {'p_faithful': [0.9955534338951111, 6.857347034383565e-05]}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lastmile-eval-0.0.1.tar.gz
(5.6 kB
view hashes)
Built Distribution
Close
Hashes for lastmile_eval-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31d5204c4ac59bb6ca19c1150107a8ea4e1a7e1b608ec826ee4cde5f6cf73e25 |
|
MD5 | 58ca0854d4b9704c0b18aac040fafb4b |
|
BLAKE2b-256 | 94bbdc96c8b73624fe7180492bee676feb970cc31f6750b221c83e33c438f2b5 |