Skip to main content

Library with langchain instrumentation to evaluate LLM based applications.

Project description

Welcome to TruLens-Eval!

TruLens

Evaluate and track your LLM experiments with TruLens. As you work on your models and prompts TruLens-Eval supports the iterative development and of a wide range of LLM applications by wrapping your application to log key metadata across the entire chain (or off chain if your project does not use chains) on your local machine.

Using feedback functions, you can objectively evaluate the quality of the responses provided by an LLM to your requests. This is completed with minimal latency, as this is achieved in a sequential call for your application, and evaluations are logged to your local machine. Finally, we provide an easy to use Streamlit dashboard run locally on your machine for you to better understand your LLM’s performance.

Value Propositions

TruLens-Eval has two key value propositions:

  1. Evaluation:
    • TruLens supports the the evaluation of inputs, outputs and internals of your LLM application using any model (including LLMs).
    • A number of feedback functions for evaluation are implemented out-of-the-box such as groundedness, relevance and toxicity. The framework is also easily extensible for custom evaluation requirements.
  2. Tracking:
    • TruLens contains instrumentation for any LLM application including question answering, retrieval-augmented generation, agent-based applications and more. This instrumentation allows for the tracking of a wide variety of usage metrics and metadata. Read more in the instrumentation overview.
    • TruLens' instrumentation can be applied to any LLM application without being tied down to a given framework. Additionally, deep integrations with LangChain and Llama-Index allow the capture of internal metadata and text.
    • Anything that is tracked by the instrumentation can be evaluated!

The process for building your evaluated and tracked LLM application with TruLens is below 👇

Architecture Diagram

Installation and Setup

Install the trulens-eval pip package from PyPI.

    pip install trulens-eval

Setting Keys

In any of the quickstarts, you will need OpenAI and Huggingface keys. You can add keys by setting the environmental variables:

import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["HUGGINGFACE_API_KEY"] = "..."

Quick Usage

TruLens supports the evaluation of tracking for any LLM app framework. Choose a framework below to get started:

Langchain

langchain_quickstart.ipynb. Open In Colab

langchain_quickstart.py.

Llama-Index

llama_index_quickstart.ipynb. Open In Colab

llama_index_quickstart.py

No Framework

text2text_quickstart.ipynb. Open In Colab

text2text_quickstart.py

💡 Contributing

Interested in contributing? See our contribution guide for more details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trulens_eval-0.13.0a0-py3-none-any.whl (585.5 kB view details)

Uploaded Python 3

File details

Details for the file trulens_eval-0.13.0a0-py3-none-any.whl.

File metadata

File hashes

Hashes for trulens_eval-0.13.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 6fca30ebc415d63bdfa9ecf6710a44b224f8aa1a97532a1b630fff68c9b77c36
MD5 11a1ce9fce398bad2446cd09a70d39ce
BLAKE2b-256 7188560302417f1325450cd7680d8050588e2915bf04436b6f381258ff34959b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page