The LLM Evaluation Framework

These details have not been verified by PyPI

Project links

Project description

DeepEval Logo

The LLM Evaluation Framework

Documentation | Metrics and Features | Getting Started | Integrations | DeepEval Platform

DeepEval is a simple-to-use, open-source LLM evaluation framework, for evaluating and testing large-language model systems. It is similar to Pytest but specialized for unit testing LLM outputs. DeepEval incorporates the latest research to evaluate LLM outputs based on metrics such as G-Eval, task completion, answer relevancy, hallucination, etc., which uses LLM-as-a-judge and other NLP models that run locally on your machine for evaluation.

Whether your LLM applications are AI agents, RAG pipelines, or chatbots, implemented via LangChain or OpenAI, DeepEval has you covered. With it, you can easily determine the optimal models, prompts, and architecture to improve your RAG pipeline, agentic workflows, prevent prompt drifting, or even transition from OpenAI to hosting your own Deepseek R1 with confidence.

[!IMPORTANT] Need a place for your DeepEval testing data to live 🏡❤️? Sign up to the DeepEval platform to compare iterations of your LLM app, generate & share testing reports, and more.

Want to talk LLM evaluation, need help picking metrics, or just to say hi? Come join our discord.

🔥 Metrics and Features

🥳 You can now share DeepEval's test results on the cloud directly on Confident AI

Supports both end-to-end and component-level LLM evaluation.
Large variety of ready-to-use LLM evaluation metrics (all with explanations) powered by ANY LLM of your choice, statistical methods, or NLP models that run locally on your machine:
- G-Eval
- DAG (deep acyclic graph)
- RAG metrics:
  - Answer Relevancy
  - Faithfulness
  - Contextual Recall
  - Contextual Precision
  - Contextual Relevancy
  - RAGAS
- Agentic metrics:
  - Task Completion
  - Tool Correctness
- Others:
  - Hallucination
  - Summarization
  - Bias
  - Toxicity
- Conversational metrics:
  - Knowledge Retention
  - Conversation Completeness
  - Conversation Relevancy
  - Role Adherence
- etc.
Build your own custom metrics that are automatically integrated with DeepEval's ecosystem.
Generate synthetic datasets for evaluation.
Integrates seamlessly with ANY CI/CD environment.
Red team your LLM application for 40+ safety vulnerabilities in a few lines of code, including:
- Toxicity
- Bias
- SQL Injection
- etc., using advanced 10+ attack enhancement strategies such as prompt injections.
Easily benchmark ANY LLM on popular LLM benchmarks in under 10 lines of code., which includes:
- MMLU
- HellaSwag
- DROP
- BIG-Bench Hard
- TruthfulQA
- HumanEval
- GSM8K
100% integrated with Confident AI for the full evaluation & observability lifecycle:
- Curate/annotate evaluation datasets on the cloud
- Benchmark LLM app using dataset, and compare with previous iterations to experiment which models/prompts works best
- Fine-tune metrics for custom results
- Debug evaluation results via LLM traces
- Monitor & evaluate LLM responses in product to improve datasets with real-world data
- Repeat until perfection

[!NOTE] DeepEval is available on Confident AI, an LLM evals platform for AI observability and quality. Create an account here.

🔌 Integrations

🦄 LlamaIndex, to unit test RAG applications in CI/CD
🤗 Hugging Face, to enable real-time evaluations during LLM fine-tuning

🚀 QuickStart

Let's pretend your LLM application is a RAG based customer support chatbot; here's how DeepEval can help test what you've built.

Installation

Deepeval works with Python>=3.9+.

pip install -U deepeval

Create an account (highly recommended)

Using the deepeval platform will allow you to generate sharable testing reports on the cloud. It is free, takes no additional code to setup, and we highly recommend giving it a try.

To login, run:

deepeval login

Follow the instructions in the CLI to create an account, copy your API key, and paste it into the CLI. All test cases will automatically be logged (find more information on data privacy here).

Writing your first test case

Create a test file:

touch test_chatbot.py

Open test_chatbot.py and write your first test case to run an end-to-end evaluation using DeepEval, which treats your LLM app as a black-box:

import pytest
from deepeval import assert_test
from deepeval.metrics import GEval
from deepeval.test_case import LLMTestCase, LLMTestCaseParams

def test_case():
    correctness_metric = GEval(
        name="Correctness",
        criteria="Determine if the 'actual output' is correct based on the 'expected output'.",
        evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT],
        threshold=0.5
    )
    test_case = LLMTestCase(
        input="What if these shoes don't fit?",
        # Replace this with the actual output from your LLM application
        actual_output="You have 30 days to get a full refund at no extra cost.",
        expected_output="We offer a 30-day full refund at no extra costs.",
        retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
    )
    assert_test(test_case, [correctness_metric])

Set your OPENAI_API_KEY as an environment variable (you can also evaluate using your own custom model, for more details visit this part of our docs):

export OPENAI_API_KEY="..."

And finally, run test_chatbot.py in the CLI:

deepeval test run test_chatbot.py

Congratulations! Your test case should have passed ✅ Let's breakdown what happened.

The variable input mimics a user input, and actual_output is a placeholder for what your application's supposed to output based on this input.
The variable expected_output represents the ideal answer for a given input, and GEval is a research-backed metric provided by deepeval for you to evaluate your LLM output's on any custom with human-like accuracy.
In this example, the metric criteria is correctness of the actual_output based on the provided expected_output.
All metric scores range from 0 - 1, which the threshold=0.5 threshold ultimately determines if your test have passed or not.

Read our documentation for more information on more options to run end-to-end evaluation, how to use additional metrics, create your own custom metrics, and tutorials on how to integrate with other tools like LangChain and LlamaIndex.

Evaluating Nested Components

If you wish to evaluate individual components within your LLM app, you need to run component-level evals - a powerful way to evaluate any component within an LLM system.

Simply trace "components" such as LLM calls, retrievers, tool calls, and agents within your LLM application using the @observe decorator to apply metrics on a component-level. Tracing with deepeval is non-instrusive (learn more here) and helps you avoid rewriting your codebase just for evals:

from deepeval.tracing import observe, update_current_span
from deepeval.test_case import LLMTestCase
from deepeval.dataset import Golden
from deepeval.metrics import GEval
from deepeval import evaluate

correctness = GEval(name="Correctness", criteria="Determine if the 'actual output' is correct based on the 'expected output'.", evaluation_params=[LLMTestCaseParams.ACTUAL_OUTPUT, LLMTestCaseParams.EXPECTED_OUTPUT])

@observe(metrics=[correctness])
def inner_component():
    # Component can be anything from an LLM call, retrieval, agent, tool use, etc.
    update_current_span(test_case=LLMTestCase(input="...", actual_output="..."))
    return

@observe
def llm_app(input: str):
    inner_component()
    return

evaluate(observed_callback=llm_app, goldens=[Golden(input="Hi!")])

You can learn everything about component-level evaluations here.

Evaluating Without Pytest Integration

Alternatively, you can evaluate without Pytest, which is more suited for a notebook environment.

from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase

answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7)
test_case = LLMTestCase(
    input="What if these shoes don't fit?",
    # Replace this with the actual output from your LLM application
    actual_output="We offer a 30-day full refund at no extra costs.",
    retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
)
evaluate([test_case], [answer_relevancy_metric])

Using Standalone Metrics

DeepEval is extremely modular, making it easy for anyone to use any of our metrics. Continuing from the previous example:

from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase

answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7)
test_case = LLMTestCase(
    input="What if these shoes don't fit?",
    # Replace this with the actual output from your LLM application
    actual_output="We offer a 30-day full refund at no extra costs.",
    retrieval_context=["All customers are eligible for a 30 day full refund at no extra costs."]
)

answer_relevancy_metric.measure(test_case)
print(answer_relevancy_metric.score)
# All metrics also offer an explanation
print(answer_relevancy_metric.reason)

Note that some metrics are for RAG pipelines, while others are for fine-tuning. Make sure to use our docs to pick the right one for your use case.

Evaluating a Dataset / Test Cases in Bulk

In DeepEval, a dataset is simply a collection of test cases. Here is how you can evaluate these in bulk:

import pytest
from deepeval import assert_test
from deepeval.dataset import EvaluationDataset, Golden
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase

dataset = EvaluationDataset(goldens=[Golden(input="What's the weather like today?")])

for golden in dataset.goldens:
    test_case = LLMTestCase(
        input=golden.input,
        actual_output=your_llm_app(golden.input)
    )
    dataset.add_test_case(test_case)

@pytest.mark.parametrize(
    "test_case",
    dataset.test_cases,
)
def test_customer_chatbot(test_case: LLMTestCase):
    answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.5)
    assert_test(test_case, [answer_relevancy_metric])

# Run this in the CLI, you can also add an optional -n flag to run tests in parallel
deepeval test run test_<filename>.py -n 4

Alternatively, although we recommend using deepeval test run, you can evaluate a dataset/test cases without using our Pytest integration:

from deepeval import evaluate
...

evaluate(dataset, [answer_relevancy_metric])
# or
dataset.evaluate([answer_relevancy_metric])

A Note on Env Variables (.env / .env.local)

DeepEval auto-loads .env.local then .env from the current working directory at import time. Precedence: process env -> .env.local -> .env. Opt out with DEEPEVAL_DISABLE_DOTENV=1.

cp .env.example .env.local
# then edit .env.local (ignored by git)

DeepEval With Confident AI

DeepEval is available on Confident AI, an evals & observability platform that allows you to:

Curate/annotate evaluation datasets on the cloud
Benchmark LLM app using dataset, and compare with previous iterations to experiment which models/prompts works best
Fine-tune metrics for custom results
Debug evaluation results via LLM traces
Monitor & evaluate LLM responses in product to improve datasets with real-world data
Repeat until perfection

Everything on Confident AI, including how to use Confident is available here.

To begin, login from the CLI:

deepeval login

Follow the instructions to log in, create your account, and paste your API key into the CLI.

Now, run your test file again:

deepeval test run test_chatbot.py

You should see a link displayed in the CLI once the test has finished running. Paste it into your browser to view the results!

Demo GIF

Configuration

Environment variables via .env files

Using .env.local or .env is optional. If they are missing, DeepEval uses your existing environment variables. When present, dotenv environment variables are auto-loaded at import time (unless you set DEEPEVAL_DISABLE_DOTENV=1).

Precedence: process env -> .env.local -> .env

cp .env.example .env.local
# then edit .env.local (ignored by git)

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Roadmap

Features:

Integration with Confident AI
Implement G-Eval
Implement RAG metrics
Implement Conversational metrics
Evaluation Dataset Creation
Red-Teaming
DAG custom metrics
Guardrails

Authors

Built by the founders of Confident AI. Contact jeffreyip@confident-ai.com for all enquiries.

License

DeepEval is licensed under Apache 2.0 - see the LICENSE.md file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.8.4

Feb 4, 2026

3.8.3

Jan 31, 2026

3.8.2

Jan 29, 2026

3.8.1

Jan 22, 2026

3.8.0

Jan 15, 2026

3.7.9

Jan 6, 2026

3.7.8

Jan 2, 2026

3.7.7

Dec 29, 2025

3.7.6

Dec 17, 2025

3.7.5

Dec 9, 2025

3.7.4

Dec 3, 2025

3.7.3

Dec 1, 2025

3.7.2

Nov 17, 2025

3.7.1

Nov 16, 2025

3.7.0

Nov 4, 2025

3.6.9

Oct 28, 2025

3.6.8

Oct 27, 2025

3.6.7

Oct 15, 2025

3.6.6

Oct 8, 2025

3.6.5

Oct 8, 2025

3.6.4

Oct 7, 2025

3.6.3

Oct 7, 2025

3.6.2

Oct 4, 2025

3.6.1

Oct 2, 2025

3.6.0

Sep 30, 2025

3.5.9

Sep 26, 2025

3.5.8

Sep 25, 2025

3.5.7

Sep 25, 2025

3.5.6

Sep 23, 2025

3.5.5

Sep 22, 2025

3.5.4

Sep 20, 2025

3.5.3

Sep 19, 2025

3.5.2

Sep 17, 2025

3.5.1

Sep 16, 2025

3.5.0

Sep 16, 2025

3.4.9

Sep 12, 2025

3.4.8

Sep 8, 2025

3.4.7

Sep 5, 2025

3.4.6

Sep 4, 2025

3.4.5

Sep 4, 2025

3.4.4

Sep 4, 2025

3.4.3

Sep 2, 2025

3.4.2

Aug 29, 2025

3.4.1

Aug 25, 2025

3.4.0

Aug 18, 2025

3.3.9

Aug 10, 2025

3.3.6

Aug 6, 2025

3.3.5

Aug 4, 2025

3.3.4

Aug 2, 2025

3.3.3

Jul 30, 2025

3.3.2

Jul 23, 2025

3.3.1

Jul 23, 2025

3.3.0

Jul 18, 2025

3.2.9

Jul 18, 2025

3.2.8

Jul 16, 2025

3.2.6

Jul 11, 2025

3.2.5

Jul 10, 2025

3.2.4

Jul 7, 2025

3.2.3

Jul 2, 2025

3.2.2

Jun 30, 2025

3.2.1

Jun 27, 2025

3.2.0

Jun 25, 2025

3.1.9

Jun 25, 2025

3.1.8

Jun 23, 2025

3.1.7

Jun 21, 2025

3.1.6

Jun 19, 2025

3.1.5

Jun 19, 2025

3.1.4

Jun 17, 2025

3.1.3

Jun 16, 2025

3.1.0

Jun 12, 2025

3.0.9

Jun 12, 2025

3.0.8

Jun 10, 2025

3.0.7

Jun 8, 2025

3.0.6

Jun 7, 2025

3.0.5

Jun 5, 2025

3.0.4

Jun 5, 2025

3.0.3

Jun 2, 2025

3.0.2

May 30, 2025

3.0.1

May 28, 2025

3.0.0

May 27, 2025

2.9.7

May 23, 2025

2.9.6

May 22, 2025

2.9.5

May 22, 2025

2.9.4

May 20, 2025

2.9.3

May 18, 2025

2.9.2

May 16, 2025

2.9.1

May 15, 2025

2.9.0

May 15, 2025

2.8.9

May 12, 2025

2.8.8

May 8, 2025

2.8.6

May 8, 2025

2.8.5

May 7, 2025

2.8.4

May 6, 2025

2.8.2

Apr 30, 2025

2.8.1

Apr 29, 2025

2.8.0

Apr 29, 2025

2.7.9

Apr 28, 2025

2.7.8

Apr 28, 2025

2.7.6

Apr 23, 2025

2.7.5

Apr 18, 2025

2.7.4

Apr 17, 2025

2.7.3

Apr 15, 2025

2.7.2

Apr 14, 2025

2.7.1

Apr 9, 2025

2.7.0

Apr 7, 2025

2.6.9

Apr 7, 2025

2.6.8

Apr 7, 2025

2.6.7

Apr 6, 2025

2.6.6

Apr 2, 2025

2.6.5

Mar 26, 2025

2.6.4

Mar 21, 2025

2.6.3

Mar 20, 2025

2.6.2

Mar 20, 2025

2.6.1

Mar 19, 2025

2.6.0

Mar 19, 2025

2.5.9

Mar 18, 2025

2.5.8

Mar 18, 2025

2.5.7

Mar 17, 2025

2.5.6

Mar 16, 2025

2.5.5

Mar 14, 2025

2.5.4

Mar 14, 2025

2.5.3

Mar 12, 2025

2.5.2

Mar 8, 2025

2.5.1

Mar 6, 2025

2.5.0

Mar 5, 2025

2.4.9

Mar 3, 2025

2.4.8

Feb 28, 2025

2.4.7

Feb 27, 2025

2.4.6

Feb 26, 2025

2.4.5

Feb 25, 2025

2.4.4

Feb 25, 2025

2.4.3

Feb 24, 2025

2.4.2

Feb 22, 2025

2.4.1

Feb 20, 2025

2.4.0

Feb 20, 2025

2.3.9

Feb 20, 2025

2.3.8

Feb 17, 2025

2.3.7

Feb 13, 2025

2.3.6

Feb 10, 2025

2.3.4

Feb 7, 2025

2.3.3

Feb 6, 2025

2.3.2

Feb 5, 2025

2.3.1

Feb 3, 2025

2.3.0

Feb 1, 2025

2.2.9

Jan 31, 2025

2.2.8

Jan 31, 2025

2.2.7

Jan 31, 2025

2.2.6

Jan 24, 2025

2.2.5

Jan 23, 2025

2.2.4

Jan 23, 2025

2.2.3

Jan 22, 2025

2.2.2

Jan 22, 2025

2.2.1

Jan 22, 2025

2.2.0

Jan 22, 2025

2.1.9

Jan 20, 2025

2.1.8

Jan 16, 2025

2.1.7

Jan 15, 2025

2.1.6

Jan 10, 2025

2.1.5

Jan 9, 2025

2.1.4

Jan 9, 2025

2.1.3

Jan 9, 2025

2.1.2

Jan 9, 2025

2.1.1

Jan 2, 2025

2.1.0

Jan 2, 2025

2.0.9

Dec 21, 2024

2.0.8

Dec 20, 2024

2.0.7

Dec 20, 2024

2.0.6

Dec 18, 2024

2.0.5

Dec 9, 2024

2.0.4

Dec 9, 2024

2.0.3

Dec 6, 2024

2.0.2

Dec 6, 2024

2.0.1

Dec 2, 2024

2.0

Nov 27, 2024

1.6.2

Nov 26, 2024

1.6.1

Nov 26, 2024

1.6.0

Nov 26, 2024

1.5.9

Nov 22, 2024

1.5.8

Nov 19, 2024

1.5.7

Nov 19, 2024

1.5.6

Nov 18, 2024

1.5.5

Nov 17, 2024

1.5.4

Nov 16, 2024

1.5.3

Nov 13, 2024

1.5.2

Nov 13, 2024

1.5.1

Nov 13, 2024

1.5.0

Nov 6, 2024

1.4.9

Nov 6, 2024

1.4.8

Nov 3, 2024

1.4.7

Oct 31, 2024

1.4.6

Oct 28, 2024

1.4.5

Oct 25, 2024

1.4.4

Oct 22, 2024

1.4.3

Oct 22, 2024

1.4.2

Oct 20, 2024

1.4.1

Oct 18, 2024

1.4.0

Oct 18, 2024

1.3.9

Oct 16, 2024

1.3.8

Oct 15, 2024

1.3.7

Oct 15, 2024

1.3.6

Oct 14, 2024

1.3.5

Oct 8, 2024

1.3.4

Oct 5, 2024

1.3.3

Oct 4, 2024

1.3.2

Sep 27, 2024

1.3.1

Sep 26, 2024

1.3.0

Sep 24, 2024

1.2.9

Sep 23, 2024

1.2.8

Sep 23, 2024

1.2.7

Sep 23, 2024

1.2.6

Sep 22, 2024

1.2.5

Sep 22, 2024

1.2.4

Sep 21, 2024

1.2.3

Sep 20, 2024

1.2.2

Sep 17, 2024

1.2.1

Sep 17, 2024

1.2.0

Sep 16, 2024

1.1.9

Sep 13, 2024

1.1.8

Sep 12, 2024

1.1.7

Sep 12, 2024

1.1.6

Sep 2, 2024

1.1.5

Sep 2, 2024

1.1.4

Aug 30, 2024

1.1.3

Aug 29, 2024

1.1.2

Aug 27, 2024

1.1.1

Aug 26, 2024

1.1.0

Aug 23, 2024

1.0.9

Aug 22, 2024

1.0.8

Aug 22, 2024

1.0.7

Aug 22, 2024

1.0.6

Aug 18, 2024

1.0.5

Aug 18, 2024

1.0.4

Aug 16, 2024

1.0.3

Aug 15, 2024

1.0.2

Aug 15, 2024

1.0.1

Aug 14, 2024

1.0.0

Aug 12, 2024

0.21.78

Aug 10, 2024

0.21.77

Aug 9, 2024

0.21.76

Aug 9, 2024

0.21.75

Aug 9, 2024

0.21.74

Jul 30, 2024

0.21.73

Jul 24, 2024

0.21.72

Jul 24, 2024

0.21.71

Jul 23, 2024

0.21.70

Jul 23, 2024

0.21.69

Jul 23, 2024

0.21.68

Jul 20, 2024

0.21.67

Jul 18, 2024

0.21.66

Jul 16, 2024

0.21.65

Jul 10, 2024

0.21.64

Jul 3, 2024

0.21.63

Jul 3, 2024

0.21.62

Jun 25, 2024

0.21.61

Jun 25, 2024

0.21.60

Jun 23, 2024

0.21.59

Jun 20, 2024

0.21.58

Jun 20, 2024

0.21.57

Jun 18, 2024

0.21.56

Jun 17, 2024

0.21.55

Jun 12, 2024

0.21.54

Jun 12, 2024

0.21.53

Jun 11, 2024

0.21.52

Jun 11, 2024

0.21.51

Jun 7, 2024

0.21.50

Jun 5, 2024

0.21.49

Jun 4, 2024

0.21.48

May 28, 2024

0.21.47

May 27, 2024

0.21.46

May 27, 2024

0.21.45

May 22, 2024

0.21.44

May 22, 2024

0.21.43

May 20, 2024

0.21.42

May 14, 2024

0.21.41

May 13, 2024

0.21.40

May 13, 2024

0.21.39

May 8, 2024

0.21.38

May 8, 2024

0.21.37

May 7, 2024

0.21.36

Apr 28, 2024

0.21.35

Apr 26, 2024

0.21.34

Apr 25, 2024

0.21.33

Apr 24, 2024

0.21.32

Apr 22, 2024

0.21.31

Apr 21, 2024

0.21.30

Apr 19, 2024

0.21.29

Apr 17, 2024

0.21.28

Apr 16, 2024

0.21.27

Apr 16, 2024

0.21.26

Apr 14, 2024

0.21.25

Apr 12, 2024

0.21.24

Apr 10, 2024

0.21.23

Apr 7, 2024

0.21.22

Apr 7, 2024

0.21.21

Apr 4, 2024

0.21.20

Apr 4, 2024

0.21.19

Apr 4, 2024

0.21.18

Apr 3, 2024

0.21.17

Apr 2, 2024

0.21.16

Apr 1, 2024

0.21.15

Mar 31, 2024

0.21.14

Mar 31, 2024

0.21.13

Mar 28, 2024

0.21.12

Mar 26, 2024

0.21.11

Mar 26, 2024

0.21.1

Mar 26, 2024

0.21.0

Mar 20, 2024

0.20.99

Mar 19, 2024

0.20.98

Mar 19, 2024

0.20.97

Mar 16, 2024

0.20.96

Mar 16, 2024

0.20.95

Mar 16, 2024

0.20.94

Mar 16, 2024

0.20.93

Mar 16, 2024

0.20.92

Mar 15, 2024

0.20.91

Mar 15, 2024

0.20.90

Mar 14, 2024

0.20.89

Mar 11, 2024

0.20.88

Mar 11, 2024

0.20.87

Mar 11, 2024

0.20.86

Mar 11, 2024

0.20.85

Mar 9, 2024

0.20.84

Mar 9, 2024

0.20.83

Mar 9, 2024

0.20.82

Mar 9, 2024

0.20.81

Mar 4, 2024

0.20.80

Mar 4, 2024

0.20.79

Mar 4, 2024

0.20.78

Mar 1, 2024

0.20.77

Feb 28, 2024

0.20.76

Feb 27, 2024

0.20.75

Feb 27, 2024

0.20.74

Feb 25, 2024

0.20.73

Feb 25, 2024

0.20.72

Feb 25, 2024

0.20.71

Feb 23, 2024

0.20.70

Feb 22, 2024

0.20.69

Feb 22, 2024

0.20.68

Feb 21, 2024

0.20.67

Feb 21, 2024

0.20.66

Feb 19, 2024

0.20.65

Feb 15, 2024

0.20.64

Feb 14, 2024

0.20.63

Feb 11, 2024

0.20.62

Feb 9, 2024

0.20.61

Feb 9, 2024

0.20.60

Feb 8, 2024

0.20.59

Feb 8, 2024

0.20.58

Feb 7, 2024

0.20.57

Feb 6, 2024

0.20.56

Jan 30, 2024

0.20.55

Jan 29, 2024

0.20.54

Jan 29, 2024

0.20.53

Jan 25, 2024

0.20.52

Jan 23, 2024

0.20.51

Jan 23, 2024

0.20.50

Jan 21, 2024

0.20.49

Jan 19, 2024

0.20.48

Jan 16, 2024

0.20.47

Jan 16, 2024

0.20.46

Jan 12, 2024

0.20.45

Jan 12, 2024

0.20.44

Jan 3, 2024

0.20.43

Dec 26, 2023

0.20.42

Dec 21, 2023

0.20.41

Dec 20, 2023

0.20.40

Dec 19, 2023

0.20.39

Dec 16, 2023

0.20.38

Dec 16, 2023

0.20.37

Dec 15, 2023

0.20.36

Dec 15, 2023

0.20.35

Dec 14, 2023

0.20.34

Dec 14, 2023

0.20.33

Dec 12, 2023

0.20.32

Dec 12, 2023

0.20.31

Dec 12, 2023

0.20.30

Dec 10, 2023

0.20.29

Dec 7, 2023

0.20.28

Dec 6, 2023

0.20.27

Dec 4, 2023

0.20.26

Dec 4, 2023

0.20.25

Dec 2, 2023

0.20.24

Nov 28, 2023

0.20.23

Nov 23, 2023

0.20.22

Nov 22, 2023

0.20.21

Nov 22, 2023

0.20.20

Nov 22, 2023

0.20.19

Nov 16, 2023

0.20.18

Nov 14, 2023

0.20.17

Nov 13, 2023

0.20.16

Nov 7, 2023

0.20.15

Nov 6, 2023

0.20.14

Nov 5, 2023

0.20.13

Oct 27, 2023

0.20.12

Oct 23, 2023

0.20.11

Oct 20, 2023

0.20.10

Oct 18, 2023

0.20.8

Oct 13, 2023

0.20.7

Oct 12, 2023

0.20.6

Oct 12, 2023

0.20.5

Oct 12, 2023

0.20.4

Oct 11, 2023

0.20.3

Oct 11, 2023

0.20.2

Oct 10, 2023

0.20.1

Oct 6, 2023

0.20.0

Oct 2, 2023

0.19.2

Oct 1, 2023

0.19.1

Oct 1, 2023

0.19.0

Oct 1, 2023

0.18.0

Sep 29, 2023

0.17.8

Sep 28, 2023

0.17.7

Sep 27, 2023

0.17.6

Sep 27, 2023

0.17.5

Sep 25, 2023

0.17.4

Sep 24, 2023

0.17.3

Sep 24, 2023

0.17.2

Sep 24, 2023

0.17.1

Sep 24, 2023

0.17.0

Sep 24, 2023

0.16.4

Sep 22, 2023

0.16.3

Sep 22, 2023

0.16.2

Sep 22, 2023

0.16.1

Sep 22, 2023

0.16.0

Sep 22, 2023

0.15.2

Sep 20, 2023

0.15.0

Sep 20, 2023

0.14.1

Sep 12, 2023

0.14.0

Sep 12, 2023

0.13.0

Sep 10, 2023

0.12.4

Sep 8, 2023

0.12.3

Sep 8, 2023

0.12.2

Sep 7, 2023

0.12.1

Sep 7, 2023

0.12.0

Sep 4, 2023

0.11.5

Sep 2, 2023

0.11.4

Sep 2, 2023

0.11.3

Sep 2, 2023

0.11.2

Sep 2, 2023

0.11.1

Sep 2, 2023

0.11.0

Sep 1, 2023

0.10.13

Aug 31, 2023

0.10.12

Aug 29, 2023

0.10.11

Aug 29, 2023

0.10.10

Aug 29, 2023

0.10.9

Aug 29, 2023

0.10.8

Aug 29, 2023

0.10.7

Aug 29, 2023

0.10.6

Aug 29, 2023

0.10.5

Aug 29, 2023

0.10.4

Aug 28, 2023

0.10.3

Aug 28, 2023

0.10.2

Aug 28, 2023

0.10.1

Aug 28, 2023

0.10.0

Aug 27, 2023

0.9.18

Aug 25, 2023

0.9.16

Aug 25, 2023

0.9.15

Aug 25, 2023

0.9.13

Aug 25, 2023

0.9.12

Aug 25, 2023

0.9.11

Aug 25, 2023

0.9.10

Aug 25, 2023

0.9.9

Aug 25, 2023

0.9.8

Aug 24, 2023

0.9.7

Aug 24, 2023

0.9.6

Aug 24, 2023

0.9.5

Aug 24, 2023

0.9.4

Aug 24, 2023

0.9.2

Aug 24, 2023

0.9.1

Aug 23, 2023

0.9.0

Aug 23, 2023

0.8.0

Aug 22, 2023

0.7.1

Aug 22, 2023

0.7.0

Aug 22, 2023

0.6.1

Aug 21, 2023

0.6.0

Aug 20, 2023

0.5.0

Aug 18, 2023

0.4.2

Aug 17, 2023

0.4.1

Aug 17, 2023

0.4.0

Aug 17, 2023

0.3.1

Aug 16, 2023

0.2.2

Aug 15, 2023

0.2.1

Aug 15, 2023

0.2.0

Aug 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepeval-3.8.4.tar.gz (593.4 kB view details)

Uploaded Feb 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepeval-3.8.4-py3-none-any.whl (819.3 kB view details)

Uploaded Feb 4, 2026 Python 3

File details

Details for the file deepeval-3.8.4.tar.gz.

File metadata

Download URL: deepeval-3.8.4.tar.gz
Upload date: Feb 4, 2026
Size: 593.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.11.9 Darwin/22.4.0

File hashes

Hashes for deepeval-3.8.4.tar.gz
Algorithm	Hash digest
SHA256	`cb60c4a53488970136e8149de4beb3a0e1272feb6b8ec2a8ee633ca2ad3bbce7`
MD5	`4bd91443d63c0b8a25278353a25bcfe2`
BLAKE2b-256	`e6ef577e7dc254f9c81bb122b2e137ec1b09d9c80b674fdea33f24923cbc8342`

See more details on using hashes here.

File details

Details for the file deepeval-3.8.4-py3-none-any.whl.

File metadata

Download URL: deepeval-3.8.4-py3-none-any.whl
Upload date: Feb 4, 2026
Size: 819.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.11.9 Darwin/22.4.0

File hashes

Hashes for deepeval-3.8.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4435aeb854a454bba00d6f4d6fb31ae3e70420d2be065ce9a966f09bb691fed7`
MD5	`61e504ea94796ef3eba4367873c6b93a`
BLAKE2b-256	`7a48feab76e41dc8c34a26a22289005b62f1a2d43b9cc0d95e0a2d68e4332d73`

See more details on using hashes here.

deepeval 3.8.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The LLM Evaluation Framework

Documentation | Metrics and Features | Getting Started | Integrations | DeepEval Platform

🔥 Metrics and Features

🔌 Integrations

🚀 QuickStart

Installation

Create an account (highly recommended)

Writing your first test case

Evaluating Nested Components

Evaluating Without Pytest Integration

Using Standalone Metrics

Evaluating a Dataset / Test Cases in Bulk

A Note on Env Variables (.env / .env.local)

DeepEval With Confident AI

Configuration

Environment variables via .env files

Contributing

Roadmap

Authors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes