Skip to main content

Evaluation framework for RAG and LLM applications

Project description

Supercharge Your LLM Application Evaluations 🚀

Latest release Made with Python License Apache-2.0 Ragas Downloads per month Join Ragas community on Discord Ask DeepWiki.com

Documentation | Quick start | Join Discord | Blog | NewsLetter | Careers

Objective metrics, intelligent test generation, and data-driven insights for LLM apps

Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications. Say goodbye to time-consuming, subjective assessments and hello to data-driven, efficient evaluation workflows. Don't have a test dataset ready? We also do production-aligned test set generation.

[!NOTE] Need help setting up Evals for your AI application? We'd love to help! We are conducting Office Hours every week. You can sign up here.

Key Features

  • 🎯 Objective Metrics: Evaluate your LLM applications with precision using both LLM-based and traditional metrics.
  • 🧪 Test Data Generation: Automatically create comprehensive test datasets covering a wide range of scenarios.
  • 🔗 Seamless Integrations: Works flawlessly with popular LLM frameworks like LangChain and major observability tools.
  • 📊 Build feedback loops: Leverage production data to continually improve your LLM applications.

:shield: Installation

Pypi:

pip install ragas

Alternatively, from source:

pip install git+https://github.com/vibrantlabsai/ragas

:fire: Quickstart

Clone a Complete Example Project

The fastest way to get started is to use the ragas quickstart command:

# List available templates
ragas quickstart

# Create a RAG evaluation project
ragas quickstart rag_eval

# Specify where you want to create it.
ragas quickstart rag_eval -o ./my-project

Available templates:

  • rag_eval - Evaluate RAG systems

Coming Soon:

  • agent_evals - Evaluate AI agents
  • benchmark_llm - Benchmark and compare LLMs
  • prompt_evals - Evaluate prompt variations
  • workflow_eval - Evaluate complex workflows

Evaluate your LLM App

This is a simple example evaluating a summary for accuracy:

import asyncio
from ragas.metrics.collections import AspectCritic
from ragas.llms import llm_factory

# Setup your LLM
llm = llm_factory("gpt-4o")

# Create a metric
metric = AspectCritic(
    name="summary_accuracy",
    definition="Verify if the summary is accurate and captures key information.",
    llm=llm
)

# Evaluate
test_data = {
    "user_input": "summarise given text\nThe company reported an 8% rise in Q3 2024, driven by strong performance in the Asian market. Sales in this region have significantly contributed to the overall growth. Analysts attribute this success to strategic marketing and product localization. The positive trend in the Asian market is expected to continue into the next quarter.",
    "response": "The company experienced an 8% increase in Q3 2024, largely due to effective marketing strategies and product adaptation, with expectations of continued growth in the coming quarter.",
}

score = await metric.ascore(
    user_input=test_data["user_input"],
    response=test_data["response"]
)
print(f"Score: {score.value}")
print(f"Reason: {score.reason}")

Note: Make sure your OPENAI_API_KEY environment variable is set.

Find the complete Quickstart Guide

Want help in improving your AI application using evals?

In the past 2 years, we have seen and helped improve many AI applications using evals. If you want help with improving and scaling up your AI application using evals.

🔗 Book a slot or drop us a line: founders@vibrantlabs.com.

🫂 Community

If you want to get more involved with Ragas, check out our discord server. It's a fun community where we geek out about LLM, Retrieval, Production issues, and more.

Contributors

+----------------------------------------------------------------------------+
|     +----------------------------------------------------------------+     |
|     | Developers: Those who built with `ragas`.                      |     |
|     | (You have `import ragas` somewhere in your project)            |     |
|     |     +----------------------------------------------------+     |     |
|     |     | Contributors: Those who make `ragas` better.       |     |     |
|     |     | (You make PR to this repo)                         |     |     |
|     |     +----------------------------------------------------+     |     |
|     +----------------------------------------------------------------+     |
+----------------------------------------------------------------------------+

We welcome contributions from the community! Whether it's bug fixes, feature additions, or documentation improvements, your input is valuable.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

🔍 Open Analytics

At Ragas, we believe in transparency. We collect minimal, anonymized usage data to improve our product and guide our development efforts.

✅ No personal or company-identifying information

✅ Open-source data collection code

✅ Publicly available aggregated data

To opt-out, set the RAGAS_DO_NOT_TRACK environment variable to true.

Cite Us

@misc{ragas2024,
  author       = {VibrantLabs},
  title        = {Ragas: Supercharge Your LLM Application Evaluations},
  year         = {2024},
  howpublished = {\url{https://github.com/vibrantlabsai/ragas}},
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragas-0.4.3.tar.gz (44.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragas-0.4.3-py3-none-any.whl (466.5 kB view details)

Uploaded Python 3

File details

Details for the file ragas-0.4.3.tar.gz.

File metadata

  • Download URL: ragas-0.4.3.tar.gz
  • Upload date:
  • Size: 44.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragas-0.4.3.tar.gz
Algorithm Hash digest
SHA256 1eb1f61dbc8613ad014fdb8d630cbe9a1caec1ea01664a106993cb756128c001
MD5 deeb66e9da13945dd56659ae3395b267
BLAKE2b-256 d2bc3234517692ac0ffae1ec2ec940992e4057844c49ee6c51c07ce385bb98f1

See more details on using hashes here.

File details

Details for the file ragas-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: ragas-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 466.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ragas-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ef1d75f674c294e9a6e7d8e9ad261b6bf4697dad1c9cbd1a756ba7a6b4849a38
MD5 962e14c6d5b43a4352b9f0e7569b2f9b
BLAKE2b-256 4de01fecd22c93d3ed66453cbbdefd05528331af4d33b2b76a370d751231912c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page