Skip to main content

Autonomous Agentic QA System for testing RAG pipelines and LLM systems.

Project description

🛡️ Agentic QA: Autonomous Multi-Agent Testing for RAG & LLMs

Agentic QA is a Python library that autonomously generates adversarial test cases, executes them against your RAG/LLM system, evaluates the results, and self-improves its testing coverage—all without human intervention.

Unlike traditional testing frameworks (like RAGAS or TruLens) that evaluate outputs against static, human-written inputs, Agentic QA acts as an active red-team, dynamically generating the tricky edge cases needed to break your system.

Python LangGraph LangSmith


🚀 Installation

Install the library directly via pip:

pip install agentic-qa

Note: You will need an OpenAI API key for the agents to operate.

export OPENAI_API_KEY="sk-..."

📖 How to Use

Agentic QA provides two ways to test your systems: a High-Level Simple API for quick tests, and a Low-Level Advanced API for custom setups (like Jupyter Notebooks).

Option 1: High-Level API (Recommended)

You can test any RAG or LLM pipeline in just a few lines of code using the run_autonomous_test wrapper.

Testing a Python Function

import agentic_qa

# 1. Your existing RAG or Chatbot function
def my_custom_rag(query: str) -> str:
    # Example: return my_langchain_pipeline.invoke(query)
    return "This is my AI response."

# 2. Run the autonomous testing loop
final_state = agentic_qa.run_autonomous_test(
    target_function=my_custom_rag,
    system_name="YouTube Video Q&A",
    system_description="A chatbot that answers questions about YouTube transcripts.",
    domain="video content",
    max_iterations=3,          # How many times agents learn and retry
    tests_per_iteration=5      # Tests generated per round
)

# 3. View the generated report
print(final_state["final_report"])

Testing an API Endpoint

If your system is deployed behind a REST API (FastAPI, Flask, LangServe):

import agentic_qa

final_state = agentic_qa.run_autonomous_test(
    api_endpoint="http://localhost:8000/api/chat",
    system_name="Customer Support Bot",
    system_description="An AI that resolves customer support tickets.",
    domain="customer support"
)

Option 2: Low-Level Advanced API

If you need fine-grained control over the SUT (System Under Test) adapters or want to integrate the workflow directly into a Jupyter Notebook or a custom LangGraph pipeline, use the adapter classes directly.

import os
from agentic_qa.sut import CallableAdapter, set_active_sut
from agentic_qa.graph.workflow import run_qa_pipeline

# 1. Configure testing environment variables
os.environ["MAX_ITERATIONS"] = "3"
os.environ["TESTS_PER_ITERATION"] = "3"

# 2. Define your RAG function
def ask_research_paper(query: str) -> str:
    return "Attention is a mechanism..."

# 3. Wrap your RAG function in the CallableAdapter
adapter = CallableAdapter(
    fn=ask_research_paper,
    system_name="Research Paper RAG",
    description="A Retrieval-Augmented Generation system that answers questions about machine learning research papers.",
    domain="Academic Research / Machine Learning"
)

# 4. Set it as the active System Under Test
set_active_sut(adapter)

# 5. Launch the autonomous multi-agent pipeline
print("Launching Autonomous Multi-Agent QA...\n")
final_state = run_qa_pipeline()

# 6. Extract metrics and the final Markdown report
print(f"Coverage Score: {final_state.get('coverage_score', 0):.0%}")
print(final_state.get("final_report", "No report generated."))

🏗️ Architecture

The framework is powered by 5 autonomous agents built with LangGraph:

START ──▶ 🔴 Red-Team Agent ──▶ ⚡ Executor Agent ──▶ ⚖️ Judge Agent ──▶ Decision
              ▲                                                            │
              │                                                      ┌─────┴─────┐
              │                                                      ▼           ▼
              └──────────────── 🔧 Refiner Agent               📊 Reporter Agent
                                   (loop back)                       (END)

Agent Roles

Agent Role
🔴 Red-Team Generates adversarial test inputs targeting edge cases (prompt injections, boundary values, etc.).
Executor Runs tests through the target system and captures the outputs.
⚖️ Judge Evaluates the outputs using an LLM-as-a-Judge pattern with strict pass/fail criteria.
🔧 Refiner Analyzes the judge's failure patterns and instructs the Red-Team on how to exploit weaknesses in the next iteration.
📊 Reporter Compiles a comprehensive final Markdown QA report.

📊 Streamlit Dashboard

If you prefer a visual dashboard to monitor the agents in real-time, run the included Streamlit app:

streamlit run app.py

From the UI, you can connect your API endpoint or use the built-in mock system for a demonstration, watching the agents generate verdicts and failure patterns live!

📡 LangSmith Monitoring

All agent interactions are automatically traced via LangSmith if configured in .env.

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=agentic-qa

📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_qa-0.2.0.tar.gz (37.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_qa-0.2.0-py3-none-any.whl (45.1 kB view details)

Uploaded Python 3

File details

Details for the file agentic_qa-0.2.0.tar.gz.

File metadata

  • Download URL: agentic_qa-0.2.0.tar.gz
  • Upload date:
  • Size: 37.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentic_qa-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3b95ffb37f78c254bfef0f75d234e9e06984314e5272059f1c0fc48e57a1cefa
MD5 59046c04bb071866eb4950b11432f373
BLAKE2b-256 7f6051927838c919f701c303b2ddabc6d2d75ab4fa0c4bccf4e316a94e6f8556

See more details on using hashes here.

File details

Details for the file agentic_qa-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: agentic_qa-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 45.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentic_qa-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09b05141f5a1b1292d20ee08bcdee04c59d28e48be9c4af1f5aea1afbd42f44c
MD5 10f632c62c61c4c34a3501099bd0c729
BLAKE2b-256 8b9f0ae32a14f4e14b62484c44cb89042cf6f25220b8d0e797df81bfce7c4707

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page