Autonomous Agentic QA System for testing RAG pipelines and LLM systems.
Project description
๐ก๏ธ Agentic QA: Autonomous Multi-Agent Testing for RAG & LLMs
Agentic QA is a Python library that autonomously generates adversarial test cases, executes them against your RAG/LLM system, evaluates the results, and self-improves its testing coverageโall without human intervention.
Unlike traditional testing frameworks (like RAGAS or TruLens) that evaluate outputs against static, human-written inputs, Agentic QA acts as an active red-team, dynamically generating the tricky edge cases needed to break your system.
๐ Quick Start
1. Installation
Install the library locally:
git clone https://github.com/yourusername/multi-agent-qa.git
cd multi-agent-qa
pip install -e .
Ensure you have your .env configured with your API keys:
cp .env.example .env
# Edit .env and provide OPENAI_API_KEY
2. Using the Python Library
You can test any RAG or LLM pipeline in just a few lines of code.
Option A: Testing a Python Function
If your RAG system is a Python function in your codebase:
import agentic_qa
# Your existing RAG or Chatbot function
def my_custom_rag(query: str) -> str:
# Example: return my_langchain_pipeline.invoke(query)
return "This is my AI response."
# Run the autonomous testing loop
report = agentic_qa.run_autonomous_test(
target_function=my_custom_rag,
system_name="YouTube Video Q&A",
system_description="A chatbot that answers questions about YouTube transcripts.",
domain="video content",
max_iterations=3, # How many times agents learn and retry
tests_per_iteration=5 # Tests generated per round
)
Option B: Testing an API Endpoint
If your system is deployed behind a REST API (FastAPI, Flask, LangServe):
import agentic_qa
report = agentic_qa.run_autonomous_test(
api_endpoint="http://localhost:8000/api/chat",
system_name="Customer Support Bot",
system_description="An AI that resolves customer support tickets.",
domain="customer support"
)
3. Using the Streamlit UI
If you prefer a visual dashboard to monitor the agents in real-time, run the included Streamlit app:
streamlit run app.py
From the UI, you can connect your API endpoint or use the built-in mock system for a demonstration.
๐๏ธ Architecture
The framework is powered by 5 autonomous agents built with LangGraph:
START โโโถ ๐ด Red-Team Agent โโโถ โก Executor Agent โโโถ โ๏ธ Judge Agent โโโถ Decision
โฒ โ
โ โโโโโโโดโโโโโโ
โ โผ โผ
โโโโโโโโโโโโโโโโโ ๐ง Refiner Agent ๐ Reporter Agent
(loop back) (END)
Agent Roles
| Agent | Role |
|---|---|
| ๐ด Red-Team | Generates adversarial test inputs targeting edge cases (prompt injections, boundary values, etc.). |
| โก Executor | Runs tests through the target system and captures the outputs. |
| โ๏ธ Judge | Evaluates the outputs using an LLM-as-a-Judge pattern with strict pass/fail criteria. |
| ๐ง Refiner | Analyzes the judge's failure patterns and instructs the Red-Team on how to exploit weaknesses in the next iteration. |
| ๐ Reporter | Compiles a comprehensive final Markdown QA report. |
๐ง What Makes This Novel
| Traditional Testing Tools (RAGAS, TruLens) | Agentic QA |
|---|---|
| Measures outputs against static inputs | Generates the adversarial inputs autonomously |
| Human writes test cases | AI agents write and refine test cases |
| One-shot evaluation | Self-improving loop with pattern learning |
| Relies heavily on reference data | Relies on behavioral boundaries and edge-case testing |
๐ Project Structure
multi-agent-qa/
โโโ agentic_qa/
โ โโโ __init__.py # Clean developer API (run_autonomous_test)
โ โโโ agents/ # 5 LangGraph agent definitions
โ โโโ graph/ # State definitions and LangGraph flow
โ โโโ schemas/ # Pydantic validation models
โ โโโ sut/ # Adapters (API, Callable, Base)
โ โโโ utils/ # Prompt templates
โโโ setup.py # Package configuration
โโโ app.py # Streamlit Dashboard UI
โโโ .env # API Keys configuration
โโโ README.md
๐ก LangSmith Monitoring
All agent interactions are automatically traced via LangSmith if configured in .env.
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-api-key
LANGCHAIN_PROJECT=agentic-qa
๐ License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_qa-0.1.0.tar.gz.
File metadata
- Download URL: agentic_qa-0.1.0.tar.gz
- Upload date:
- Size: 26.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ec4c3de0dbf79c1938e638a8a566a6458c89878569724fea42770646d9667ea
|
|
| MD5 |
22f32a1461eed89a42c78f225516b16c
|
|
| BLAKE2b-256 |
f55d0a40f218085c9df58b01ebb371757ecd6aff539e67834cf79b5071c6a315
|
File details
Details for the file agentic_qa-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentic_qa-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee9e2512c02d1c982a7417aeedac02b1e2f276f7902d27be33338c3050b88fc1
|
|
| MD5 |
ba7ec0bfaf25a89dd0b51b8f54db508b
|
|
| BLAKE2b-256 |
ed962e583de7227e2e0745571367f3d8c93c59f3802ca3d10b4ff2c1523035e9
|