LLM Quality Gate - A provider-agnostic evaluation framework for LLM applications

These details have not been verified by PyPI

Project links

Project description

LLMQ Logo

LLMQ

Regression Testing & Quality Gates for LLM Applications

Open-source framework to catch silent LLM failures before they reach production.

Website · Quick Start · PyPI · Dashboard Demo · Contributing

The Problem

LLM applications fail silently. There's no stack trace when your summarizer starts hallucinating. No exception when your classifier drifts. A prompt change that works in development can quietly degrade production. Model updates break existing functionality overnight.

Without systematic testing, these regressions go undetected until users complain.

Common regressions LLMQ catches:

Prompt optimization improves one task but degrades another
Model updates change response formats, breaking downstream parsing
Provider API changes affect response quality
Temperature adjustments reduce output consistency
Context length changes truncate important information

Quick Start

Get running in under 5 minutes:

# Install
pip install llmq-gate

# Initialize project
llmq init

# Set your API key
echo "GROQ_API_KEY=your_key_here" >> .env

# Run your first evaluation
llmq eval --provider groq

View results in the browser:

llmq dashboard
# → http://localhost:8000

How It Works

Dataset → LLM Provider → Metrics Engine → Quality Gates → Pass / Fail

Define test cases in evals/dataset.json with inputs, expected outputs, and context
Run evaluations against any supported provider
Metrics are computed automatically — task success, relevance, hallucination, consistency
Quality gates pass or fail based on your configured thresholds
Results are stored for historical tracking and comparison

Supported Providers

Provider	Models	API Key	Cost
Groq	Llama 3.1, Mixtral	Required	Free tier
OpenAI	GPT-3.5, GPT-4	Required	Paid
Claude	Claude 3 Haiku / Sonnet	Required	Paid
Gemini	Gemini 1.5 Flash / Pro	Required	Free tier
HuggingFace	Open models	Required	Free
OpenRouter	100+ models	Required	Varies
Ollama	Local models	—	Free
LocalAI	Local models	—	Free

CI/CD Integration

Add quality gates to your pull request workflow. Builds fail automatically when LLM performance drops below your thresholds.

# .github/workflows/llm-quality-gate.yml
name: LLM Quality Gate

on:
  pull_request:
    paths: ['prompts/**', 'llm/**', 'llmq.yaml']

jobs:
  quality-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install LLMQ
        run: pip install llmq-gate

      - name: Run Quality Gate
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
        run: llmq eval --provider groq --fail-on-gate

Metrics

Metric	Method	Description
Task Success	Exact match + semantic similarity	Did the model produce the correct answer?
Relevance	Embedding-based cosine similarity	Is the response relevant to the input?
Hallucination	LLM-as-judge detection	Did the model fabricate information?
Consistency	Multi-run variance analysis	Are responses stable across runs?

Dashboard

llmq dashboard

The interactive web dashboard provides historical performance tracking, provider comparison charts, quality gate pass/fail trends, and test case drill-down analysis.

🎬 Watch the full CLI + Dashboard walkthrough →

v0.1.1 Highlights

Unified configuration filename: llmq.yaml everywhere.
Config auto-discovery from current directory upward (similar to pyproject.toml lookup).
llmq eval now supports standalone mode by falling back to local engine if dashboard API is unavailable.
CLI exit codes are standardized:
- 0: quality gate passed
- 1: quality gate failed or runtime error
- 2: configuration error (e.g., missing llmq.yaml)

Configuration

llmq.yaml — project-level settings:

llm:
  default_provider: "groq"
  temperature: 0.0
  max_tokens: 1000

providers:
  groq:
    api_key_env: "GROQ_API_KEY"
    model: "llama-3.1-8b-instant"
  openai:
    api_key_env: "OPENAI_API_KEY"
    model: "gpt-3.5-turbo"

quality_gates:
  task_success_threshold: 0.8
  relevance_threshold: 0.7
  hallucination_threshold: 0.1

evals/dataset.json — test cases:

{
  "test_cases": [
    {
      "id": "example_1",
      "task_type": "question_answering",
      "input": "What is the capital of France?",
      "expected_output": "Paris",
      "context": "Geography question",
      "reference": "Paris is the capital of France."
    }
  ]
}

CLI Reference

# Setup
llmq init                                    # Initialize new project
llmq doctor                                  # Check system health

# Evaluation
llmq eval --provider groq                    # Run evaluation
llmq eval --provider openai --fail-on-gate   # CI mode (exit 1 on gate failure)
llmq compare                                 # Compare providers side-by-side

# Management
llmq providers                               # List provider status
llmq runs --limit 10                         # View recent runs
llmq dashboard                               # Start web dashboard
llmq settings --set '{"quality_gates": {"task_success_threshold": 0.9}}'

Migration Guide (<=0.1.0 -> 0.1.1)

Rename existing config.yaml to llmq.yaml.
Update scripts to use --config-path (or continue using --config) when you need an explicit location.
Remove hard dependency on llmq dashboard for CLI evaluations; llmq eval now runs standalone if API is unavailable.
If you parse CLI statuses in CI, adopt the documented exit codes (0/1/2).

API

# Start an evaluation
curl -X POST http://localhost:8000/api/v1/evaluate \
  -H "Content-Type: application/json" \
  -d '{"provider": "groq"}'

# Get run history
curl http://localhost:8000/api/v1/runs

# Compare providers
curl http://localhost:8000/api/v1/compare

Contributing

Contributions are welcome — whether it's a bug fix, new provider integration, docs improvement, or feature request.

git clone https://github.com/Emart29/llm-quality-gate.git
cd llm-quality-gate
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e .
llmq doctor               # Verify setup

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes and add tests
Run tests: python -m pytest tests/ -v
Submit a pull request

Roadmap

v1.1 — Custom metric plugins · Slack/Discord webhooks · A/B testing framework · Performance benchmarking

v1.2 — Multi-language datasets · Advanced regression analysis · Cost tracking per provider · Distributed evaluation

v2.0 — Visual prompt debugging · Automated prompt optimization · Enterprise SSO · Advanced analytics

License

MIT — see LICENSE for details.

⭐ Star on GitHub · 📦 PyPI · 🌐 Website

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Mar 19, 2026

0.1.2

Mar 18, 2026

This version

0.1.1

Feb 15, 2026

0.1.0

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmq_gate-0.1.1.tar.gz (85.1 kB view details)

Uploaded Feb 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmq_gate-0.1.1-py3-none-any.whl (91.9 kB view details)

Uploaded Feb 15, 2026 Python 3

File details

Details for the file llmq_gate-0.1.1.tar.gz.

File metadata

Download URL: llmq_gate-0.1.1.tar.gz
Upload date: Feb 15, 2026
Size: 85.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmq_gate-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6a2ce79bef490989c208e4b5ccea65773fdb76ca1b71f9617721a83e32e0193c`
MD5	`9a49bf298f27f7848e1fe4bf6280617b`
BLAKE2b-256	`4cf16712f1f580f0c92e154efcd2e9ce4a5c78961a8db8e0a7363192bd3aa16e`

See more details on using hashes here.

File details

Details for the file llmq_gate-0.1.1-py3-none-any.whl.

File metadata

Download URL: llmq_gate-0.1.1-py3-none-any.whl
Upload date: Feb 15, 2026
Size: 91.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmq_gate-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9bd98fd3636749ce5af8e39409b085a133c97d5ce4e5b3c3ae7e9e03ee8975ff`
MD5	`71bd39705fea60a08e4c5bd771dd722f`
BLAKE2b-256	`c1df8526fd89a7521827394720eaaccc722e1671ab0e1099a3eb03328d7b6301`

See more details on using hashes here.

llmq-gate 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMQ

The Problem

Quick Start

How It Works

Supported Providers

CI/CD Integration

Metrics

Dashboard

v0.1.1 Highlights

Configuration

CLI Reference

Migration Guide (<=0.1.0 -> 0.1.1)

API

Contributing

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes