LLM Quality Gate - A provider-agnostic evaluation framework for LLM applications

These details have not been verified by PyPI

Project links

Project description

LLMQ

An open-source LLM regression testing & CI quality gate framework.

Prevent prompt and model regressions before they reach production with automated testing across 8 LLM providers.

Why LLMQ?

LLM applications fail silently. A prompt change that works in development can degrade performance in production. Model updates can break existing functionality. Without systematic testing, these regressions go undetected until users complain.

Common LLM Regression Examples:

Prompt optimization improves one task but breaks another
Model updates change response format, breaking downstream parsing
Provider API changes affect response quality
Temperature adjustments reduce consistency
Context length changes truncate important information

LLMQ catches these issues before deployment with automated regression testing and quality gates.

Quick Start

Get running in 5 minutes:

# 1. Install
pip install -e .

# 2. Initialize project
llmq init

# 3. Set API key (copy .env.example to .env)
echo "GROQ_API_KEY=your_key_here" >> .env

# 4. Run evaluation
llmq eval --provider groq

View results at http://localhost:8000 after running llmq dashboard.

CI Integration

Add to .github/workflows/llm-quality-gate.yml:

name: LLM Quality Gate

on:
  pull_request:
    paths: ['prompts/**', 'llm/**', 'llmq.yaml']

jobs:
  quality-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install LLMQ
        run: pip install -e .
      
      - name: Run Quality Gate
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
        run: |
          llmq eval --provider groq --fail-on-gate

Architecture

LLMQ Architecture

Flow: Dataset → Provider → Metrics → Quality Gates → Results

Supported Providers

Provider	Models	API Key Required	Cost
Groq	Llama 3.1, Mixtral	✅	Free tier
OpenAI	GPT-3.5, GPT-4	✅	Paid
Claude	Claude 3 Haiku/Sonnet	✅	Paid
Gemini	Gemini 1.5 Flash/Pro	✅	Free tier
HuggingFace	Open models	✅	Free
OpenRouter	100+ models	✅	Varies
Ollama	Local models	❌	Free
LocalAI	Local models	❌	Free

CLI Commands

# Project setup
llmq init                           # Initialize new project
llmq doctor                         # Check system health

# Evaluation
llmq eval --provider groq           # Run evaluation
llmq eval --provider openai --fail-on-gate  # CI mode
llmq compare                        # Compare providers

# Management
llmq providers                      # List provider status
llmq runs --limit 10               # View recent runs
llmq dashboard                      # Start web interface
llmq settings --set '{"quality_gates": {"task_success_threshold": 0.9}}'

Dashboard

Dashboard Overview

🎬 Interactive Demo — See the full CLI + Dashboard walkthrough.

Features:

Historical performance tracking
Provider comparison charts
Quality gate pass/fail trends
Test case drill-down analysis

Configuration

llmq.yaml:

llm:
  default_provider: "groq"
  temperature: 0.0
  max_tokens: 1000

providers:
  groq:
    api_key_env: "GROQ_API_KEY"
    model: "llama-3.1-8b-instant"
  openai:
    api_key_env: "OPENAI_API_KEY"
    model: "gpt-3.5-turbo"

quality_gates:
  task_success_threshold: 0.8
  relevance_threshold: 0.7
  hallucination_threshold: 0.1

evals/dataset.json:

{
  "test_cases": [
    {
      "id": "example_1",
      "task_type": "question_answering",
      "input": "What is the capital of France?",
      "expected_output": "Paris",
      "context": "Geography question",
      "reference": "Paris is the capital of France."
    }
  ]
}

Metrics

Task Success: Exact match + semantic similarity
Relevance: Embedding-based cosine similarity
Hallucination: LLM-as-judge detection
Consistency: Multi-run variance analysis

API

# Start evaluation
curl -X POST http://localhost:8000/api/v1/evaluate \
  -H "Content-Type: application/json" \
  -d '{"provider": "groq"}'

# Get results
curl http://localhost:8000/api/v1/runs

# Provider comparison
curl http://localhost:8000/api/v1/compare

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes and add tests
Run tests: python -m pytest tests/ -v
Submit a pull request

Development setup:

git clone https://github.com/Emart29/llm-quality-gate.git
cd llm-quality-gate
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e .
llmq doctor  # Verify setup

Roadmap

v1.1

Custom metric plugins
Slack/Discord webhooks
A/B testing framework
Performance benchmarking

v1.2

Multi-language datasets
Advanced regression analysis
Cost tracking per provider
Distributed evaluation

v2.0

Visual prompt debugging
Automated prompt optimization
Enterprise SSO integration
Advanced analytics

License: MIT | Python: 3.8+ | Status: Production Ready

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Mar 19, 2026

0.1.2

Mar 18, 2026

0.1.1

Feb 15, 2026

This version

0.1.0

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmq_gate-0.1.0.tar.gz (85.3 kB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmq_gate-0.1.0-py3-none-any.whl (92.2 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file llmq_gate-0.1.0.tar.gz.

File metadata

Download URL: llmq_gate-0.1.0.tar.gz
Upload date: Feb 12, 2026
Size: 85.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmq_gate-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fa824f55acd1db89f9f1ac738d12748c9953f753609dfcb6275c76d660efd67b`
MD5	`7dc31a5dab12b1bad1e265700fdf4e00`
BLAKE2b-256	`2e4e91cfa1e2e3167c9480988918142b8740fbbecd6c823bca0d171ccf376be1`

See more details on using hashes here.

File details

Details for the file llmq_gate-0.1.0-py3-none-any.whl.

File metadata

Download URL: llmq_gate-0.1.0-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 92.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for llmq_gate-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05a1081a5f15300770d6c2104825185e0c368c93653952bdc271dd90c57f300a`
MD5	`13faff7452571485f620944501f3c2c0`
BLAKE2b-256	`484654f446832ed59abf7844d8c4efaa21d7d8bf3cee75655ec32d2f1ac514c7`

See more details on using hashes here.

llmq-gate 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLMQ

Why LLMQ?

Quick Start

CI Integration

Architecture

Supported Providers

CLI Commands

Dashboard

Configuration

Metrics

API

Contributing

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes