CI for AI-generated code — measures production readiness, not just correctness
Project description
QualBench — CI for AI-Generated Code
AI Cost Tracking
This project uses AI-generated code. Total cost: $0.1500 with 1 AI commits.
Generated on 2026-04-04 using openrouter/qwen/qwen3-coder-next
Correct code is not the same as mergeable code. eslint + code review, but for AI. Add to your pipeline in 2 minutes.
60 seconds to your first score
pip install qualbench
qualbench quickstart
No config, no API keys. QualBench evaluates your current diff and prints a Quality Score.
Add to CI in 2 minutes
# .github/workflows/qualbench.yml
name: QualBench
on: [pull_request]
jobs:
quality-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: softreck/qualbench-action@v1
with:
tool: prollama
fail_on_score: 70
Every AI-generated PR gets a quality review comment. Set fail_on_score and the pipeline fails if quality is below your threshold.
🧠 QualBench Review
Quality Score: 78/100
❌ Complexity increased (+12%)
⚠ Security: 1 new medium-severity finding
✔ Tests pass, no regressions
Verdict: needs_review
The problem
AI coding tools resolve 70–80% of benchmark tasks. But most AI-generated PRs are not mergeable without human fixes. Every existing benchmark asks "do tests pass?" — nobody asks "would a senior developer approve this PR?"
Six dimensions of production readiness
| Dimension | What it measures | Weight |
|---|---|---|
| Correctness | All tests pass, no regressions | 25% |
| Mergeability | Would a senior dev merge this? (1–5) | 25% |
| Security | New vulnerabilities introduced | 15% |
| Code quality | Complexity delta, dead code | 15% |
| Iterations | Attempts to reach acceptable output | 10% |
| Cost efficiency | USD per successful patch | 10% |
Verdicts: ready_to_merge (≥85), needs_review (65–84), not_merge_ready (<65).
CLI
qualbench run --tool prollama # score current diff
qualbench run --tool prollama --json # portable JSON output
qualbench run --mode cheap # lowest-cost models
qualbench quickstart # first score in 60 seconds
qualbench compare my_tool # vs leaderboard
qualbench info # dataset summary
qualbench doctor # check dependencies
One portable format everywhere
CLI, API, GitHub Action — same JSON schema. See docs/schema.md.
Adding your tool
cp runners/template.py runners/my_tool.py
# Implement run() → return portable schema
qualbench run --tool my_tool
# Submit PR with results
License
Licensed under Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qualbench-0.2.1.tar.gz.
File metadata
- Download URL: qualbench-0.2.1.tar.gz
- Upload date:
- Size: 59.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67b4289df237f94b1936e5d40f557c8d5d6b419b9c62fcac9be5aa09e9ee61d9
|
|
| MD5 |
d627d1bc4b7bf7eb5ebbca0b415ba8f7
|
|
| BLAKE2b-256 |
6fba1d3425ee824d76a4fa900606d0573517632c18507c910d7b8204835b2c05
|
File details
Details for the file qualbench-0.2.1-py3-none-any.whl.
File metadata
- Download URL: qualbench-0.2.1-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f882def8aa68e3ca27b1f2d4bf2e0d24e9e7c8d4e7cb88ecf3ca90328d050740
|
|
| MD5 |
e89632a72bf8479ab4f35f543b94145a
|
|
| BLAKE2b-256 |
2522a03c546abdfb12cf4cb01fe953cabec3d4ca54950e0cb8c06785855db77c
|