Skip to main content

CI for AI-generated code — measures production readiness, not just correctness

Project description

QualBench — CI for AI-Generated Code

AI Cost Tracking

AI Cost AI Model

This project uses AI-generated code. Total cost: $0.1500 with 1 AI commits.

Generated on 2026-04-04 using openrouter/qwen/qwen3-coder-next


Correct code is not the same as mergeable code. eslint + code review, but for AI. Add to your pipeline in 2 minutes.

License: Apache 2.0 Dataset: v0 CI


60 seconds to your first score

pip install qualbench
qualbench quickstart

No config, no API keys. QualBench evaluates your current diff and prints a Quality Score.

Add to CI in 2 minutes

# .github/workflows/qualbench.yml
name: QualBench
on: [pull_request]
jobs:
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: softreck/qualbench-action@v1
        with:
          tool: prollama
          fail_on_score: 70

Every AI-generated PR gets a quality review comment. Set fail_on_score and the pipeline fails if quality is below your threshold.

🧠 QualBench Review

Quality Score: 78/100

  ❌ Complexity increased (+12%)
  ⚠ Security: 1 new medium-severity finding
  ✔ Tests pass, no regressions

Verdict: needs_review

The problem

AI coding tools resolve 70–80% of benchmark tasks. But most AI-generated PRs are not mergeable without human fixes. Every existing benchmark asks "do tests pass?" — nobody asks "would a senior developer approve this PR?"

Six dimensions of production readiness

Dimension What it measures Weight
Correctness All tests pass, no regressions 25%
Mergeability Would a senior dev merge this? (1–5) 25%
Security New vulnerabilities introduced 15%
Code quality Complexity delta, dead code 15%
Iterations Attempts to reach acceptable output 10%
Cost efficiency USD per successful patch 10%

Verdicts: ready_to_merge (≥85), needs_review (65–84), not_merge_ready (<65).

CLI

qualbench run --tool prollama          # score current diff
qualbench run --tool prollama --json   # portable JSON output
qualbench run --mode cheap             # lowest-cost models
qualbench quickstart                   # first score in 60 seconds
qualbench compare my_tool              # vs leaderboard
qualbench info                         # dataset summary
qualbench doctor                       # check dependencies

One portable format everywhere

CLI, API, GitHub Action — same JSON schema. See docs/schema.md.

Adding your tool

cp runners/template.py runners/my_tool.py
# Implement run() → return portable schema
qualbench run --tool my_tool
# Submit PR with results

License

Licensed under Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qualbench-0.2.1.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qualbench-0.2.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file qualbench-0.2.1.tar.gz.

File metadata

  • Download URL: qualbench-0.2.1.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qualbench-0.2.1.tar.gz
Algorithm Hash digest
SHA256 67b4289df237f94b1936e5d40f557c8d5d6b419b9c62fcac9be5aa09e9ee61d9
MD5 d627d1bc4b7bf7eb5ebbca0b415ba8f7
BLAKE2b-256 6fba1d3425ee824d76a4fa900606d0573517632c18507c910d7b8204835b2c05

See more details on using hashes here.

File details

Details for the file qualbench-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: qualbench-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qualbench-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f882def8aa68e3ca27b1f2d4bf2e0d24e9e7c8d4e7cb88ecf3ca90328d050740
MD5 e89632a72bf8479ab4f35f543b94145a
BLAKE2b-256 2522a03c546abdfb12cf4cb01fe953cabec3d4ca54950e0cb8c06785855db77c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page