CI for AI-generated code — measures production readiness, not just correctness

These details have not been verified by PyPI

Project links

Project description

QualBench — CI for AI-Generated Code

AI Cost Tracking

AI Cost AI Model

This project uses AI-generated code. Total cost: $0.1500 with 1 AI commits.

Generated on 2026-04-04 using openrouter/qwen/qwen3-coder-next

Correct code is not the same as mergeable code. eslint + code review, but for AI. Add to your pipeline in 2 minutes.

60 seconds to your first score

pip install qualbench
qualbench quickstart

No config, no API keys. QualBench evaluates your current diff and prints a Quality Score.

Add to CI in 2 minutes

# .github/workflows/qualbench.yml
name: QualBench
on: [pull_request]
jobs:
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: softreck/qualbench-action@v1
        with:
          tool: prollama
          fail_on_score: 70

Every AI-generated PR gets a quality review comment. Set fail_on_score and the pipeline fails if quality is below your threshold.

🧠 QualBench Review

Quality Score: 78/100

  ❌ Complexity increased (+12%)
  ⚠ Security: 1 new medium-severity finding
  ✔ Tests pass, no regressions

Verdict: needs_review

The problem

AI coding tools resolve 70–80% of benchmark tasks. But most AI-generated PRs are not mergeable without human fixes. Every existing benchmark asks "do tests pass?" — nobody asks "would a senior developer approve this PR?"

Six dimensions of production readiness

Dimension	What it measures	Weight
Correctness	All tests pass, no regressions	25%
Mergeability	Would a senior dev merge this? (1–5)	25%
Security	New vulnerabilities introduced	15%
Code quality	Complexity delta, dead code	15%
Iterations	Attempts to reach acceptable output	10%
Cost efficiency	USD per successful patch	10%

Verdicts: ready_to_merge (≥85), needs_review (65–84), not_merge_ready (<65).

CLI

qualbench run --tool prollama          # score current diff
qualbench run --tool prollama --json   # portable JSON output
qualbench run --mode cheap             # lowest-cost models
qualbench quickstart                   # first score in 60 seconds
qualbench compare my_tool              # vs leaderboard
qualbench info                         # dataset summary
qualbench doctor                       # check dependencies

One portable format everywhere

CLI, API, GitHub Action — same JSON schema. See docs/schema.md.

Adding your tool

cp runners/template.py runners/my_tool.py
# Implement run() → return portable schema
qualbench run --tool my_tool
# Submit PR with results

License

Licensed under Apache-2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

Apr 9, 2026

This version

0.2.1

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qualbench-0.2.1.tar.gz (59.5 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qualbench-0.2.1-py3-none-any.whl (14.5 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file qualbench-0.2.1.tar.gz.

File metadata

Download URL: qualbench-0.2.1.tar.gz
Upload date: Apr 4, 2026
Size: 59.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qualbench-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`67b4289df237f94b1936e5d40f557c8d5d6b419b9c62fcac9be5aa09e9ee61d9`
MD5	`d627d1bc4b7bf7eb5ebbca0b415ba8f7`
BLAKE2b-256	`6fba1d3425ee824d76a4fa900606d0573517632c18507c910d7b8204835b2c05`

See more details on using hashes here.

File details

Details for the file qualbench-0.2.1-py3-none-any.whl.

File metadata

Download URL: qualbench-0.2.1-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 14.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for qualbench-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f882def8aa68e3ca27b1f2d4bf2e0d24e9e7c8d4e7cb88ecf3ca90328d050740`
MD5	`e89632a72bf8479ab4f35f543b94145a`
BLAKE2b-256	`2522a03c546abdfb12cf4cb01fe953cabec3d4ca54950e0cb8c06785855db77c`

See more details on using hashes here.

qualbench 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

QualBench — CI for AI-Generated Code

AI Cost Tracking

60 seconds to your first score

Add to CI in 2 minutes

The problem

Six dimensions of production readiness

CLI

One portable format everywhere

Adding your tool

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes