Empirical Safety Harness for agentic AI coding systems. Scores AI-generated code on 5 metrics across 5 vendor conditions against one fixed spec.
Project description
AI Code Quality Auditor — the Referee Tool
An empirical Safety Harness for agentic AI coding systems. Quantifies where AI-assisted development fails at governance, security, and ethical alignment — before the code reaches production.
🟢 Try it in 30 seconds:
pipx install ai-code-quality-auditor
auditor --help
🚀 Or wire it into your CI in 6 lines (.github/workflows/auditor.yml):
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dominicrume/NEW-enterprise-ai-code-quality-auditor@main
with:
run-id: ${{ github.run_id }}
conditions: claude_code,cursor_agent
📊 Live dashboard: https://auditor-dashboard.fly.dev (pending deploy — see below)
This is the experimental instrument for the MSc dissertation "AI-Assisted Coding Assessment Tool: Evaluating LLM Performance, Governance, and Security in an Agent Education System" (Aston University, MSc AI & Business Strategy). The same instrument is the working prototype for the PhD extension at the Aston-Capgemini Centre of Excellence for Enterprise AI.
What it does
Given a fixed specification (the "spec box"), the Auditor:
- Runs five experimental conditions against the same task (human control, visualisation→Claude→Replit, Cursor IDE, autonomous agent).
- Captures every output and every interaction event.
- Scores each result on five empirical metrics: security vulnerability density, cyclomatic complexity, code duplication, hallucination frequency (features outside spec), and keystroke dynamics (correction frequency).
- Emits CSV/JSON reports for statistical comparison.
Quick start
cp .env.example .env
pip install -e .
auditor run --spec specs/agent_education_system.yaml --workflow human_control
auditor report --out data/reports/
Read in this order
docs/ARCHITECTURE.md— how the pieces fitdocs/METHODOLOGY.md— how an experiment is rundocs/METRICS.md— what each metric means and how it's computeddocs/ETHICS.md— GDPR, synthetic data, academic integritydocs/DISSERTATION_LINKAGE.md— which folder serves which proposal sectiondocs/ROADMAP.md— the PhD extension (API security + enterprise risk)
Principles
- One analyzer per metric. One adapter per AI workflow. Single responsibility.
- The spec is data, not code — externalised in
specs/for reproducibility. - Synthetic data only. No PII, no proprietary corporate records, ever.
- Every analyzer has a test. Green tests = trustable experiment.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_code_quality_auditor-0.2.0.tar.gz.
File metadata
- Download URL: ai_code_quality_auditor-0.2.0.tar.gz
- Upload date:
- Size: 42.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93aaf0a1970b2e8db7568f402af1e25c813179b54f1308064f72cc4ee104e640
|
|
| MD5 |
dda23fb06c035ba2a12971edfed304c9
|
|
| BLAKE2b-256 |
e2f3da0c497824acbfddcc1282008ac4ef5638fcb1b76fef03cdbf45d6bee4bd
|
File details
Details for the file ai_code_quality_auditor-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ai_code_quality_auditor-0.2.0-py3-none-any.whl
- Upload date:
- Size: 48.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e97aef0e6b5fed2843a34a6166f501a047dd040f3c4b4d514d300ccde0656aa
|
|
| MD5 |
e6f45189605d941666165dd247a506e9
|
|
| BLAKE2b-256 |
221a7ddc4db9f4b680b2e3f21ce8d20a2238b7b311ee7f496c21c491141e67fc
|