Agent-aware code quality system for multi-agent codebases
Project description
Arbiter
Agent-aware code quality system for multi-agent codebases.
In 2026, code is written by fleets of AI agents. Arbiter knows who wrote each line — human or AI — and scores quality accordingly.
What Makes Arbiter Different
| Feature | Traditional Tools | Arbiter |
|---|---|---|
| Agent attribution | None | First-class: tracks Claude, Codex, Gemini, Copilot, humans |
| Per-commit scoring | Repo-wide only | Scores each commit's changed files individually |
| Diff analysis | N/A | Score only what changed in a PR/branch |
| Transparency | Opaque score | Every score decomposes into lint + security + complexity |
| Agent-specific gates | N/A | Different quality thresholds per agent trust tier |
| Tool integration | Proprietary | Wraps tools you already trust: ruff, Bandit, radon, vulture |
| Dashboard | SaaS login | Single HTML file with per-agent timelines, commit feed, fleet view |
| Dependencies | Heavy | Analysis tools only; core is stdlib Python |
Quick Start
git clone https://github.com/hummbl-dev/arbiter.git
cd arbiter
# Install (makes `arbiter` command available)
pip install ".[analyzers]"
# Quick score (no persistence)
arbiter score /path/to/your/repo
# Full analysis with per-commit agent attribution
arbiter analyze /path/to/your/repo
# Score only files changed since main
arbiter diff /path/to/your/repo --base main
# Agent leaderboard
arbiter agents
# Start dashboard
arbiter serve --port 8080
# Open http://localhost:8080
Without install (PYTHONPATH)
PYTHONPATH=src python -m arbiter score /path/to/your/repo
With Docker
docker build -t arbiter .
docker run -p 8080:8080 -v /path/to/repo:/repo:ro arbiter
Architecture
Git Repo ──→ [Git Historian] ──→ [Analyzer Runner] ──→ [Scoring Engine] ──→ [SQLite Store]
│ │ │ │
agent attribution tool invocation weighted rubric trend data
(Co-Authored-By, (ruff, radon, (lint 35%, │
email matching) vulture, bandit) security 30%, ├──→ REST API
complexity 35%) └──→ Dashboard
┌────────────┐
│Diff Analyzer│ ←── v0.2: scores only changed files per commit/branch
└────────────┘
Per-Commit Scoring (v0.2)
Every commit is scored against only the files it changed, not the entire repo. This makes the agent leaderboard meaningful — a commit that touches 1 clean file scores differently than one that touches 10 messy files.
Diff Mode (v0.2)
arbiter diff scores only files changed since a base branch. Ideal for CI/PR quality gates — fast, scoped, actionable.
Agent Attribution
Arbiter identifies which agent authored each commit:
- Co-Authored-By trailer —
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> - Author email — maps
noreply@anthropic.com→ claude,codex@openai.com→ codex - Default — "human" if no agent pattern matches
Configure in agents.yml:
agents:
- name: claude
emails: [noreply@anthropic.com]
co_author_patterns: ["Claude\\s+(Opus|Sonnet|Haiku)"]
trust_tier: verified
quality_threshold: 70.0
- name: gemini
trust_tier: probation
quality_threshold: 80.0 # Higher bar for probationary agents
Analyzers (pluggable)
| Analyzer | Tool | What It Finds |
|---|---|---|
| Lint | ruff | Style violations, import errors, bugbear patterns |
| Complexity | radon | Cyclomatic complexity (grade A-F per function) |
| Security | bandit | Hardcoded secrets, shell injection, dangerous patterns |
| Dead Code | vulture | Unused functions, imports, variables |
| Duplication | AST hash | Near-duplicate function bodies |
Scoring
Deterministic. Same code → same score. Always.
Overall = Lint (35%) + Security (30%) + Complexity (35%)
Penalty points by severity:
CRITICAL: 50 | HIGH: 20 | MEDIUM: 5 | LOW: 1
Score = 100 - (total_penalty / LOC) * normalization_factor
Grades: A (90+) | B (80+) | C (70+) | D (60+) | F (<60)
Dashboard (v2)
Single HTML file with Chart.js. No build step, no React, no npm.
- Score Card — Big number + breakdown bars
- Agent Leaderboard — Who writes the best code? Color-coded by agent
- Per-Agent Quality Timeline — Score over time per agent (not just repo-wide)
- Commit Feed — Recent commits with agent, score, changes, timestamp
- Hotspot Files — Ranked by finding count
- Fleet View — Multi-repo quality grid with color-coded scores
- Tabbed UI — Overview, Commits, Fleet tabs
API
GET /api/score Current repo score
GET /api/agents Agent leaderboard
GET /api/agents/{name}/trend Per-agent quality over time
GET /api/trend?days=30 Quality over time
GET /api/worst?limit=20 Worst files
GET /api/commits Recent commits with scores
GET /api/commits/{hash} Detail for one commit
GET /api/fleet Fleet report (multi-repo)
GET /api/health System health
CLI Commands
arbiter analyze <repo> # Full analysis + per-commit scoring + persist
arbiter score <repo> [--json] [--exclude] # Quick score (no persist)
arbiter diff <repo> [--base main] [--json] # Score only changed files vs base branch
arbiter agents # Agent leaderboard
arbiter trend [--days 30] # Quality trend
arbiter worst [--limit 20] # Worst files
arbiter commits [--agent claude] # Recent commits
arbiter audit-fleet <directory> # Audit all repos in a directory
arbiter fleet-report # Fleet quality summary
arbiter triage # Auto-classify repos: green/yellow/red/archive
arbiter fix <repo> [--dry-run] # Auto-fix ruff findings + before/after score
arbiter serve [--port 8080] # API + dashboard
Tests
pip install ".[test]"
PYTHONPATH=src python -m pytest tests/ -v
# 78 tests, <7 seconds
Requirements
- Python 3.11+
- git (for historian)
- Optional: ruff, radon, vulture, bandit (for full analysis)
- Docker (for containerized deployment)
License
MIT — see LICENSE.
Built by HUMMBL LLC from production experience coordinating Claude, Codex, Gemini, and human engineers on a 6,000+ test codebase.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arbiter_dev-0.2.0.tar.gz.
File metadata
- Download URL: arbiter_dev-0.2.0.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90dae7b25bace48b1ec1b1c245ab5183f364974b36b7c4f35a6177aeea08606a
|
|
| MD5 |
f3c315b8e143d057ef2c61e548ca4306
|
|
| BLAKE2b-256 |
7987a21a2a443bc6744c371c0ab4bf9f90f47922a007951fa1645e178cf65fe6
|
File details
Details for the file arbiter_dev-0.2.0-py3-none-any.whl.
File metadata
- Download URL: arbiter_dev-0.2.0-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3360f2f9f95a12cdb9df1951f22b288d5b857c16c83358dbf483f096cd3dea2
|
|
| MD5 |
f7a807810d6d0e8ae0a4c9fb1d1def57
|
|
| BLAKE2b-256 |
05ddb30cac372a192cec09a4f3a8acaedc79281f70e7f4414f619afbbbed7a1c
|