Agent-aware code quality system for multi-agent codebases

These details have not been verified by PyPI

Project description

Arbiter

Agent-aware code quality system for multi-agent codebases.

In 2026, code is written by fleets of AI agents. Arbiter knows who wrote each line — human or AI — and scores quality accordingly.

What Makes Arbiter Different

Feature	Traditional Tools	Arbiter
Agent attribution	None	First-class: tracks Claude, Codex, Gemini, Copilot, humans
Per-commit scoring	Repo-wide only	Scores each commit's changed files individually
Diff analysis	N/A	Score only what changed in a PR/branch
Transparency	Opaque score	Every score decomposes into lint + security + complexity
Agent-specific gates	N/A	Different quality thresholds per agent trust tier
Tool integration	Proprietary	Wraps tools you already trust: ruff, Bandit, radon, vulture
Dashboard	SaaS login	Single HTML file with per-agent timelines, commit feed, fleet view
Dependencies	Heavy	Analysis tools only; core is stdlib Python

Quick Start

git clone https://github.com/hummbl-dev/arbiter.git
cd arbiter

# Install (makes `arbiter` command available)
pip install ".[analyzers]"

# Quick score (no persistence)
arbiter score /path/to/your/repo

# Full analysis with per-commit agent attribution
arbiter analyze /path/to/your/repo

# Score only files changed since main
arbiter diff /path/to/your/repo --base main

# Agent leaderboard
arbiter agents

# Start dashboard
arbiter serve --port 8080
# Open http://localhost:8080

Without install (PYTHONPATH)

PYTHONPATH=src python -m arbiter score /path/to/your/repo

With Docker

docker build -t arbiter .
docker run -p 8080:8080 -v /path/to/repo:/repo:ro arbiter

Architecture

Git Repo ──→ [Git Historian] ──→ [Analyzer Runner] ──→ [Scoring Engine] ──→ [SQLite Store]
                  │                      │                     │                    │
           agent attribution      tool invocation        weighted rubric       trend data
           (Co-Authored-By,       (ruff, radon,          (lint 35%,             │
            email matching)        vulture, bandit)        security 30%,        ├──→ REST API
                                                           complexity 35%)     └──→ Dashboard
             ┌────────────┐
             │Diff Analyzer│ ←── v0.2: scores only changed files per commit/branch
             └────────────┘

Per-Commit Scoring (v0.2)

Every commit is scored against only the files it changed, not the entire repo. This makes the agent leaderboard meaningful — a commit that touches 1 clean file scores differently than one that touches 10 messy files.

Diff Mode (v0.2)

arbiter diff scores only files changed since a base branch. Ideal for CI/PR quality gates — fast, scoped, actionable.

Agent Attribution

Arbiter identifies which agent authored each commit:

Co-Authored-By trailer — Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author email — maps noreply@anthropic.com → claude, codex@openai.com → codex
Default — "human" if no agent pattern matches

Configure in agents.yml:

agents:
  - name: claude
    emails: [noreply@anthropic.com]
    co_author_patterns: ["Claude\\s+(Opus|Sonnet|Haiku)"]
    trust_tier: verified
    quality_threshold: 70.0
  - name: gemini
    trust_tier: probation
    quality_threshold: 80.0  # Higher bar for probationary agents

Analyzers (pluggable)

Analyzer	Tool	What It Finds
Lint	ruff	Style violations, import errors, bugbear patterns
Complexity	radon	Cyclomatic complexity (grade A-F per function)
Security	bandit	Hardcoded secrets, shell injection, dangerous patterns
Dead Code	vulture	Unused functions, imports, variables
Duplication	AST hash	Near-duplicate function bodies

Scoring

Deterministic. Same code → same score. Always.

Overall = Lint (35%) + Security (30%) + Complexity (35%)

Penalty points by severity:
  CRITICAL: 50 | HIGH: 20 | MEDIUM: 5 | LOW: 1

Score = 100 - (total_penalty / LOC) * normalization_factor

Grades: A (90+) | B (80+) | C (70+) | D (60+) | F (<60)

Dashboard (v2)

Single HTML file with Chart.js. No build step, no React, no npm.

Score Card — Big number + breakdown bars
Agent Leaderboard — Who writes the best code? Color-coded by agent
Per-Agent Quality Timeline — Score over time per agent (not just repo-wide)
Commit Feed — Recent commits with agent, score, changes, timestamp
Hotspot Files — Ranked by finding count
Fleet View — Multi-repo quality grid with color-coded scores
Tabbed UI — Overview, Commits, Fleet tabs

API

GET /api/score                  Current repo score
GET /api/agents                 Agent leaderboard
GET /api/agents/{name}/trend    Per-agent quality over time
GET /api/trend?days=30          Quality over time
GET /api/worst?limit=20         Worst files
GET /api/commits                Recent commits with scores
GET /api/commits/{hash}         Detail for one commit
GET /api/fleet                  Fleet report (multi-repo)
GET /api/health                 System health

CLI Commands

arbiter analyze <repo>                     # Full analysis + per-commit scoring + persist
arbiter score <repo> [--json] [--exclude]  # Quick score (no persist)
arbiter diff <repo> [--base main] [--json] # Score only changed files vs base branch
arbiter agents                             # Agent leaderboard
arbiter trend [--days 30]                  # Quality trend
arbiter worst [--limit 20]                 # Worst files
arbiter commits [--agent claude]           # Recent commits
arbiter audit-fleet <directory>            # Audit all repos in a directory
arbiter fleet-report                       # Fleet quality summary
arbiter triage                             # Auto-classify repos: green/yellow/red/archive
arbiter fix <repo> [--dry-run]             # Auto-fix ruff findings + before/after score
arbiter serve [--port 8080]                # API + dashboard

Tests

pip install ".[test]"
PYTHONPATH=src python -m pytest tests/ -v
# 78 tests, <7 seconds

Requirements

Python 3.11+
git (for historian)
Optional: ruff, radon, vulture, bandit (for full analysis)
Docker (for containerized deployment)

License

MIT — see LICENSE.

Built by HUMMBL LLC from production experience coordinating Claude, Codex, Gemini, and human engineers on a 6,000+ test codebase.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arbiter_dev-0.2.0.tar.gz (32.8 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arbiter_dev-0.2.0-py3-none-any.whl (30.3 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file arbiter_dev-0.2.0.tar.gz.

File metadata

Download URL: arbiter_dev-0.2.0.tar.gz
Upload date: Mar 24, 2026
Size: 32.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for arbiter_dev-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`90dae7b25bace48b1ec1b1c245ab5183f364974b36b7c4f35a6177aeea08606a`
MD5	`f3c315b8e143d057ef2c61e548ca4306`
BLAKE2b-256	`7987a21a2a443bc6744c371c0ab4bf9f90f47922a007951fa1645e178cf65fe6`

See more details on using hashes here.

File details

Details for the file arbiter_dev-0.2.0-py3-none-any.whl.

File metadata

Download URL: arbiter_dev-0.2.0-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 30.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for arbiter_dev-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d3360f2f9f95a12cdb9df1951f22b288d5b857c16c83358dbf483f096cd3dea2`
MD5	`f7a807810d6d0e8ae0a4c9fb1d1def57`
BLAKE2b-256	`05ddb30cac372a192cec09a4f3a8acaedc79281f70e7f4414f619afbbbed7a1c`

See more details on using hashes here.

arbiter-dev 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Arbiter

What Makes Arbiter Different

Quick Start

Without install (PYTHONPATH)

With Docker

Architecture

Per-Commit Scoring (v0.2)

Diff Mode (v0.2)

Agent Attribution

Analyzers (pluggable)

Scoring

Dashboard (v2)

API

CLI Commands

Tests

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes