Skip to main content

CI quality gate that scores pull requests by how hard they are to review

Project description

reviewability

A CI/CD quality gate that scores pull requests by how hard they are to review.

Catch diffs that are too large, too tangled, or too scattered to review safely — before they merge.

code review bottleneck

It doesn't matter how fast AI generates code — the bottleneck is the human reviewer.

Installation

pip install reviewability

Requires Python 3.12+.

The Idea

A pull request can be hard to review not because the code is poorly written, but because of how the changes are combined. Mixing renames, movements, and logic changes in one PR makes each harder to verify. This is especially common with AI-generated code. Unlike linters, Reviewability does not analyze the code — only how the changes are structured.

tangled diff example

A clean-code change can still turn into a reviewability disaster when refactors, renames, and behavior updates are mixed together.

When a diff scores low, the typical remedies are splitting it into focused pull requests or deferring non-essential changes.

Reviewability computes metrics at the level of individual hunks, files, and the whole diff, feeding into Reviewability Scores (0.0 = hardest, 1.0 = easiest) with configurable thresholds for what counts as problematic.

Key Concepts

  • Hunk — a contiguous block of changes within a single file (the smallest unit of analysis)
  • Metric — a calculated value attached to a hunk, a file, or the whole diff
  • Score — a float [0.0, 1.0] representing reviewability at hunk, file, or diff level

Extensibility

The metric system is designed to be extended:

  • Add a metric — subclass HunkMetric, FileMetric, or OverallMetric, implement calculate(), register via registry.add()
  • Adjust scoring — provide a custom ReviewabilityScorer implementation
  • Adjust thresholds — edit the default config or provide your own reviewability.toml

Usage

# Analyze a range of commits
reviewability HEAD~1 HEAD

# Analyze from stdin
git diff HEAD~1 | reviewability --from-stdin

# Use a custom config
reviewability --config path/to/reviewability.toml HEAD~1 HEAD

# Include per-file and per-hunk breakdowns
reviewability --detailed HEAD~1 HEAD

Output is JSON. Exit code is 0 if the gate passes, 1 if it fails.

Claude Code Skill

If you use Claude Code, a /reviewability skill is included. It runs the tool on the current diff, summarizes the results, and attempts to address any recommendations directly.

Configuration

All thresholds and limits are configured via a single reviewability.toml file. The tool looks for it in the current directory, or you can specify a path explicitly:

reviewability -c path/to/reviewability.toml HEAD~1 HEAD

If no config file is found, the built-in default is used. You can edit that file directly to change the defaults, or copy it into your project root. The config must contain all mandatory fields — there is no merging with defaults.

# Scores below these thresholds mark hunks/files as problematic
hunk_score_threshold = 0.5
file_score_threshold = 0.5

# Size limits (used for score normalisation)
max_diff_lines = 500
max_hunk_lines = 50

# Gate: fail if overall score drops below this (provisional, based on calibration)
min_overall_score = 0.7

# Optional limits (remove a line to disable that check)
max_problematic_hunks = 3
max_problematic_files = 2
max_file_hunk_count = 5
max_files_changed = 10
max_added_lines = 400

[movement_detection]
hunk_min_lines = 8
file_min_lines = 15
similarity_threshold = 0.95

Movement Detection

Moved code is easy to review — the logic hasn't changed, only the location. The tool detects when a block of code is deleted from one place and inserted elsewhere (accounting for reindentation and package/import changes), and treats those hunks and files as relocations.

Relocations receive a perfect score and are excluded from the size calculations that drive the overall score. A diff that is large only because of relocations is not penalized.

Metrics

Metrics are calculated at three levels: hunk, file, and overall diff.

Hunk-level

Metric Description
hunk.lines_changed Total lines added and removed in a hunk
hunk.added_lines Lines added in a hunk
hunk.removed_lines Lines removed in a hunk
hunk.context_lines Unchanged context lines surrounding the change
hunk.change_balance Ratio of added lines to total changed lines (0.0 = pure deletion, 1.0 = pure addition)
hunk.is_likely_moved Whether this hunk is a movement of code from another location

File-level

Metric Description
file.lines_changed Total lines added and removed across all hunks in a file
file.added_lines Total lines added in a file
file.removed_lines Total lines removed in a file
file.hunk_count Number of separate change regions in a file
file.max_hunk_lines Lines changed in the largest single hunk within a file
file.is_likely_moved Whether this file is a movement from another path

Overall-level

Metric Description
overall.lines_changed Total lines changed across the entire diff
overall.added_lines Total lines added across the entire diff
overall.removed_lines Total lines removed across the entire diff
overall.files_changed Number of files changed
overall.moved_lines Total lines in hunks identified as code movements
overall.change_entropy Shannon entropy of the distribution of changes across files
overall.largest_file_ratio Fraction of total diff lines in the most-changed file
overall.scatter_factor Normalized entropy of how changes are distributed across files (0.0 = all in one file, 1.0 = evenly spread)
overall.problematic_hunk_count Hunks with a score below the configured threshold
overall.problematic_file_count Files with a score below the configured threshold

Overall Scoring

score = max(0, 1 − effective_size_ratio × (1 + scatter_factor))

effective_size_ratio = (lines_changed − moved_lines) / max_diff_lines   [capped at 1.0]

The score is driven by effective diff size and scatter. Moved lines are excluded from the size count — relocations are easy to review and should not penalize the score.

scatter_factor measures how evenly changes are spread across files (normalized entropy, 0.0 = all in one file, 1.0 = evenly spread). It amplifies the size penalty: a large diff that touches many files evenly scores worse than an equally large diff concentrated in a few files.

A large but focused diff (e.g. a bulk rename in one file) or a scattered but small diff each score better than a diff that is both large and scattered.

Validation

The scoring formula was calibrated against ~2,000 pull requests from 15 permissively licensed open-source repositories. Ground truth labels were derived from review outcomes (change requests, revision cycles, comment density). Metrics that did not improve prediction over a naive size baseline were removed from the formula.

Research

Metrics are informed by peer-reviewed research on code review effectiveness. Most are heuristics derived from research concepts rather than direct paper-defined variables:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reviewability-0.2.1.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reviewability-0.2.1-py3-none-any.whl (49.4 kB view details)

Uploaded Python 3

File details

Details for the file reviewability-0.2.1.tar.gz.

File metadata

  • Download URL: reviewability-0.2.1.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reviewability-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ad1a7065c46cd9477c652ce4731743aef4811915ee063b53c3a00571c5ce87ae
MD5 8ac44c59d5ac4d5ada939146a61635c3
BLAKE2b-256 6b8a4c6b4d0e6a9102fe4da682e26fd928fbab241afe0d496ea261a2f6e514cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for reviewability-0.2.1.tar.gz:

Publisher: publish.yml on Kirvolque/reviewability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reviewability-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: reviewability-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reviewability-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 67bd0efac441c43c1af43fa1dca8d418edc8a7d32b3918a8e30f381c5fc2c675
MD5 17497d8449d60ebe055d5ab95f709ee8
BLAKE2b-256 6eac06a123599b1cd02a7a9a8e1d97ff070a3120d50f7abfb9c0a7573b6761a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for reviewability-0.2.1-py3-none-any.whl:

Publisher: publish.yml on Kirvolque/reviewability

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page