3-state quality gate for code review
Project description
code-forge
A 5-step code review pipeline for AI coding assistants. Treats review as a state machine: three independent passes per cycle, three consecutive clean cycles required, any finding resets the counter. The minimum path to a commit is 9 static review passes plus a runtime smoke test.
Why
AI coding assistants ship code that compiles, runs, and looks right. Single-pass review (Copilot, Cursor, CodeRabbit, etc.) catches the obvious defects but misses two failure modes:
- Author and reviewer collapse. When the same model writes and reviews the change, it inherits its own blind spots. code-forge runs three independent review perspectives (qodo, expert, adversarial) and treats their findings as untrusted claims that must be reproduced before any fix.
- Self-claimed completion. Hooks that gate on "I finished" markers are
bypassable by any agent that can write a string. code-forge gates on
actual state: a real
pre-commithook running the test suite, a mutation runner proving the tests catch regressions, and a coverage heuristic detecting drift across components.
Quick start
pip install code-review-forge
code-forge install-skill
The first command installs the CLI (Python >=3.12). The second copies the
6 review skills into ~/.claude/skills/. Then in Claude Code, run the
full pipeline:
/code-forge
Or invoke individual passes:
/qodo-review # change-aware pre-review (Pass 1)
/code-review-expert # SOLID, architecture, security (Pass 2)
/adversarial-qe # red-team QE, 12 attack dimensions (Pass 3)
/kernel-fp-verify # false-positive verification (Step 3.5)
/smoke-test # runtime verification (Step 4)
Other agent targets:
code-forge install-skill --target vscode # <cwd>/.claude/skills/
code-forge install-skill --target universal # <cwd>/.agents/skills/
code-forge install-skill --dest /path/to/dir # explicit location
code-forge install-skill --skill code-forge # one skill only
code-forge install-skill --force # overwrite existing
The pipeline
Code Change
|
v
[Step 0] Syntax (0a) + Lint (0b) + Non-ASCII (0c)
|
v
[Cycle 1] Pass 1: qodo-review
Pass 2: code-review-expert
Pass 3: adversarial-qe
|
| zero findings -> counter += 1
| any finding -> fix, counter = 0, restart Cycle 1
v
[Cycle 2] (same 3 passes)
|
v
[Cycle 3] (same 3 passes)
| counter = 3
v
[Step 3.5] kernel-fp-verify (if fixes were applied during cycles)
|
v
[Step 4] smoke-test (runtime verification)
|
v
[COMMIT GATE] # post-review-c3
What ships
| Skill | Step | Purpose |
|---|---|---|
| code-forge | Orchestrator | Runs the full 5-step pipeline |
| qodo-review | Pass 1 | Change-aware pre-review with feature-grouped walkthrough |
| code-review-expert | Pass 2 | SOLID, architecture, security analysis |
| adversarial-qe | Pass 3 | Red-team QE with 12 attack dimensions |
| kernel-fp-verify | Step 3.5 | 10-step false-positive verification protocol |
| smoke-test | Step 4 | Runtime verification with bash assertion primitives |
What code-forge does that others don't
- Multi-pass convergence. Three consecutive clean cycles from three independent perspectives. Any finding resets the counter to zero. Copilot, CodeRabbit, Cursor, and Devin are single-pass.
- Anti-hallucination gates. code-forge treats LLM review output as untrusted claims. Parser-deterministic findings auto-confirm; LLM findings require falsification before disposition; Step 4 runs the actual code. Prompt-only mitigations cap at 15% hallucination reduction; tool grounding reaches 65-80% (CodeAnt and Suprmind data, 2026).
- Real commit gate (R1). A real
.git/hooks/pre-committhat runs the test suite and blocks on NEW failures vs a baseline. Gates on diff content and test results, not a self-claimed marker. Closes the terminal-and-IDE bypass that PreToolUse hooks cannot reach. - Mutation-gated review (R2). Diff-scoped mutation runs after static review and before the verdict. Each mutant introduced into the changed code is run against the test suite; a surviving mutant flags tests that cannot catch the change. Toothless tests block the same cycle that finds the defect.
- Cross-component coverage heuristic (R3). Detects diffs that span multiple source areas with a changed function signature. An opt-in components mapping raises an uncertain finding when a hub and a dependent both change in the same diff and no integration test under the dependent's paths matches the configured test patterns.
Honest limitations
- No cross-repo impact. code-forge reviews a single repository.
Multi-repo dependency analysis requires CodeRabbit-style tooling or
Chromium's
Cq-Depend. - No feedback learning. code-forge does not adapt to dismissed findings or developer preferences. Each review is independent.
- No long-term maintainability scoring. code-forge does not assess technical debt accumulation. SonarQube's tech-debt tracking is the closest automated approximation.
- No performance regression suite. No benchmark harness equivalent to
Rust's
perf.rust-lang.org. - R3 is artifact-presence, not coverage proof. The cross-component check confirms an integration test file exists under the expected path; it does not verify that the test exercises the specific code that changed. A present-but-stale test passes the gate.
Static review (3-cycle convergence) is one layer. code-forge learned from its own Phase 2 experience where 9 static passes and 639 mock tests missed 3 bugs that dynamic verification caught. Verification grounding (test suite + mutation + e2e coverage check) is the thesis -- not a passes count.
Requirements
- Python 3.12 or newer
jqfor the bash smoke primitives- Claude Code or a compatible AI coding assistant for skill invocation
Installation alternatives
git clone
git clone https://github.com/HouMinXi/forge.git
cd forge
./install.sh
Symlinks each of the 6 skills from ~/.claude/skills/<name> to this
repo's skills/<name>. Hook installation is manual -- see
hooks/README.md and hooks/settings-snippet.json.
Hooks (reference implementations)
| Hook | Trigger | Purpose |
|---|---|---|
check_worktree.sh |
PreToolUse Edit/Write | Block edits in main worktree |
check_non_ascii.sh |
PreToolUse Write/Edit | Non-ASCII character detection |
check_read_before_edit.sh |
PreToolUse Edit | 1:1 read-before-edit ratio |
check_review_tracker.sh |
PostToolUse Bash | Review cycle state machine |
check_git_commit_review.sh |
PreToolUse Bash | Block unreviewed commits |
check_git_push_review.sh |
PreToolUse Bash | Block unreviewed pushes |
Some hooks contain environment-specific logic (Kerberos auth, pattern
matching) you will need to adapt. See hooks/README.md.
Bash smoke primitives
skills/smoke-test/test-library/shell/ ships 19 reusable bash assertion
functions with no dependencies beyond jq:
run_and_capture,run_concurrent,concurrent_waitassert_success,assert_failure,assert_exit_codeassert_output_contains,assert_output_not_containsassert_stderr_contains,assert_stderr_emptyassert_file_exists,assert_file_not_exists,assert_file_containsassert_json_validassert_no_zombie,assert_temp_cleanassert_no_command_exec,assert_no_command_exec_json,assert_no_path_traversal
A backward-compatible symlink at test-library/ points to
skills/smoke-test/test-library/ for users migrating from
bash-smoke-primitives.
Documentation
evidence/cross-model-complementarity.md-- why 3 different review passesevidence/design-iterations.md-- how the pipeline evolvedevidence/ground-truth-verification.md-- why smoke tests must inject bugsevidence/shell-assertion-footguns.md-- 5 bash-specific trapsevidence/v9-model-coverage-matrix.md-- 4-model coverage datahooks/README.md-- hook installation and adaptation guide
Contributing
Issues and discussion: https://github.com/HouMinXi/forge/issues.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file code_review_forge-2.1.0.tar.gz.
File metadata
- Download URL: code_review_forge-2.1.0.tar.gz
- Upload date:
- Size: 209.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24ca8464cbeffd890dfe4d8a726e336a49a3a4dc7ce3d8789deba80fafa5577f
|
|
| MD5 |
ed14053df0ac8cd1322a0d72bddb2f1e
|
|
| BLAKE2b-256 |
014b88f3c3d082a673986c7fc1fb6d72aa26cc63a6491068371e3b488f68d823
|
File details
Details for the file code_review_forge-2.1.0-py3-none-any.whl.
File metadata
- Download URL: code_review_forge-2.1.0-py3-none-any.whl
- Upload date:
- Size: 153.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f5f4d9387ca2dda0599d9c640c4c96aa42ee0111a4e271dddae99601dc8ac59
|
|
| MD5 |
5d03e54b86f7e49a890fbed125f37cfd
|
|
| BLAKE2b-256 |
32282b127f540f5721d503b5a070043d6c3fa367b1d6c4718602059fe0808562
|