Deterministic, offline static-analysis tool that carves any git repo into units and extracts per-unit factual signals (LLM call sites, regex parse sites, frameworks, integrations, Taxi schemas) with file:line evidence. Exposes analysis as an MCP server.
Project description
CodeWalker
Deterministic, offline code exploration — ground truth for AI agents.
CodeWalker carves any git repo into navigable units and extracts factual
signals about what the code calls, imports and integrates — LLM call sites
(provider + model), frameworks, integrations, prompts, git recency, Taxi schemas
— each with file:line evidence, no LLM, no code execution.
It's the deterministic half of a two-step idea:
① CodeWalker explores the repo → facts, every one a
file:line. ② your LLM reads those facts and maps the codebase → grounded, cited, not guessing.
It exposes its analysis as an MCP server (so Claude Code and any agent use it as ground truth) plus a local web explorer.
Philosophy
- Signals are ground truth, derived from AST / regex / filesystem / git — never an LLM guess. Classification is a navigational heuristic; the evidence is the truth.
- Every signal carries evidence: a
file:lineyou can jump to. No claim without a location. - Cheap deterministic substrate: any LLM "judgment" runs on top of this output, not inside it.
- Generalize: nothing is hardcoded to a repo, company or domain — every target is configuration.
- First-class Taxi:
.taxischemas are parsed into a semantic-type usage graph (which models/services consume which semantic types).
What it extracts (per unit)
Size & languages, models referenced (gpt-*, claude-*, …), LLM call sites
(provider + model + via + file:line), regex parse sites, frameworks,
integrations, an AST-derived call graph (imports / subprocess / http / aws / db
/ tools / a2a / mcp), prompts + a request table, output formats, key deps, git
recency, a heuristic classification + status, a deterministic plain-English
summary, an architecture graph, and a full Taxi schema where present.
Run it in 30 seconds
No install needed — point uv at the repo:
# explore any repo in your browser (clones it if you pass a git URL):
uvx --from "git+https://github.com/jhammant/codewalker" codewalker web /path/to/repo
# or run the MCP server so Claude Code can use it:
uvx --from "git+https://github.com/jhammant/codewalker" codewalker mcp /path/to/repo
Install
# from the repo (works today):
pip install "git+https://github.com/jhammant/codewalker"
# from PyPI (once published — see .github/workflows/release.yml):
pip install codewalker # or: uv tool install codewalker
# from source (for development):
git clone https://github.com/jhammant/codewalker && cd codewalker
uv venv --python 3.13 .venv && uv pip install -e ".[dev]" && pytest
Requires Python 3.10+. tree-sitter grammars (for deep AST analysis) install
automatically via tree-sitter-language-pack.
Use
# Analyze one repo (caches OUTSIDE the repo, in ~/.cache/codewalker)
codewalker analyze /path/to/repo
codewalker analyze /path/to/repo --reindex # force rebuild
# Estate report / map
codewalker report /path/to/repo
codewalker map /path/to/repo --group-by package
# Taxi schema (global merged "router" view, or a single unit's)
codewalker taxi /path/to/repo
codewalker taxi /path/to/repo <unit_id>
# Agent Pack — a deterministic markdown briefing for a coding agent
codewalker pack /path/to/repo # whole repo
codewalker pack /path/to/repo <unit_id> # one unit
codewalker pack /path/to/repo --out PACK.md
# Run the MCP server over stdio (primary interface)
codewalker mcp /path/to/repo
# Web explorer (live API at http://localhost:8765) — single repo OR a workspace
codewalker web /path/to/repo
codewalker web ~/dev --max-repos 40 # a whole folder of repos
# Bake a shareable offline static bundle
codewalker bake /path/to/repo --out ./bundle
Explore a whole developer's work (a "workspace")
Point CodeWalker at a directory of repos (or a GitHub user) and get a portfolio — every child git repo analyzed, with cross-repo aggregates and a per-repo + portfolio Agent Pack:
codewalker explore ~/dev # local folder of repos (offline)
codewalker explore ~/dev --max-repos 20
codewalker explore ~/dev --pack # portfolio Agent Pack (markdown)
codewalker explore gh:someuser # clone+analyze a user's repos (needs the gh CLI)
In the browser you can also point at a new repo on the fly — the topbar's Open box takes a local path or a git URL (cloned + cached), no restart needed.
Screenshots
A unit's deterministic Explained view (every signal a file:line) and the
Taxi semantic-type graph:
MCP tools
analyze, explore (workspace portfolio), list_units, get_unit, search,
read_file, report, map, taxi_schema, agent_pack, tech_profile — plus
read-only resources unit://<id>, report://estate, taxi://<unit_id>. Tool
descriptions state explicitly that results are deterministic static analysis with
file:line evidence — facts to verify, not prose to trust.
The Agent Pack (agent_pack / codewalker pack) is the deep, deterministic
counterpart to a GitHub-metadata "agent pack": a concise markdown briefing of
what the code actually does (units, what each calls, the Taxi semantic-type web,
frameworks), every claim backed by file:line.
The LLM-analysis layer (run it via Claude Code or any agent)
CodeWalker is the deterministic substrate; the judgment (audits, reviews, dossiers) is an LLM pass on top. Three ways to drive it:
- Claude Code plugin (
plugin/): bundles the MCP server + slash-commands and a code-analysis skill. Install:/plugin marketplace add jhammant/codewalker /plugin install codewalker /cw-audit # audit · /cw-explain <unit> · /cw-taxi-review · /cw-dossier <dir|gh:org>
- RECIPES.md: copy-paste analysis prompts for any MCP-capable agent (BYO).
- Agent SDK (
examples/agent_sdk_audit.py): a headless Python harness that runs CodeWalker + an LLM audit pass — CI/cron friendly.
Each keeps the contract: facts come from CodeWalker (with file:line), the model
adds judgment and verifies before concluding.
Coverage honesty
Languages with a tree-sitter grammar (Python, JS/TS, Go, Java, Rust, Ruby, C#, Kotlin, Scala, Shell, Rego, SQL, HCL) get the deep AST signals; other languages are line-counted and regex-scanned only, and each unit summary says so.
Contributing & license
Contributions welcome — see CONTRIBUTING.md (keep it
deterministic, offline, evidence-backed). New detection patterns are usually just
a data edit in config.py. Licensed MIT (see LICENSE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codewalker-0.1.0.tar.gz.
File metadata
- Download URL: codewalker-0.1.0.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e20860dd5153d96d9615a1c33d6170b23face6d23b3b9485efb403d939165ecc
|
|
| MD5 |
ddf9e575ce25795429df2156644dbdec
|
|
| BLAKE2b-256 |
3f883f6017305c1daa5901d3ea2c5742247c71f1cf0ceef75bcb8f42067088ec
|
Provenance
The following attestation bundles were made for codewalker-0.1.0.tar.gz:
Publisher:
release.yml on jhammant/codewalker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codewalker-0.1.0.tar.gz -
Subject digest:
e20860dd5153d96d9615a1c33d6170b23face6d23b3b9485efb403d939165ecc - Sigstore transparency entry: 1756739132
- Sigstore integration time:
-
Permalink:
jhammant/codewalker@e292b580fa0af921c4af8531436a1ca7044a25ee -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jhammant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e292b580fa0af921c4af8531436a1ca7044a25ee -
Trigger Event:
push
-
Statement type:
File details
Details for the file codewalker-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codewalker-0.1.0-py3-none-any.whl
- Upload date:
- Size: 130.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e49ce51af10d6517f325af19b74ff122364930ba03289107d466ba6da7aa156
|
|
| MD5 |
78ba90257bf11b26dfe0d2173709726c
|
|
| BLAKE2b-256 |
8a25d0199004c1460919e77f3cf923682a32222d4989b1e2f8dddf59d98c35ee
|
Provenance
The following attestation bundles were made for codewalker-0.1.0-py3-none-any.whl:
Publisher:
release.yml on jhammant/codewalker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codewalker-0.1.0-py3-none-any.whl -
Subject digest:
3e49ce51af10d6517f325af19b74ff122364930ba03289107d466ba6da7aa156 - Sigstore transparency entry: 1756739137
- Sigstore integration time:
-
Permalink:
jhammant/codewalker@e292b580fa0af921c4af8531436a1ca7044a25ee -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jhammant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e292b580fa0af921c4af8531436a1ca7044a25ee -
Trigger Event:
push
-
Statement type: