Skip to main content

Deterministic, offline static-analysis tool that carves any git repo into units and extracts per-unit factual signals (LLM call sites, regex parse sites, frameworks, integrations, Taxi schemas) with file:line evidence. Exposes analysis as an MCP server.

Project description

CodeWalker

Deterministic, offline code exploration — ground truth for AI agents. CodeWalker carves any git repo into navigable units and extracts factual signals about what the code calls, imports and integrates — LLM call sites (provider + model), frameworks, integrations, prompts, git recency, Taxi schemas — each with file:line evidence, no LLM, no code execution.

It's the deterministic half of a two-step idea:

① CodeWalker explores the repo → facts, every one a file:line. ② your LLM reads those facts and maps the codebase → grounded, cited, not guessing.

It exposes its analysis as an MCP server (so Claude Code and any agent use it as ground truth) plus a local web explorer.

CodeWalker — explore the code, ground the LLM

Philosophy

  • Signals are ground truth, derived from AST / regex / filesystem / git — never an LLM guess. Classification is a navigational heuristic; the evidence is the truth.
  • Every signal carries evidence: a file:line you can jump to. No claim without a location.
  • Cheap deterministic substrate: any LLM "judgment" runs on top of this output, not inside it.
  • Generalize: nothing is hardcoded to a repo, company or domain — every target is configuration.
  • First-class Taxi: .taxi schemas are parsed into a semantic-type usage graph (which models/services consume which semantic types).

What it extracts (per unit)

Size & languages, models referenced (gpt-*, claude-*, …), LLM call sites (provider + model + via + file:line), regex parse sites, frameworks, integrations, an AST-derived call graph (imports / subprocess / http / aws / db / tools / a2a / mcp), prompts + a request table, output formats, key deps, git recency, a heuristic classification + status, a deterministic plain-English summary, an architecture graph, and a full Taxi schema where present.

Run it in 30 seconds

No install needed — point uv at the repo:

# explore any repo in your browser (clones it if you pass a git URL):
uvx --from "git+https://github.com/jhammant/codewalker" codewalker web /path/to/repo

# or run the MCP server so Claude Code can use it:
uvx --from "git+https://github.com/jhammant/codewalker" codewalker mcp /path/to/repo

Install

# from the repo (works today):
pip install "git+https://github.com/jhammant/codewalker"

# from PyPI (once published — see .github/workflows/release.yml):
pip install codewalker          # or:  uv tool install codewalker

# from source (for development):
git clone https://github.com/jhammant/codewalker && cd codewalker
uv venv --python 3.13 .venv && uv pip install -e ".[dev]" && pytest

Requires Python 3.10+. tree-sitter grammars (for deep AST analysis) install automatically via tree-sitter-language-pack.

Use

# Analyze one repo (caches OUTSIDE the repo, in ~/.cache/codewalker)
codewalker analyze /path/to/repo
codewalker analyze /path/to/repo --reindex     # force rebuild

# Estate report / map
codewalker report /path/to/repo
codewalker map /path/to/repo --group-by package

# Taxi schema (global merged "router" view, or a single unit's)
codewalker taxi /path/to/repo
codewalker taxi /path/to/repo <unit_id>

# Agent Pack — a deterministic markdown briefing for a coding agent
codewalker pack /path/to/repo                  # whole repo
codewalker pack /path/to/repo <unit_id>        # one unit
codewalker pack /path/to/repo --out PACK.md

# Run the MCP server over stdio (primary interface)
codewalker mcp /path/to/repo

# Web explorer (live API at http://localhost:8765) — single repo OR a workspace
codewalker web /path/to/repo
codewalker web ~/dev --max-repos 40            # a whole folder of repos

# Bake a shareable offline static bundle
codewalker bake /path/to/repo --out ./bundle

Explore a whole developer's work (a "workspace")

Point CodeWalker at a directory of repos (or a GitHub user) and get a portfolio — every child git repo analyzed, with cross-repo aggregates and a per-repo + portfolio Agent Pack:

codewalker explore ~/dev                       # local folder of repos (offline)
codewalker explore ~/dev --max-repos 20
codewalker explore ~/dev --pack                # portfolio Agent Pack (markdown)
codewalker explore gh:someuser                 # clone+analyze a user's repos (needs the gh CLI)

In the browser you can also point at a new repo on the fly — the topbar's Open box takes a local path or a git URL (cloned + cached), no restart needed.

Screenshots

A unit's deterministic Explained view (every signal a file:line) and the Taxi semantic-type graph:

CodeWalker web explorer — a unit explained, every signal a file:line CodeWalker web explorer — the Taxi semantic-type usage graph

MCP tools

analyze, explore (workspace portfolio), list_units, get_unit, search, read_file, report, map, taxi_schema, agent_pack, tech_profile — plus read-only resources unit://<id>, report://estate, taxi://<unit_id>. Tool descriptions state explicitly that results are deterministic static analysis with file:line evidence — facts to verify, not prose to trust.

The Agent Pack (agent_pack / codewalker pack) is the deep, deterministic counterpart to a GitHub-metadata "agent pack": a concise markdown briefing of what the code actually does (units, what each calls, the Taxi semantic-type web, frameworks), every claim backed by file:line.

The LLM-analysis layer (run it via Claude Code or any agent)

CodeWalker is the deterministic substrate; the judgment (audits, reviews, dossiers) is an LLM pass on top. Three ways to drive it:

  • Claude Code plugin (plugin/): bundles the MCP server + slash-commands and a code-analysis skill. Install:
    /plugin marketplace add jhammant/codewalker
    /plugin install codewalker
    /cw-audit          # audit · /cw-explain <unit> · /cw-taxi-review · /cw-dossier <dir|gh:org>
    
  • RECIPES.md: copy-paste analysis prompts for any MCP-capable agent (BYO).
  • Agent SDK (examples/agent_sdk_audit.py): a headless Python harness that runs CodeWalker + an LLM audit pass — CI/cron friendly.

Each keeps the contract: facts come from CodeWalker (with file:line), the model adds judgment and verifies before concluding.

Coverage honesty

Languages with a tree-sitter grammar (Python, JS/TS, Go, Java, Rust, Ruby, C#, Kotlin, Scala, Shell, Rego, SQL, HCL) get the deep AST signals; other languages are line-counted and regex-scanned only, and each unit summary says so.

Contributing & license

Contributions welcome — see CONTRIBUTING.md (keep it deterministic, offline, evidence-backed). New detection patterns are usually just a data edit in config.py. Licensed MIT (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codewalker-0.1.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codewalker-0.1.0-py3-none-any.whl (130.0 kB view details)

Uploaded Python 3

File details

Details for the file codewalker-0.1.0.tar.gz.

File metadata

  • Download URL: codewalker-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codewalker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e20860dd5153d96d9615a1c33d6170b23face6d23b3b9485efb403d939165ecc
MD5 ddf9e575ce25795429df2156644dbdec
BLAKE2b-256 3f883f6017305c1daa5901d3ea2c5742247c71f1cf0ceef75bcb8f42067088ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for codewalker-0.1.0.tar.gz:

Publisher: release.yml on jhammant/codewalker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codewalker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codewalker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 130.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codewalker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e49ce51af10d6517f325af19b74ff122364930ba03289107d466ba6da7aa156
MD5 78ba90257bf11b26dfe0d2173709726c
BLAKE2b-256 8a25d0199004c1460919e77f3cf923682a32222d4989b1e2f8dddf59d98c35ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for codewalker-0.1.0-py3-none-any.whl:

Publisher: release.yml on jhammant/codewalker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page