Code evidence retrieval and grounded review for documentation workflows. AST chunking, hybrid search (BM25 + vector), and API surface extraction.
Project description
code-finder
AST-based code indexing and hybrid search (BM25 + vector) for retrieving code evidence from repositories. Built to answer natural-language questions about a codebase with ranked, source-grounded results.
Import name: The package installs as
code-finderbut the Python import isclaude_context, notcode_finder.
Install
pip install code-finder
Or run ephemerally without installing:
uv run --with code-finder code-finder-evidence --repo /path/to/repo --query "how does auth work?"
What it does
code-finder parses source code into AST-aware chunks, embeds them with a local sentence-transformer model, and stores them in a Milvus Lite vector database. At query time it combines BM25 keyword search with vector similarity search (reciprocal rank fusion) to return the most relevant code snippets for a natural-language question.
Three capabilities are exposed as both CLI commands and Python functions:
| Capability | CLI command | What it returns |
|---|---|---|
| Code evidence retrieval | code-finder-evidence |
Ranked code snippets matching a query |
| Code-grounded review | code-finder-review |
Per-claim verdicts for a draft document |
| API surface extraction | code-finder-api-surface |
Public classes, functions, and signatures |
CLI usage
Code evidence retrieval
Search a repo with a natural-language question:
code-finder-evidence \
--repo /path/to/repo \
--query "how does authentication work?" \
--limit 5
Filter by chunk type or file path:
code-finder-evidence \
--repo /path/to/repo \
--query "error handling" \
--filter-types function,method \
--filter-paths src/auth,src/config
Force a re-index after code changes:
code-finder-evidence --repo /path/to/repo --query "config loading" --reindex
Code-grounded review
Validate a draft document's factual claims against the source code:
code-finder-review \
--repo /path/to/repo \
--draft docs/getting-started.md
Each claim gets a verdict: supported, partially_supported, unsupported, or no_evidence_found.
API surface extraction
Extract the public API from source files. This is deterministic (no LLM, no indexing):
code-finder-api-surface --target src/mypackage/
# Single file
code-finder-api-surface --target src/mypackage/client.py
# Include private members
code-finder-api-surface --target src/mypackage/ --include-private
Python API
from claude_context.skills.evidence_retrieval import retrieve_evidence
results = retrieve_evidence(
repo_path="/path/to/repo",
query="how does hybrid search combine BM25 and vector results?",
limit=10,
filter_types=["function", "method"],
filter_paths=["src/auth", "src/config"],
)
for r in results:
print(f"{r['file_path']}:{r['start_line']} ({r['combined_score']:.3f})")
print(f" {r['signature']}")
from claude_context.skills.grounded_review import grounded_review
report = grounded_review(
repo_path="/path/to/repo",
draft_path="docs/getting-started.md",
max_evidence_per_claim=5,
)
from claude_context.skills.api_surface import extract_api_surface
surface = extract_api_surface(
target_path="src/mypackage/",
languages=["python"],
include_private=False,
include_docstrings=True,
)
Index caching
On first run, code-finder builds an index of the repository (AST chunking + embeddings). This takes 1-3 minutes depending on repo size. The index is cached at:
{repo}/.vibe2doc/index.db
Subsequent runs reuse the cached index. Pass --reindex (CLI) or reindex=True (Python) after significant code changes. API surface extraction does not use the index.
Filtering
Path filtering
Restrict results to specific directories using --filter-paths (CLI) or filter_paths (Python). Paths are relative to the repo root:
code-finder-evidence --repo /path/to/repo --query "auth" --filter-paths src/auth,src/middleware
Type filtering
Restrict to specific chunk types: function, method, class, module, import, decorator.
Language filtering
Restrict to specific languages: python, javascript, typescript, go, and others.
Supported languages
Python, JavaScript, TypeScript, Go (AST-parsed via tree-sitter). Additionally indexes Markdown, JSON, YAML, TOML, HTML, CSS, shell scripts, SQL, and other text formats.
Used by
redhat-docs-agent-tools uses code-finder as the backend for its code-evidence, grounded-review, and api-surface skills. If you're using those skills, code-finder is installed automatically as a dependency.
Origin
code-finder was built from a fork of claude-context by Zilliz, which provides Milvus-backed code search for Claude. It was extended within vibe2doc with enhanced AST chunking, path filtering, grounded review, and API surface extraction, then extracted as a standalone package. The vibe2doc README describes the full doc generation workflow; this package provides only the code analysis and search layer.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file code_finder-0.1.1.tar.gz.
File metadata
- Download URL: code_finder-0.1.1.tar.gz
- Upload date:
- Size: 145.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5887671d01edbc5e55a5f15c0e2e6d6f96e6d3ab052ad9a2f594ed9c109ebef6
|
|
| MD5 |
a1d04dc92aa425f2385612ab1c9d098c
|
|
| BLAKE2b-256 |
13bf442f87948b593b8766122ef8f40943e5fe64a1393eabb6635496d18fcb66
|
File details
Details for the file code_finder-0.1.1-py3-none-any.whl.
File metadata
- Download URL: code_finder-0.1.1-py3-none-any.whl
- Upload date:
- Size: 146.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b31c76e2b19bab4a3a9f397a689a0bacab82a4d3dadff4972293527566aeee44
|
|
| MD5 |
a850c248de66d0a209a6e5932473e612
|
|
| BLAKE2b-256 |
d885bdb6b3bb9fa171edc588402db32c123a947d857c840bee761c277d93e479
|