Skip to main content

Code evidence retrieval and grounded review for documentation workflows. AST chunking, hybrid search (BM25 + vector), and API surface extraction.

Project description

code-finder

AST-based code indexing and hybrid search (BM25 + vector) for retrieving code evidence from repositories. Built to answer natural-language questions about a codebase with ranked, source-grounded results.

Import name: The package installs as code-finder but the Python import is claude_context, not code_finder.

Install

pip install code-finder

Or run ephemerally without installing:

uv run --with code-finder code-finder-evidence --repo /path/to/repo --query "how does auth work?"

What it does

code-finder parses source code into AST-aware chunks, embeds them with a local sentence-transformer model, and stores them in a Milvus Lite vector database. At query time it combines BM25 keyword search with vector similarity search (reciprocal rank fusion) to return the most relevant code snippets for a natural-language question.

Three capabilities are exposed as both CLI commands and Python functions:

Capability CLI command What it returns
Code evidence retrieval code-finder-evidence Ranked code snippets matching a query
Code-grounded review code-finder-review Per-claim verdicts for a draft document
API surface extraction code-finder-api-surface Public classes, functions, and signatures

CLI usage

Code evidence retrieval

Search a repo with a natural-language question:

code-finder-evidence \
  --repo /path/to/repo \
  --query "how does authentication work?" \
  --limit 5

Filter by chunk type or file path:

code-finder-evidence \
  --repo /path/to/repo \
  --query "error handling" \
  --filter-types function,method \
  --filter-paths src/auth,src/config

Force a re-index after code changes:

code-finder-evidence --repo /path/to/repo --query "config loading" --reindex

Code-grounded review

Validate a draft document's factual claims against the source code:

code-finder-review \
  --repo /path/to/repo \
  --draft docs/getting-started.md

Each claim gets a verdict: supported, partially_supported, unsupported, or no_evidence_found.

API surface extraction

Extract the public API from source files. This is deterministic (no LLM, no indexing):

code-finder-api-surface --target src/mypackage/

# Single file
code-finder-api-surface --target src/mypackage/client.py

# Include private members
code-finder-api-surface --target src/mypackage/ --include-private

Python API

from claude_context.skills.evidence_retrieval import retrieve_evidence

results = retrieve_evidence(
    repo_path="/path/to/repo",
    query="how does hybrid search combine BM25 and vector results?",
    limit=10,
    filter_types=["function", "method"],
    filter_paths=["src/auth", "src/config"],
)

for r in results:
    print(f"{r['file_path']}:{r['start_line']} ({r['combined_score']:.3f})")
    print(f"  {r['signature']}")
from claude_context.skills.grounded_review import grounded_review

report = grounded_review(
    repo_path="/path/to/repo",
    draft_path="docs/getting-started.md",
    max_evidence_per_claim=5,
)
from claude_context.skills.api_surface import extract_api_surface

surface = extract_api_surface(
    target_path="src/mypackage/",
    languages=["python"],
    include_private=False,
    include_docstrings=True,
)

Index caching

On first run, code-finder builds an index of the repository (AST chunking + embeddings). This takes 1-3 minutes depending on repo size. The index is cached at:

{repo}/.vibe2doc/index.db

Subsequent runs reuse the cached index. Pass --reindex (CLI) or reindex=True (Python) after significant code changes. API surface extraction does not use the index.

Filtering

Path filtering

Restrict results to specific directories using --filter-paths (CLI) or filter_paths (Python). Paths are relative to the repo root:

code-finder-evidence --repo /path/to/repo --query "auth" --filter-paths src/auth,src/middleware

Type filtering

Restrict to specific chunk types: function, method, class, module, import, decorator.

Language filtering

Restrict to specific languages: python, javascript, typescript, go, and others.

Supported languages

Python, JavaScript, TypeScript, Go (AST-parsed via tree-sitter). Additionally indexes Markdown, JSON, YAML, TOML, HTML, CSS, shell scripts, SQL, and other text formats.

Used by

redhat-docs-agent-tools uses code-finder as the backend for its code-evidence, grounded-review, and api-surface skills. If you're using those skills, code-finder is installed automatically as a dependency.

Origin

code-finder was built from a fork of claude-context by Zilliz, which provides Milvus-backed code search for Claude. It was extended within vibe2doc with enhanced AST chunking, path filtering, grounded review, and API surface extraction, then extracted as a standalone package. The vibe2doc README describes the full doc generation workflow; this package provides only the code analysis and search layer.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_finder-0.1.1.tar.gz (145.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_finder-0.1.1-py3-none-any.whl (146.2 kB view details)

Uploaded Python 3

File details

Details for the file code_finder-0.1.1.tar.gz.

File metadata

  • Download URL: code_finder-0.1.1.tar.gz
  • Upload date:
  • Size: 145.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for code_finder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5887671d01edbc5e55a5f15c0e2e6d6f96e6d3ab052ad9a2f594ed9c109ebef6
MD5 a1d04dc92aa425f2385612ab1c9d098c
BLAKE2b-256 13bf442f87948b593b8766122ef8f40943e5fe64a1393eabb6635496d18fcb66

See more details on using hashes here.

File details

Details for the file code_finder-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: code_finder-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 146.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for code_finder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b31c76e2b19bab4a3a9f397a689a0bacab82a4d3dadff4972293527566aeee44
MD5 a850c248de66d0a209a6e5932473e612
BLAKE2b-256 d885bdb6b3bb9fa171edc588402db32c123a947d857c840bee761c277d93e479

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page