Skip to main content

Fast structural repository indexer for wells-coding-harness

Project description

wells-index

Fast structural repository indexer for wells-coding-harness. Uses Tree-sitter for language parsing, SQLite for storage, and BLAKE3 for incremental hashing.

Features

  • Multi-language symbol extraction — Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
  • Incremental indexing — Only re-parses changed files (via BLAKE3 hashing)
  • Compressed storage — SQLite database with LZ4 compression
  • Fast queries — O(1) symbol lookups, reference finding, call site discovery
  • 98% token reduction — Compared to grep-based code retrieval
  • PyO3 bindings — Native Python extension for seamless integration

Building

Prerequisites

  • Rust 1.70+
  • Python 3.12+
  • maturin >= 1.7

Setup

  1. Clone the repository and navigate to the wells-index directory:

    cd Wells-Coding-Harness/wells-index
    
  2. Vendor tree-sitter grammar sources (one-time setup):

    The build system expects tree-sitter grammar C sources in grammars/<language>/src/:

    # Create grammar directories
    mkdir -p grammars/{python,javascript,typescript/typescript,go,rust,java,c,cpp}/src
    
    # Copy parser.c and scanner.c from official tree-sitter repos
    # Example for Python:
    curl https://raw.githubusercontent.com/tree-sitter/tree-sitter-python/master/src/parser.c \
        -o grammars/python/src/parser.c
    

    Or use git submodules:

    git submodule add https://github.com/tree-sitter/tree-sitter-python.git grammars/python
    git submodule add https://github.com/tree-sitter/tree-sitter-javascript.git grammars/javascript
    # ... etc for other languages
    
  3. Build the extension:

    maturin develop
    
  4. Verify installation:

    python -c "from wells_index import IndexEngine; print(IndexEngine.__doc__)"
    

Usage

Command Line

# Index the current directory
python -c "from wells_index import IndexEngine; e = IndexEngine('.'); e.index(); print(e.stats())"

Python API

from wells_index import IndexEngine

# Create indexer for workspace
engine = IndexEngine("/path/to/repo")

# Build/update index
stats = engine.index()
print(f"Indexed {stats['files_indexed']} files")

# Query the index
symbols = engine.find_symbol("MyClass")
for sym in symbols:
    print(f"{sym['file_path']}:{sym['start_line']} - {sym['name']} ({sym['kind']})")

# Find all references to a symbol
refs = engine.find_references("authenticate")
for ref in refs:
    print(f"{ref['file_path']}:{ref['start_line']}")

# Find all callers of a function
callers = engine.find_callers("process_request")
for caller in callers:
    print(f"{caller['file_path']}:{caller['start_line']} calls process_request")

# Prefix/substring search
results = engine.search_symbols("MyClass", limit=20)
for r in results:
    print(r)

# List all symbols in a file
symbols = engine.list_in_file("src/main.py")
for sym in symbols:
    print(f"  {sym['name']} ({sym['kind']})")

# Get repository stats
stats = engine.stats()
print(f"Total files: {stats['total_files']}")
print(f"Total symbols: {stats['total_symbols']}")
print(f"Total edges: {stats['total_edges']}")

# Clear index
engine.clear()

Index Format

The index is stored in .wells_index/index.db (relative to the workspace root):

  • Compressed: LZ4 compression applied on flush
  • Incremental: BLAKE3 file hashes skip unchanged files
  • Portable: SQLite database readable by standard tools

Grammar Support

Language Status Extension Tree-sitter Repo
Python .py tree-sitter-python
JavaScript .js, .mjs, .cjs tree-sitter-javascript
TypeScript .ts, .tsx tree-sitter-typescript
Go .go tree-sitter-go
Rust .rs tree-sitter-rust
Java .java tree-sitter-java
C .c, .h tree-sitter-c
C++ .cpp, .cc, .cxx, .hpp, .hh tree-sitter-cpp

Symbol Kinds

  • class — Class/struct/interface definition
  • function — Top-level function definition
  • method — Method within a class
  • variable — Variable/field definition
  • module — Module/file-level scope

Edge Kinds

  • calls — Function/method call
  • references — Symbol reference or usage
  • inherits — Class inheritance or trait impl
  • imports — Module import

Performance

On a typical large repository (10k+ files, 1M+ symbols):

  • Initial indexing: 5-10 seconds on modern hardware
  • Incremental update: <100ms for changed files
  • Query latency: <1ms for symbol lookups
  • Storage: ~10-20% of source code size

Architecture

  • Language detection: File extension-based with shebang fallback
  • Parsing: Tree-sitter C library (via Rust bindings)
  • Parallelism: rayon for multi-core file scanning and parsing
  • Storage: rusqlite with integer-mapped symbol names
  • Hashing: BLAKE3 for incremental change detection
  • Compression: LZ4 on database at rest

Known Limitations

  • Parser extraction is stubbed (TODO: tree-sitter queries per language)
  • No semantic analysis (only syntax-based extraction)
  • Symbol deduplication not yet implemented
  • Cross-file call resolution is name-based (not type-aware)

Development

Run tests:

maturin develop
pytest tests/ -v

Build in release mode:

maturin build --release

License

MIT OR Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

wells_index-0.1.0-cp313-cp313-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.13Windows x86-64

wells_index-0.1.0-cp312-cp312-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.12Windows x86-64

wells_index-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

wells_index-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file wells_index-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b760f88be5f50f7639a40676e4a13e892584a765c792d5a54eb1fdb82c36d304
MD5 34596a94c95ac31f8d3c929c7ac9a411
BLAKE2b-256 9f6c9f710820e40644a07e76505e9108d0dada2ca81f417c98f9abdb097529ca

See more details on using hashes here.

File details

Details for the file wells_index-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7c67f8561fb95c94dc61a3c8b7262154eb570deb45700490c8af40151ec58962
MD5 3a371335dd004ba3fc84e88313a610cc
BLAKE2b-256 83592a9c9229402aa102d703f96be99da4ff0f78bfda41bebd92cb3d16fb97b9

See more details on using hashes here.

File details

Details for the file wells_index-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 16de28fca10efa0da7be0a49bc8c941bb28b9969d86fbe822cebd8e6d62100b6
MD5 851dd44366e5fdecc56b132c14e5123d
BLAKE2b-256 76b72edd088fcc9f4c1602d4105fcd8e15401b28ec4023bafdcc1b08569d2bfe

See more details on using hashes here.

File details

Details for the file wells_index-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a2dc3498db3a3914c0ca983afac412133ac5b34556926ff6a1422746e782e9d5
MD5 3def54e72b94abf7af94aa55003d5e76
BLAKE2b-256 a8348b64030b7779d979168d26a89cbfee649b6ef137de01f3ea329591de4f5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page