Skip to main content

Fast structural repository indexer for wells-coding-harness

Project description

wells-index

Fast structural repository indexer for wells-coding-harness. Uses Tree-sitter for language parsing, SQLite for storage, and BLAKE3 for incremental hashing.

Features

  • Multi-language symbol extraction — Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
  • Incremental indexing — Only re-parses changed files (via BLAKE3 hashing)
  • Compressed storage — SQLite database with LZ4 compression
  • Fast queries — O(1) symbol lookups, reference finding, call site discovery
  • 98% token reduction — Compared to grep-based code retrieval
  • PyO3 bindings — Native Python extension for seamless integration

Building

Prerequisites

  • Rust 1.70+
  • Python 3.12+
  • maturin >= 1.7

Setup

  1. Clone the repository and navigate to the wells-index directory:

    cd Wells-Coding-Harness/wells-index
    
  2. Vendor tree-sitter grammar sources (one-time setup):

    The build system expects tree-sitter grammar C sources in grammars/<language>/src/:

    # Create grammar directories
    mkdir -p grammars/{python,javascript,typescript/typescript,go,rust,java,c,cpp}/src
    
    # Copy parser.c and scanner.c from official tree-sitter repos
    # Example for Python:
    curl https://raw.githubusercontent.com/tree-sitter/tree-sitter-python/master/src/parser.c \
        -o grammars/python/src/parser.c
    

    Or use git submodules:

    git submodule add https://github.com/tree-sitter/tree-sitter-python.git grammars/python
    git submodule add https://github.com/tree-sitter/tree-sitter-javascript.git grammars/javascript
    # ... etc for other languages
    
  3. Build the extension:

    maturin develop
    
  4. Verify installation:

    python -c "from wells_index import IndexEngine; print(IndexEngine.__doc__)"
    

Usage

Command Line

# Index the current directory
python -c "from wells_index import IndexEngine; e = IndexEngine('.'); e.index(); print(e.stats())"

Python API

from wells_index import IndexEngine

# Create indexer for workspace
engine = IndexEngine("/path/to/repo")

# Build/update index
stats = engine.index()
print(f"Indexed {stats['files_indexed']} files")

# Query the index
symbols = engine.find_symbol("MyClass")
for sym in symbols:
    print(f"{sym['file_path']}:{sym['start_line']} - {sym['name']} ({sym['kind']})")

# Find all references to a symbol
refs = engine.find_references("authenticate")
for ref in refs:
    print(f"{ref['file_path']}:{ref['start_line']}")

# Find all callers of a function
callers = engine.find_callers("process_request")
for caller in callers:
    print(f"{caller['file_path']}:{caller['start_line']} calls process_request")

# Prefix/substring search
results = engine.search_symbols("MyClass", limit=20)
for r in results:
    print(r)

# List all symbols in a file
symbols = engine.list_in_file("src/main.py")
for sym in symbols:
    print(f"  {sym['name']} ({sym['kind']})")

# Get repository stats
stats = engine.stats()
print(f"Total files: {stats['total_files']}")
print(f"Total symbols: {stats['total_symbols']}")
print(f"Total edges: {stats['total_edges']}")

# Clear index
engine.clear()

Index Format

The index is stored in .wells_index/index.db (relative to the workspace root):

  • Compressed: LZ4 compression applied on flush
  • Incremental: BLAKE3 file hashes skip unchanged files
  • Portable: SQLite database readable by standard tools

Grammar Support

Language Status Extension Tree-sitter Repo
Python .py tree-sitter-python
JavaScript .js, .mjs, .cjs tree-sitter-javascript
TypeScript .ts, .tsx tree-sitter-typescript
Go .go tree-sitter-go
Rust .rs tree-sitter-rust
Java .java tree-sitter-java
C .c, .h tree-sitter-c
C++ .cpp, .cc, .cxx, .hpp, .hh tree-sitter-cpp

Symbol Kinds

  • class — Class/struct/interface definition
  • function — Top-level function definition
  • method — Method within a class
  • variable — Variable/field definition
  • module — Module/file-level scope

Edge Kinds

  • calls — Function/method call
  • references — Symbol reference or usage
  • inherits — Class inheritance or trait impl
  • imports — Module import

Performance

On a typical large repository (10k+ files, 1M+ symbols):

  • Initial indexing: 5-10 seconds on modern hardware
  • Incremental update: <100ms for changed files
  • Query latency: <1ms for symbol lookups
  • Storage: ~10-20% of source code size

Architecture

  • Language detection: File extension-based with shebang fallback
  • Parsing: Tree-sitter C library (via Rust bindings)
  • Parallelism: rayon for multi-core file scanning and parsing
  • Storage: rusqlite with integer-mapped symbol names
  • Hashing: BLAKE3 for incremental change detection
  • Compression: LZ4 on database at rest

Known Limitations

  • Parser extraction is stubbed (TODO: tree-sitter queries per language)
  • No semantic analysis (only syntax-based extraction)
  • Symbol deduplication not yet implemented
  • Cross-file call resolution is name-based (not type-aware)

Development

Run tests:

maturin develop
pytest tests/ -v

Build in release mode:

maturin build --release

License

MIT OR Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

wells_index-0.1.1-cp313-cp313-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.13Windows x86-64

wells_index-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

wells_index-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

wells_index-0.1.1-cp312-cp312-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.12Windows x86-64

wells_index-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

wells_index-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file wells_index-0.1.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 df47d2c0d82b88d56bb23abfc415c08e57faebaec4410d6413e5f98a21ccc5b1
MD5 ee8e4d3e23e3f44a13fb5f3aa6ba1372
BLAKE2b-256 014b1b9e2cb33c9680eaa7240975946276c2b9d26ee515d7332c1003caa941fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for wells_index-0.1.1-cp313-cp313-win_amd64.whl:

Publisher: release-index.yml on corbybender/Wells-Coding-Harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wells_index-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cf31ce6dc2b4a1ab92b84c28cb5efac6f594516bf0c22bf89c0e2077668faf6a
MD5 6e78569d03f5a1736b28cf7d3cc9cbe4
BLAKE2b-256 0a5134474a47e48c71c7057c25259842aaf96b44a4b9478f1aca625e3a080822

See more details on using hashes here.

Provenance

The following attestation bundles were made for wells_index-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release-index.yml on corbybender/Wells-Coding-Harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wells_index-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b9117aef90a5d2b7b5699140cb5c75aea780e90039a09f3ea8ceabe987220201
MD5 3ac3c1fa72278a1768e55d87a15c7d90
BLAKE2b-256 7dee639cb7d5ce710dd145115627a51f20477b2199d91c31d5f503a0f8a89851

See more details on using hashes here.

Provenance

The following attestation bundles were made for wells_index-0.1.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release-index.yml on corbybender/Wells-Coding-Harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wells_index-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ca413b57eb4da28aa7fa54772a871c472ac6a602bb6bb49e9a5186923b5cecc8
MD5 de852ead4632226947a338de63129c99
BLAKE2b-256 7ebf6f4fccf4a7003cb293c69d60f459e01b460d24b15e9fc8b3b503bf9b7b14

See more details on using hashes here.

Provenance

The following attestation bundles were made for wells_index-0.1.1-cp312-cp312-win_amd64.whl:

Publisher: release-index.yml on corbybender/Wells-Coding-Harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wells_index-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d8a1acf39aa92537af01f22f36205ffd6d8ea4315aebd9038ed2e54dc873acc1
MD5 a5a2973cd0b8d935d6d1b1dd133df3c4
BLAKE2b-256 e24d69d5de13c28ebf2944a02cde44105fab38f0e5a5a9ca4233a5fef4efc5ee

See more details on using hashes here.

Provenance

The following attestation bundles were made for wells_index-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release-index.yml on corbybender/Wells-Coding-Harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wells_index-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for wells_index-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f067d8b0bd573ded31b9b18aab097361ab165ce67e40ed106993cd52fcc90bd2
MD5 ad52c9f0907cf885633972ac2128a01d
BLAKE2b-256 0b0cf096a817542a42fddfc34e7bc1b18669893bb7ca873c2839f14fa83db43b

See more details on using hashes here.

Provenance

The following attestation bundles were made for wells_index-0.1.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release-index.yml on corbybender/Wells-Coding-Harness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page