Fast structural repository indexer for wells-coding-harness
Project description
wells-index
Fast structural repository indexer for wells-coding-harness. Uses Tree-sitter for language parsing, SQLite for storage, and BLAKE3 for incremental hashing.
Features
- Multi-language symbol extraction — Python, JavaScript, TypeScript, Go, Rust, Java, C, C++
- Incremental indexing — Only re-parses changed files (via BLAKE3 hashing)
- Compressed storage — SQLite database with LZ4 compression
- Fast queries — O(1) symbol lookups, reference finding, call site discovery
- 98% token reduction — Compared to grep-based code retrieval
- PyO3 bindings — Native Python extension for seamless integration
Building
Prerequisites
- Rust 1.70+
- Python 3.12+
maturin>= 1.7
Setup
-
Clone the repository and navigate to the
wells-indexdirectory:cd Wells-Coding-Harness/wells-index
-
Vendor tree-sitter grammar sources (one-time setup):
The build system expects tree-sitter grammar C sources in
grammars/<language>/src/:# Create grammar directories mkdir -p grammars/{python,javascript,typescript/typescript,go,rust,java,c,cpp}/src # Copy parser.c and scanner.c from official tree-sitter repos # Example for Python: curl https://raw.githubusercontent.com/tree-sitter/tree-sitter-python/master/src/parser.c \ -o grammars/python/src/parser.c
Or use git submodules:
git submodule add https://github.com/tree-sitter/tree-sitter-python.git grammars/python git submodule add https://github.com/tree-sitter/tree-sitter-javascript.git grammars/javascript # ... etc for other languages
-
Build the extension:
maturin develop -
Verify installation:
python -c "from wells_index import IndexEngine; print(IndexEngine.__doc__)"
Usage
Command Line
# Index the current directory
python -c "from wells_index import IndexEngine; e = IndexEngine('.'); e.index(); print(e.stats())"
Python API
from wells_index import IndexEngine
# Create indexer for workspace
engine = IndexEngine("/path/to/repo")
# Build/update index
stats = engine.index()
print(f"Indexed {stats['files_indexed']} files")
# Query the index
symbols = engine.find_symbol("MyClass")
for sym in symbols:
print(f"{sym['file_path']}:{sym['start_line']} - {sym['name']} ({sym['kind']})")
# Find all references to a symbol
refs = engine.find_references("authenticate")
for ref in refs:
print(f"{ref['file_path']}:{ref['start_line']}")
# Find all callers of a function
callers = engine.find_callers("process_request")
for caller in callers:
print(f"{caller['file_path']}:{caller['start_line']} calls process_request")
# Prefix/substring search
results = engine.search_symbols("MyClass", limit=20)
for r in results:
print(r)
# List all symbols in a file
symbols = engine.list_in_file("src/main.py")
for sym in symbols:
print(f" {sym['name']} ({sym['kind']})")
# Get repository stats
stats = engine.stats()
print(f"Total files: {stats['total_files']}")
print(f"Total symbols: {stats['total_symbols']}")
print(f"Total edges: {stats['total_edges']}")
# Clear index
engine.clear()
Index Format
The index is stored in .wells_index/index.db (relative to the workspace root):
- Compressed: LZ4 compression applied on flush
- Incremental: BLAKE3 file hashes skip unchanged files
- Portable: SQLite database readable by standard tools
Grammar Support
| Language | Status | Extension | Tree-sitter Repo |
|---|---|---|---|
| Python | ✓ | .py |
tree-sitter-python |
| JavaScript | ✓ | .js, .mjs, .cjs |
tree-sitter-javascript |
| TypeScript | ✓ | .ts, .tsx |
tree-sitter-typescript |
| Go | ✓ | .go |
tree-sitter-go |
| Rust | ✓ | .rs |
tree-sitter-rust |
| Java | ✓ | .java |
tree-sitter-java |
| C | ✓ | .c, .h |
tree-sitter-c |
| C++ | ✓ | .cpp, .cc, .cxx, .hpp, .hh |
tree-sitter-cpp |
Symbol Kinds
class— Class/struct/interface definitionfunction— Top-level function definitionmethod— Method within a classvariable— Variable/field definitionmodule— Module/file-level scope
Edge Kinds
calls— Function/method callreferences— Symbol reference or usageinherits— Class inheritance or trait implimports— Module import
Performance
On a typical large repository (10k+ files, 1M+ symbols):
- Initial indexing: 5-10 seconds on modern hardware
- Incremental update: <100ms for changed files
- Query latency: <1ms for symbol lookups
- Storage: ~10-20% of source code size
Architecture
- Language detection: File extension-based with shebang fallback
- Parsing: Tree-sitter C library (via Rust bindings)
- Parallelism:
rayonfor multi-core file scanning and parsing - Storage:
rusqlitewith integer-mapped symbol names - Hashing: BLAKE3 for incremental change detection
- Compression: LZ4 on database at rest
Known Limitations
- Parser extraction is stubbed (TODO: tree-sitter queries per language)
- No semantic analysis (only syntax-based extraction)
- Symbol deduplication not yet implemented
- Cross-file call resolution is name-based (not type-aware)
Development
Run tests:
maturin develop
pytest tests/ -v
Build in release mode:
maturin build --release
License
MIT OR Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wells_index-0.1.0-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: wells_index-0.1.0-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b760f88be5f50f7639a40676e4a13e892584a765c792d5a54eb1fdb82c36d304
|
|
| MD5 |
34596a94c95ac31f8d3c929c7ac9a411
|
|
| BLAKE2b-256 |
9f6c9f710820e40644a07e76505e9108d0dada2ca81f417c98f9abdb097529ca
|
File details
Details for the file wells_index-0.1.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: wells_index-0.1.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c67f8561fb95c94dc61a3c8b7262154eb570deb45700490c8af40151ec58962
|
|
| MD5 |
3a371335dd004ba3fc84e88313a610cc
|
|
| BLAKE2b-256 |
83592a9c9229402aa102d703f96be99da4ff0f78bfda41bebd92cb3d16fb97b9
|
File details
Details for the file wells_index-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: wells_index-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16de28fca10efa0da7be0a49bc8c941bb28b9969d86fbe822cebd8e6d62100b6
|
|
| MD5 |
851dd44366e5fdecc56b132c14e5123d
|
|
| BLAKE2b-256 |
76b72edd088fcc9f4c1602d4105fcd8e15401b28ec4023bafdcc1b08569d2bfe
|
File details
Details for the file wells_index-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: wells_index-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2dc3498db3a3914c0ca983afac412133ac5b34556926ff6a1422746e782e9d5
|
|
| MD5 |
3def54e72b94abf7af94aa55003d5e76
|
|
| BLAKE2b-256 |
a8348b64030b7779d979168d26a89cbfee649b6ef137de01f3ea329591de4f5c
|