Local-first codebase intelligence with semantic search, multi-hop research, and 12-language AST support
Project description
Sia Code
Local-first codebase search with semantic understanding and multi-hop code discovery.
Benchmark Results
89.9% Recall@5 on RepoEval benchmark (1,600 queries, 8 repositories)
- +12.9 percentage points better than cAST (77.0%)
- Lexical-only search outperforms hybrid (BM25 > BM25+embeddings)
- Publication-quality results with ±1.5% confidence interval
See docs/BENCHMARK_RESULTS.md for full analysis.
Features
- 89.9% Recall@5 - State-of-the-art code search performance on RepoEval benchmark
- Lexical-First Search - BM25 + FTS5 optimized for code queries (outperforms semantic-only)
- Multi-Hop Research - Automatically discover code relationships and call graphs
- AST-Aware Chunking - Tree-sitter preserves function/class boundaries
- Project Auto-Detection - Automatic language detection and indexing strategy
- Tiered Search - Filter by project code, dependencies, or both
- 12 Languages - Python, JS/TS, Go, Rust, Java, C/C++, C#, Ruby, PHP (full AST support)
- Watch Mode - Auto-reindex on file changes with incremental updates
- Portable Index - Usearch HNSW + SQLite FTS5 in
.sia-code/directory
Installation
# From PyPI (recommended)
pip install sia-code
# Or with uv
uv tool install sia-code
# Or from source
uv tool install git+https://github.com/DxTa/sia-code.git
# Try without installing (ephemeral run)
uvx sia-code --version
uvx sia-code search "authentication logic"
# Verify installation
sia-code --version
Quick Start
# Initialize and index
sia-code init
sia-code index .
# Search
sia-code search "authentication logic" # Hybrid search (default: BM25 + semantic)
sia-code search --regex "def.*login" # Lexical-only search (BM25)
sia-code search --semantic-only "handle errors" # Semantic-only search
# Multi-hop research (discover relationships)
sia-code research "how does the API handle errors?"
# Check index health
sia-code status
Commands
| Command | Description |
|---|---|
sia-code init |
Initialize index in current directory |
sia-code index . |
Index codebase |
sia-code index --update |
Re-index changed files only |
sia-code index --watch |
Auto-reindex on file changes |
sia-code search "query" |
Hybrid search (BM25 + semantic) |
sia-code search --regex "pattern" |
Lexical-only search |
sia-code search --semantic-only "query" |
Semantic-only search |
sia-code research "question" |
Multi-hop code discovery |
sia-code status |
Index health and staleness metrics |
sia-code compact |
Remove stale chunks |
sia-code memory list |
List timeline/changelogs/decisions |
sia-code memory changelog |
Generate changelog from git |
sia-code memory sync-git |
Import events from git history |
sia-code config show |
Display configuration |
sia-code interactive |
Live search mode |
See docs/CLI_FEATURES.md for complete command reference with all options and examples.
Configuration
Recommended: Lexical-only search (best performance, no API key needed)
sia-code init
sia-code index .
# Search uses BM25 by default (89.9% Recall@5)
Optional: Hybrid search (adds semantic embeddings):
export OPENAI_API_KEY=sk-your-key-here
sia-code config set embedding.enabled true
sia-code config set search.vector_weight 0.0 # 0.0 = lexical-only (recommended!)
sia-code index --clean
Edit config at .sia-code/config.json to:
- Set
vector_weight(0.0 = lexical-only, 0.5 = hybrid, 1.0 = semantic-only) - Change embedding model (
BAAI/bge-small-en-v1.5,openai-small) - Exclude patterns (
node_modules/,__pycache__/, etc.) - Adjust chunk sizes (
max_chunk_size,min_chunk_size)
View config: sia-code config show
AI Summarization (optional, enhances git changelogs):
{
"summarization": {
"enabled": true,
"model": "google/flan-t5-base",
"max_commits": 20
}
}
Output Formats
sia-code search "query" --format json # JSON output
sia-code search "query" --format table # Rich table
sia-code search "query" --format csv # CSV for Excel
sia-code search "query" --output results.json # Save to file
Supported Languages
Full AST Support (12): Python, JavaScript, TypeScript, JSX, TSX, Go, Rust, Java, C, C++, C#, Ruby, PHP
Recognized: Kotlin, Groovy, Swift, Bash, Vue, Svelte, and more (indexed as text)
Troubleshooting
| Issue | Solution |
|---|---|
| No API key warning | Normal - searches fallback to lexical mode |
| Index growing large | Run sia-code compact to remove stale chunks |
| Slow indexing | Use sia-code index --update for incremental |
| Stale search results | Run sia-code index --clean to rebuild |
How It Works
- Parse - Tree-sitter generates language-agnostic AST for each file
- Chunk - AST-aware chunking preserves function/class boundaries (max 1200 chars)
- Index - Usearch HNSW (vectors) + SQLite FTS5 (lexical search with BM25)
- Store - Portable
.sia-code/directory (17-25 MB per repo) - Search - Lexical-first (BM25) with optional hybrid fusion (RRF)
Key Innovation: Lexical-only search (BM25) outperforms hybrid (BM25+embeddings) for code queries because code contains precise identifiers that benefit from exact keyword matching.
Documentation
Architecture & Implementation
- docs/ARCHITECTURE.md - System design, data structures, and technology stack
- docs/CODE_STRUCTURE.md - Codebase organization and key classes
- docs/INDEXING.md - Indexing pipeline and AST-aware chunking
- docs/QUERYING.md - Search methods and hybrid fusion
Benchmark Results
- docs/BENCHMARK_RESULTS.md - 89.9% Recall@5 full results and analysis
- docs/BENCHMARK_METHODOLOGY.md - RepoEval benchmark setup
- docs/PERFORMANCE_ANALYSIS.md - Why sia-code outperforms cAST by +12.9 pts
Usage & Configuration
- docs/CLI_FEATURES.md - Complete CLI reference and examples
- examples/ - Test results and usage examples
- ROADMAP.md - Development progress
- KNOWN_LIMITATIONS.md - Current limitations and workarounds
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sia_code-0.5.1.tar.gz.
File metadata
- Download URL: sia_code-0.5.1.tar.gz
- Upload date:
- Size: 92.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
113611f5bb75c08d7ffbc97fd8b2f27796265db03fb0245f10572944ce4b0b76
|
|
| MD5 |
687872adc7c4b74411b4d8a3abfc16b7
|
|
| BLAKE2b-256 |
e67e6e9c332df5964b3401db85fffc5f058d04a34abf37dec5fb7992bd40f8eb
|
Provenance
The following attestation bundles were made for sia_code-0.5.1.tar.gz:
Publisher:
release.yml on DxTa/sia-code
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sia_code-0.5.1.tar.gz -
Subject digest:
113611f5bb75c08d7ffbc97fd8b2f27796265db03fb0245f10572944ce4b0b76 - Sigstore transparency entry: 912117674
- Sigstore integration time:
-
Permalink:
DxTa/sia-code@01696360ab118db230e4d0811abff195a30ef2ff -
Branch / Tag:
refs/heads/main - Owner: https://github.com/DxTa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01696360ab118db230e4d0811abff195a30ef2ff -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file sia_code-0.5.1-py3-none-any.whl.
File metadata
- Download URL: sia_code-0.5.1-py3-none-any.whl
- Upload date:
- Size: 95.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14110278ab5dddf592dd50ee123fa4760596b6bb5e471a5db98ed0536ffc940f
|
|
| MD5 |
fb90b23f6284aebd52cade0fd04ea55b
|
|
| BLAKE2b-256 |
3a7315a79938a59233e91a8bf60d7c839c0a9df387181a21eca22ef0eadc20ec
|
Provenance
The following attestation bundles were made for sia_code-0.5.1-py3-none-any.whl:
Publisher:
release.yml on DxTa/sia-code
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sia_code-0.5.1-py3-none-any.whl -
Subject digest:
14110278ab5dddf592dd50ee123fa4760596b6bb5e471a5db98ed0536ffc940f - Sigstore transparency entry: 912117728
- Sigstore integration time:
-
Permalink:
DxTa/sia-code@01696360ab118db230e4d0811abff195a30ef2ff -
Branch / Tag:
refs/heads/main - Owner: https://github.com/DxTa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@01696360ab118db230e4d0811abff195a30ef2ff -
Trigger Event:
workflow_dispatch
-
Statement type: