Skip to main content

Local-first codebase intelligence with semantic search, multi-hop research, and 12-language AST support

Project description

Sia Code

Local-first codebase search with semantic understanding and multi-hop code discovery.

Benchmark Results

89.9% Recall@5 on RepoEval benchmark (1,600 queries, 8 repositories)

  • +12.9 percentage points better than cAST (77.0%)
  • Lexical-only search outperforms hybrid (BM25 > BM25+embeddings)
  • Publication-quality results with ±1.5% confidence interval

See docs/BENCHMARK_RESULTS.md for full analysis.

Features

  • 89.9% Recall@5 - State-of-the-art code search performance on RepoEval benchmark
  • Lexical-First Search - BM25 + FTS5 optimized for code queries (outperforms semantic-only)
  • Multi-Hop Research - Automatically discover code relationships and call graphs
  • AST-Aware Chunking - Tree-sitter preserves function/class boundaries
  • Project Auto-Detection - Automatic language detection and indexing strategy
  • Tiered Search - Filter by project code, dependencies, or both
  • 12 Languages - Python, JS/TS, Go, Rust, Java, C/C++, C#, Ruby, PHP (full AST support)
  • Watch Mode - Auto-reindex on file changes with incremental updates
  • Portable Index - Usearch HNSW + SQLite FTS5 in .sia-code/ directory

Installation

# From PyPI (recommended)
pip install sia-code

# Or with uv
uv tool install sia-code

# Or from source
uv tool install git+https://github.com/DxTa/sia-code.git

# Try without installing (ephemeral run)
uvx sia-code --version
uvx sia-code search "authentication logic"

# Verify installation
sia-code --version

Quick Start

# Initialize and index
sia-code init
sia-code index .

# Search
sia-code search "authentication logic"           # Hybrid search (default: BM25 + semantic)
sia-code search --regex "def.*login"             # Lexical-only search (BM25)
sia-code search --semantic-only "handle errors"  # Semantic-only search

# Multi-hop research (discover relationships)
sia-code research "how does the API handle errors?"

# Check index health
sia-code status

Commands

Command Description
sia-code init Initialize index in current directory
sia-code index . Index codebase
sia-code index --update Re-index changed files only
sia-code index --watch Auto-reindex on file changes
sia-code search "query" Hybrid search (BM25 + semantic)
sia-code search --regex "pattern" Lexical-only search
sia-code search --semantic-only "query" Semantic-only search
sia-code research "question" Multi-hop code discovery
sia-code status Index health and staleness metrics
sia-code compact Remove stale chunks
sia-code memory list List timeline/changelogs/decisions
sia-code memory changelog Generate changelog from git
sia-code memory sync-git Import events from git history
sia-code config show Display configuration
sia-code interactive Live search mode

See docs/CLI_FEATURES.md for complete command reference with all options and examples.

Configuration

Recommended: Lexical-only search (best performance, no API key needed)

sia-code init
sia-code index .
# Search uses BM25 by default (89.9% Recall@5)

Optional: Hybrid search (adds semantic embeddings):

export OPENAI_API_KEY=sk-your-key-here
sia-code config set embedding.enabled true
sia-code config set search.vector_weight 0.0  # 0.0 = lexical-only (recommended!)
sia-code index --clean

Edit config at .sia-code/config.json to:

  • Set vector_weight (0.0 = lexical-only, 0.5 = hybrid, 1.0 = semantic-only)
  • Change embedding model (BAAI/bge-small-en-v1.5, openai-small)
  • Exclude patterns (node_modules/, __pycache__/, etc.)
  • Adjust chunk sizes (max_chunk_size, min_chunk_size)

View config: sia-code config show

Git worktrees: by default, sia-code auto-detects worktrees and stores a single shared index in the git common dir. You can override with SIA_CODE_INDEX_SCOPE or set an explicit path with SIA_CODE_INDEX_DIR.

# Force shared index even outside worktrees
export SIA_CODE_INDEX_SCOPE=shared

# Or disable auto-detection (per-worktree index)
export SIA_CODE_INDEX_SCOPE=worktree

sia-code init
sia-code index .

AI Summarization (optional, enhances git changelogs):

{
  "summarization": {
    "enabled": true,
    "model": "google/flan-t5-base",
    "max_commits": 20
  }
}

Output Formats

sia-code search "query" --format json            # JSON output
sia-code search "query" --format table           # Rich table
sia-code search "query" --format csv             # CSV for Excel
sia-code search "query" --output results.json    # Save to file

Supported Languages

Full AST Support (12): Python, JavaScript, TypeScript, JSX, TSX, Go, Rust, Java, C, C++, C#, Ruby, PHP

Recognized: Kotlin, Groovy, Swift, Bash, Vue, Svelte, and more (indexed as text)

Troubleshooting

Issue Solution
No API key warning Normal - searches fallback to lexical mode
Index growing large Run sia-code compact to remove stale chunks
Slow indexing Use sia-code index --update for incremental
Stale search results Run sia-code index --clean to rebuild

How It Works

  1. Parse - Tree-sitter generates language-agnostic AST for each file
  2. Chunk - AST-aware chunking preserves function/class boundaries (max 1200 chars)
  3. Index - Usearch HNSW (vectors) + SQLite FTS5 (lexical search with BM25)
  4. Store - Portable .sia-code/ directory (17-25 MB per repo)
  5. Search - Lexical-first (BM25) with optional hybrid fusion (RRF)

Key Innovation: Lexical-only search (BM25) outperforms hybrid (BM25+embeddings) for code queries because code contains precise identifiers that benefit from exact keyword matching.

Documentation

Architecture & Implementation

Benchmark Results

Usage & Configuration

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sia_code-0.7.0.tar.gz (114.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sia_code-0.7.0-py3-none-any.whl (116.7 kB view details)

Uploaded Python 3

File details

Details for the file sia_code-0.7.0.tar.gz.

File metadata

  • Download URL: sia_code-0.7.0.tar.gz
  • Upload date:
  • Size: 114.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sia_code-0.7.0.tar.gz
Algorithm Hash digest
SHA256 547c7950c425c5bb9754a5c6d3033dff87b4db1002d522a90008898865c7c446
MD5 43b4167b757513ff2eb36e1a1cceb574
BLAKE2b-256 28f12c6976d6d8e9974c7b1f8f6e595251db5dc14e5abd3dc987136a5460ae06

See more details on using hashes here.

Provenance

The following attestation bundles were made for sia_code-0.7.0.tar.gz:

Publisher: release.yml on DxTa/sia-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sia_code-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: sia_code-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 116.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sia_code-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45412a04843cd2c72272a3f34eaeadbfc6a1988e9c4b898281c1eb9b8d3f415a
MD5 2af8eadc1f78e52dcbe05b14e417a0d6
BLAKE2b-256 73d3fc0210131cff199a4550cb4a0d7463827dd1978255ceadf34876768e053a

See more details on using hashes here.

Provenance

The following attestation bundles were made for sia_code-0.7.0-py3-none-any.whl:

Publisher: release.yml on DxTa/sia-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page