Skip to main content

Local-first codebase intelligence with semantic search, multi-hop research, and 12-language AST support

Project description

Sia Code

Local-first codebase search with semantic understanding and multi-hop code discovery.

Benchmark Results

89.9% Recall@5 on RepoEval benchmark (1,600 queries, 8 repositories)

  • +12.9 percentage points better than cAST (77.0%)
  • Lexical-only search outperforms hybrid (BM25 > BM25+embeddings)
  • Publication-quality results with ±1.5% confidence interval

See docs/BENCHMARK_RESULTS.md for full analysis.

Features

  • 89.9% Recall@5 - State-of-the-art code search performance on RepoEval benchmark
  • Lexical-First Search - BM25 + FTS5 optimized for code queries (outperforms semantic-only)
  • Multi-Hop Research - Automatically discover code relationships and call graphs
  • AST-Aware Chunking - Tree-sitter preserves function/class boundaries
  • Project Auto-Detection - Automatic language detection and indexing strategy
  • Tiered Search - Filter by project code, dependencies, or both
  • 12 Languages - Python, JS/TS, Go, Rust, Java, C/C++, C#, Ruby, PHP (full AST support)
  • Watch Mode - Auto-reindex on file changes with incremental updates
  • Portable Index - Usearch HNSW + SQLite FTS5 in .sia-code/ directory

Installation

# From PyPI (recommended)
pip install sia-code

# Or with uv
uv tool install sia-code

# Or from source
uv tool install git+https://github.com/DxTa/sia-code.git

# Try without installing (ephemeral run)
uvx sia-code --version
uvx sia-code search "authentication logic"

# Verify installation
sia-code --version

Quick Start

# Initialize and index
sia-code init
sia-code index .

# Search
sia-code search "authentication logic"           # Hybrid search (default: BM25 + semantic)
sia-code search --regex "def.*login"             # Lexical-only search (BM25)
sia-code search --semantic-only "handle errors"  # Semantic-only search

# Multi-hop research (discover relationships)
sia-code research "how does the API handle errors?"

# Check index health
sia-code status

Commands

Command Description
sia-code init Initialize index in current directory
sia-code index . Index codebase
sia-code index --update Re-index changed files only
sia-code index --watch Auto-reindex on file changes
sia-code search "query" Hybrid search (BM25 + semantic)
sia-code search --regex "pattern" Lexical-only search
sia-code search --semantic-only "query" Semantic-only search
sia-code research "question" Multi-hop code discovery
sia-code status Index health and staleness metrics
sia-code compact Remove stale chunks
sia-code memory list List timeline/changelogs/decisions
sia-code memory changelog Generate changelog from git
sia-code memory sync-git Import events from git history
sia-code config show Display configuration
sia-code interactive Live search mode

See docs/CLI_FEATURES.md for complete command reference with all options and examples.

Configuration

Recommended: Lexical-only search (best performance, no API key needed)

sia-code init
sia-code index .
# Search uses BM25 by default (89.9% Recall@5)

Optional: Hybrid search (adds semantic embeddings):

export OPENAI_API_KEY=sk-your-key-here
sia-code config set embedding.enabled true
sia-code config set search.vector_weight 0.0  # 0.0 = lexical-only (recommended!)
sia-code index --clean

Edit config at .sia-code/config.json to:

  • Set vector_weight (0.0 = lexical-only, 0.5 = hybrid, 1.0 = semantic-only)
  • Change embedding model (BAAI/bge-small-en-v1.5, openai-small)
  • Exclude patterns (node_modules/, __pycache__/, etc.)
  • Adjust chunk sizes (max_chunk_size, min_chunk_size)

View config: sia-code config show

AI Summarization (optional, enhances git changelogs):

{
  "summarization": {
    "enabled": true,
    "model": "google/flan-t5-base",
    "max_commits": 20
  }
}

Output Formats

sia-code search "query" --format json            # JSON output
sia-code search "query" --format table           # Rich table
sia-code search "query" --format csv             # CSV for Excel
sia-code search "query" --output results.json    # Save to file

Supported Languages

Full AST Support (12): Python, JavaScript, TypeScript, JSX, TSX, Go, Rust, Java, C, C++, C#, Ruby, PHP

Recognized: Kotlin, Groovy, Swift, Bash, Vue, Svelte, and more (indexed as text)

Troubleshooting

Issue Solution
No API key warning Normal - searches fallback to lexical mode
Index growing large Run sia-code compact to remove stale chunks
Slow indexing Use sia-code index --update for incremental
Stale search results Run sia-code index --clean to rebuild

How It Works

  1. Parse - Tree-sitter generates language-agnostic AST for each file
  2. Chunk - AST-aware chunking preserves function/class boundaries (max 1200 chars)
  3. Index - Usearch HNSW (vectors) + SQLite FTS5 (lexical search with BM25)
  4. Store - Portable .sia-code/ directory (17-25 MB per repo)
  5. Search - Lexical-first (BM25) with optional hybrid fusion (RRF)

Key Innovation: Lexical-only search (BM25) outperforms hybrid (BM25+embeddings) for code queries because code contains precise identifiers that benefit from exact keyword matching.

Documentation

Architecture & Implementation

Benchmark Results

Usage & Configuration

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sia_code-0.5.1.tar.gz (92.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sia_code-0.5.1-py3-none-any.whl (95.9 kB view details)

Uploaded Python 3

File details

Details for the file sia_code-0.5.1.tar.gz.

File metadata

  • Download URL: sia_code-0.5.1.tar.gz
  • Upload date:
  • Size: 92.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sia_code-0.5.1.tar.gz
Algorithm Hash digest
SHA256 113611f5bb75c08d7ffbc97fd8b2f27796265db03fb0245f10572944ce4b0b76
MD5 687872adc7c4b74411b4d8a3abfc16b7
BLAKE2b-256 e67e6e9c332df5964b3401db85fffc5f058d04a34abf37dec5fb7992bd40f8eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for sia_code-0.5.1.tar.gz:

Publisher: release.yml on DxTa/sia-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sia_code-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: sia_code-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 95.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sia_code-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 14110278ab5dddf592dd50ee123fa4760596b6bb5e471a5db98ed0536ffc940f
MD5 fb90b23f6284aebd52cade0fd04ea55b
BLAKE2b-256 3a7315a79938a59233e91a8bf60d7c839c0a9df387181a21eca22ef0eadc20ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for sia_code-0.5.1-py3-none-any.whl:

Publisher: release.yml on DxTa/sia-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page