Skip to main content

Local-first codebase intelligence with semantic search, multi-hop research, and 12-language AST support

Project description

Sia Code

Local-first codebase search with semantic understanding and multi-hop code discovery.

Benchmark Results

89.9% Recall@5 on RepoEval benchmark (1,600 queries, 8 repositories)

  • +12.9 percentage points better than cAST (77.0%)
  • Lexical-only search outperforms hybrid (BM25 > BM25+embeddings)
  • Publication-quality results with ±1.5% confidence interval

See docs/BENCHMARK_RESULTS.md for full analysis.

Features

  • 89.9% Recall@5 - State-of-the-art code search performance on RepoEval benchmark
  • Lexical-First Search - BM25 + FTS5 optimized for code queries (outperforms semantic-only)
  • Multi-Hop Research - Automatically discover code relationships and call graphs
  • AST-Aware Chunking - Tree-sitter preserves function/class boundaries
  • Project Auto-Detection - Automatic language detection and indexing strategy
  • Tiered Search - Filter by project code, dependencies, or both
  • 12 Languages - Python, JS/TS, Go, Rust, Java, C/C++, C#, Ruby, PHP (full AST support)
  • Watch Mode - Auto-reindex on file changes with incremental updates
  • Portable Index - Usearch HNSW + SQLite FTS5 in .sia-code/ directory

Installation

# From PyPI (recommended)
pip install sia-code

# Or with uv
uv tool install sia-code

# Or from source
uv tool install git+https://github.com/DxTa/sia-code.git

# Try without installing (ephemeral run)
uvx sia-code --version
uvx sia-code search "authentication logic"

# Verify installation
sia-code --version

Quick Start

# Initialize and index
sia-code init
sia-code index .

# Search
sia-code search "authentication logic"           # Hybrid search (default: BM25 + semantic)
sia-code search --regex "def.*login"             # Lexical-only search (BM25)
sia-code search --semantic-only "handle errors"  # Semantic-only search

# Multi-hop research (discover relationships)
sia-code research "how does the API handle errors?"

# Check index health
sia-code status

Commands

Command Description
sia-code init Initialize index in current directory
sia-code index . Index codebase
sia-code index --update Re-index changed files only
sia-code index --watch Auto-reindex on file changes
sia-code search "query" Hybrid search (BM25 + semantic)
sia-code search --regex "pattern" Lexical-only search
sia-code search --semantic-only "query" Semantic-only search
sia-code research "question" Multi-hop code discovery
sia-code status Index health and staleness metrics
sia-code compact Remove stale chunks
sia-code memory list List timeline/changelogs/decisions
sia-code memory changelog Generate changelog from git
sia-code memory sync-git Import events from git history
sia-code config show Display configuration
sia-code interactive Live search mode

See docs/CLI_FEATURES.md for complete command reference with all options and examples.

Configuration

Recommended: Lexical-only search (best performance, no API key needed)

sia-code init
sia-code index .
# Search uses BM25 by default (89.9% Recall@5)

Optional: Hybrid search (adds semantic embeddings):

export OPENAI_API_KEY=sk-your-key-here
sia-code config set embedding.enabled true
sia-code config set search.vector_weight 0.0  # 0.0 = lexical-only (recommended!)
sia-code index --clean

Edit config at .sia-code/config.json to:

  • Set vector_weight (0.0 = lexical-only, 0.5 = hybrid, 1.0 = semantic-only)
  • Change embedding model (BAAI/bge-small-en-v1.5, openai-small)
  • Exclude patterns (node_modules/, __pycache__/, etc.)
  • Adjust chunk sizes (max_chunk_size, min_chunk_size)

View config: sia-code config show

AI Summarization (optional, enhances git changelogs):

{
  "summarization": {
    "enabled": true,
    "model": "google/flan-t5-base",
    "max_commits": 20
  }
}

Output Formats

sia-code search "query" --format json            # JSON output
sia-code search "query" --format table           # Rich table
sia-code search "query" --format csv             # CSV for Excel
sia-code search "query" --output results.json    # Save to file

Supported Languages

Full AST Support (12): Python, JavaScript, TypeScript, JSX, TSX, Go, Rust, Java, C, C++, C#, Ruby, PHP

Recognized: Kotlin, Groovy, Swift, Bash, Vue, Svelte, and more (indexed as text)

Troubleshooting

Issue Solution
No API key warning Normal - searches fallback to lexical mode
Index growing large Run sia-code compact to remove stale chunks
Slow indexing Use sia-code index --update for incremental
Stale search results Run sia-code index --clean to rebuild

How It Works

  1. Parse - Tree-sitter generates language-agnostic AST for each file
  2. Chunk - AST-aware chunking preserves function/class boundaries (max 1200 chars)
  3. Index - Usearch HNSW (vectors) + SQLite FTS5 (lexical search with BM25)
  4. Store - Portable .sia-code/ directory (17-25 MB per repo)
  5. Search - Lexical-first (BM25) with optional hybrid fusion (RRF)

Key Innovation: Lexical-only search (BM25) outperforms hybrid (BM25+embeddings) for code queries because code contains precise identifiers that benefit from exact keyword matching.

Documentation

Architecture & Implementation

Benchmark Results

Usage & Configuration

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sia_code-0.6.0.tar.gz (93.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sia_code-0.6.0-py3-none-any.whl (96.3 kB view details)

Uploaded Python 3

File details

Details for the file sia_code-0.6.0.tar.gz.

File metadata

  • Download URL: sia_code-0.6.0.tar.gz
  • Upload date:
  • Size: 93.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sia_code-0.6.0.tar.gz
Algorithm Hash digest
SHA256 9395721e25909bd3c27c76804254a82b50b2f2cbe5b742677cfe3be2c58658fe
MD5 015f23e91291fdca0b0fd85af2ff235b
BLAKE2b-256 663faf2e89dd59787dc18b7cdb6ceb824d3a7a96c24ca08c8b8270a6a42409d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for sia_code-0.6.0.tar.gz:

Publisher: release.yml on DxTa/sia-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sia_code-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: sia_code-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 96.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sia_code-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5b555c2af588fd8abed0d8adab38741ad1943a7c27b0b49569f353e1d5a5ccac
MD5 34e479fd077b799ef119246915cbcc9b
BLAKE2b-256 d74348bef7e743c564526d783ec0d01cbb148be5752ac6d5f6cb316e96365f33

See more details on using hashes here.

Provenance

The following attestation bundles were made for sia_code-0.6.0-py3-none-any.whl:

Publisher: release.yml on DxTa/sia-code

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page