Semantic file search for AI workstations using HNSW indexing
Project description
File Compass
Semantic file search for AI workstations using HNSW vector indexing and local embeddings.
Features
- Semantic Search: Find files by describing what you're looking for, not just keywords
- Quick Search: Instant filename and symbol search (no embedding required)
- Multi-Language AST Parsing: Tree-sitter support for Python, JavaScript, TypeScript, Rust, Go
- Result Explanations: Understand why each result matched your query
- Local Embeddings: Uses Ollama with nomic-embed-text (no API keys needed)
- Fast Search: HNSW indexing for sub-second queries across thousands of files
- Git-Aware: Optionally filter to only git-tracked files
- MCP Server: Integrates with Claude Code and other MCP clients
- Security Hardened: Input validation, path traversal protection, sanitized errors
Requirements
- Python 3.10+
- Ollama with
nomic-embed-textmodel
Installation
# Clone the repository
git clone https://github.com/mikeyfrilot/file-compass.git
cd file-compass
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# or: source venv/bin/activate # Linux/Mac
# Install dependencies
pip install -e .
# Pull the embedding model
ollama pull nomic-embed-text
Quick Start
1. Build the Index
# Index a directory
file-compass index -d "C:/Projects"
# Index multiple directories
file-compass index -d "C:/Projects" "D:/Code"
2. Search Files
# Semantic search
file-compass search "database connection handling"
# Filter by file type
file-compass search "training loop" --types python
# Git-tracked files only
file-compass search "API endpoints" --git-only
3. Quick Search (No Embeddings Required)
# Search by filename or symbol name
file-compass scan -d "C:/Projects" # Build quick index
4. Check Status
file-compass status
MCP Server
File Compass includes an MCP server for integration with Claude Code and other AI assistants.
Available Tools
| Tool | Description |
|---|---|
file_search |
Semantic search with explanations for why results matched |
file_preview |
Get visual code preview with syntax highlighting |
file_quick_search |
Fast filename/symbol search (no embedding required) |
file_quick_index_build |
Build the quick search index |
file_actions |
Perform actions: context, usages, related, history, symbols |
file_index_status |
Check index statistics |
file_index_scan |
Build or rebuild the full semantic index |
Claude Code Integration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"file-compass": {
"command": "python",
"args": ["-m", "file_compass.gateway"],
"cwd": "C:/path/to/file-compass"
}
}
}
Configuration
Configuration is managed via environment variables or the FileCompassConfig class:
| Variable | Default | Description |
|---|---|---|
FILE_COMPASS_DIRECTORIES |
F:/AI |
Comma-separated directories to index |
FILE_COMPASS_OLLAMA_URL |
http://localhost:11434 |
Ollama server URL |
FILE_COMPASS_EMBEDDING_MODEL |
nomic-embed-text |
Embedding model name |
How It Works
- Scanning: Discovers files matching configured extensions, respecting
.gitignore - Chunking: Splits files into semantic pieces:
- Python/JS/TS/Rust/Go: AST-aware via tree-sitter (functions, classes, methods)
- Markdown: Heading-based sections
- JSON/YAML: Top-level keys
- Other: Sliding window with overlap
- Embedding: Generates 768-dim vectors via Ollama's nomic-embed-text
- Indexing: Stores vectors in HNSW index, metadata in SQLite
- Search: Embeds query, finds nearest neighbors, returns ranked results with explanations
Project Structure
file-compass/
├── file_compass/
│ ├── __init__.py # Package init, default paths
│ ├── config.py # Configuration management
│ ├── embedder.py # Ollama embedding client with retry logic
│ ├── scanner.py # File discovery with gitignore support
│ ├── chunker.py # Multi-language AST chunking (tree-sitter)
│ ├── indexer.py # HNSW + SQLite index
│ ├── quick_index.py # Fast filename/symbol search
│ ├── explainer.py # Result explanation generation
│ ├── merkle.py # Incremental update tracking
│ ├── gateway.py # MCP server with security hardening
│ └── cli.py # Command-line interface
├── tests/ # 298 tests, 91% coverage
├── pyproject.toml
├── README.md
└── LICENSE
Security
File Compass includes several security measures:
- Input Validation: All MCP tool inputs are validated (length limits, type checks)
- Path Traversal Protection: Files outside allowed directories cannot be accessed
- SQL Injection Prevention: All database queries use parameterized statements
- Error Sanitization: Internal errors are not exposed to clients
Performance
- Index Size: ~1KB per chunk (embedding + metadata)
- Search Latency: <100ms for 10K+ chunks
- Quick Search: <10ms for filename/symbol search
- Embedding Speed: ~3-4 seconds per chunk (sequential, local)
Development
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=file_compass --cov-report=term-missing
# Type checking (optional)
mypy file_compass/
License
MIT License - see LICENSE for details.
Acknowledgments
- Ollama for local LLM inference
- hnswlib for fast vector search
- nomic-embed-text for embeddings
- tree-sitter for multi-language AST parsing
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file file_compass-0.1.0.tar.gz.
File metadata
- Download URL: file_compass-0.1.0.tar.gz
- Upload date:
- Size: 71.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c30451badf6a5bf3c84573124dd3f21daec2a9519f48ccd937fb701042250908
|
|
| MD5 |
bc379f135ff04c313c173400aed194ab
|
|
| BLAKE2b-256 |
dda7724280b307ed41af63059121ad1bd6625c38aae97a3d7420452e12e5f991
|
Provenance
The following attestation bundles were made for file_compass-0.1.0.tar.gz:
Publisher:
publish.yml on mikeyfrilot/file-compass
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
file_compass-0.1.0.tar.gz -
Subject digest:
c30451badf6a5bf3c84573124dd3f21daec2a9519f48ccd937fb701042250908 - Sigstore transparency entry: 836066698
- Sigstore integration time:
-
Permalink:
mikeyfrilot/file-compass@0e8aea8105df2fa9996d438159c723d6513d822c -
Branch / Tag:
refs/heads/master - Owner: https://github.com/mikeyfrilot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0e8aea8105df2fa9996d438159c723d6513d822c -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file file_compass-0.1.0-py3-none-any.whl.
File metadata
- Download URL: file_compass-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72328c48280b698c8e34833b6902cdb927e4898f2d314cea179d2e616f6b9e2f
|
|
| MD5 |
299debff0645094f1c5f44a246ca517d
|
|
| BLAKE2b-256 |
7a9332b4c3ffcf5de45f4bbbef2acb850f6b471fbd41719fc2ff750474428e23
|
Provenance
The following attestation bundles were made for file_compass-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on mikeyfrilot/file-compass
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
file_compass-0.1.0-py3-none-any.whl -
Subject digest:
72328c48280b698c8e34833b6902cdb927e4898f2d314cea179d2e616f6b9e2f - Sigstore transparency entry: 836066699
- Sigstore integration time:
-
Permalink:
mikeyfrilot/file-compass@0e8aea8105df2fa9996d438159c723d6513d822c -
Branch / Tag:
refs/heads/master - Owner: https://github.com/mikeyfrilot
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0e8aea8105df2fa9996d438159c723d6513d822c -
Trigger Event:
workflow_dispatch
-
Statement type: