Skip to main content

Semantic file search for AI workstations using HNSW indexing

Project description

File Compass

Semantic file search for AI workstations using HNSW vector indexing and local embeddings.

Tests Coverage Python License

Features

  • Semantic Search: Find files by describing what you're looking for, not just keywords
  • Quick Search: Instant filename and symbol search (no embedding required)
  • Multi-Language AST Parsing: Tree-sitter support for Python, JavaScript, TypeScript, Rust, Go
  • Result Explanations: Understand why each result matched your query
  • Local Embeddings: Uses Ollama with nomic-embed-text (no API keys needed)
  • Fast Search: HNSW indexing for sub-second queries across thousands of files
  • Git-Aware: Optionally filter to only git-tracked files
  • MCP Server: Integrates with Claude Code and other MCP clients
  • Security Hardened: Input validation, path traversal protection, sanitized errors

Requirements

  • Python 3.10+
  • Ollama with nomic-embed-text model

Installation

# Clone the repository
git clone https://github.com/mikeyfrilot/file-compass.git
cd file-compass

# Create virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
# or: source venv/bin/activate  # Linux/Mac

# Install dependencies
pip install -e .

# Pull the embedding model
ollama pull nomic-embed-text

Quick Start

1. Build the Index

# Index a directory
file-compass index -d "C:/Projects"

# Index multiple directories
file-compass index -d "C:/Projects" "D:/Code"

2. Search Files

# Semantic search
file-compass search "database connection handling"

# Filter by file type
file-compass search "training loop" --types python

# Git-tracked files only
file-compass search "API endpoints" --git-only

3. Quick Search (No Embeddings Required)

# Search by filename or symbol name
file-compass scan -d "C:/Projects"  # Build quick index

4. Check Status

file-compass status

MCP Server

File Compass includes an MCP server for integration with Claude Code and other AI assistants.

Available Tools

Tool Description
file_search Semantic search with explanations for why results matched
file_preview Get visual code preview with syntax highlighting
file_quick_search Fast filename/symbol search (no embedding required)
file_quick_index_build Build the quick search index
file_actions Perform actions: context, usages, related, history, symbols
file_index_status Check index statistics
file_index_scan Build or rebuild the full semantic index

Claude Code Integration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "file-compass": {
      "command": "python",
      "args": ["-m", "file_compass.gateway"],
      "cwd": "C:/path/to/file-compass"
    }
  }
}

Configuration

Configuration is managed via environment variables or the FileCompassConfig class:

Variable Default Description
FILE_COMPASS_DIRECTORIES F:/AI Comma-separated directories to index
FILE_COMPASS_OLLAMA_URL http://localhost:11434 Ollama server URL
FILE_COMPASS_EMBEDDING_MODEL nomic-embed-text Embedding model name

How It Works

  1. Scanning: Discovers files matching configured extensions, respecting .gitignore
  2. Chunking: Splits files into semantic pieces:
    • Python/JS/TS/Rust/Go: AST-aware via tree-sitter (functions, classes, methods)
    • Markdown: Heading-based sections
    • JSON/YAML: Top-level keys
    • Other: Sliding window with overlap
  3. Embedding: Generates 768-dim vectors via Ollama's nomic-embed-text
  4. Indexing: Stores vectors in HNSW index, metadata in SQLite
  5. Search: Embeds query, finds nearest neighbors, returns ranked results with explanations

Project Structure

file-compass/
├── file_compass/
│   ├── __init__.py      # Package init, default paths
│   ├── config.py        # Configuration management
│   ├── embedder.py      # Ollama embedding client with retry logic
│   ├── scanner.py       # File discovery with gitignore support
│   ├── chunker.py       # Multi-language AST chunking (tree-sitter)
│   ├── indexer.py       # HNSW + SQLite index
│   ├── quick_index.py   # Fast filename/symbol search
│   ├── explainer.py     # Result explanation generation
│   ├── merkle.py        # Incremental update tracking
│   ├── gateway.py       # MCP server with security hardening
│   └── cli.py           # Command-line interface
├── tests/               # 298 tests, 91% coverage
├── pyproject.toml
├── README.md
└── LICENSE

Security

File Compass includes several security measures:

  • Input Validation: All MCP tool inputs are validated (length limits, type checks)
  • Path Traversal Protection: Files outside allowed directories cannot be accessed
  • SQL Injection Prevention: All database queries use parameterized statements
  • Error Sanitization: Internal errors are not exposed to clients

Performance

  • Index Size: ~1KB per chunk (embedding + metadata)
  • Search Latency: <100ms for 10K+ chunks
  • Quick Search: <10ms for filename/symbol search
  • Embedding Speed: ~3-4 seconds per chunk (sequential, local)

Development

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=file_compass --cov-report=term-missing

# Type checking (optional)
mypy file_compass/

License

MIT License - see LICENSE for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_compass-0.1.0.tar.gz (71.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_compass-0.1.0-py3-none-any.whl (45.8 kB view details)

Uploaded Python 3

File details

Details for the file file_compass-0.1.0.tar.gz.

File metadata

  • Download URL: file_compass-0.1.0.tar.gz
  • Upload date:
  • Size: 71.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for file_compass-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c30451badf6a5bf3c84573124dd3f21daec2a9519f48ccd937fb701042250908
MD5 bc379f135ff04c313c173400aed194ab
BLAKE2b-256 dda7724280b307ed41af63059121ad1bd6625c38aae97a3d7420452e12e5f991

See more details on using hashes here.

Provenance

The following attestation bundles were made for file_compass-0.1.0.tar.gz:

Publisher: publish.yml on mikeyfrilot/file-compass

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file file_compass-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: file_compass-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 45.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for file_compass-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 72328c48280b698c8e34833b6902cdb927e4898f2d314cea179d2e616f6b9e2f
MD5 299debff0645094f1c5f44a246ca517d
BLAKE2b-256 7a9332b4c3ffcf5de45f4bbbef2acb850f6b471fbd41719fc2ff750474428e23

See more details on using hashes here.

Provenance

The following attestation bundles were made for file_compass-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mikeyfrilot/file-compass

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page