Skip to main content

Self-hosted local file indexing MCP server with semantic search

Project description

mcp-trove-crunchtools

Self-hosted local file indexing MCP server with semantic search. Index any local directory (pCloud ~/AutoSync/, rclone mounts, ~/Documents/, anything) and search over the contents using hybrid vector + keyword search.

Features

  • Hybrid search — Combines semantic vector similarity with FTS5 keyword matching
  • Multiple file formats — PDF, DOCX, Markdown, plain text, source code
  • Local-first — No cloud services, no per-seat fees, your data stays on your machine
  • Lightweight embeddings — Uses fastembed (ONNX runtime) instead of PyTorch (~22MB vs ~2GB)
  • Incremental indexing — SHA-256 checksum-based change detection
  • Background mode--index CLI mode for systemd timer automation

Install

uvx (recommended)

uvx mcp-trove-crunchtools

pip

pip install mcp-trove-crunchtools

Container

podman run -v trove-data:/data -v ~/Documents:/docs:ro quay.io/crunchtools/mcp-trove

Claude Code Integration

claude mcp add mcp-trove-crunchtools -- uvx mcp-trove-crunchtools

Tools (8)

Search (2)

Tool Description
trove_search Hybrid semantic + FTS5 search. Returns ranked chunks with file paths, scores, and content.
trove_similar Find files similar to a given indexed file using its average embedding.

Index Management (3)

Tool Description
trove_index Index a specific file or directory. Skips unchanged files (checksum-based).
trove_reindex Force re-index ignoring checksums. If no path given, reindexes everything.
trove_remove Remove a file or directory from the index.

Status (3)

Tool Description
trove_status Index statistics: total files, chunks, disk usage, model info.
trove_list List indexed files with metadata (size, type, chunk count).
trove_get_chunks Show the text chunks for a specific indexed file.

Environment Variables

Variable Default Description
TROVE_DB ~/.local/share/mcp-trove/trove.db SQLite database path
TROVE_PATHS (none) Colon-separated directories to index in background mode
TROVE_INDEX_WORKERS 2 Concurrent embedding workers
TROVE_INDEX_BATCH 50 Files per indexing batch
TROVE_EMBEDDING_MODEL BAAI/bge-small-en-v1.5 fastembed model name
TROVE_EXCLUDE_PATTERNS *.iso,*.zip,... Glob patterns to skip
TROVE_CHUNK_SIZE 1000 Characters per text chunk
TROVE_CHUNK_OVERLAP 200 Overlap between chunks

Background Indexing

Set up a systemd timer to keep your index fresh:

TROVE_PATHS=~/Documents:~/AutoSync mcp-trove-crunchtools --index

License

AGPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_trove_crunchtools-0.3.0.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_trove_crunchtools-0.3.0-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file mcp_trove_crunchtools-0.3.0.tar.gz.

File metadata

  • Download URL: mcp_trove_crunchtools-0.3.0.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_trove_crunchtools-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e7cb32391f4ba4ad2e77eaf3548f036281c5da53b01df40b8dad8f83313b9e04
MD5 f3106ba3e521579db3ec9bfda65803bb
BLAKE2b-256 7292fbe622f23c5a34838f09a733efdb6cfad97e2f68e15bcca830d8012db535

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_trove_crunchtools-0.3.0.tar.gz:

Publisher: publish.yml on crunchtools/mcp-trove

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcp_trove_crunchtools-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_trove_crunchtools-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8da5de4e8abb83dd2e88a74419bdf5ff75cd5143c6deb763a161c2f0d9b4788
MD5 806e27bab4eac343eeb98bb760a9c6c3
BLAKE2b-256 eec69646a502dc0409ecdddcfe6f4b98ab83ba70c2e0a990339f8b5673ba500a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_trove_crunchtools-0.3.0-py3-none-any.whl:

Publisher: publish.yml on crunchtools/mcp-trove

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page