Self-hosted local file indexing MCP server with semantic search
Project description
mcp-trove-crunchtools
Self-hosted local file indexing MCP server with semantic search. Index any local directory (pCloud ~/AutoSync/, rclone mounts, ~/Documents/, anything) and search over the contents using hybrid vector + keyword search.
Features
- Hybrid search — Combines semantic vector similarity with FTS5 keyword matching
- Multiple file formats — PDF, DOCX, Markdown, plain text, source code
- Local-first — No cloud services, no per-seat fees, your data stays on your machine
- Lightweight embeddings — Uses fastembed (ONNX runtime) instead of PyTorch (~22MB vs ~2GB)
- Incremental indexing — SHA-256 checksum-based change detection
- Background mode —
--indexCLI mode for systemd timer automation
Install
uvx (recommended)
uvx mcp-trove-crunchtools
pip
pip install mcp-trove-crunchtools
Container
podman run -v trove-data:/data -v ~/Documents:/docs:ro quay.io/crunchtools/mcp-trove
Claude Code Integration
claude mcp add mcp-trove-crunchtools -- uvx mcp-trove-crunchtools
Tools (8)
Search (2)
| Tool | Description |
|---|---|
trove_search |
Hybrid semantic + FTS5 search. Returns ranked chunks with file paths, scores, and content. |
trove_similar |
Find files similar to a given indexed file using its average embedding. |
Index Management (3)
| Tool | Description |
|---|---|
trove_index |
Index a specific file or directory. Skips unchanged files (checksum-based). |
trove_reindex |
Force re-index ignoring checksums. If no path given, reindexes everything. |
trove_remove |
Remove a file or directory from the index. |
Status (3)
| Tool | Description |
|---|---|
trove_status |
Index statistics: total files, chunks, disk usage, model info. |
trove_list |
List indexed files with metadata (size, type, chunk count). |
trove_get_chunks |
Show the text chunks for a specific indexed file. |
Environment Variables
| Variable | Default | Description |
|---|---|---|
TROVE_DB |
~/.local/share/mcp-trove/trove.db |
SQLite database path |
TROVE_PATHS |
(none) | Colon-separated directories to index in background mode |
TROVE_INDEX_WORKERS |
2 |
Concurrent embedding workers |
TROVE_INDEX_BATCH |
50 |
Files per indexing batch |
TROVE_EMBEDDING_MODEL |
BAAI/bge-small-en-v1.5 |
fastembed model name |
TROVE_EXCLUDE_PATTERNS |
*.iso,*.zip,... |
Glob patterns to skip |
TROVE_CHUNK_SIZE |
1000 |
Characters per text chunk |
TROVE_CHUNK_OVERLAP |
200 |
Overlap between chunks |
Background Indexing
Set up a systemd timer to keep your index fresh:
TROVE_PATHS=~/Documents:~/AutoSync mcp-trove-crunchtools --index
License
AGPL-3.0-or-later
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_trove_crunchtools-0.3.0.tar.gz.
File metadata
- Download URL: mcp_trove_crunchtools-0.3.0.tar.gz
- Upload date:
- Size: 48.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7cb32391f4ba4ad2e77eaf3548f036281c5da53b01df40b8dad8f83313b9e04
|
|
| MD5 |
f3106ba3e521579db3ec9bfda65803bb
|
|
| BLAKE2b-256 |
7292fbe622f23c5a34838f09a733efdb6cfad97e2f68e15bcca830d8012db535
|
Provenance
The following attestation bundles were made for mcp_trove_crunchtools-0.3.0.tar.gz:
Publisher:
publish.yml on crunchtools/mcp-trove
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_trove_crunchtools-0.3.0.tar.gz -
Subject digest:
e7cb32391f4ba4ad2e77eaf3548f036281c5da53b01df40b8dad8f83313b9e04 - Sigstore transparency entry: 1128020600
- Sigstore integration time:
-
Permalink:
crunchtools/mcp-trove@def93f9f6cd68c2d45aba0aea085296853e29e05 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/crunchtools
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@def93f9f6cd68c2d45aba0aea085296853e29e05 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mcp_trove_crunchtools-0.3.0-py3-none-any.whl.
File metadata
- Download URL: mcp_trove_crunchtools-0.3.0-py3-none-any.whl
- Upload date:
- Size: 36.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8da5de4e8abb83dd2e88a74419bdf5ff75cd5143c6deb763a161c2f0d9b4788
|
|
| MD5 |
806e27bab4eac343eeb98bb760a9c6c3
|
|
| BLAKE2b-256 |
eec69646a502dc0409ecdddcfe6f4b98ab83ba70c2e0a990339f8b5673ba500a
|
Provenance
The following attestation bundles were made for mcp_trove_crunchtools-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on crunchtools/mcp-trove
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_trove_crunchtools-0.3.0-py3-none-any.whl -
Subject digest:
b8da5de4e8abb83dd2e88a74419bdf5ff75cd5143c6deb763a161c2f0d9b4788 - Sigstore transparency entry: 1128020855
- Sigstore integration time:
-
Permalink:
crunchtools/mcp-trove@def93f9f6cd68c2d45aba0aea085296853e29e05 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/crunchtools
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@def93f9f6cd68c2d45aba0aea085296853e29e05 -
Trigger Event:
release
-
Statement type: