Local RAG-based semantic document search with MCP server interface

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Chetic

These details have not been verified by PyPI

Project description

ChunkSilo Logo

ChunkSilo MCP Server

ChunkSilo is like a local Google for your documents. It uses semantic search — matching by meaning rather than exact keywords — so your LLM can find relevant information across all your files even when the wording differs from your query. Point it at your PDFs, Word docs, Markdown, and text files, and it builds a fully searchable index locally on your machine.

Runs entirely on your machine — no servers, no infrastructure
Semantic search + keyword filename matching across PDF, DOCX, DOC, Markdown, and TXT
Incremental indexing — only reprocesses new or changed files
Heading-aware results with source links back to the original file
Date filtering and recency boosting
Optional Confluence integration

Example `search_docs` output

{
  "matched_files": [
    { "uri": "file:///docs/database-configuration.docx", "score": 0.8432 }
  ],
  "num_matched_files": 1,
  "chunks": [
    {
      "text": "To configure the database connection, set the DATABASE_URL environment variable...",
      "score": 0.912,
      "location": {
        "uri": "file:///docs/setup-guide.pdf",
        "page": 12,
        "line": null,
        "heading_path": ["Getting Started", "Configuration", "Database"]
      }
    }
  ],
  "num_chunks": 1,
  "query": "how to configure the database",
  "retrieval_time": "0.42s"
}

Installation

Option A: Install from PyPI (Recommended)

Requires Python 3.11 or later. Models are downloaded automatically on first run (~250MB). The first run may appear to pause while models download — this is normal.

pip install chunksilo

# Or with Confluence support:
pip install chunksilo[confluence]

# Or with Jira support:
pip install chunksilo[jira]

# Or with both Confluence and Jira:
pip install chunksilo[confluence,jira]

Then:

Create a config file at ~/.config/chunksilo/config.yaml (see Configuration)
Build the index: chunksilo --build-index
Configure your MCP client (see MCP Client Configuration)

Option B: Offline Bundle

A self-contained package with pre-downloaded models, ideal for air-gapped environments or systems without Python installed.

Download from the Releases page:

Download the chunksilo-vX.Y.Z-manylinux_2_34_x86_64.tar.gz file
Extract and install:

tar -xzf chunksilo-vX.Y.Z-manylinux_2_34_x86_64.tar.gz
cd chunksilo
./setup.sh

Edit config.yaml to set your document directories
Build the index: ./venv/bin/chunksilo --build-index
Configure your MCP client (see MCP Client Configuration)

Configuration

ChunkSilo uses a single configuration file: config.yaml

Configuration File

Edit config.yaml to configure your settings:

# Indexing settings - used by chunksilo --build-index
indexing:
  directories:
    - "./data"
    - "/mnt/nfs/shared-docs"
    - path: "/mnt/samba/engineering"
      include: ["**/*.pdf", "**/*.md"]
      exclude: ["**/archive/**"]
  chunk_size: 1600
  chunk_overlap: 200

# Retrieval settings - used when searching
retrieval:
  embed_top_k: 20
  rerank_top_k: 5
  score_threshold: 0.1

# Confluence integration (optional)
confluence:
  url: "https://confluence.example.com"
  username: "your-username"
  api_token: "your-api-token"

# Storage paths (usually don't need to change)
storage:
  storage_dir: "./storage"
  model_cache_dir: "./models"

All settings are optional and have sensible defaults.

Configuration Reference

Indexing Settings

Setting	Default	Description
`indexing.directories`	`["./data"]`	List of directories to index (strings or objects)
`indexing.chunk_size`	`1600`	Maximum size of text chunks
`indexing.chunk_overlap`	`200`	Overlap between adjacent chunks

Per-directory options (when using object format):

Option	Default	Description
`path`	(required)	Directory path to index
`include`	`["*/.pdf", "*/.md", "*/.txt", "*/.docx", "*/.doc"]`	Glob patterns for files to include
`exclude`	`[]`	Glob patterns for files to exclude
`recursive`	`true`	Whether to recurse into subdirectories
`enabled`	`true`	Whether to index this directory

Retrieval Settings

Setting	Default	Description
`retrieval.embed_model_name`	`BAAI/bge-small-en-v1.5`	Embedding model for vector search
`retrieval.embed_top_k`	`20`	Candidates from vector search before reranking
`retrieval.rerank_model_name`	`ms-marco-MiniLM-L-12-v2`	Reranker model
`retrieval.rerank_top_k`	`5`	Final results after reranking
`retrieval.rerank_candidates`	`100`	Maximum candidates sent to reranker
`retrieval.score_threshold`	`0.1`	Minimum score (0.0-1.0) for results
`retrieval.recency_boost`	`0.3`	Recency boost weight (0.0-1.0)
`retrieval.recency_half_life_days`	`365`	Days until recency boost halves
`retrieval.bm25_similarity_top_k`	`10`	Files returned by BM25 filename search
`retrieval.offline`	`false`	Prevent ML library network requests

Confluence Settings (optional)

Note: Confluence integration requires the optional dependency. Install with: pip install chunksilo[confluence]

Setting	Default	Description
`confluence.url`	`""`	Confluence base URL (empty = disabled)
`confluence.username`	`""`	Confluence username
`confluence.api_token`	`""`	Confluence API token
`confluence.timeout`	`10.0`	Request timeout in seconds
`confluence.max_results`	`30`	Maximum results per search

Jira Settings (optional)

Note: Jira integration requires the optional dependency. Install with: pip install chunksilo[jira]

Setting	Default	Description
`jira.url`	`""`	Jira base URL (empty = disabled)
`jira.username`	`""`	Jira username/email
`jira.api_token`	`""`	Jira API token
`jira.timeout`	`10.0`	Request timeout in seconds
`jira.max_results`	`30`	Maximum results per search
`jira.projects`	`[]`	Project keys to search (empty = all)
`jira.include_comments`	`true`	Include issue comments in search
`jira.include_custom_fields`	`true`	Include custom fields in search

Creating a Jira API Token:

Log into Jira
Go to Account Settings > Security > API Tokens
Click "Create API Token"
Copy the token and add it to your config

SSL Settings (optional)

Setting	Default	Description
`ssl.ca_bundle_path`	`""`	Path to custom CA bundle file

Storage Settings

Setting	Default	Description
`storage.storage_dir`	`./storage`	Directory for vector index and state
`storage.model_cache_dir`	`./models`	Directory for model cache

CLI Usage

The chunksilo command provides indexing, searching, and model management:

# Build or update the search index
chunksilo --build-index

# Search for documents
chunksilo "your search query"

# Search with date filtering
chunksilo "quarterly report" --date-from 2024-01-01 --date-to 2024-03-31

# Output results as JSON
chunksilo "search query" --json

# Show verbose output (model loading, search stats)
chunksilo "search query" --verbose

# Pre-download ML models (useful before going offline)
chunksilo --download-models

# Use a custom config file
chunksilo --build-index --config /path/to/config.yaml

CLI Options

Option	Description
`query`	Search query text (positional argument)
`--build-index`	Build or update the search index, then exit
`--download-models`	Download required ML models, then exit
`--date-from`	Start date filter (YYYY-MM-DD format, inclusive)
`--date-to`	End date filter (YYYY-MM-DD format, inclusive)
`--json`	Output results as JSON instead of formatted text
`-v, --verbose`	Show diagnostic messages (model loading, search stats)
`--config`	Path to config.yaml (overrides auto-discovery)

MCP Client Configuration

Configure your MCP client to run ChunkSilo. Below are examples for common clients.

Note: For PyPI installs, use chunksilo-mcp directly. For offline bundles, use the full path /path/to/chunksilo/venv/bin/chunksilo-mcp. You can find the PyPI-installed binary location with which chunksilo-mcp.

Claude Code

Add chunksilo as an MCP server using the CLI:

PyPI install:

claude mcp add chunksilo --scope user -- chunksilo-mcp --config ~/.config/chunksilo/config.yaml

Offline bundle:

claude mcp add chunksilo --scope user -- /path/to/chunksilo/venv/bin/chunksilo-mcp --config /path/to/chunksilo/config.yaml

Verify it's connected:

claude mcp list

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

PyPI install:

{
  "mcpServers": {
    "chunksilo": {
      "command": "chunksilo-mcp",
      "args": ["--config", "/path/to/config.yaml"]
    }
  }
}

Offline bundle:

{
  "mcpServers": {
    "chunksilo": {
      "command": "/path/to/chunksilo/venv/bin/chunksilo-mcp",
      "args": ["--config", "/path/to/chunksilo/config.yaml"]
    }
  }
}

Cline (VS Code Extension)

Add to cline_mcp_settings.json (typically in ~/.config/Code/User/globalStorage/saoudrizwan.claude-dev/settings/):

PyPI install:

{
  "mcpServers": {
    "chunksilo": {
      "command": "chunksilo-mcp",
      "args": ["--config", "/path/to/config.yaml"],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Offline bundle:

{
  "mcpServers": {
    "chunksilo": {
      "command": "/path/to/chunksilo/venv/bin/chunksilo-mcp",
      "args": ["--config", "/path/to/chunksilo/config.yaml"],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Roo Code (VS Code Extension)

Add to mcp_settings.json (typically in ~/.config/Code/User/globalStorage/rooveterinaryinc.roo-cline/settings/):

PyPI install:

{
  "mcpServers": {
    "chunksilo": {
      "command": "chunksilo-mcp",
      "args": ["--config", "/path/to/config.yaml"]
    }
  }
}

Offline bundle:

{
  "mcpServers": {
    "chunksilo": {
      "command": "/path/to/chunksilo/venv/bin/chunksilo-mcp",
      "args": ["--config", "/path/to/chunksilo/config.yaml"]
    }
  }
}

Troubleshooting

Index missing: Run chunksilo --build-index (PyPI install) or ./venv/bin/chunksilo --build-index (offline bundle).
Retrieval errors: Check paths in your MCP client configuration.
Offline mode: PyPI installs default to offline: false (models auto-download). The offline bundle includes pre-downloaded models and sets offline: true. Set retrieval.offline: true in config.yaml to prevent network calls after initial model download.
Confluence Integration: Install with pip install chunksilo[confluence], then set confluence.url, confluence.username, and confluence.api_token in config.yaml.
Jira Integration: Install with pip install chunksilo[jira], then set jira.url, jira.username, and jira.api_token in config.yaml. Optionally configure jira.projects to restrict search to specific project keys.
Custom CA Bundle: Set ssl.ca_bundle_path in config.yaml for custom certificates.
Network mounts: Unavailable directories are skipped with a warning; indexing continues with available directories.
Legacy .doc files: Requires LibreOffice to be installed for automatic conversion to .docx. If LibreOffice is not found, .doc files are skipped with a warning. Full heading extraction is supported.

License

Apache-2.0. See LICENSE for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Chetic

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.3.3

Feb 25, 2026

2.3.2

Feb 17, 2026

2.3.1

Feb 13, 2026

2.3.0

Feb 11, 2026

2.2.0

Feb 3, 2026

2.1.3

Feb 3, 2026

2.1.2

Feb 3, 2026

2.1.1

Feb 3, 2026

This version

2.1.0

Feb 3, 2026

2.0.0

Jan 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunksilo-2.1.0.tar.gz (71.5 kB view details)

Uploaded Feb 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

chunksilo-2.1.0-py3-none-any.whl (45.0 kB view details)

Uploaded Feb 3, 2026 Python 3

File details

Details for the file chunksilo-2.1.0.tar.gz.

File metadata

Download URL: chunksilo-2.1.0.tar.gz
Upload date: Feb 3, 2026
Size: 71.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chunksilo-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9bf13d437a061823a3ba162b5d0bb3316b14fad45afb49819ff23edb021e1d96`
MD5	`b875b1c6a30038510af85640210bf5ec`
BLAKE2b-256	`4f87bc34d8731f5065feec67a46a33703517f48807bdc1a40bc47f2b2481de7a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chunksilo-2.1.0.tar.gz:

Publisher: manual-release.yml on Chetic/chunksilo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chunksilo-2.1.0.tar.gz
- Subject digest: 9bf13d437a061823a3ba162b5d0bb3316b14fad45afb49819ff23edb021e1d96
- Sigstore transparency entry: 908131319
- Sigstore integration time: Feb 3, 2026
Source repository:
- Permalink: Chetic/chunksilo@4588984cc7dfb000e774b7080f935d69f83b4f70
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Chetic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: manual-release.yml@4588984cc7dfb000e774b7080f935d69f83b4f70
- Trigger Event: workflow_dispatch

File details

Details for the file chunksilo-2.1.0-py3-none-any.whl.

File metadata

Download URL: chunksilo-2.1.0-py3-none-any.whl
Upload date: Feb 3, 2026
Size: 45.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chunksilo-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d070d378e9d9cf13069c04ec1ed10384f0eac7df148056dc66d4309f3a764cc4`
MD5	`055aee6fa767d4ce2296423d53777665`
BLAKE2b-256	`438c3dd25b370318af39f8ea24cf75f36741b370fa4d18a70c5d1b253dd0774a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for chunksilo-2.1.0-py3-none-any.whl:

Publisher: manual-release.yml on Chetic/chunksilo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: chunksilo-2.1.0-py3-none-any.whl
- Subject digest: d070d378e9d9cf13069c04ec1ed10384f0eac7df148056dc66d4309f3a764cc4
- Sigstore transparency entry: 908131344
- Sigstore integration time: Feb 3, 2026
Source repository:
- Permalink: Chetic/chunksilo@4588984cc7dfb000e774b7080f935d69f83b4f70
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Chetic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: manual-release.yml@4588984cc7dfb000e774b7080f935d69f83b4f70
- Trigger Event: workflow_dispatch

chunksilo 2.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ChunkSilo MCP Server

Example search_docs output

Installation

Option A: Install from PyPI (Recommended)

Option B: Offline Bundle

Configuration

Configuration File

Configuration Reference

Indexing Settings

Retrieval Settings

Confluence Settings (optional)

Jira Settings (optional)

SSL Settings (optional)

Storage Settings

CLI Usage

CLI Options

MCP Client Configuration

Claude Code

Claude Desktop

Cline (VS Code Extension)

Roo Code (VS Code Extension)

Troubleshooting

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Example `search_docs` output