Zero-config MCP server for searchable documentation (SQLite default, PostgreSQL optional)
Project description
Gnosis MCP
Give your AI agent a searchable knowledge base. Zero config.
Quick Start · Web Crawl · Backends · Editor Setup · Tools & Resources · Configuration · Full Reference
Ingest docs → Search with highlights → Stats overview → Serve to AI agents
Gnosis MCP turns a folder of docs into a searchable knowledge base for AI agents. Run ingest, run serve, and any MCP-compatible editor can query your documentation. SQLite by default (zero config). PostgreSQL + pgvector when you need scale.
Why use this
Less hallucination, lower token costs. A search returns ranked excerpts (~600 tokens) instead of feeding entire files into context (3,000-8,000+ tokens each). Architecture decisions, API contracts, billing rules — one tool call away instead of guesswork.
Docs that stay current. Add a new markdown file, run ingest, it's searchable immediately. Or use --watch to auto-re-ingest on file changes. No routing tables to maintain, no hardcoded paths to update.
Works with what you have. Ingests .md, .txt, .ipynb, .toml, .csv, .json (stdlib only), plus optional .rst and .pdf. Non-markdown formats are auto-converted for chunking.
Quick Start
pip install gnosis-mcp
gnosis-mcp ingest ./docs/ # loads docs, auto-creates SQLite database
gnosis-mcp serve # starts MCP server
That's it. Your AI agent can now search your docs.
Want semantic search? Add local ONNX embeddings (no API key needed, ~23MB model):
pip install gnosis-mcp[embeddings]
gnosis-mcp ingest ./docs/ --embed # ingest + embed in one step
gnosis-mcp serve # hybrid keyword+semantic search auto-activated
Test it before connecting to an editor:
gnosis-mcp search "getting started" # keyword search
gnosis-mcp search "how does auth work" --embed # hybrid semantic+keyword
gnosis-mcp stats # see what was indexed
Try without installing (uvx)
uvx gnosis-mcp ingest ./docs/
uvx gnosis-mcp serve
Crawl Documentation Sites
Dry-run discovery → Crawl & ingest → Search crawled docs → SSRF protection
Ingest docs from any website — no files needed:
pip install gnosis-mcp[web]
# Crawl via sitemap (best for large doc sites)
gnosis-mcp crawl https://docs.stripe.com/ --sitemap
# Depth-limited link crawl with URL filter
gnosis-mcp crawl https://fastapi.tiangolo.com/ --depth 2 --include "/tutorial/*"
# Preview what would be crawled
gnosis-mcp crawl https://docs.python.org/ --dry-run
# Force re-crawl + embed for semantic search
gnosis-mcp crawl https://docs.sveltekit.dev/ --sitemap --force --embed
Respects robots.txt, caches with ETag/Last-Modified for incremental re-crawl, and rate-limits requests (5 concurrent, 0.2s delay). Crawled pages are stored with the URL as the document path and hostname as the category — searchable like any other doc.
Editor Integrations
Add the server config to your editor and your AI agent gets search_docs, get_doc, and get_related tools automatically.
{
"mcpServers": {
"docs": {
"command": "gnosis-mcp",
"args": ["serve"]
}
}
}
| Editor | Config file |
|---|---|
| Claude Code | .claude/mcp.json (or install as plugin) |
| Cursor | .cursor/mcp.json |
| Windsurf | ~/.codeium/windsurf/mcp_config.json |
| JetBrains | Settings > Tools > AI Assistant > MCP Servers |
| Cline | Cline MCP settings panel |
VS Code (GitHub Copilot) — slightly different key
Add to .vscode/mcp.json (note: "servers" not "mcpServers"):
{
"servers": {
"docs": {
"command": "gnosis-mcp",
"args": ["serve"]
}
}
}
Also discoverable via the VS Code MCP gallery — search @mcp gnosis in the Extensions view.
For remote deployment, use Streamable HTTP:
gnosis-mcp serve --transport streamable-http --host 0.0.0.0 --port 8000
Choose Your Backend
| SQLite (default) | SQLite + embeddings | PostgreSQL | |
|---|---|---|---|
| Install | pip install gnosis-mcp |
pip install gnosis-mcp[embeddings] |
pip install gnosis-mcp[postgres] |
| Config | Nothing | Nothing | Set GNOSIS_MCP_DATABASE_URL |
| Search | FTS5 keyword (BM25) | Hybrid keyword + semantic (RRF) | tsvector + pgvector hybrid |
| Embeddings | None | Local ONNX (23MB, no API key) | Any provider + HNSW index |
| Multi-table | No | No | Yes (UNION ALL) |
| Best for | Quick start, keyword-only | Semantic search without a server | Production, large doc sets |
Auto-detection: Set GNOSIS_MCP_DATABASE_URL to postgresql://... and it uses PostgreSQL. Don't set it and it uses SQLite. Override with GNOSIS_MCP_BACKEND=sqlite|postgres.
PostgreSQL setup
pip install gnosis-mcp[postgres]
export GNOSIS_MCP_DATABASE_URL="postgresql://user:pass@localhost:5432/mydb"
gnosis-mcp init-db # create tables + indexes
gnosis-mcp ingest ./docs/ # load your markdown
gnosis-mcp serve
For hybrid semantic+keyword search, also enable pgvector:
CREATE EXTENSION IF NOT EXISTS vector;
Then backfill embeddings:
gnosis-mcp embed # via OpenAI (default)
gnosis-mcp embed --provider ollama # or use local Ollama
Claude Code Plugin
For Claude Code users, install as a plugin to get the MCP server plus slash commands:
claude plugin marketplace add nicholasglazer/gnosis-mcp
claude plugin install gnosis
This gives you:
| Component | What you get |
|---|---|
| MCP server | gnosis-mcp serve — auto-configured |
/gnosis:search |
Search docs with keyword or --semantic hybrid mode |
/gnosis:status |
Health check — connectivity, doc stats, troubleshooting |
/gnosis:manage |
CRUD — add, delete, update metadata, bulk embed |
The plugin works with both SQLite and PostgreSQL backends.
Manual setup (without plugin)
Add to .claude/mcp.json:
{
"mcpServers": {
"gnosis": {
"command": "gnosis-mcp",
"args": ["serve"]
}
}
}
For PostgreSQL, add "env": {"GNOSIS_MCP_DATABASE_URL": "postgresql://..."}.
What It Does
Gnosis MCP exposes 6 tools and 3 resources over MCP. Your AI agent calls these automatically when it needs information from your docs.
Tools
| Tool | What it does | Mode |
|---|---|---|
search_docs |
Search by keyword or hybrid semantic+keyword | Read |
get_doc |
Retrieve a full document by path | Read |
get_related |
Find linked/related documents | Read |
upsert_doc |
Create or replace a document | Write |
delete_doc |
Remove a document and its chunks | Write |
update_metadata |
Change title, category, tags | Write |
Read tools are always available. Write tools require GNOSIS_MCP_WRITABLE=true.
Resources
| URI | Returns |
|---|---|
gnosis://docs |
All documents — path, title, category, chunk count |
gnosis://docs/{path} |
Full document content |
gnosis://categories |
Categories with document counts |
How search works
# Keyword search — works on both SQLite and PostgreSQL
gnosis-mcp search "stripe webhook"
# Hybrid search — keyword + semantic similarity (requires [embeddings] or pgvector)
gnosis-mcp search "how does billing work" --embed
# Filtered — narrow results to a specific category
gnosis-mcp search "auth" -c guides
When called via MCP, the agent passes a query string for keyword search. With embeddings configured, search automatically combines keyword and semantic results (hybrid mode). Works on both SQLite (via sqlite-vec) and PostgreSQL (via pgvector).
Search results include a highlight field with matched terms wrapped in <mark> tags for context-aware snippets (FTS5 snippet() on SQLite, ts_headline() on PostgreSQL).
Embeddings
Embeddings enable semantic search — finding docs by meaning, not just keywords.
1. Local ONNX (recommended for SQLite) — zero-config, no API key needed:
pip install gnosis-mcp[embeddings]
gnosis-mcp ingest ./docs/ --embed # ingest + embed in one step
gnosis-mcp embed # or embed existing chunks separately
Uses MongoDB/mdbr-leaf-ir (~23MB quantized, Apache 2.0). Auto-downloads on first run. Customize with GNOSIS_MCP_EMBED_MODEL.
2. Remote providers — OpenAI, Ollama, or any OpenAI-compatible endpoint:
gnosis-mcp embed --provider openai # requires GNOSIS_MCP_EMBED_API_KEY
gnosis-mcp embed --provider ollama # uses local Ollama server
3. Pre-computed vectors — pass embeddings to upsert_doc or query_embedding to search_docs from your own pipeline.
Hybrid search — when embeddings are available, search automatically combines keyword (BM25) and semantic (cosine) results using Reciprocal Rank Fusion (RRF). Works on both SQLite (via sqlite-vec) and PostgreSQL (via pgvector).
Configuration
All settings via environment variables. Nothing required for SQLite — it works with zero config.
| Variable | Default | Description |
|---|---|---|
GNOSIS_MCP_DATABASE_URL |
SQLite auto | PostgreSQL URL or SQLite file path |
GNOSIS_MCP_BACKEND |
auto |
Force sqlite or postgres |
GNOSIS_MCP_WRITABLE |
false |
Enable write tools (upsert_doc, delete_doc, update_metadata) |
GNOSIS_MCP_TRANSPORT |
stdio |
Server transport: stdio, sse, or streamable-http |
GNOSIS_MCP_SCHEMA |
public |
Database schema (PostgreSQL only) |
GNOSIS_MCP_CHUNKS_TABLE |
documentation_chunks |
Table name for chunks |
GNOSIS_MCP_SEARCH_FUNCTION |
— | Custom search function (PostgreSQL only) |
GNOSIS_MCP_EMBEDDING_DIM |
1536 |
Vector dimension for init-db |
All variables
Search & chunking: GNOSIS_MCP_CONTENT_PREVIEW_CHARS (200), GNOSIS_MCP_CHUNK_SIZE (4000), GNOSIS_MCP_SEARCH_LIMIT_MAX (20).
Connection pool (PostgreSQL): GNOSIS_MCP_POOL_MIN (1), GNOSIS_MCP_POOL_MAX (3).
Webhooks: GNOSIS_MCP_WEBHOOK_URL, GNOSIS_MCP_WEBHOOK_TIMEOUT (5s). Set a URL to receive POST notifications when documents are created, updated, or deleted.
Embeddings: GNOSIS_MCP_EMBED_PROVIDER (openai/ollama/custom/local), GNOSIS_MCP_EMBED_MODEL (text-embedding-3-small for remote, MongoDB/mdbr-leaf-ir for local), GNOSIS_MCP_EMBED_DIM (384, Matryoshka truncation dimension for local provider), GNOSIS_MCP_EMBED_API_KEY, GNOSIS_MCP_EMBED_URL (custom endpoint), GNOSIS_MCP_EMBED_BATCH_SIZE (50).
Column overrides (for connecting to existing tables with non-standard column names): GNOSIS_MCP_COL_FILE_PATH, GNOSIS_MCP_COL_TITLE, GNOSIS_MCP_COL_CONTENT, GNOSIS_MCP_COL_CHUNK_INDEX, GNOSIS_MCP_COL_CATEGORY, GNOSIS_MCP_COL_AUDIENCE, GNOSIS_MCP_COL_TAGS, GNOSIS_MCP_COL_EMBEDDING, GNOSIS_MCP_COL_TSV, GNOSIS_MCP_COL_SOURCE_PATH, GNOSIS_MCP_COL_TARGET_PATH, GNOSIS_MCP_COL_RELATION_TYPE.
Links table: GNOSIS_MCP_LINKS_TABLE (documentation_links).
Logging: GNOSIS_MCP_LOG_LEVEL (INFO).
Custom search function (PostgreSQL)
Delegate search to your own PostgreSQL function for custom ranking:
CREATE FUNCTION my_schema.my_search(
p_query_text text,
p_categories text[],
p_limit integer
) RETURNS TABLE (
file_path text, title text, content text,
category text, combined_score double precision
) ...
GNOSIS_MCP_SEARCH_FUNCTION=my_schema.my_search
Multi-table mode (PostgreSQL)
Query across multiple doc tables:
GNOSIS_MCP_CHUNKS_TABLE=documentation_chunks,api_docs,tutorial_chunks
All tables must share the same schema. Reads use UNION ALL. Writes target the first table.
CLI Reference
gnosis-mcp ingest <path> [--dry-run] [--force] [--embed] Load files (--force to re-ingest unchanged)
gnosis-mcp crawl <url> [--sitemap] [--depth N] [--include] [--exclude] [--dry-run] [--force] [--embed]
gnosis-mcp serve [--transport stdio|sse|streamable-http] [--ingest PATH] [--watch PATH]
gnosis-mcp search <query> [-n LIMIT] [-c CAT] [--embed] Search (--embed for hybrid semantic+keyword)
gnosis-mcp stats Show document, chunk, and embedding counts
gnosis-mcp check Verify database connection + sqlite-vec status
gnosis-mcp embed [--provider P] [--model M] [--dry-run] Backfill embeddings (auto-detects local provider)
gnosis-mcp init-db [--dry-run] Create tables + indexes manually
gnosis-mcp export [-f json|markdown|csv] [-c CAT] Export documents
gnosis-mcp diff <path> Show what would change on re-ingest
How ingestion works
gnosis-mcp ingest scans a directory for supported files (.md, .txt, .ipynb, .toml, .csv, .json) and loads them into the database:
- Multi-format — Markdown native;
.txt,.ipynb,.toml,.csv,.jsonauto-converted (stdlib only). Optional:.rst(pip install gnosis-mcp[rst]),.pdf(pip install gnosis-mcp[pdf]) - Smart chunking — splits by H2 headings (H3/H4 for oversized sections), never splits inside fenced code blocks or tables
- Frontmatter support — extracts
title,category,audience,tagsfrom YAML frontmatter - Auto-linking —
relates_toin frontmatter creates bidirectional links (queryable viaget_related) - Auto-categorization — infers category from the parent directory name
- Incremental updates — content hashing skips unchanged files on re-run (
--forceto override) - Watch mode —
gnosis-mcp serve --watch ./docs/auto-re-ingests on file changes - Dry run — preview what would be indexed with
--dry-run
Available on
Gnosis MCP is listed on the Official MCP Registry (which feeds the VS Code MCP gallery and GitHub Copilot), PyPI, and major MCP directories including mcp.so, Glama, and cursor.directory.
Architecture
src/gnosis_mcp/
├── backend.py DocBackend protocol + create_backend() factory
├── pg_backend.py PostgreSQL — asyncpg, tsvector, pgvector
├── sqlite_backend.py SQLite — aiosqlite, FTS5, sqlite-vec hybrid search (RRF)
├── sqlite_schema.py SQLite DDL — tables, FTS5, triggers, vec0 virtual table
├── config.py Config from env vars, backend auto-detection
├── db.py Backend lifecycle + FastMCP lifespan
├── server.py FastMCP server — 6 tools, 3 resources, auto-embed queries
├── ingest.py File scanner + converters — multi-format, smart chunking (H2/H3/H4)
├── crawl.py Web crawler — sitemap/BFS discovery, robots.txt, ETag caching, trafilatura
├── watch.py File watcher — mtime polling, auto-re-ingest on changes
├── schema.py PostgreSQL DDL — tables, indexes, search functions
├── embed.py Embedding providers — OpenAI, Ollama, custom, local ONNX
├── local_embed.py Local ONNX embedding engine — HuggingFace model download
└── cli.py CLI — serve, ingest, search, embed, stats, check
AI-Friendly Docs
These files are optimized for AI agents to consume:
| File | Purpose |
|---|---|
llms.txt |
Quick overview — what it does, tools, config |
llms-full.txt |
Complete reference in one file |
llms-install.md |
Step-by-step installation guide |
Development
git clone https://github.com/nicholasglazer/gnosis-mcp.git
cd gnosis-mcp
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest # 470+ tests, no database needed
ruff check src/ tests/
All tests run without a database. Keep it that way.
Good first contributions: new embedding providers, export formats, ingestion for RST/HTML/PDF (via optional extras). Open an issue first for larger changes.
Sponsors
If Gnosis MCP saves you time, consider sponsoring the project.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gnosis_mcp-0.8.3.tar.gz.
File metadata
- Download URL: gnosis_mcp-0.8.3.tar.gz
- Upload date:
- Size: 747.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebb289c64f08884836454939c7019633e0572db2dfbb582172de6499810c9b19
|
|
| MD5 |
5016d1673710d1754b080e8d389ef2e7
|
|
| BLAKE2b-256 |
adf357d25cde7f95a04d6ba994a0209956feb2b8932e83f8c15a43fbc3a45977
|
Provenance
The following attestation bundles were made for gnosis_mcp-0.8.3.tar.gz:
Publisher:
publish.yml on nicholasglazer/gnosis-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gnosis_mcp-0.8.3.tar.gz -
Subject digest:
ebb289c64f08884836454939c7019633e0572db2dfbb582172de6499810c9b19 - Sigstore transparency entry: 976473319
- Sigstore integration time:
-
Permalink:
nicholasglazer/gnosis-mcp@76a930c27f88bb09d641cabded958d233ce931d9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/nicholasglazer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@76a930c27f88bb09d641cabded958d233ce931d9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file gnosis_mcp-0.8.3-py3-none-any.whl.
File metadata
- Download URL: gnosis_mcp-0.8.3-py3-none-any.whl
- Upload date:
- Size: 60.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cda7bb1de9ad56aaf3777cd3d3eded3bec1c6d49e59856cb1a162682bad212d
|
|
| MD5 |
f06c09d7fab57eb29914eb0c5005bf68
|
|
| BLAKE2b-256 |
b606ed7c4b3c8c888a9885c322cb373eda36ad132b58a459ca09c850bda322ad
|
Provenance
The following attestation bundles were made for gnosis_mcp-0.8.3-py3-none-any.whl:
Publisher:
publish.yml on nicholasglazer/gnosis-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gnosis_mcp-0.8.3-py3-none-any.whl -
Subject digest:
9cda7bb1de9ad56aaf3777cd3d3eded3bec1c6d49e59856cb1a162682bad212d - Sigstore transparency entry: 976473324
- Sigstore integration time:
-
Permalink:
nicholasglazer/gnosis-mcp@76a930c27f88bb09d641cabded958d233ce931d9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/nicholasglazer
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@76a930c27f88bb09d641cabded958d233ce931d9 -
Trigger Event:
push
-
Statement type: