Skip to main content

Zero-config MCP server for searchable documentation (SQLite default, PostgreSQL optional)

Project description

Gnosis MCP

Give your AI agent a searchable knowledge base. Zero config.

PyPI Python MIT License CI MCP Registry

Quick Start · Web Crawl · Backends · Editor Setup · Tools & Resources · Configuration · Full Reference

Gnosis MCP — ingest docs, search, view stats, serve
Ingest docs → Search with highlights → Stats overview → Serve to AI agents


AI coding agents can read your source code but not your documentation. They guess at architecture, miss established patterns, and hallucinate details they could have looked up.

Gnosis MCP fixes this. Point it at a folder of docs and it creates a searchable knowledge base that any MCP-compatible AI agent can query — Claude Code, Cursor, Windsurf, Cline, and any tool that supports the Model Context Protocol.

No database server. SQLite works out of the box with keyword search, or add [embeddings] for local semantic search. Scale to PostgreSQL + pgvector when needed.

Why use this

Less hallucination. Agents search your docs before guessing. Architecture decisions, API contracts, billing rules — one tool call away instead of made up.

Lower token costs. A search returns ~600 tokens of ranked results. Reading the same docs as files costs 3,000-8,000+ tokens. On a 170-doc knowledge base (~840K tokens), that's the difference between a precise answer and a blown context window.

Docs that stay current. Add a new markdown file, run ingest, it's searchable immediately. Or use --watch to auto-re-ingest on file changes. No routing tables to maintain, no hardcoded paths to update.

Works with what you have. Gnosis MCP ingests .md, .txt, .ipynb, .toml, .csv, and .json files. Non-markdown formats are auto-converted for chunking — zero extra dependencies.

Crawl the web. Ingest documentation from any website — Docusaurus, MkDocs, ReadTheDocs, GitBook, VitePress. Sitemap discovery, robots.txt compliance, incremental caching. The self-hosted alternative to cloud-only doc ingestion.

Quick Start

pip install gnosis-mcp
gnosis-mcp ingest ./docs/       # loads docs, auto-creates SQLite database
gnosis-mcp serve                # starts MCP server

That's it. Your AI agent can now search your docs.

Want semantic search? Add local ONNX embeddings (no API key needed, ~23MB model):

pip install gnosis-mcp[embeddings]
gnosis-mcp ingest ./docs/ --embed   # ingest + embed in one step
gnosis-mcp serve                    # hybrid keyword+semantic search auto-activated

Test it before connecting to an editor:

gnosis-mcp search "getting started"           # keyword search
gnosis-mcp search "how does auth work" --embed # hybrid semantic+keyword
gnosis-mcp stats                               # see what was indexed
Try without installing (uvx)
uvx gnosis-mcp ingest ./docs/
uvx gnosis-mcp serve

Crawl Documentation Sites

Gnosis MCP — crawl docs with dry-run, fetch, search, SSRF protection
Dry-run discovery → Crawl & ingest → Search crawled docs → SSRF protection

Ingest docs from any website — no files needed:

pip install gnosis-mcp[web]

# Crawl via sitemap (best for large doc sites)
gnosis-mcp crawl https://docs.stripe.com/ --sitemap

# Depth-limited link crawl with URL filter
gnosis-mcp crawl https://fastapi.tiangolo.com/ --depth 2 --include "/tutorial/*"

# Preview what would be crawled
gnosis-mcp crawl https://docs.python.org/ --dry-run

# Force re-crawl + embed for semantic search
gnosis-mcp crawl https://docs.sveltekit.dev/ --sitemap --force --embed

Respects robots.txt, caches with ETag/Last-Modified for incremental re-crawl, and rate-limits requests (5 concurrent, 0.2s delay). Crawled pages are stored with the URL as the document path and hostname as the category — searchable like any other doc.

Editor Integrations

Gnosis MCP works with any MCP-compatible editor. Add the server config, and your AI agent gets search_docs, get_doc, and get_related tools automatically.

Claude Code

Add to .claude/mcp.json:

{
  "mcpServers": {
    "docs": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

Or install as a Claude Code plugin for a richer experience with slash commands.

Cursor

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "docs": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

Windsurf

Add to ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "docs": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

VS Code (GitHub Copilot)

Add to .vscode/mcp.json in your workspace:

{
  "servers": {
    "docs": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

Also discoverable via the VS Code MCP gallery — search @mcp gnosis in the Extensions view.

Enterprise: Your org admin needs the "MCP servers in Copilot" policy enabled. Free/Pro/Pro+ plans work without this.

JetBrains (IntelliJ, PyCharm, WebStorm)

Go to Settings > Tools > AI Assistant > MCP Servers, click +, and add:

  • Name: docs
  • Command: gnosis-mcp
  • Arguments: serve

Cline

Open Cline MCP settings panel and add the same server config.

Other MCP clients

Any tool that supports the Model Context Protocol works — including Zed, Neovim (via plugins), and custom agents. The server communicates over stdio by default, or Streamable HTTP for remote deployment:

gnosis-mcp serve --transport streamable-http --host 0.0.0.0 --port 8000
# Remote clients connect to http://your-server:8000/mcp

Choose Your Backend

SQLite (default) SQLite + embeddings PostgreSQL
Install pip install gnosis-mcp pip install gnosis-mcp[embeddings] pip install gnosis-mcp[postgres]
Config Nothing Nothing Set DATABASE_URL
Search FTS5 keyword (BM25) Hybrid keyword + semantic (RRF) tsvector + pgvector hybrid
Embeddings None Local ONNX (23MB, no API key) Any provider + HNSW index
Multi-table No No Yes (UNION ALL)
Best for Quick start, keyword-only Semantic search without a server Production, large doc sets

Auto-detection: Set DATABASE_URL to postgresql://... and it uses PostgreSQL. Don't set it and it uses SQLite. Override with GNOSIS_MCP_BACKEND=sqlite|postgres.

PostgreSQL setup
pip install gnosis-mcp[postgres]
export GNOSIS_MCP_DATABASE_URL="postgresql://user:pass@localhost:5432/mydb"
gnosis-mcp init-db              # create tables + indexes
gnosis-mcp ingest ./docs/       # load your markdown
gnosis-mcp serve

For hybrid semantic+keyword search, also enable pgvector:

CREATE EXTENSION IF NOT EXISTS vector;

Then backfill embeddings:

gnosis-mcp embed                        # via OpenAI (default)
gnosis-mcp embed --provider ollama      # or use local Ollama

Claude Code Plugin

For Claude Code users, install as a plugin to get the MCP server plus slash commands:

claude plugin marketplace add nicholasglazer/gnosis-mcp
claude plugin install gnosis

This gives you:

Component What you get
MCP server gnosis-mcp serve — auto-configured
/gnosis:search Search docs with keyword or --semantic hybrid mode
/gnosis:status Health check — connectivity, doc stats, troubleshooting
/gnosis:manage CRUD — add, delete, update metadata, bulk embed

The plugin works with both SQLite and PostgreSQL backends.

Manual setup (without plugin)

Add to .claude/mcp.json:

{
  "mcpServers": {
    "gnosis": {
      "command": "gnosis-mcp",
      "args": ["serve"]
    }
  }
}

For PostgreSQL, add "env": {"GNOSIS_MCP_DATABASE_URL": "postgresql://..."}.

What It Does

Gnosis MCP exposes 6 tools and 3 resources over MCP. Your AI agent calls these automatically when it needs information from your docs.

Tools

Tool What it does Mode
search_docs Search by keyword or hybrid semantic+keyword Read
get_doc Retrieve a full document by path Read
get_related Find linked/related documents Read
upsert_doc Create or replace a document Write
delete_doc Remove a document and its chunks Write
update_metadata Change title, category, tags Write

Read tools are always available. Write tools require GNOSIS_MCP_WRITABLE=true.

Resources

URI Returns
gnosis://docs All documents — path, title, category, chunk count
gnosis://docs/{path} Full document content
gnosis://categories Categories with document counts

How search works

# Keyword search — works on both SQLite and PostgreSQL
gnosis-mcp search "stripe webhook"

# Hybrid search — keyword + semantic similarity (PostgreSQL + embeddings)
gnosis-mcp search "how does billing work" --embed

# Filtered — narrow results to a specific category
gnosis-mcp search "auth" -c guides

When called via MCP, the agent passes a query string for keyword search. On PostgreSQL with embeddings, it can also pass query_embedding for hybrid mode that combines keyword matching with semantic similarity.

Search results include a highlight field with matched terms wrapped in <mark> tags for context-aware snippets (FTS5 snippet() on SQLite, ts_headline() on PostgreSQL).

Embeddings

Embeddings enable semantic search — finding docs by meaning, not just keywords.

1. Local ONNX (recommended for SQLite) — zero-config, no API key needed:

pip install gnosis-mcp[embeddings]
gnosis-mcp ingest ./docs/ --embed       # ingest + embed in one step
gnosis-mcp embed                        # or embed existing chunks separately

Uses MongoDB/mdbr-leaf-ir (~23MB quantized, Apache 2.0). Auto-downloads on first run. Customize with GNOSIS_MCP_EMBED_MODEL.

2. Remote providers — OpenAI, Ollama, or any OpenAI-compatible endpoint:

gnosis-mcp embed --provider openai      # requires GNOSIS_MCP_EMBED_API_KEY
gnosis-mcp embed --provider ollama      # uses local Ollama server

3. Pre-computed vectors — pass embeddings to upsert_doc or query_embedding to search_docs from your own pipeline.

Hybrid search — when embeddings are available, search automatically combines keyword (BM25) and semantic (cosine) results using Reciprocal Rank Fusion (RRF). Works on both SQLite (via sqlite-vec) and PostgreSQL (via pgvector).

Configuration

All settings via environment variables. Nothing required for SQLite — it works with zero config.

Variable Default Description
GNOSIS_MCP_DATABASE_URL SQLite auto PostgreSQL URL or SQLite file path
GNOSIS_MCP_BACKEND auto Force sqlite or postgres
GNOSIS_MCP_WRITABLE false Enable write tools (upsert_doc, delete_doc, update_metadata)
GNOSIS_MCP_TRANSPORT stdio Server transport: stdio or sse
GNOSIS_MCP_SCHEMA public Database schema (PostgreSQL only)
GNOSIS_MCP_CHUNKS_TABLE documentation_chunks Table name for chunks
GNOSIS_MCP_SEARCH_FUNCTION Custom search function (PostgreSQL only)
GNOSIS_MCP_EMBEDDING_DIM 1536 Vector dimension for init-db
All variables

Search & chunking: GNOSIS_MCP_CONTENT_PREVIEW_CHARS (200), GNOSIS_MCP_CHUNK_SIZE (4000), GNOSIS_MCP_SEARCH_LIMIT_MAX (20).

Connection pool (PostgreSQL): GNOSIS_MCP_POOL_MIN (1), GNOSIS_MCP_POOL_MAX (3).

Webhooks: GNOSIS_MCP_WEBHOOK_URL, GNOSIS_MCP_WEBHOOK_TIMEOUT (5s). Set a URL to receive POST notifications when documents are created, updated, or deleted.

Embeddings: GNOSIS_MCP_EMBED_PROVIDER (openai/ollama/custom/local), GNOSIS_MCP_EMBED_MODEL (text-embedding-3-small for remote, MongoDB/mdbr-leaf-ir for local), GNOSIS_MCP_EMBED_DIM (384, Matryoshka truncation dimension for local provider), GNOSIS_MCP_EMBED_API_KEY, GNOSIS_MCP_EMBED_URL (custom endpoint), GNOSIS_MCP_EMBED_BATCH_SIZE (50).

Column overrides (for connecting to existing tables with non-standard column names): GNOSIS_MCP_COL_FILE_PATH, GNOSIS_MCP_COL_TITLE, GNOSIS_MCP_COL_CONTENT, GNOSIS_MCP_COL_CHUNK_INDEX, GNOSIS_MCP_COL_CATEGORY, GNOSIS_MCP_COL_AUDIENCE, GNOSIS_MCP_COL_TAGS, GNOSIS_MCP_COL_EMBEDDING, GNOSIS_MCP_COL_TSV, GNOSIS_MCP_COL_SOURCE_PATH, GNOSIS_MCP_COL_TARGET_PATH, GNOSIS_MCP_COL_RELATION_TYPE.

Links table: GNOSIS_MCP_LINKS_TABLE (documentation_links).

Logging: GNOSIS_MCP_LOG_LEVEL (INFO).

Custom search function (PostgreSQL)

Delegate search to your own PostgreSQL function for custom ranking:

CREATE FUNCTION my_schema.my_search(
    p_query_text text,
    p_categories text[],
    p_limit integer
) RETURNS TABLE (
    file_path text, title text, content text,
    category text, combined_score double precision
) ...
GNOSIS_MCP_SEARCH_FUNCTION=my_schema.my_search
Multi-table mode (PostgreSQL)

Query across multiple doc tables:

GNOSIS_MCP_CHUNKS_TABLE=documentation_chunks,api_docs,tutorial_chunks

All tables must share the same schema. Reads use UNION ALL. Writes target the first table.

CLI Reference

gnosis-mcp ingest <path> [--dry-run] [--force] [--embed]    Load files (--force to re-ingest unchanged)
gnosis-mcp crawl <url> [--sitemap] [--depth N] [--include] [--exclude] [--dry-run] [--force] [--embed]
gnosis-mcp serve [--transport stdio|sse] [--ingest PATH] [--watch PATH]   Start MCP server (--watch for live reload)
gnosis-mcp search <query> [-n LIMIT] [-c CAT] [--embed]    Search (--embed for hybrid semantic+keyword)
gnosis-mcp stats                                           Show document, chunk, and embedding counts
gnosis-mcp check                                           Verify database connection + sqlite-vec status
gnosis-mcp embed [--provider P] [--model M] [--dry-run]    Backfill embeddings (auto-detects local provider)
gnosis-mcp init-db [--dry-run]                             Create tables + indexes manually
gnosis-mcp export [-f json|markdown|csv] [-c CAT]          Export documents
gnosis-mcp diff <path>                                     Show what would change on re-ingest

How ingestion works

gnosis-mcp ingest scans a directory for supported files (.md, .txt, .ipynb, .toml, .csv, .json) and loads them into the database:

  • Multi-format — Markdown native; .txt, .ipynb, .toml, .csv, .json auto-converted (stdlib only). Optional: .rst (pip install gnosis-mcp[rst]), .pdf (pip install gnosis-mcp[pdf])
  • Smart chunking — splits by H2 headings (H3/H4 for oversized sections), never splits inside fenced code blocks or tables
  • Frontmatter support — extracts title, category, audience, tags from YAML frontmatter
  • Auto-linkingrelates_to in frontmatter creates bidirectional links (queryable via get_related)
  • Auto-categorization — infers category from the parent directory name
  • Incremental updates — content hashing skips unchanged files on re-run (--force to override)
  • Watch modegnosis-mcp serve --watch ./docs/ auto-re-ingests on file changes
  • Web crawlinggnosis-mcp crawl <url> fetches and ingests documentation from any website
  • Dry run — preview what would be indexed with --dry-run

Available on

Gnosis MCP is listed on the Official MCP Registry (which feeds the VS Code MCP gallery and GitHub Copilot), PyPI, and major MCP directories including mcp.so, Glama, and cursor.directory.

Architecture

src/gnosis_mcp/
├── backend.py         DocBackend protocol + create_backend() factory
├── pg_backend.py      PostgreSQL — asyncpg, tsvector, pgvector
├── sqlite_backend.py  SQLite — aiosqlite, FTS5, sqlite-vec hybrid search (RRF)
├── sqlite_schema.py   SQLite DDL — tables, FTS5, triggers, vec0 virtual table
├── config.py          Config from env vars, backend auto-detection
├── db.py              Backend lifecycle + FastMCP lifespan
├── server.py          FastMCP server — 6 tools, 3 resources, auto-embed queries
├── ingest.py          File scanner + converters — multi-format, smart chunking (H2/H3/H4)
├── crawl.py           Web crawler — sitemap/BFS discovery, robots.txt, ETag caching, trafilatura
├── watch.py           File watcher — mtime polling, auto-re-ingest on changes
├── schema.py          PostgreSQL DDL — tables, indexes, search functions
├── embed.py           Embedding providers — OpenAI, Ollama, custom, local ONNX
├── local_embed.py     Local ONNX embedding engine — HuggingFace model download
└── cli.py             CLI — serve, ingest, search, embed, stats, check

AI-Friendly Docs

These files are optimized for AI agents to consume:

File Purpose
llms.txt Quick overview — what it does, tools, config
llms-full.txt Complete reference in one file
llms-install.md Step-by-step installation guide

Development

git clone https://github.com/nicholasglazer/gnosis-mcp.git
cd gnosis-mcp
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest                    # 470+ tests, no database needed
ruff check src/ tests/

All tests run without a database. Keep it that way.

Good first contributions: new embedding providers, export formats, ingestion for RST/HTML/PDF (via optional extras). Open an issue first for larger changes.

Sponsors

If Gnosis MCP saves you time, consider sponsoring the project.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gnosis_mcp-0.8.2.tar.gz (610.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gnosis_mcp-0.8.2-py3-none-any.whl (60.5 kB view details)

Uploaded Python 3

File details

Details for the file gnosis_mcp-0.8.2.tar.gz.

File metadata

  • Download URL: gnosis_mcp-0.8.2.tar.gz
  • Upload date:
  • Size: 610.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gnosis_mcp-0.8.2.tar.gz
Algorithm Hash digest
SHA256 cbb0c3cdb3c0072dcf63886b4561b8394f891fa0b73c80b5478a53d43d4d9cd4
MD5 7bc2d01d20c70dcd0851401fb51c5e9d
BLAKE2b-256 1ce3c3d46d90c8accf72097da30ac8ffe617171adbfae53debccd790c7566778

See more details on using hashes here.

Provenance

The following attestation bundles were made for gnosis_mcp-0.8.2.tar.gz:

Publisher: publish.yml on nicholasglazer/gnosis-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gnosis_mcp-0.8.2-py3-none-any.whl.

File metadata

  • Download URL: gnosis_mcp-0.8.2-py3-none-any.whl
  • Upload date:
  • Size: 60.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gnosis_mcp-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c85a40cc0a6cbdae8678279e41007ddc7cc2d2c64111fb1b7796c7cd1a13a99e
MD5 68c1455d26f4d9f8084e79fd3ab252dc
BLAKE2b-256 840c2a77346635f6e76ef18edd4456b8d555ee4ce8169ed1cd5f89d61f19e8ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for gnosis_mcp-0.8.2-py3-none-any.whl:

Publisher: publish.yml on nicholasglazer/gnosis-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page