Skip to main content

Live documentation access layer for AI agents — no hallucination, no stale docs

Project description

  ██████╗  ██████╗  ██████╗██╗   ██╗██╗      █████╗ ██╗   ██╗███████╗██████╗
  ██╔══██╗██╔═══██╗██╔════╝██║   ██║██║     ██╔══██╗╚██╗ ██╔╝██╔════╝██╔══██╗
  ██║  ██║██║   ██║██║     ██║   ██║██║     ███████║ ╚████╔╝ █████╗  ██████╔╝
  ██║  ██║██║   ██║██║     ██║   ██║██║     ██╔══██║  ╚██╔╝  ██╔══╝  ██╔══██╗
  ██████╔╝╚██████╔╝╚██████╗╚██████╔╝███████╗██║  ██║   ██║   ███████╗██║  ██║
  ╚═════╝  ╚═════╝  ╚═════╝ ╚═════╝ ╚══════╝╚═╝  ╚═╝   ╚═╝   ╚══════╝╚═╝  ╚═╝

The live documentation layer for AI agents

PyPI version Python License MCP Compatible Downloads

Claude Code Cursor VS Code Windsurf Zed


AI agents hallucinate APIs. Not because they're broken — because their training data is stale.
A function signature that changed six months ago, a parameter that was renamed, a new method that didn't exist at training time — the model confidently fabricates the old behavior.

DocuLayer fixes this. It sits between your agent and the real documentation, fetching live content on demand, returning verbatim text with full source attribution, and never generating a single word.

 Your agent  (Claude, Cursor, Codex, any MCP client…)
     │   "what parameters does httpx.AsyncClient.get() take?"
     ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  DocuLayer                                                   │
 │  ──────────────────────────────────────────────────────────  │
 │  resolve_identifier  →  shortcut table / PyPI / npm         │
 │                                                              │
 │  discover_llms_txt   →  targeted page index (when present)  │
 │       keyword score entries  →  fetch only relevant pages   │
 │                                                              │
 │  DocParser  (HTML → Markdown, heading split)                │
 │  DocSearcher  (BM25 — no ML, no embeddings, no network)     │
 │                                                              │
 │  TTLCache  (in-memory only — zero disk writes)              │
 └──────────────────────────────────────────────────────────────┘
     │   verbatim section + source URL + "fetched 3s ago"
     ▼
 LLM  —  reads real docs, answers correctly

⚡ Quick Install — 30 seconds

Step 1 — Install the package

pip install doculayer==0.1.1

Step 2 — Wire up your IDE (auto-detected, zero prompts)

doculayer setup

That's it. The wizard detects your IDE and writes the MCP config automatically — no editor, no JSON, no manual steps.

Restart your IDE after setup and the four DocuLayer tools appear immediately.


🖥️ IDE-Specific Setup

The doculayer setup command handles all of these automatically.
Manual one-liners are listed below for reference.

Claude Code

pip install doculayer==0.1.1 && claude mcp add doculayer -- doculayer mcp

Restart Claude Code. Tools are live.

Cursor

pip install doculayer==0.1.1 && doculayer setup --ide cursor

Or add manually to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "doculayer": {
      "command": "doculayer",
      "args": ["mcp"]
    }
  }
}

Windsurf

pip install doculayer==0.1.1 && doculayer setup --ide windsurf

Or add manually to ~/.codeium/windsurf/mcp_config.json with the same JSON block above.

VS Code (with MCP support)

pip install doculayer==0.1.1 && doculayer setup --ide vscode

Or add to your VS Code settings.json:

{
  "mcpServers": {
    "doculayer": {
      "command": "doculayer",
      "args": ["mcp"]
    }
  }
}

Zed

pip install doculayer==0.1.1 && doculayer setup --ide zed

Or add to ~/.config/zed/settings.json (macOS/Linux) or %APPDATA%\Zed\settings.json (Windows):

{
  "context_servers": {
    "doculayer": {
      "command": { "path": "doculayer", "args": ["mcp"] }
    }
  }
}

All IDEs at once

doculayer setup --all-ides

🔧 What it does

  • MCP serverdoculayer_search, doculayer_fetch, doculayer_symbol, doculayer_sources for any MCP client
  • Python libraryawait search("query", "fastapi") / await fetch("httpx", section="AsyncClient") inline in any app
  • CLIdoculayer search "streaming responses" --source nextjs
  • llms.txt-first — uses the llms.txt index when present to fetch only the 1–3 pages most relevant to the query
  • Zero hallucination — every byte returned is verbatim from the fetched URL; attribution header on every response; never generates text
  • Zero disk storage — content lives in a process-local TTL cache; no database, no files, no persistence across restarts

🛠️ MCP Tools

Tool What it does
doculayer_search(query, source, max_results=5) BM25 search across live doc sections. Returns verbatim content ranked by relevance.
doculayer_fetch(source, section=None) Fetch a whole page or a named heading. Use section= to target large pages.
doculayer_symbol(symbol, source=None) Look up a function, class, or method. Source auto-inferred from dotted prefix.
doculayer_sources() List known sources, identifier formats, and live cache stats.

Every response includes a citation block:

> **Source**: https://docs.pydantic.dev/latest/concepts/validators/
> **Fetched**: 4s ago

No generated text ever appears in a response.


📦 Python Library

import asyncio
from doculayer import search, fetch

# Search live docs — returns verbatim sections with source attribution
results = asyncio.run(search("dependency injection", "fastapi"))
for r in results:
    print(r.score, r.section.title)
    print(r.section.source_url)     # "https://fastapi.tiangolo.com/tutorial/..."
    print(r.section.content)        # verbatim Markdown from the real docs
    print(r.section.cited_content)  # content with attribution header prepended

# Fetch a specific section
content = asyncio.run(fetch("httpx", section="AsyncClient"))
print(content)
# > **Source**: https://www.python-httpx.org/...
# > **Fetched**: 2s ago
#
# ## AsyncClient
# ...verbatim text...

🔍 Identifier Formats

Format Example Resolves via
bare name fastapi shortcut table → PyPI → npm
pypi: pypi:httpx PyPI JSON API
npm: npm:react npm registry
gh: gh:owner/repo GitHub URL
direct URL https://docs.example.com passthrough

📚 Packages with llms.txt (fastest access)

These packages publish an llms.txt index that DocuLayer uses to fetch only the pages relevant to your query — instead of the whole docs site.

anthropic · astro · fastapi · httpx · langchain · nextjs · openai · pydantic · react · shadcn · supabase · svelte · tailwindcss · vite · vue

Any package without llms.txt falls back to HTML parsing of the root docs page. Passing a direct URL also works — DocuLayer probes for llms.txt automatically.


⚙️ How it works

1. resolve_identifier("pydantic")
        │
        ▼  shortcut table hit → https://docs.pydantic.dev

2. discover_llms_txt("https://docs.pydantic.dev")
        │
        ▼  fetches /llms.txt → 88 indexed entries

3. _candidate_urls("field validators")
        │
        ▼  keyword-score each entry → top 3 pages
           ["concepts/validators/", "api/validators/", "concepts/models/"]

4. DocFetcher.fetch(url) × 3   (parallel, TTLCache checked first)
        │
        ▼  raw HTML → markdownify → DocParser → list[DocSection]

5. DocSearcher(all_sections).search("field validators", top_k=5)
        │
        ▼  BM25Okapi scores → ranked SearchResult list

6. Return verbatim section content + source URL + fetch timestamp

No embeddings. No vector store. No ML inference. No generated text.


🔒 Storage guarantee

DocuLayer never writes to disk.

  • All fetched pages go into a TTLCache[FetchResult] in process memory
  • Cache entries expire after DOCULAYER_CACHE_TTL seconds (default: 1 hour)
  • On process restart, the cache is empty — nothing persisted
  • Safe for privacy-sensitive environments; docs can never go stale past the TTL

🔧 Configuration

All settings are environment variables — no config files, no disk reads:

Variable Default Description
DOCULAYER_CACHE_TTL 3600 Cache entry lifetime in seconds
DOCULAYER_MAX_CACHE 256 Max cached pages (oldest evicted on overflow)
DOCULAYER_FETCH_TIMEOUT 12.0 HTTP timeout per request
DOCULAYER_MAX_BYTES 524288 Max page size (512 KB)
DOCULAYER_MAX_WORDS 400 Max words per returned section

📊 Compared to RAG

RAG DocuLayer
Storage Vector DB required None — in-memory TTL only
Freshness Depends on indexing schedule Always live (TTL-bounded)
Accuracy Semantic similarity Verbatim text from source
Setup Embedding model + DB + ingestion pipeline pip install doculayer==0.1.1
Hallucination risk Embedding drift, chunking artifacts Zero — no generated text

❓ Why not just give the agent a URL?

You could. But:

  • The agent still has to know which URL. For a library with 200 pages, it guesses.
  • The agent will fetch the whole page and summarize it — that's generation, which means drift.
  • DocuLayer uses llms.txt to fetch only the 1–3 pages most likely to answer the query, then returns the relevant section verbatim. The agent reads real documentation, not a paraphrase of it.

🤝 Contributing

git clone https://github.com/inamdarmihir/doculayer
cd doculayer
pip install -e ".[dev]"
pytest

📄 License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doculayer-0.1.1.tar.gz (26.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doculayer-0.1.1-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file doculayer-0.1.1.tar.gz.

File metadata

  • Download URL: doculayer-0.1.1.tar.gz
  • Upload date:
  • Size: 26.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for doculayer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1d7655c87d9ab9abed827d47507764b9383791d2daed1dda0c521e31cd750133
MD5 93f7a158e7e47f1db44e6329ebc7a6cc
BLAKE2b-256 6367801ce13ff0d5e4fd2c501d2d647ca301f1541e7b40b879bfd658c862ccbd

See more details on using hashes here.

File details

Details for the file doculayer-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: doculayer-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for doculayer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d887f0241d4487b47357adb789b686dbe90ea2ec8e5143fa748924465a0537fa
MD5 2ef0837e066b6bc2186a78365c4ec148
BLAKE2b-256 fc450fff89de9a27cd53a6539b1fc520c00a9014e65b245e58e8427d0d09f71f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page