doculayer

Live documentation access layer for AI agents — no hallucination, no stale docs

These details have not been verified by PyPI

Project links

Project description

  ██████╗  ██████╗  ██████╗██╗   ██╗██╗      █████╗ ██╗   ██╗███████╗██████╗
  ██╔══██╗██╔═══██╗██╔════╝██║   ██║██║     ██╔══██╗╚██╗ ██╔╝██╔════╝██╔══██╗
  ██║  ██║██║   ██║██║     ██║   ██║██║     ███████║ ╚████╔╝ █████╗  ██████╔╝
  ██║  ██║██║   ██║██║     ██║   ██║██║     ██╔══██║  ╚██╔╝  ██╔══╝  ██╔══██╗
  ██████╔╝╚██████╔╝╚██████╗╚██████╔╝███████╗██║  ██║   ██║   ███████╗██║  ██║
  ╚═════╝  ╚═════╝  ╚═════╝ ╚═════╝ ╚══════╝╚═╝  ╚═╝   ╚═╝   ╚══════╝╚═╝  ╚═╝

The live documentation layer for AI agents

AI agents hallucinate APIs. Not because they're broken — because their training data is stale.
A function signature that changed six months ago, a parameter that was renamed, a new method that didn't exist at training time — the model confidently fabricates the old behavior.

DocuLayer fixes this. It sits between your agent and the real documentation, fetching live content on demand, returning verbatim text with full source attribution, and never generating a single word.

 Your agent  (Claude, Cursor, Codex, any MCP client…)
     │   "what parameters does httpx.AsyncClient.get() take?"
     ▼
 ┌──────────────────────────────────────────────────────────────┐
 │  DocuLayer                                                   │
 │  ──────────────────────────────────────────────────────────  │
 │  resolve_identifier  →  shortcut table / PyPI / npm         │
 │                                                              │
 │  discover_llms_txt   →  targeted page index (when present)  │
 │       keyword score entries  →  fetch only relevant pages   │
 │                                                              │
 │  DocParser  (HTML → Markdown, heading split)                │
 │  DocSearcher  (BM25 — no ML, no embeddings, no network)     │
 │                                                              │
 │  TTLCache  (in-memory only — zero disk writes)              │
 └──────────────────────────────────────────────────────────────┘
     │   verbatim section + source URL + "fetched 3s ago"
     ▼
 LLM  —  reads real docs, answers correctly

⚡ Quick Install — 30 seconds

Step 1 — Install the package

pip install doculayer==0.1.1

Step 2 — Wire up your IDE (auto-detected, zero prompts)

doculayer setup

That's it. The wizard detects your IDE and writes the MCP config automatically — no editor, no JSON, no manual steps.

Restart your IDE after setup and the four DocuLayer tools appear immediately.

🖥️ IDE-Specific Setup

The doculayer setup command handles all of these automatically.
Manual one-liners are listed below for reference.

Claude Code

pip install doculayer==0.1.1 && claude mcp add doculayer -- doculayer mcp

Restart Claude Code. Tools are live.

Cursor

pip install doculayer==0.1.1 && doculayer setup --ide cursor

Or add manually to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "doculayer": {
      "command": "doculayer",
      "args": ["mcp"]
    }
  }
}

Windsurf

pip install doculayer==0.1.1 && doculayer setup --ide windsurf

Or add manually to ~/.codeium/windsurf/mcp_config.json with the same JSON block above.

VS Code (with MCP support)

pip install doculayer==0.1.1 && doculayer setup --ide vscode

Or add to your VS Code settings.json:

{
  "mcpServers": {
    "doculayer": {
      "command": "doculayer",
      "args": ["mcp"]
    }
  }
}

Zed

pip install doculayer==0.1.1 && doculayer setup --ide zed

Or add to ~/.config/zed/settings.json (macOS/Linux) or %APPDATA%\Zed\settings.json (Windows):

{
  "context_servers": {
    "doculayer": {
      "command": { "path": "doculayer", "args": ["mcp"] }
    }
  }
}

All IDEs at once

doculayer setup --all-ides

🔧 What it does

MCP server — doculayer_search, doculayer_fetch, doculayer_symbol, doculayer_sources for any MCP client
Python library — await search("query", "fastapi") / await fetch("httpx", section="AsyncClient") inline in any app
CLI — doculayer search "streaming responses" --source nextjs
llms.txt-first — uses the llms.txt index when present to fetch only the 1–3 pages most relevant to the query
Zero hallucination — every byte returned is verbatim from the fetched URL; attribution header on every response; never generates text
Zero disk storage — content lives in a process-local TTL cache; no database, no files, no persistence across restarts

🛠️ MCP Tools

Tool	What it does
`doculayer_search(query, source, max_results=5)`	BM25 search across live doc sections. Returns verbatim content ranked by relevance.
`doculayer_fetch(source, section=None)`	Fetch a whole page or a named heading. Use `section=` to target large pages.
`doculayer_symbol(symbol, source=None)`	Look up a function, class, or method. Source auto-inferred from dotted prefix.
`doculayer_sources()`	List known sources, identifier formats, and live cache stats.

Every response includes a citation block:

> **Source**: https://docs.pydantic.dev/latest/concepts/validators/
> **Fetched**: 4s ago

No generated text ever appears in a response.

📦 Python Library

import asyncio
from doculayer import search, fetch

# Search live docs — returns verbatim sections with source attribution
results = asyncio.run(search("dependency injection", "fastapi"))
for r in results:
    print(r.score, r.section.title)
    print(r.section.source_url)     # "https://fastapi.tiangolo.com/tutorial/..."
    print(r.section.content)        # verbatim Markdown from the real docs
    print(r.section.cited_content)  # content with attribution header prepended

# Fetch a specific section
content = asyncio.run(fetch("httpx", section="AsyncClient"))
print(content)
# > **Source**: https://www.python-httpx.org/...
# > **Fetched**: 2s ago
#
# ## AsyncClient
# ...verbatim text...

🔍 Identifier Formats

Format	Example	Resolves via
bare name	`fastapi`	shortcut table → PyPI → npm
`pypi:`	`pypi:httpx`	PyPI JSON API
`npm:`	`npm:react`	npm registry
`gh:`	`gh:owner/repo`	GitHub URL
direct URL	`https://docs.example.com`	passthrough

📚 Packages with llms.txt (fastest access)

These packages publish an llms.txt index that DocuLayer uses to fetch only the pages relevant to your query — instead of the whole docs site.

anthropic · astro · fastapi · httpx · langchain · nextjs · openai · pydantic · react · shadcn · supabase · svelte · tailwindcss · vite · vue

Any package without llms.txt falls back to HTML parsing of the root docs page. Passing a direct URL also works — DocuLayer probes for llms.txt automatically.

⚙️ How it works

1. resolve_identifier("pydantic")
        │
        ▼  shortcut table hit → https://docs.pydantic.dev

2. discover_llms_txt("https://docs.pydantic.dev")
        │
        ▼  fetches /llms.txt → 88 indexed entries

3. _candidate_urls("field validators")
        │
        ▼  keyword-score each entry → top 3 pages
           ["concepts/validators/", "api/validators/", "concepts/models/"]

4. DocFetcher.fetch(url) × 3   (parallel, TTLCache checked first)
        │
        ▼  raw HTML → markdownify → DocParser → list[DocSection]

5. DocSearcher(all_sections).search("field validators", top_k=5)
        │
        ▼  BM25Okapi scores → ranked SearchResult list

6. Return verbatim section content + source URL + fetch timestamp

No embeddings. No vector store. No ML inference. No generated text.

🔒 Storage guarantee

DocuLayer never writes to disk.

All fetched pages go into a TTLCache[FetchResult] in process memory
Cache entries expire after DOCULAYER_CACHE_TTL seconds (default: 1 hour)
On process restart, the cache is empty — nothing persisted
Safe for privacy-sensitive environments; docs can never go stale past the TTL

🔧 Configuration

All settings are environment variables — no config files, no disk reads:

Variable	Default	Description
`DOCULAYER_CACHE_TTL`	`3600`	Cache entry lifetime in seconds
`DOCULAYER_MAX_CACHE`	`256`	Max cached pages (oldest evicted on overflow)
`DOCULAYER_FETCH_TIMEOUT`	`12.0`	HTTP timeout per request
`DOCULAYER_MAX_BYTES`	`524288`	Max page size (512 KB)
`DOCULAYER_MAX_WORDS`	`400`	Max words per returned section

📊 Compared to RAG

	RAG	DocuLayer
Storage	Vector DB required	None — in-memory TTL only
Freshness	Depends on indexing schedule	Always live (TTL-bounded)
Accuracy	Semantic similarity	Verbatim text from source
Setup	Embedding model + DB + ingestion pipeline	`pip install doculayer==0.1.1`
Hallucination risk	Embedding drift, chunking artifacts	Zero — no generated text

❓ Why not just give the agent a URL?

You could. But:

The agent still has to know which URL. For a library with 200 pages, it guesses.
The agent will fetch the whole page and summarize it — that's generation, which means drift.
DocuLayer uses llms.txt to fetch only the 1–3 pages most likely to answer the query, then returns the relevant section verbatim. The agent reads real documentation, not a paraphrase of it.

🤝 Contributing

git clone https://github.com/inamdarmihir/doculayer
cd doculayer
pip install -e ".[dev]"
pytest

📄 License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 29, 2026

0.1.0

Jun 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doculayer-0.1.1.tar.gz (26.9 MB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

doculayer-0.1.1-py3-none-any.whl (27.7 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file doculayer-0.1.1.tar.gz.

File metadata

Download URL: doculayer-0.1.1.tar.gz
Upload date: Jun 29, 2026
Size: 26.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.2

File hashes

Hashes for doculayer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`1d7655c87d9ab9abed827d47507764b9383791d2daed1dda0c521e31cd750133`
MD5	`93f7a158e7e47f1db44e6329ebc7a6cc`
BLAKE2b-256	`6367801ce13ff0d5e4fd2c501d2d647ca301f1541e7b40b879bfd658c862ccbd`

See more details on using hashes here.

File details

Details for the file doculayer-0.1.1-py3-none-any.whl.

File metadata

Download URL: doculayer-0.1.1-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.2

File hashes

Hashes for doculayer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d887f0241d4487b47357adb789b686dbe90ea2ec8e5143fa748924465a0537fa`
MD5	`2ef0837e066b6bc2186a78365c4ec148`
BLAKE2b-256	`fc450fff89de9a27cd53a6539b1fc520c00a9014e65b245e58e8427d0d09f71f`

See more details on using hashes here.

doculayer 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

⚡ Quick Install — 30 seconds

Step 1 — Install the package

Step 2 — Wire up your IDE (auto-detected, zero prompts)

🖥️ IDE-Specific Setup

Claude Code

Cursor

Windsurf

VS Code (with MCP support)

Zed

All IDEs at once

🔧 What it does

🛠️ MCP Tools

📦 Python Library

🔍 Identifier Formats

📚 Packages with llms.txt (fastest access)

⚙️ How it works

🔒 Storage guarantee

🔧 Configuration

📊 Compared to RAG

❓ Why not just give the agent a URL?

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes