Live documentation access layer for AI agents — no hallucination, no stale docs
Project description
██████╗ ██████╗ ██████╗██╗ ██╗██╗ █████╗ ██╗ ██╗███████╗██████╗
██╔══██╗██╔═══██╗██╔════╝██║ ██║██║ ██╔══██╗╚██╗ ██╔╝██╔════╝██╔══██╗
██║ ██║██║ ██║██║ ██║ ██║██║ ███████║ ╚████╔╝ █████╗ ██████╔╝
██║ ██║██║ ██║██║ ██║ ██║██║ ██╔══██║ ╚██╔╝ ██╔══╝ ██╔══██╗
██████╔╝╚██████╔╝╚██████╗╚██████╔╝███████╗██║ ██║ ██║ ███████╗██║ ██║
╚═════╝ ╚═════╝ ╚═════╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝ ╚═╝ ╚══════╝╚═╝ ╚═╝
The live documentation layer for AI agents
AI agents hallucinate APIs. Not because they're broken — because their training data is stale.
A function signature that changed six months ago, a parameter that was renamed, a new method that didn't exist at training time — the model confidently fabricates the old behavior.
DocuLayer fixes this. It sits between your agent and the real documentation, fetching live content on demand, returning verbatim text with full source attribution, and never generating a single word.
Your agent (Claude, Cursor, Codex, any MCP client…)
│ "what parameters does httpx.AsyncClient.get() take?"
▼
┌──────────────────────────────────────────────────────────────┐
│ DocuLayer │
│ ────────────────────────────────────────────────────────── │
│ resolve_identifier → shortcut table / PyPI / npm │
│ │
│ discover_llms_txt → targeted page index (when present) │
│ keyword score entries → fetch only relevant pages │
│ │
│ DocParser (HTML → Markdown, heading split) │
│ DocSearcher (BM25 — no ML, no embeddings, no network) │
│ │
│ TTLCache (in-memory only — zero disk writes) │
└──────────────────────────────────────────────────────────────┘
│ verbatim section + source URL + "fetched 3s ago"
▼
LLM — reads real docs, answers correctly
⚡ Quick Install — 30 seconds
Step 1 — Install the package
pip install doculayer==0.1.1
Step 2 — Wire up your IDE (auto-detected, zero prompts)
doculayer setup
That's it. The wizard detects your IDE and writes the MCP config automatically — no editor, no JSON, no manual steps.
Restart your IDE after setup and the four DocuLayer tools appear immediately.
🖥️ IDE-Specific Setup
The doculayer setup command handles all of these automatically.
Manual one-liners are listed below for reference.
Claude Code
pip install doculayer==0.1.1 && claude mcp add doculayer -- doculayer mcp
Restart Claude Code. Tools are live.
Cursor
pip install doculayer==0.1.1 && doculayer setup --ide cursor
Or add manually to ~/.cursor/mcp.json:
{
"mcpServers": {
"doculayer": {
"command": "doculayer",
"args": ["mcp"]
}
}
}
Windsurf
pip install doculayer==0.1.1 && doculayer setup --ide windsurf
Or add manually to ~/.codeium/windsurf/mcp_config.json with the same JSON block above.
VS Code (with MCP support)
pip install doculayer==0.1.1 && doculayer setup --ide vscode
Or add to your VS Code settings.json:
{
"mcpServers": {
"doculayer": {
"command": "doculayer",
"args": ["mcp"]
}
}
}
Zed
pip install doculayer==0.1.1 && doculayer setup --ide zed
Or add to ~/.config/zed/settings.json (macOS/Linux) or %APPDATA%\Zed\settings.json (Windows):
{
"context_servers": {
"doculayer": {
"command": { "path": "doculayer", "args": ["mcp"] }
}
}
}
All IDEs at once
doculayer setup --all-ides
🔧 What it does
- MCP server —
doculayer_search,doculayer_fetch,doculayer_symbol,doculayer_sourcesfor any MCP client - Python library —
await search("query", "fastapi")/await fetch("httpx", section="AsyncClient")inline in any app - CLI —
doculayer search "streaming responses" --source nextjs - llms.txt-first — uses the llms.txt index when present to fetch only the 1–3 pages most relevant to the query
- Zero hallucination — every byte returned is verbatim from the fetched URL; attribution header on every response; never generates text
- Zero disk storage — content lives in a process-local TTL cache; no database, no files, no persistence across restarts
🛠️ MCP Tools
| Tool | What it does |
|---|---|
doculayer_search(query, source, max_results=5) |
BM25 search across live doc sections. Returns verbatim content ranked by relevance. |
doculayer_fetch(source, section=None) |
Fetch a whole page or a named heading. Use section= to target large pages. |
doculayer_symbol(symbol, source=None) |
Look up a function, class, or method. Source auto-inferred from dotted prefix. |
doculayer_sources() |
List known sources, identifier formats, and live cache stats. |
Every response includes a citation block:
> **Source**: https://docs.pydantic.dev/latest/concepts/validators/
> **Fetched**: 4s ago
No generated text ever appears in a response.
📦 Python Library
import asyncio
from doculayer import search, fetch
# Search live docs — returns verbatim sections with source attribution
results = asyncio.run(search("dependency injection", "fastapi"))
for r in results:
print(r.score, r.section.title)
print(r.section.source_url) # "https://fastapi.tiangolo.com/tutorial/..."
print(r.section.content) # verbatim Markdown from the real docs
print(r.section.cited_content) # content with attribution header prepended
# Fetch a specific section
content = asyncio.run(fetch("httpx", section="AsyncClient"))
print(content)
# > **Source**: https://www.python-httpx.org/...
# > **Fetched**: 2s ago
#
# ## AsyncClient
# ...verbatim text...
🔍 Identifier Formats
| Format | Example | Resolves via |
|---|---|---|
| bare name | fastapi |
shortcut table → PyPI → npm |
pypi: |
pypi:httpx |
PyPI JSON API |
npm: |
npm:react |
npm registry |
gh: |
gh:owner/repo |
GitHub URL |
| direct URL | https://docs.example.com |
passthrough |
📚 Packages with llms.txt (fastest access)
These packages publish an llms.txt index that DocuLayer uses to fetch only the pages relevant to your query — instead of the whole docs site.
anthropic · astro · fastapi · httpx · langchain · nextjs · openai · pydantic · react · shadcn · supabase · svelte · tailwindcss · vite · vue
Any package without llms.txt falls back to HTML parsing of the root docs page. Passing a direct URL also works — DocuLayer probes for llms.txt automatically.
⚙️ How it works
1. resolve_identifier("pydantic")
│
▼ shortcut table hit → https://docs.pydantic.dev
2. discover_llms_txt("https://docs.pydantic.dev")
│
▼ fetches /llms.txt → 88 indexed entries
3. _candidate_urls("field validators")
│
▼ keyword-score each entry → top 3 pages
["concepts/validators/", "api/validators/", "concepts/models/"]
4. DocFetcher.fetch(url) × 3 (parallel, TTLCache checked first)
│
▼ raw HTML → markdownify → DocParser → list[DocSection]
5. DocSearcher(all_sections).search("field validators", top_k=5)
│
▼ BM25Okapi scores → ranked SearchResult list
6. Return verbatim section content + source URL + fetch timestamp
No embeddings. No vector store. No ML inference. No generated text.
🔒 Storage guarantee
DocuLayer never writes to disk.
- All fetched pages go into a
TTLCache[FetchResult]in process memory - Cache entries expire after
DOCULAYER_CACHE_TTLseconds (default: 1 hour) - On process restart, the cache is empty — nothing persisted
- Safe for privacy-sensitive environments; docs can never go stale past the TTL
🔧 Configuration
All settings are environment variables — no config files, no disk reads:
| Variable | Default | Description |
|---|---|---|
DOCULAYER_CACHE_TTL |
3600 |
Cache entry lifetime in seconds |
DOCULAYER_MAX_CACHE |
256 |
Max cached pages (oldest evicted on overflow) |
DOCULAYER_FETCH_TIMEOUT |
12.0 |
HTTP timeout per request |
DOCULAYER_MAX_BYTES |
524288 |
Max page size (512 KB) |
DOCULAYER_MAX_WORDS |
400 |
Max words per returned section |
📊 Compared to RAG
| RAG | DocuLayer | |
|---|---|---|
| Storage | Vector DB required | None — in-memory TTL only |
| Freshness | Depends on indexing schedule | Always live (TTL-bounded) |
| Accuracy | Semantic similarity | Verbatim text from source |
| Setup | Embedding model + DB + ingestion pipeline | pip install doculayer==0.1.1 |
| Hallucination risk | Embedding drift, chunking artifacts | Zero — no generated text |
❓ Why not just give the agent a URL?
You could. But:
- The agent still has to know which URL. For a library with 200 pages, it guesses.
- The agent will fetch the whole page and summarize it — that's generation, which means drift.
- DocuLayer uses llms.txt to fetch only the 1–3 pages most likely to answer the query, then returns the relevant section verbatim. The agent reads real documentation, not a paraphrase of it.
🤝 Contributing
git clone https://github.com/inamdarmihir/doculayer
cd doculayer
pip install -e ".[dev]"
pytest
📄 License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doculayer-0.1.1.tar.gz.
File metadata
- Download URL: doculayer-0.1.1.tar.gz
- Upload date:
- Size: 26.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d7655c87d9ab9abed827d47507764b9383791d2daed1dda0c521e31cd750133
|
|
| MD5 |
93f7a158e7e47f1db44e6329ebc7a6cc
|
|
| BLAKE2b-256 |
6367801ce13ff0d5e4fd2c501d2d647ca301f1541e7b40b879bfd658c862ccbd
|
File details
Details for the file doculayer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: doculayer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d887f0241d4487b47357adb789b686dbe90ea2ec8e5143fa748924465a0537fa
|
|
| MD5 |
2ef0837e066b6bc2186a78365c4ec148
|
|
| BLAKE2b-256 |
fc450fff89de9a27cd53a6539b1fc520c00a9014e65b245e58e8427d0d09f71f
|