Skip to main content

Fetch docs, embed locally, expose to AI agents via skills.

Project description

docmancer logo docmancer

A local knowledge base for AI agents. Ground your agents in version-specific docs and structured research vaults, locally, for free.

PyPI version License: MIT Python 3.11 | 3.12 | 3.13 CI


docmancer vault demo

✔ Up-to-date, version-specific documentation straight from the source
✔ Research vaults for mixed-source knowledge work with Obsidian compatibility
✔ Only the chunks your agent needs, not the whole doc site
✔ Built-in evals to measure and improve retrieval quality
✔ 100% local. Embeddings, storage, retrieval all on your machine.
✔ Completely free. No rate limits, no quotas, no API keys.
✔ Works offline once ingested. Private and internal docs supported.
✔ No MCP server. Installs as a skill, runs as a CLI.

pipx install docmancer --python python3.13

Quickstart · Two Workflows · The Problem · Agents · Why Local? · Commands · Install · Wiki


Quickstart

# 1. Install pipx
brew install pipx
pipx ensurepath

# 2. Open a new shell, then install docmancer
pipx install docmancer --python python3.13

# 3. Create a knowledge vault
docmancer init --template vault --name my-research

# 4. Add sources from the web or local files
docmancer vault add-url https://some-article.com/post
# or place markdown files directly in raw/

# 5. Sync filesystem, manifest, and vector index
docmancer vault scan

# 6. Install the skill into your agents
docmancer install claude-code
docmancer install cursor

# 7. Query, navigate, and maintain
docmancer query "How does authentication work?"
docmancer vault search "auth flow"
docmancer vault suggest

No server to start. Config and the default vector store are created under ~/.docmancer/ on first use. Vaults are plain markdown on the filesystem, so they work natively with Obsidian for graph view, canvas, backlinks, and the full plugin ecosystem.

You can also adopt an existing folder of Markdown (such as an Obsidian vault) without reorganizing anything:

docmancer vault open ./my-obsidian-vault --name research

Two Workflows

Docmancer supports two primary workflows built on the same local-first retrieval stack.

Research vaults

The recommended way to use docmancer. A vault is a structured local knowledge base with filesystem layout (raw/, wiki/, outputs/), a provenance manifest, maintenance intelligence, and retrieval evals. You add sources from the web, local files, or PDFs, and docmancer handles indexing, linting, and maintenance guidance so your agents can navigate and build on the knowledge over time.

docmancer vault scan                         # reconcile state
docmancer vault context "transformer arch"   # grouped research bundle
docmancer vault lint                         # check structural integrity
docmancer vault backlog                      # find coverage gaps
docmancer vault suggest                      # get next actions for agents

For full details, see the Vaults wiki page.

Quick docs retrieval

If you just need to ground your agents in a specific documentation site without setting up a full vault, the original ingest workflow still works. Point docmancer at a docs URL, ingest it, and query directly.

docmancer ingest https://docs.example.com
docmancer query "How do I authenticate?"

Both workflows coexist. They share the same embedding pipeline, vector store, and CLI skill system. Quick docs retrieval is a fast on-ramp, while vaults are the full experience for knowledge work that grows over time.


The Problem

AI agents hallucinate APIs. They invent CLI flags, fabricate method signatures, and confidently cite documentation from versions that no longer exist. The root cause is simple: their training data has a cutoff, and they fill gaps by guessing.

The obvious fix, dumping entire doc sites into context, makes it worse. You burn thousands of tokens on irrelevant text and bury the one paragraph that actually matters. The same problem applies to research and knowledge work: agents need structured, retrievable knowledge, not a raw pile of files.

Cloud-based documentation tools add rate limits, usage tiers, and route your queries through third-party servers. Docmancer takes a different approach: you ingest docs once (or build a structured vault from mixed sources), they are chunked and indexed locally, and the agent retrieves only the matching sections when it needs them.


Works With Every Agent

Docmancer installs a skill file into each agent that teaches it to call the CLI directly. One local index, one ingest step, every agent covered.

Agent Install command
Claude Code docmancer install claude-code
Cline docmancer install cline
Codex docmancer install codex
Cursor docmancer install cursor
Gemini CLI docmancer install gemini
OpenCode docmancer install opencode
Claude Desktop docmancer install claude-desktop

Skills are plain markdown files. No background daemon, no MCP server, no ports. Use --project with claude-code, gemini, or cline to install into the current working directory instead of globally.


Why Local?

DocMancer
Cost Free, always. No tiers, no quotas.
Rate limits None. Query as much as you want.
Private docs Supported free. No paid plan required.
Data privacy Nothing leaves your machine.
Infrastructure No server. CLI + local storage.
Offline use Yes, after ingestion.
Embedding Local FastEmbed. No API key needed.

Commands

Core

Command What it does
docmancer ingest <url-or-path> Fetch, chunk, embed, and index docs locally
docmancer query <text> Retrieve relevant chunks from the local index
docmancer install <agent> Install skill file for a supported agent
docmancer list List ingested sources with timestamps
docmancer fetch <url> Download GitBook docs as markdown (no embedding)
docmancer remove <source> Remove an ingested source from the index
docmancer inspect Show collection stats and config
docmancer doctor Health check: PATH, config, Qdrant, installed skills
docmancer init Create a project-local docmancer.yaml
docmancer setup Interactive wizard for API keys and integrations

Vault

Command What it does
docmancer init --template vault Scaffold a structured knowledge base with raw/, wiki/, outputs/
docmancer vault open <path> Adopt an existing folder of files as a vault
docmancer vault scan Reconcile filesystem, manifest, and vector index
docmancer vault status Show vault health summary with file counts and index states
docmancer vault add-url <url> Fetch a web page into raw/ with provenance and index it
docmancer vault inspect <id-or-path> Show manifest metadata for a specific vault entry
docmancer vault search <query> Search vault metadata and content at file level
docmancer vault context <query> Get grouped research context across raw, wiki, and output corpora
docmancer vault related <id-or-path> Find entries related by tags, links, and semantic similarity
docmancer vault lint Validate vault integrity; use --deep for LLM-assisted checks
docmancer vault backlog Generate prioritized maintenance items from vault state
docmancer vault suggest Produce a next-actions list for agents without writing content

Evals

Command What it does
docmancer query --trace Print a structured execution trace for a single retrieval
docmancer dataset generate Generate a golden eval dataset scaffold; use --llm for LLM-assisted Q&A
docmancer dataset generate-training Generate fine-tuning training data in JSONL, Alpaca, or conversation format
docmancer eval Run retrieval metrics (MRR, hit rate, chunk overlap, latency) against a dataset

Use --full with docmancer query to return the entire chunk body (default truncates at 1500 characters). Use --limit N to change how many chunks are returned.

For large ingests, tune ingestion.workers, ingestion.embed_queue_size, web_fetch.workers, embedding.batch_size, and embedding.parallel in docmancer.yaml.


Evals and Observability

Docmancer includes a local-first eval system so you can measure whether retrieval quality is actually improving as you add content and organize a vault.

  • Query tracing (--trace) shows a latency breakdown for each retrieval: embedding time, search time, and returned chunks with scores.
  • Dataset generation creates golden eval datasets from your content, either as a scaffold you fill in manually or with LLM-assisted Q&A generation (--llm).
  • Deterministic metrics (MRR, hit rate, chunk overlap, latency percentiles) run entirely locally with no API keys required.
  • LLM-as-judge (eval --judge) adds semantic relevance scoring on top of the deterministic metrics for deeper analysis.

The eval system connects to the vault intelligence commands. For example, vault backlog can surface queries from the golden dataset that scored below threshold, pointing agents toward areas where the knowledge base needs better coverage.

For full details, see the Evals and Observability wiki page.


Cross-Vault Workflows

You can have separate vaults for different knowledge domains. Each vault has its own manifest and config, but they share the local Qdrant store by default. Tags let you organize vaults into logical groups and query across them.

# Create and tag vaults
docmancer init --template vault --name stripe-docs --dir ./vaults/stripe
docmancer vault tag stripe-docs work api

# List registered vaults, optionally filtered by tag
docmancer list --vaults --tag work

# Query across all vaults or a specific tag group
docmancer query --cross-vault "webhook retry behavior"
docmancer query --tag research "attention mechanisms"

Knowledge ingested in one agent context is queryable from any other agent on the same machine. Ingest in Claude Code, query from Cursor, and the results are the same because all agents hit the same local store.

For full details, see the Cross-Vault Workflows wiki page.


Install

brew install pipx
pipx ensurepath
# open a new shell, then:
pipx install docmancer --python python3.13

Supports Python 3.11-3.13. On Apple Silicon, prefer the native Homebrew Python:

pipx install docmancer --python /opt/homebrew/bin/python3.13

Upgrade with pipx upgrade docmancer.


Documentation

For configuration, troubleshooting, architecture details, and more, see the GitHub Wiki.


Contributing

See CONTRIBUTING.md.

License

MIT License. See LICENSE.


Your agents are guessing. Give them a knowledge base.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docmancer-0.2.2.tar.gz (482.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docmancer-0.2.2-py3-none-any.whl (159.1 kB view details)

Uploaded Python 3

File details

Details for the file docmancer-0.2.2.tar.gz.

File metadata

  • Download URL: docmancer-0.2.2.tar.gz
  • Upload date:
  • Size: 482.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmancer-0.2.2.tar.gz
Algorithm Hash digest
SHA256 190bbfcd126c30763809d6e9ef09e8bf0e16951dda21fa0523cdcaf255968021
MD5 d3a23c658f08b21b86187b42c4ef05b7
BLAKE2b-256 052a4c4d1380bb43297a8bbc39589b00b8b151e1d78e01a277b1cd25839e4106

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmancer-0.2.2.tar.gz:

Publisher: publish.yml on docmancer/docmancer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docmancer-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: docmancer-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 159.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmancer-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 45925430cdc19204b40b3b8303e395ca380509409b2a0ca2b3e2a3994fcc2368
MD5 6cce98c173dbd5832191e3624251170b
BLAKE2b-256 4482579674cd31ef1aef8b46a9de7224e6b7ebad7c1110bf39e837960263c0a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmancer-0.2.2-py3-none-any.whl:

Publisher: publish.yml on docmancer/docmancer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page