Local documentation retrieval service for agent-assisted development. Indexes framework docs into typed Knowledge Units and exposes them via CLI and HTTP API.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

phlppbmm

These details have not been verified by PyPI

Project description

rtfm

Local documentation retrieval for agent-assisted development.

Indexes framework documentation into typed Knowledge Units and exposes them via CLI and HTTP API. Built for LLM-powered workflows where agents need precise, up-to-date API references instead of relying on training data.

Why

LLMs hallucinate APIs. They confidently generate outdated patterns, mix up v1/v2 syntax, and invent function signatures. rtfm gives your agents a local, searchable ground truth — the actual docs, chunked, typed, and semantically indexed.

Install

# Requires Python 3.11+ and uv
uv tool install -e .

# Or for development
git clone <repo-url> && cd rtfm
uv sync

Quick Start

rtfm init                          # Create ~/.rtfm/config.yaml
# Edit config to add your frameworks, then:
rtfm ingest                        # Download & index documentation
rtfm up                            # Start server in background

Usage

# Topic search — returns API refs, examples, concepts, pitfalls
rtfm dependency injection -f fastapi
rtfm "reactive state" -f svelte
rtfm WebSocket -f fastapi -k 3

# Symbol lookup — single word triggers exact match
rtfm Depends -f fastapi
rtfm '$state' -f svelte
rtfm Client -f reqwest

Output is JSON when piped (for agents), formatted text in terminal (for humans). Override with --json or --pretty.

Single-word queries try symbol lookup first, then fall back to topic search. Multi-word queries return a topic bundle grouped by type.

Admin

rtfm status                        # Health scores, unit counts, update check
rtfm update                        # Re-ingest outdated sources
rtfm remove tauri                  # Remove a framework's data
rtfm ingest -f svelte --rebuild    # Force re-ingest
rtfm up                            # Start server in background
rtfm down                          # Stop background server
rtfm serve                         # Foreground server (for systemd)

Configuration

~/.rtfm/config.yaml:

embedding_model: nomic-ai/nomic-embed-text-v1.5
min_health_score: 80               # Reject imports scoring below this

server:
  host: 127.0.0.1
  port: 8787

sources:
  # llms.txt — single pre-built markdown file
  svelte:
    type: llms_txt
    url: https://svelte.dev/llms-full.txt
    language: javascript

  # GitHub — sparse checkout of a docs directory
  fastapi:
    type: github
    repo: fastapi/fastapi
    docs_path: docs/en/docs
    glob: "**/*.md"
    language: python

  # Website + sitemap
  claude-agent-sdk:
    type: website
    sitemap: https://platform.claude.com/sitemap.xml
    url_filter: "/docs/en/managed-agents/"
    language: python

  # Website + wildcard crawl
  tokio:
    type: website
    url: https://tokio.rs/tokio/tutorial/*
    language: rust

  # docs.rs (auto-detected as rustdoc, parsed from raw HTML)
  reqwest:
    type: website
    url: https://docs.rs/reqwest/latest/reqwest/*
    language: rust

  # TypeDoc (explicit doc_system for github.io sites)
  jira-js:
    type: website
    url: https://mrrefactoring.github.io/jira.js/*
    language: javascript
    doc_system: typedoc

Source Types

Type	How it works	Version tracking
`llms_txt`	HTTP GET on a URL, single markdown file	HTTP ETag
`github`	Git sparse checkout of a docs directory	Commit SHA
`local`	Reads files from the local filesystem	File hash
`website` + `sitemap`	Sitemap → filter URLs → fetch HTML → convert to Markdown	Sitemap hash
`website` + `url: .../path/*`	Crawl start page, discover all links under prefix	URL set hash

Doc System Detection

rtfm auto-detects the documentation system and uses a specialized parser:

System	Detection	Sources
Sphinx/RST	`.. autoclass::` directives	sqlalchemy, numpy
MkDocs	`!!! admonition`, `::: module.path` inserts	fastapi, pydantic, polars
rustdoc	`docs.rs` URL or `class="rustdoc"` in HTML	reqwest
TypeDoc	`class="tsd-"` in HTML or `doc_system: typedoc`	jira-js
llms.txt	`type: llms_txt` in config	svelte, tauri, claude-code

Override with doc_system: in source config when auto-detection fails.

How It Works

Ingestion Pipeline

Source (GitHub / HTTP / local / website)
  → Downloader (sparse checkout, HTTP GET, file read, crawl + HTML→MD or raw HTML)
    → Parser (doc-system-aware: Sphinx, MkDocs, rustdoc, TypeDoc, generic MD)
      → Extractor (classifier + symbol extraction + definition site tracking)
        → Storage (SQLite + ChromaDB)

Parser splits documents at heading boundaries (or autodoc directive boundaries for Sphinx). Tracks a full heading hierarchy. Doc-system-specific parsers handle HTML directly (rustdoc, TypeDoc) instead of losing structure in HTML→MD conversion.

Extractor assigns each section one or more types and extracts symbols. Definition sites are tracked — sections where a symbol is canonically defined (via .. autoclass::, ::: module.Symbol, rustdoc struct definitions, etc.) are marked so lookup always returns the authoritative reference, not a random mention.

Relevance decay downweights release notes, changelogs, and migration guides so they don't dominate search results over actual documentation.

Dual-Index Storage

SQLite + FTS5	ChromaDB
Keyword search (BM25)	Semantic search (cosine similarity)
Exact symbol lookup	Conceptual similarity
Definition site ranking	"How do I do X" queries
Metadata, versions	Vector embeddings (nomic-embed-text-v1.5)

Hybrid Search (RRF)

Queries run against both indexes in parallel. Results are merged using Reciprocal Rank Fusion (k=60), then multiplied by per-unit relevance decay.

Import Health Score

Every ingested source gets a health score (0-100, grade A-F) based on:

Signal	What it measures
Type diversity	Are api/example/concept/pitfall all present?
Definition coverage	% of symbols with canonical definition sites
Content quality	Average content length, stub ratio
Changelog noise	% of units from release notes / changelogs
Doc system detection	Was a specialized parser used?

Sources scoring below min_health_score (default: 80) are auto-rejected during ingest.

Knowledge Unit Types

Type	What it contains
`api`	Function/class/component definition with signature, parameters, types
`example`	Complete, working code block with context
`concept`	Explanatory text about a pattern or architecture
`pitfall`	Gotchas, breaking changes, deprecations + workarounds

Architecture

src/rtfm/
  models.py              KnowledgeUnit, configs, Pydantic response models
  cli.py                 Agent-first CLI: rtfm <query> -f <framework>
  server.py              FastAPI HTTP API
  storage.py             Dual-index: ChromaDB (semantic) + SQLite/FTS5 (keyword)
  search.py              Hybrid search (RRF), symbol lookup, bundle
  health.py              Import health score computation
  reporter.py            CLI progress rendering
  ingest/
    pipeline.py          Orchestrates download → parse → extract → store
    detect.py            Auto-detection of doc systems
    downloaders.py       GitHub, HTTP, local, website (sitemap/crawl/HTML→MD)
    parsers/
      base.py            Parser protocol + Section dataclass
      generic_md.py      Markdown + RST (default fallback)
      sphinx_rst.py      Sphinx autodoc directive splitting
      mkdocs_md.py       MkDocs/Material
      rustdoc_html.py    docs.rs raw HTML parsing
      typedoc_html.py    TypeDoc raw HTML parsing
    extractors/
      base.py            Extractor protocol
      generic.py         Multi-label classifier + symbol extraction
      sphinx.py          Autodoc definition sites + changelog decay
      mkdocs.py          mkdocstrings definition sites
      rustdoc.py         All-API extractor for rustdoc
      typedoc.py         All-API extractor for TypeDoc

HTTP API

All endpoints are available via HTTP when the server is running.

Endpoint	Description
`GET /search?q=...&framework=...&type=...&top_k=10`	Hybrid search
`GET /lookup/{symbol}?framework=...`	Symbol lookup
`GET /browse?framework=...&module=...`	Structural navigation
`GET /bundle?q=...&framework=...&top_k=5`	Topic bundle

Swagger UI: http://127.0.0.1:8787/docs

Development

uv sync
uv run python -m pytest tests/ -v     # 200 tests
uv run ruff check src/rtfm/

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

phlppbmm

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.6

Apr 13, 2026

0.2.5

Apr 13, 2026

0.2.4

Apr 10, 2026

0.2.3

Apr 9, 2026

0.2.2

Apr 9, 2026

0.2.1

Apr 9, 2026

0.2.0

Apr 9, 2026

This version

0.1.1

Apr 9, 2026

0.1.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_rtfm-0.1.1.tar.gz (68.2 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_rtfm-0.1.1-py3-none-any.whl (68.8 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file agent_rtfm-0.1.1.tar.gz.

File metadata

Download URL: agent_rtfm-0.1.1.tar.gz
Upload date: Apr 9, 2026
Size: 68.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agent_rtfm-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`05ba5db39d80fc42a8d835d6fab57f137a2366714f4b78d2342af58b38d3cb92`
MD5	`93def76a85bf2588d37d779bcd46afab`
BLAKE2b-256	`85bd6b0429f6f3f964d4f102d03c9614964124e39c02379a91f6fe5fffc9a097`

See more details on using hashes here.

File details

Details for the file agent_rtfm-0.1.1-py3-none-any.whl.

File metadata

Download URL: agent_rtfm-0.1.1-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 68.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.5 {"installer":{"name":"uv","version":"0.11.5","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for agent_rtfm-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac1e7accdb8d01feece7297b0b914755043d33340928c76d73b2968694841d70`
MD5	`f098d6fca8ba20eeb426f75a316b6468`
BLAKE2b-256	`e31abd93226dcec5fe222e1ad11d2d9ba0cb716098b548cfaf18cd461806d1bb`

See more details on using hashes here.

agent-rtfm 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

rtfm

Why

Install

Quick Start

Usage

Admin

Configuration

Source Types

Doc System Detection

How It Works

Ingestion Pipeline

Dual-Index Storage

Hybrid Search (RRF)

Import Health Score

Knowledge Unit Types

Architecture

HTTP API

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes