Structure-aware document retrieval. FTS5/BM25 keyword matching over document trees.
Project description
doclens
Structure-aware document retrieval — FTS5/BM25 keyword search over document trees, with an interactive TUI and a PWA Web UI.
doclens parses documents into tree structures (headings, classes, functions…) and searches them with FTS5/BM25 keyword matching — no embeddings, no chunking, no vector DB required. Works entirely offline.
Features
| Structure-aware search | Returns results anchored to document headings, code classes, or function definitions — not orphaned line fragments |
| Multi-format | Markdown, PDF, DOCX, PPTX, Excel, HTML, JSON, CSV, code (Python AST + tree-sitter) |
| Two UIs | Textual TUI (terminal) and Lit + Shoelace PWA (browser) |
| LLM-augmented QA | Send search results to Anthropic Claude for natural-language answers |
| Background watching | Auto-reindexes changed files via watchdog |
| Web search | Fetch + extract public web pages as markdown before searching |
Installation
pip install doclens
Requires Python ≥ 3.10.
Quick setup:
# Index your documents
doclens index --force
# Search from CLI
doclens search "authentication"
# Or launch the Web UI (opens browser automatically)
doclens gui
CLI Reference
doclens [--workdir DIR] <command>
| Command | Description |
|---|---|
doclens search <query…> |
Keyword search across indexed documents |
doclens search_v2 '<json>' |
Structured search: AND / OR / NOT / PHRASE operators |
doclens ai <message…> |
Send a message to the Claude agent |
doclens index [--force] |
Build or update the document index |
doclens status |
Show index statistics and system status |
doclens gui [--port PORT] |
Launch the Web UI (PWA) |
doclens read_document --path <path> |
Read a document with structure info |
doclens web <query…> |
Search the live web |
doclens webfetch <url> |
Extract a web page as markdown |
doclens grep <pattern> |
Ripgrep-style regex search |
Quick Start
1. Index your documents
# Index the current directory
doclens index --force
# Or specify a working directory
doclens --workdir /path/to/project index
doclens automatically discovers supported files (.md, .py, .pdf, .docx, .xlsx, …) and skips common ignore patterns (.git, node_modules, __pycache__, .venv).
2. Search
doclens search "authentication flow"
doclens search "量子 计算" # Chinese supported via jieba
# Structured query
doclens search_v2 '{"type": "and", "terms": ["auth", "token"]}'
3. Interactive TUI
doclens
Opens the full terminal UI with live preview, command history, and keyboard navigation.
4. Web UI
doclens gui
# INFO: Uvicorn running on http://127.0.0.1:7860
Browser opens automatically. Port may vary if 7860 is in use — check the startup log.
5. Ask the AI
doclens ai "How does the authentication system work?"
doclens first retrieves relevant document sections, then sends them to Anthropic Claude as context for a grounded answer.
Configuration
doclens reads .env in the project root. Copy and customize:
cp doclens/.env.example .env
Key variables:
| Variable | Default | Description |
|---|---|---|
CORTEX_SEARCH_PATH |
. |
Root directory to index and search |
CORTEX_DB_PATH |
.cortex/sessions.db |
SQLite database path |
ANTHROPIC_API_KEY |
— | Required for ai and web commands |
ANTHROPIC_BASE_URL |
— | Custom API endpoint (optional) |
Architecture
┌─────────────────────────────────────────────┐
│ TUI (Textual) │
│ ┌───────────────────────────────────────┐ │
│ │ HeaderBar │ ContentArea │ InputBox │ │
│ └───────────────────────────────────────┘ │
└────────────────────┬────────────────────────┘
│
┌────────────────────▼────────────────────────┐
│ Web UI (Lit + Shoelace PWA) │
│ FastAPI + SSE streaming │
└────────────────────┬────────────────────────┘
│
┌────────────────────▼────────────────────────┐
│ IndexManager + Scoring │
│ TreeSearch (FTS5 + BM25) │
└────────────────────┬────────────────────────┘
│
┌────────────────────▼────────────────────────┐
│ treesearch/ — parsers, indexer, FTS5 │
│ planify/ — AI agent runner │
└─────────────────────────────────────────────┘
- treesearch: Powers the indexing and retrieval engine (FTS5/BM25 over document trees)
- planify: Drives the AI agent, session management, and tool execution
- doclens: Ties them together — CLI, TUI, Web UI, event bus, and file watcher
License
Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doclens-1.1.3.tar.gz.
File metadata
- Download URL: doclens-1.1.3.tar.gz
- Upload date:
- Size: 417.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a9980c00f1b16a1a089de3636855b68a2e2e12c4d9f2cf459a4d040700009b8
|
|
| MD5 |
9b44851590ed3ac8da6373981fe8993a
|
|
| BLAKE2b-256 |
c28ee41061eb43336ebd253ff709f20831553e62cc0ae4f9832a4d0705f0f020
|
File details
Details for the file doclens-1.1.3-py3-none-any.whl.
File metadata
- Download URL: doclens-1.1.3-py3-none-any.whl
- Upload date:
- Size: 479.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67aa7678ebaf31ddb25b03e8d4c0786f90e9893d086102fccd68de2290f2de10
|
|
| MD5 |
579d6bbd8672d9c31ccd0cb29bc55b37
|
|
| BLAKE2b-256 |
a5f46b3bf6879b4f77c0e8b3dcbd973666a2270aff4adbfc6c66dc9ff09f3598
|