Skip to main content

An on-device hybrid search engine for Markdown documents

Project description

QMD-Py — Query Markup Documents

中文文档

An on-device hybrid search engine for Markdown documents. Index your notes, docs, and knowledge bases — search with keywords or natural language. Python port faithfully replicating the core algorithms of qmd.

QMD-Py combines BM25 full-text search, vector semantic search, and LLM re-ranking — all running locally. Supports llama-cpp-python (GGUF models), sentence-transformers, and FlagEmbedding backends.

Install

pip install qmd                # core
pip install "qmd[mvp]"         # + LLM backends (sentence-transformers, llama-cpp, etc.)
pip install "qmd[mcp]"         # + MCP server for Claude Desktop
pip install "qmd[mvp,mcp]"     # everything

Quick Start

# Add a collection
qmd add notes ~/notes
qmd add docs ~/Documents/docs --pattern "**/*.md"

# Add context (helps LLM understand what docs are about)
qmd context add notes "" "Personal notes and ideas"
qmd context add docs "api" "API documentation"

# Generate embeddings
qmd embed

# Search
qmd search "project progress"          # BM25 keyword search
qmd query "how to deploy the service"  # Hybrid search + reranking (best quality)

# Get a document
qmd get qmd://notes/meeting.md
qmd get "#abc123"                      # by docid

# List files
qmd ls
qmd ls notes

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    QMD-Py Hybrid Search Pipeline                │
└─────────────────────────────────────────────────────────────────┘

                           ┌──────────────┐
                           │  User Query  │
                           └──────┬───────┘
                                  │
                   ┌──────────────┴──────────────┐
                   ▼                              ▼
          ┌────────────────┐            ┌─────────────────┐
          │ Query Expansion│            │  Original Query  │
          │  (fine-tuned)  │            │   (×2 weight)    │
          └───────┬────────┘            └────────┬────────┘
                  │                              │
                  │  lex / vec / hyde variants    │
                  └──────────────┬───────────────┘
                                 │
           ┌─────────────────────┼─────────────────────┐
           ▼                     ▼                     ▼
     ┌───────────┐         ┌───────────┐         ┌───────────┐
     │ BM25+Vec  │         │ BM25+Vec  │         │ BM25+Vec  │
     │ (original)│         │(expanded 1)│        │(expanded 2)│
     └─────┬─────┘         └─────┬─────┘         └─────┬─────┘
           │                     │                     │
           └─────────────────────┼─────────────────────┘
                                 ▼
                    ┌─────────────────────────┐
                    │   RRF Fusion (k=60)     │
                    │   Original ×2 weight     │
                    │   Top-rank bonus: +0.05  │
                    │   Top 40 candidates      │
                    └────────────┬────────────┘
                                 ▼
                    ┌─────────────────────────┐
                    │     LLM Re-ranking      │
                    │   (qwen3-reranker)      │
                    └────────────┬────────────┘
                                 ▼
                    ┌─────────────────────────┐
                    │  Position-Aware Blend   │
                    │  Rank 1-3:  75% RRF     │
                    │  Rank 4-10: 60% RRF     │
                    │  Rank 11+:  40% RRF     │
                    └─────────────────────────┘

Retrieval Algorithm

Score Normalization

Backend Raw Score Transform Range
BM25 (FTS5) SQLite FTS5 BM25 abs(score) 0 ~ 25+
Vector Cosine distance 1 / (1 + distance) 0.0 ~ 1.0
Reranker LLM 0-10 rating score / 10 0.0 ~ 1.0

Fusion Strategy

The query command uses Reciprocal Rank Fusion (RRF) with position-aware blending:

  1. Query Expansion: Original query (×2 weight) + LLM-generated variant queries
  2. Parallel Retrieval: Each query searches both FTS and vector indexes
  3. RRF Fusion: score = Σ(1/(k+rank+1)), k=60
  4. Top-Rank Bonus: +0.05 for #1 in any list, +0.02 for #2-3
  5. Strong Signal Detection: Skip expansion when BM25 top-1 ≥ 0.85 and gap to top-2 ≥ 0.15
  6. Top-K Selection: Top 40 candidates enter re-ranking
  7. LLM Re-ranking: Score each chunk (not full document)
  8. Position-Aware Blending:
    • RRF rank 1-3: 75% retrieval / 25% reranker (protect exact matches)
    • RRF rank 4-10: 60% retrieval / 40% reranker
    • RRF rank 11+: 40% retrieval / 60% reranker (trust reranker)

Score Interpretation

Score Meaning
0.8 – 1.0 Highly relevant
0.5 – 0.8 Moderately relevant
0.2 – 0.5 Somewhat relevant
0.0 – 0.2 Low relevance

Smart Chunking

Documents are split into ~900-token chunks with 15% overlap, using a breakpoint detection algorithm to find natural split points:

Pattern Score Description
# Heading 100 H1 heading
## Heading 90 H2 heading
### Heading 80 H3 heading
#### ~ ###### 70–50 H4–H6
``` 80 Code fence boundary
--- / *** 60 Horizontal rule
Blank line 20 Paragraph boundary
- item / 1. item 5 List item
Newline 1 Minimum breakpoint

Algorithm: When approaching the 900-token target, search the preceding 200-token window for the best breakpoint. Score decay: finalScore = baseScore × (1 - (distance/window)² × 0.7). Breakpoints inside code fences are ignored — code stays intact.

Context System

Context is a core QMD feature — attach descriptive metadata to paths so LLMs understand what documents are about.

# Collection level
qmd context add notes "" "Personal notes and ideas"

# Sub-path level
qmd context add notes "work" "Work-related notes"
qmd context add notes "work/meetings" "Meeting notes"

# Hierarchical inheritance: searching notes/work/meetings/2024.md
# returns all matching contexts concatenated:
# → "Personal notes and ideas\nWork-related notes\nMeeting notes"

# List all contexts
qmd context list

# Remove
qmd context remove notes "work/meetings"

CLI Commands

Collection Management

qmd add <name> <path> [--pattern "**/*.md"]   # Add collection
qmd remove <name>                              # Remove collection
qmd collection rename <old> <new>              # Rename
qmd list                                       # List all collections
qmd ls [collection]                            # List files
qmd update [name]                              # Re-index
qmd status                                     # Index status

Search

qmd search <query> [-c collection] [-n 10]     # BM25 search
qmd query <query> [-c collection] [-n 10]      # Hybrid search + reranking

Output Formats

--format cli     # Default terminal format
--format json    # JSON (for agent consumption)
--format csv     # CSV
--format xml     # XML
--format md      # Markdown
--format files   # Simple file list: docid,score,filepath,context
--full           # Show full content
--line-numbers   # Show line numbers

Document Operations

qmd get <file> [-c collection]                 # Get document
qmd get qmd://notes/file.md                    # Virtual path
qmd get "#abc123"                              # By docid
qmd get file.md:42 --max-lines 20             # Line range
qmd embed [--force]                            # Generate embeddings
qmd cleanup                                    # Clean orphaned data + VACUUM

MCP Server

QMD-Py provides an MCP (Model Context Protocol) server via stdio transport for use with Claude Desktop and other MCP clients.

Tools:

  • qmd_search — BM25 keyword search
  • qmd_deep_search — Hybrid search + query expansion + reranking
  • qmd_vector_search — Vector semantic search
  • qmd_get — Get document (path or docid, with fuzzy match suggestions)
  • qmd_index — Index/update collection
  • qmd_status — Index health status
  • qmd_collections — List collections

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "qmd": {
      "command": "qmd",
      "args": ["serve"]
    }
  }
}

LLM Backends

QMD-Py supports three backends, auto-selected by priority:

llama-cpp-python (recommended)

Uses GGUF models, same as the original qmd:

Model Purpose Size
embeddinggemma-300M-Q8_0 Vector embedding ~300MB
qwen3-reranker-0.6b-q8_0 Re-ranking ~640MB
qmd-query-expansion-1.7B-Q4_K_M Query expansion ~1.1GB

Models are downloaded from HuggingFace and cached in ~/.cache/qmd/models/.

sentence-transformers (fallback)

Pure Python embedding — no llama-cpp compilation needed. Good for quick testing.

FlagEmbedding

Dedicated reranker backend (FlagReranker), can be combined with other backends.

Data Storage

Database: ~/.config/qmd/qmd.db (SQLite)

collections     -- Collection directory config
path_contexts   -- Path context descriptions
documents       -- Document metadata (path, title, hash, active)
documents_fts   -- FTS5 full-text index
content         -- Document content (content-addressable, SHA256 dedup)
content_vectors -- Embedding chunks (hash, seq, pos)
vectors_vec     -- sqlite-vec vector index
llm_cache       -- LLM response cache (query expansion, rerank)

Config file: ~/.config/qmd/qmd.yaml

Environment Variables

Variable Default Description
QMD_CONFIG_DIR ~/.config/qmd Config directory
QMD_DATA_DIR ~/.cache/qmd Data/cache directory
XDG_CONFIG_HOME ~/.config XDG config root
XDG_CACHE_HOME ~/.cache XDG cache root

Requirements

  • Python >= 3.11
  • SQLite >= 3.35 (FTS5 support)
  • GPU (optional): CUDA or Apple MPS for accelerated embedding/reranking

Development

git clone https://github.com/iomgaa-ycz/qmd-py.git
cd qmd-py
pip install "qmd[mvp,mcp,dev]"
pytest tests/ -v

Project Structure

qmd/
├── core/
│   ├── db.py           # SQLite database layer (schema, CRUD, FTS5, sqlite-vec)
│   ├── config.py       # YAML config management, collection/context operations
│   ├── store.py        # Document indexing (content-addressable storage, incremental updates)
│   ├── retrieval.py    # Hybrid retrieval engine (BM25 + Vector + RRF + Rerank)
│   ├── chunking.py     # Smart chunking (breakpoint detection, code fence protection)
│   ├── document.py     # Document lookup helpers (docid, fuzzy match, glob, cleanup)
│   └── watcher.py      # File watcher (watchdog, auto-index on change)
├── cli/
│   ├── main.py         # CLI entry point (argparse, all commands)
│   └── formatter.py    # Output formatting (JSON/CSV/XML/MD/Files)
├── llm/
│   ├── base.py         # LLM abstract interface
│   ├── llama_cpp.py    # llama-cpp-python backend
│   ├── sentence_tf.py  # sentence-transformers backend
│   ├── flagembed.py    # FlagEmbedding reranker backend
│   └── models.py       # Model config, GPU detection
├── mcp/
│   └── server.py       # MCP Server (stdio transport)
├── utils/
│   ├── paths.py        # Path utilities, VirtualPath (qmd://)
│   ├── snippet.py      # Snippet extraction, title extraction
│   └── hashing.py      # SHA256 content hash
└── __init__.py         # create_store() / create_llm_backend() entry points

Acknowledgements

Python port of qmd by Tobias Lütke. Core retrieval algorithms, chunking strategy, and fusion logic faithfully replicate the original design.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qmd-0.1.1.tar.gz (107.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qmd-0.1.1-py3-none-any.whl (74.5 kB view details)

Uploaded Python 3

File details

Details for the file qmd-0.1.1.tar.gz.

File metadata

  • Download URL: qmd-0.1.1.tar.gz
  • Upload date:
  • Size: 107.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for qmd-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5d418bea8d023eed7a32944fa261f5d3059e5bb74de1ef4b1fe901e102135c83
MD5 a521317a4489151d1246eb9d18f00e66
BLAKE2b-256 c862ff5233f70c6760c5cd338d033ce300b99ab748816907eb61e29273f4e2c2

See more details on using hashes here.

File details

Details for the file qmd-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: qmd-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 74.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for qmd-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e78e5738be3635252e06ed853f48fdcf107044731f4699621a000d007da63c78
MD5 1485914f9643ab786c92199fdd4a050e
BLAKE2b-256 87404c12efc61ee12aab051cb3ba693e337bd102e412d5898e512b0e4896e52f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page