Local knowledge base CLI — hybrid search over markdown files with AI embeddings
Project description
kbx — Local Knowledge Base with Hybrid Search
Give your AI agents persistent memory. Index your markdown notes, meeting transcripts, and documentation into a hybrid search engine. Search with keywords or natural language. Everything runs locally — your data never leaves your machine.
kbx combines SQLite FTS5 full-text search with LanceDB vector search using Qwen3 embeddings — all on-device, with Apple Silicon acceleration via MLX.
You can read more about kbx's progress in the CHANGELOG.
Quick Start
# Install
pip install kbx # core CLI + FTS5 search
pip install "kbx[search]" # + vector search (Qwen3 embeddings)
pip install "kbx[search,mlx]" # + Apple Silicon acceleration
# Set up a knowledge base
kbx init # create kbx.toml in the current directory
# Index your markdown files
kbx index run # index everything under memory/
kbx index run --no-embed # text-only index (fast, no model needed)
# Search
kbx search "quarterly planning" # hybrid search (FTS5 + vector)
kbx search "quarterly planning" --fast # keyword-only (~instant, no model needed)
kbx search "MFA rollout" --json # structured output for scripts
# Browse
kbx view "memory/notes/decisions.md" # read a document
kbx view "#a1b2c3" # by content-hash prefix
kbx list --type notes --from 2026-01-01
Using with AI Agents
kbx is built for agentic workflows. The --json output format, structured error responses, and built-in agent playbook make it a natural fit for AI assistants.
# Orient: get a compressed overview of all entities (~2K tokens)
kbx context
# Search with structured output
kbx search "authentication" --fast --json --limit 5
# Look up a person
kbx person find "Alice" --json
# Timeline of everything mentioning a project
kbx person timeline "Cloud Migration" --from 2026-01-01 --json
# Take notes that persist across sessions
kbx memory add "Decision: use Postgres" --tags decision,infra --pin
kbx memory add "Promoted to Staff" --entity "Bob"
# Pin important docs to the context window
kbx pin "memory/notes/priorities.md"
When you run kbx --help, it prints an agent playbook alongside the standard CLI help — a complete reference for AI agents to self-orient and use the knowledge base effectively.
MCP Server
kbx exposes an MCP server for tighter integration with Claude Desktop, Claude Code, Cursor, and other MCP-compatible tools.
Tools exposed:
kb_search— Hybrid or FTS-only search with date/tag filterskb_person_find— Entity lookup by name, alias, or partial matchkb_person_timeline— Chronological document list for an entitykb_view— Retrieve a document by path, glob, or#hashkb_context— Compressed entity index for session orientationkb_memory_add— Create notes or record facts about entitieskb_pin/kb_unpin— Pin documents to the context windowkb_usage— Index status and usage instructions
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"kbx": {
"command": "/Users/YOU/.local/bin/kbx",
"args": ["mcp"]
}
}
}
Note: Claude Desktop does not inherit your shell PATH. Use the full path to
kbx— find it withwhich kbx(typically~/.local/bin/kbxwhen installed viauv tool install).
Claude Code (.claude/settings.local.json):
{
"mcpServers": {
"kbx": {
"command": "kbx",
"args": ["mcp"],
"type": "stdio"
}
}
}
See MCP plugin docs for full tool parameter reference.
Python API
Use kbx as a library in your own applications:
from kb import KnowledgeBase
with KnowledgeBase(thread_safe=True) as kb:
# Search
results = kb.search("cloud migration")
# Entities
people = kb.list_entities(entity_type="person")
alice = kb.get_entity("Alice")
timeline = kb.get_entity_timeline("Alice")
# Context
ctx = kb.context()
# Index
kb.index()
The KnowledgeBase class manages the full lifecycle — DB connections, embedder, auto-reindexing of stale files. All methods return Pydantic models.
See architecture docs for the full API surface.
Architecture
Write-through principle: Markdown files are the source of truth. All data writes go to flat files first; the database is a derived index rebuilt from those files. The DB is disposable — delete it and re-index.
Markdown files (source of truth)
│
▼
┌─────────────────────────────────────────────────────┐
│ Source Adapters │
│ meetings.py — walk memory/meetings/YYYY/MM/DD/ │
│ memory.py — walk memory/people/, projects/, ... │
└────────────────────────┬────────────────────────────┘
│ ParsedDocument
▼
┌─────────────────────────────────────────────────────┐
│ Indexer │
│ chunk → embed → store → link entities │
└──────────┬──────────────────────────┬───────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────────────────┐
│ SQLite │ │ LanceDB │
│ docs, chunks, │ │ Qwen3-Embedding-0.6B │
│ FTS5, entities, │ │ 1024-dim vectors │
│ facts, mentions │ │ float32, instruction-aware │
└──────────────────┘ └─────────────────────────────┘
│ │
└────────────┬─────────────┘
▼
┌─────────────────────────────────────────────────────┐
│ Hybrid Search │
│ FTS5 (BM25) + Vector → RRF Fusion → Recency Weight │
└─────────────────────────────────────────────────────┘
Search
kbx supports two search modes:
| Mode | Flag | Speed | Method |
|---|---|---|---|
| Fast | --fast |
~instant | FTS5 keyword search only |
| Hybrid | (default) | ~2s | FTS5 + vector search + RRF fusion |
Hybrid search uses Reciprocal Rank Fusion (RRF) to combine keyword and semantic results, with a 90-day half-life recency weight. A strong-signal fast path skips vector search entirely when FTS5 produces a high-confidence match.
Score interpretation: 0.8+ strong | 0.5–0.8 worth reading | <0.5 noise
See search docs for the full pipeline, score normalisation, and fusion strategy.
Entity System
kbx automatically links people, projects, teams, and glossary terms to your documents:
kbx person find "Alice" --json # profile + linked documents
kbx person timeline "Alice" # chronological mentions
kbx person create "Bob" --role "SRE Lead" --team "Platform"
kbx project find "Cloud Migration" # project profile + linked docs
kbx entity stale --days 30 # entities not mentioned recently
Entities are seeded from memory/people/*.md and memory/projects/*.md files, then linked to documents via five-tier matching: YAML tags → title participants → title substrings → source IDs → content name matching.
See entity docs for the full linking pipeline.
Sync & Ingest
Pull meeting transcripts from external sources:
# Granola API sync
kbx sync granola --since 2026-01-01
# Notion AI Meeting Notes sync
kbx sync notion --since 2026-01-01
# Granola zip export ingest
kbx ingest export.zip
# View and edit synced meeting notes
kbx granola view <calendar-uid>
kbx granola edit <calendar-uid> --append "Action: follow up with Alice"
Sync is incremental — only new or updated meetings are fetched. Attendees are automatically matched to existing entities. See Granola plugin docs for configuration.
Configuration
kbx looks for configuration in this order:
$KBX_CONFIGenvironment variable./kbx.tomlin the current directory (walk up from CWD)~/.config/kbx/config.toml
Run kbx init to generate a starter config.
Optional Extras
| Extra | What it adds |
|---|---|
search |
LanceDB + sentence-transformers + NumPy for vector search |
mlx |
MLX backend for faster embeddings on Apple Silicon |
mcp |
MCP server for AI tool integration |
all |
Everything above plus test and dev dependencies |
Install with: pip install "kbx[search,mlx,mcp]"
Requires Python 3.10+.
Data Storage
Index stored in the data directory (configurable via kbx.toml or $KB_DATA_DIR):
kbx-data/
├── metadata.db # SQLite — documents, chunks, FTS5, entities, facts
└── vectors/ # LanceDB — Qwen3 embedding vectors (1024-dim)
The database is a derived index. Delete it and kbx index run to rebuild from your markdown files.
Development
git clone https://github.com/tenfourty/kbx.git
cd kbx
uv sync --all-extras
uv run pre-commit install
uv run pytest -x -q --cov # 1361 tests, 90%+ coverage
uv run mypy src/ # strict mode
Quick CI check locally:
make ci # mirror exact GitHub CI pipeline
make fix # auto-fix lint + format issues
See CONTRIBUTING.md for guidelines and testing docs for the test strategy.
Documentation
| Doc | What it covers |
|---|---|
| Architecture | System design, data flow, module dependencies, Python API |
| Search | FTS5 + vector + RRF fusion pipeline, score normalisation |
| Entities | Entity seeding, five-tier linking, disambiguation |
| Indexing | Walk → chunk → embed → store pipeline |
| Chunking | Markdown-aware chunking strategy |
| CLI Reference | All commands and options |
| Output Formatting | JSON, table, CSV, JSONL, jq, field selection |
| Context Layer | Compressed entity index for AI agents |
| Testing | Test strategy, fixtures, markers |
| MCP Plugin | MCP server tools and resources |
| MLX Plugin | Apple Silicon embedding acceleration |
| Granola Plugin | Meeting transcript sync (view, edit, push) |
| Notion Plugin | Notion AI Meeting Notes sync |
| Integration | Ingest, migrations, search quality |
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kbx-0.1.55.tar.gz.
File metadata
- Download URL: kbx-0.1.55.tar.gz
- Upload date:
- Size: 606.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f67e72dd72b35c5417939e2fc3e7ae6e914915649453efd193ff8fcfb8cc848c
|
|
| MD5 |
584e6ef3ded5e1e7eba6a7b17911c7a7
|
|
| BLAKE2b-256 |
daacf97120a0ab4f13723aec9114976000d2105864f3055d421f6e88670de8c1
|
Provenance
The following attestation bundles were made for kbx-0.1.55.tar.gz:
Publisher:
release.yml on tenfourty/kbx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kbx-0.1.55.tar.gz -
Subject digest:
f67e72dd72b35c5417939e2fc3e7ae6e914915649453efd193ff8fcfb8cc848c - Sigstore transparency entry: 1109944120
- Sigstore integration time:
-
Permalink:
tenfourty/kbx@4717a2e74d93937afa1125a64410f4ad2e8f9a46 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tenfourty
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4717a2e74d93937afa1125a64410f4ad2e8f9a46 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kbx-0.1.55-py3-none-any.whl.
File metadata
- Download URL: kbx-0.1.55-py3-none-any.whl
- Upload date:
- Size: 155.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb08ee507e0dc238bfef6469cb699063036383e04d6b04197cf495e03596c7c6
|
|
| MD5 |
bfa08597e9ce14f7135f273dcf8e6b62
|
|
| BLAKE2b-256 |
21de8888f03cc5122fd951574bd1f012c7d7f79d66f3c7b5f17f017ede5be247
|
Provenance
The following attestation bundles were made for kbx-0.1.55-py3-none-any.whl:
Publisher:
release.yml on tenfourty/kbx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kbx-0.1.55-py3-none-any.whl -
Subject digest:
bb08ee507e0dc238bfef6469cb699063036383e04d6b04197cf495e03596c7c6 - Sigstore transparency entry: 1109944125
- Sigstore integration time:
-
Permalink:
tenfourty/kbx@4717a2e74d93937afa1125a64410f4ad2e8f9a46 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tenfourty
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4717a2e74d93937afa1125a64410f4ad2e8f9a46 -
Trigger Event:
push
-
Statement type: