Skip to main content

Local knowledge base CLI — hybrid search over markdown files with AI embeddings

Project description

kbx — Local Knowledge Base with Hybrid Search

Give your AI agents persistent memory. Index your markdown notes, meeting transcripts, and documentation into a hybrid search engine. Search with keywords or natural language. Everything runs locally — your data never leaves your machine.

kbx combines SQLite FTS5 full-text search with LanceDB vector search using Qwen3 embeddings — all on-device, with Apple Silicon acceleration via MLX.

You can read more about kbx's progress in the CHANGELOG.

Quick Start

# Install
pip install kbx                        # core CLI + FTS5 search
pip install "kbx[search]"              # + vector search (Qwen3 embeddings)
pip install "kbx[search,mlx]"          # + Apple Silicon acceleration

# Set up a knowledge base
kbx init                               # create kbx.toml in the current directory

# Index your markdown files
kbx index run                          # index everything under memory/
kbx index run --no-embed               # text-only index (fast, no model needed)

# Search
kbx search "quarterly planning"        # hybrid search (FTS5 + vector)
kbx search "quarterly planning" --fast # keyword-only (~instant, no model needed)
kbx search "MFA rollout" --json        # structured output for scripts

# Browse
kbx view "memory/notes/decisions.md"   # read a document
kbx view "#a1b2c3"                     # by content-hash prefix
kbx list --type notes --from 2026-01-01

Using with AI Agents

kbx is built for agentic workflows. The --json output format, structured error responses, and built-in agent playbook make it a natural fit for AI assistants.

# Orient: get a compressed overview of all entities (~2K tokens)
kbx context

# Search with structured output
kbx search "authentication" --fast --json --limit 5

# Look up a person
kbx person find "Alice" --json

# Timeline of everything mentioning a project
kbx person timeline "Cloud Migration" --from 2026-01-01 --json

# Take notes that persist across sessions
kbx memory add "Decision: use Postgres" --tags decision,infra --pin
kbx memory add "Promoted to Staff" --entity "Bob"

# Pin important docs to the context window
kbx pin "memory/notes/priorities.md"

When you run kbx --help, it prints an agent playbook alongside the standard CLI help — a complete reference for AI agents to self-orient and use the knowledge base effectively.

MCP Server

kbx exposes an MCP server for tighter integration with Claude Desktop, Claude Code, Cursor, and other MCP-compatible tools.

Tools exposed:

  • kb_search — Hybrid or FTS-only search with date/tag filters
  • kb_person_find — Entity lookup by name, alias, or partial match
  • kb_person_timeline — Chronological document list for an entity
  • kb_view — Retrieve a document by path, glob, or #hash
  • kb_context — Compressed entity index for session orientation
  • kb_memory_add — Create notes or record facts about entities
  • kb_pin / kb_unpin — Pin documents to the context window
  • kb_usage — Index status and usage instructions

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "kbx": {
      "command": "/Users/YOU/.local/bin/kbx",
      "args": ["mcp"]
    }
  }
}

Note: Claude Desktop does not inherit your shell PATH. Use the full path to kbx — find it with which kbx (typically ~/.local/bin/kbx when installed via uv tool install).

Claude Code (.claude/settings.local.json):

{
  "mcpServers": {
    "kbx": {
      "command": "kbx",
      "args": ["mcp"],
      "type": "stdio"
    }
  }
}

See MCP plugin docs for full tool parameter reference.

Python API

Use kbx as a library in your own applications:

from kb import KnowledgeBase

with KnowledgeBase(thread_safe=True) as kb:
    # Search
    results = kb.search("cloud migration")

    # Entities
    people = kb.list_entities(entity_type="person")
    alice = kb.get_entity("Alice")
    timeline = kb.get_entity_timeline("Alice")

    # Context
    ctx = kb.context()

    # Index
    kb.index()

The KnowledgeBase class manages the full lifecycle — DB connections, embedder, auto-reindexing of stale files. All methods return Pydantic models.

See architecture docs for the full API surface.

Architecture

Write-through principle: Markdown files are the source of truth. All data writes go to flat files first; the database is a derived index rebuilt from those files. The DB is disposable — delete it and re-index.

Markdown files (source of truth)
        │
        ▼
┌─────────────────────────────────────────────────────┐
│                   Source Adapters                     │
│  meetings.py — walk memory/meetings/YYYY/MM/DD/     │
│  memory.py   — walk memory/people/, projects/, ...  │
└────────────────────────┬────────────────────────────┘
                         │ ParsedDocument
                         ▼
┌─────────────────────────────────────────────────────┐
│                      Indexer                          │
│  chunk → embed → store → link entities               │
└──────────┬──────────────────────────┬───────────────┘
           │                          │
           ▼                          ▼
┌──────────────────┐    ┌─────────────────────────────┐
│     SQLite        │    │         LanceDB              │
│  docs, chunks,    │    │  Qwen3-Embedding-0.6B        │
│  FTS5, entities,  │    │  1024-dim vectors             │
│  facts, mentions  │    │  float32, instruction-aware   │
└──────────────────┘    └─────────────────────────────┘
           │                          │
           └────────────┬─────────────┘
                        ▼
┌─────────────────────────────────────────────────────┐
│                  Hybrid Search                       │
│  FTS5 (BM25) + Vector → RRF Fusion → Recency Weight │
└─────────────────────────────────────────────────────┘

Search

kbx supports two search modes:

Mode Flag Speed Method
Fast --fast ~instant FTS5 keyword search only
Hybrid (default) ~2s FTS5 + vector search + RRF fusion

Hybrid search uses Reciprocal Rank Fusion (RRF) to combine keyword and semantic results, with a 90-day half-life recency weight. A strong-signal fast path skips vector search entirely when FTS5 produces a high-confidence match.

Score interpretation: 0.8+ strong | 0.5–0.8 worth reading | <0.5 noise

See search docs for the full pipeline, score normalisation, and fusion strategy.

Entity System

kbx automatically links people, projects, teams, and glossary terms to your documents:

kbx person find "Alice" --json        # profile + linked documents
kbx person timeline "Alice"           # chronological mentions
kbx person create "Bob" --role "SRE Lead" --team "Platform"
kbx project find "Cloud Migration"    # project profile + linked docs
kbx entity stale --days 30            # entities not mentioned recently

Entities are seeded from memory/people/*.md and memory/projects/*.md files, then linked to documents via five-tier matching: YAML tags → title participants → title substrings → source IDs → content name matching.

See entity docs for the full linking pipeline.

Sync & Ingest

Pull meeting transcripts from external sources:

# Granola API sync
kbx sync granola --since 2026-01-01

# Notion AI Meeting Notes sync
kbx sync notion --since 2026-01-01

# Granola zip export ingest
kbx ingest export.zip

# View and edit synced meeting notes
kbx granola view <calendar-uid>
kbx granola edit <calendar-uid> --append "Action: follow up with Alice"

Sync is incremental — only new or updated meetings are fetched. Attendees are automatically matched to existing entities. See Granola plugin docs for configuration.

Configuration

kbx looks for configuration in this order:

  1. $KBX_CONFIG environment variable
  2. ./kbx.toml in the current directory (walk up from CWD)
  3. ~/.config/kbx/config.toml

Run kbx init to generate a starter config.

Optional Extras

Extra What it adds
search LanceDB + sentence-transformers + NumPy for vector search
mlx MLX backend for faster embeddings on Apple Silicon
mcp MCP server for AI tool integration
all Everything above plus test and dev dependencies

Install with: pip install "kbx[search,mlx,mcp]"

Requires Python 3.10+.

Data Storage

Index stored in the data directory (configurable via kbx.toml or $KB_DATA_DIR):

kbx-data/
├── metadata.db        # SQLite — documents, chunks, FTS5, entities, facts
└── vectors/           # LanceDB — Qwen3 embedding vectors (1024-dim)

The database is a derived index. Delete it and kbx index run to rebuild from your markdown files.

Development

git clone https://github.com/tenfourty/kbx.git
cd kbx
uv sync --all-extras
uv run pre-commit install
uv run pytest -x -q --cov           # 1361 tests, 90%+ coverage
uv run mypy src/                     # strict mode

Quick CI check locally:

make ci                              # mirror exact GitHub CI pipeline
make fix                             # auto-fix lint + format issues

See CONTRIBUTING.md for guidelines and testing docs for the test strategy.

Documentation

Doc What it covers
Architecture System design, data flow, module dependencies, Python API
Search FTS5 + vector + RRF fusion pipeline, score normalisation
Entities Entity seeding, five-tier linking, disambiguation
Indexing Walk → chunk → embed → store pipeline
Chunking Markdown-aware chunking strategy
CLI Reference All commands and options
Output Formatting JSON, table, CSV, JSONL, jq, field selection
Context Layer Compressed entity index for AI agents
Testing Test strategy, fixtures, markers
MCP Plugin MCP server tools and resources
MLX Plugin Apple Silicon embedding acceleration
Granola Plugin Meeting transcript sync (view, edit, push)
Notion Plugin Notion AI Meeting Notes sync
Integration Ingest, migrations, search quality

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kbx-0.1.56.tar.gz (607.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kbx-0.1.56-py3-none-any.whl (155.9 kB view details)

Uploaded Python 3

File details

Details for the file kbx-0.1.56.tar.gz.

File metadata

  • Download URL: kbx-0.1.56.tar.gz
  • Upload date:
  • Size: 607.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kbx-0.1.56.tar.gz
Algorithm Hash digest
SHA256 dda243fb9699d46bdb45dbbcf37758c5a0575d833802fc14fab9c5d5dd6a3127
MD5 83640d3118617d7c3cb89289efa858b1
BLAKE2b-256 699da37934ff09070eefb6c881c40177036f1c015d31de35a4105535505b493b

See more details on using hashes here.

Provenance

The following attestation bundles were made for kbx-0.1.56.tar.gz:

Publisher: release.yml on tenfourty/kbx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kbx-0.1.56-py3-none-any.whl.

File metadata

  • Download URL: kbx-0.1.56-py3-none-any.whl
  • Upload date:
  • Size: 155.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kbx-0.1.56-py3-none-any.whl
Algorithm Hash digest
SHA256 927094939b7d49ec32680e1c44744b528c576174ea8a8b4119ead2b56e7a796d
MD5 694ee6bca7a9eaf5a491d33723528f5c
BLAKE2b-256 8f5ebd38839a53fd618170fa1a609451669ac6ff3a6520c0e82d8eb3a040bbb2

See more details on using hashes here.

Provenance

The following attestation bundles were made for kbx-0.1.56-py3-none-any.whl:

Publisher: release.yml on tenfourty/kbx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page