Skip to main content

Local CLI search engine for personal knowledge bases with hybrid BM25 + vector search

Project description

mykb

本地知识库搜索引擎 CLI,支持 BM25 全文搜索 + 向量语义搜索的混合检索。

特性

  • Hybrid Search: BM25 全文 + 向量语义混合搜索
  • 本地 Embedding: 使用 embeddinggemma-300m 本地生成 embedding,无需 API
  • 多数据源: 目前支持 Obsidian vault,可扩展 Twitter、Telegram 等
  • 增量索引: 基于内容 hash,只索引变化的文档
  • 可插拔后端: 支持 Meilisearch 和 SeekDB

安装

# 基础安装(Meilisearch 后端)
pip install mykb

# 带 SeekDB 支持
pip install mykb[seekdb]

依赖

  • Meilisearch 后端: 需要运行 Meilisearch 服务
  • SeekDB 后端: 嵌入式模式无需外部服务(macOS 15+ / Linux)

快速开始

# 1. 配置 Meilisearch
mykb config set meilisearch.url http://localhost:7700

# 2. 添加 collection(Obsidian vault)
mykb collection add my-notes --path ~/Documents/Obsidian --source obsidian

# 3. 索引(含 embedding)
mykb index my-notes --embed

# 4. 搜索
mykb search "机器学习"          # BM25 全文
mykb vsearch "AI 技术趋势"      # 向量语义
mykb query "深度学习入门" --ratio 0.5  # 混合搜索

配置

配置文件: ~/.mykb/config.toml

[backend]
type = "meilisearch"  # 或 "seekdb"

[backend.meilisearch]
url = "http://localhost:7700"
api_key = ""

[backend.seekdb]
path = "~/.mykb/seekdb.db"  # 嵌入式模式
fulltext_analyzer = "ik"     # ik | space | ngram

[embedding]
model = "google/embeddinggemma-300m"
chunk_size = 800
chunk_overlap = 0.15

[collections.my-notes]
source = "obsidian"
path = "/path/to/vault"
mask = "**/*.md"
exclude = [".obsidian/**", ".trash/**"]

命令

命令 说明
mykb collection add/ls/rm 管理 collection
mykb index [--embed] [--full] 索引文档
mykb embed [--full] 补充 embedding
mykb search <query> BM25 全文搜索
mykb vsearch <query> 向量语义搜索
mykb query <query> [--ratio] 混合搜索
mykb status 查看状态

后端对比

特性 Meilisearch SeekDB
部署 独立服务 嵌入式 / 服务
BM25 速度 ⚡ 快 (~2ms) 慢 (~30ms)
Vector 速度 快 (~3ms) ⚡ 更快 (~1ms)
Hybrid 速度 ⚡ 快 (~4ms) 慢 (~33ms)
资源占用

建议:

  • 通用场景用 Meilisearch
  • 纯向量搜索场景可考虑 SeekDB

开发

# 安装开发依赖
uv sync --all-extras

# 运行测试
uv run pytest

# Benchmark
uv run python scripts/benchmark_seekdb.py --backend meilisearch -n 1000

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mykb-0.1.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mykb-0.1.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file mykb-0.1.0.tar.gz.

File metadata

  • Download URL: mykb-0.1.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mykb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 81673d06f6e052c7e2be293a15c2c5dc244b1cc86945b440d2eb93dbb9613221
MD5 4f58cf82e17fe19e6c920604f2a6576a
BLAKE2b-256 ea536b4e3ae7edd120679a0ebf4f72c1094de5c952918389cc93d7558c4f9165

See more details on using hashes here.

File details

Details for the file mykb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mykb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mykb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e338f96c76e357f3caf35e05692ff886bb396cc2f616e84862b556e5253a6d87
MD5 b66b4be841bd939d9ab0991c0df760c3
BLAKE2b-256 57829c4eed7f7ee161e633ebb0d8e952ed66ba5fdab1c4a02df2db14961e5465

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page