Skip to main content

Local CLI search engine for personal knowledge bases with hybrid BM25 + vector search

Project description

mykb

本地知识库搜索引擎 CLI,支持 BM25 全文搜索 + 向量语义搜索的混合检索。

特性

  • Hybrid Search: BM25 全文 + 向量语义混合搜索
  • 本地 Embedding: 使用 embeddinggemma-300m 本地生成 embedding,无需 API
  • 多数据源: 目前支持 Obsidian vault,可扩展 Twitter、Telegram 等
  • 增量索引: 基于内容 hash,只索引变化的文档
  • 可插拔后端: 支持 Meilisearch 和 SeekDB

安装

# 基础安装(Meilisearch 后端)
pip install mykb

# 带 SeekDB 支持
pip install mykb[seekdb]

依赖

  • Meilisearch 后端: 需要运行 Meilisearch 服务
  • SeekDB 后端: 嵌入式模式无需外部服务(macOS 15+ / Linux)

快速开始

# 1. 配置 Meilisearch
mykb config set meilisearch.url http://localhost:7700

# 2. 添加 collection(Obsidian vault)
mykb collection add my-notes --path ~/Documents/Obsidian --source obsidian

# 3. 索引(含 embedding)
mykb index my-notes --embed

# 4. 搜索
mykb search "机器学习"          # BM25 全文
mykb vsearch "AI 技术趋势"      # 向量语义
mykb query "深度学习入门" --ratio 0.5  # 混合搜索

配置

配置文件: ~/.mykb/config.toml

[backend]
type = "meilisearch"  # 或 "seekdb"

[backend.meilisearch]
url = "http://localhost:7700"
api_key = ""

[backend.seekdb]
path = "~/.mykb/seekdb.db"  # 嵌入式模式
fulltext_analyzer = "ik"     # ik | space | ngram

[embedding]
model = "google/embeddinggemma-300m"
chunk_size = 800
chunk_overlap = 0.15

[collections.my-notes]
source = "obsidian"
path = "/path/to/vault"
mask = "**/*.md"
exclude = [".obsidian/**", ".trash/**"]

命令

命令 说明
mykb collection add/ls/rm 管理 collection
mykb index [--embed] [--full] 索引文档
mykb embed [--full] 补充 embedding
mykb search <query> BM25 全文搜索
mykb vsearch <query> 向量语义搜索
mykb query <query> [--ratio] 混合搜索
mykb status 查看状态

后端对比

特性 Meilisearch SeekDB
部署 独立服务 嵌入式 / 服务
BM25 速度 ⚡ 快 (~2ms) 慢 (~30ms)
Vector 速度 快 (~3ms) ⚡ 更快 (~1ms)
Hybrid 速度 ⚡ 快 (~4ms) 慢 (~33ms)
资源占用

建议:

  • 通用场景用 Meilisearch
  • 纯向量搜索场景可考虑 SeekDB

开发

# 安装开发依赖
uv sync --all-extras

# 运行测试
uv run pytest

# Benchmark
uv run python scripts/benchmark_seekdb.py --backend meilisearch -n 1000

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mykb-0.2.0.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mykb-0.2.0-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file mykb-0.2.0.tar.gz.

File metadata

  • Download URL: mykb-0.2.0.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mykb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1639db8bbe739ccbc4daf036f4a627aecd54ea084d5b9b1d71974a923da80cde
MD5 f9ccb9e8c5c1f829dec5f0979d1fef55
BLAKE2b-256 a8e3f8625e5ccd2f6dd922d3bb8b178b1311aa3b2d7bbee0483216dedd877649

See more details on using hashes here.

File details

Details for the file mykb-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mykb-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mykb-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d852c2cbc8ba13109526950a9dbd10e6de2483ab382cdb37fd60e60a9c9f88e0
MD5 7d5a2c0c94f984336f64b16c83e6bd64
BLAKE2b-256 6212d2bebb3511f9d2051f65cd82dc912e42bdaf50b709a46e53003a6a1e9c02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page