Local knowledge base for documents and code. Search, ask questions, or chat — standalone or as an AI agent backend via MCP. Fully offline with Ollama.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tobocop

These details have not been verified by PyPI

Project description

lilbee

Beta — feedback and bug reports welcome. Open an issue.

Platforms

Chat with your documents locally using Ollama — or plug into AI agents as a retrieval backend via MCP. Indexes PDFs (including scanned via vision OCR), Office docs, spreadsheets, images, and code with a git-like per-project model. Powered by Kreuzberg for text extraction, Ollama for embeddings and chat, and LanceDB for vector storage.

Why lilbee
Demos
Install
Quick start · Full usage guide
Agent integration
HTTP Server · API reference
Interactive chat
Supported formats

Why lilbee

lilbee indexes documents and code into a searchable local knowledge base. Use it standalone — search, ask questions, chat — or plug it into AI coding agents as a retrieval backend via MCP.

Most tools like this only handle code. lilbee handles PDFs, Word docs, spreadsheets, images (OCR) — and code too, with AST-aware chunking.

Standalone knowledge base — add documents, search, ask questions, or chat interactively with model switching and slash commands
AI agent backend — MCP server and JSON CLI so coding agents can search your indexed docs as context
Per-project databases — lilbee init creates a .lilbee/ directory (like .git/) so each project gets its own isolated index
Documents and code alike — PDFs, Office docs, spreadsheets, images, ebooks, and 150+ code languages via tree-sitter
Open-source — runs with Ollama and LanceDB, no cloud APIs or Docker required

Add files (lilbee add), then search or ask questions. Once indexed, search works without Ollama — agents use their own LLM to reason over the retrieved chunks.

Demos

Click the ▶ arrows below to expand each demo.

AI agent — lilbee search vs web search (detailed analysis)

opencode + minimax-m2.5-free, single prompt, no follow-ups. The Godot 4.4 XML class reference (917 files) is indexed in lilbee. The baseline uses Exa AI code search instead.

⚠️ Caution: minimax-m2.5-free is a cloud model — retrieved chunks are sent to an external API. Use a local model if your documents are private.

	API hallucinations	Lines
With lilbee (code · config)	0	261
Without lilbee (code · config)	4 (~22% error rate)	213

With lilbee — all Godot API calls match the class reference

With lilbee MCP

Without lilbee — 4 hallucinated APIs (details)

Without lilbee

If you spot issues with these benchmarks, please open an issue.

Vision OCR

Scanned PDF → searchable knowledge base

A scanned 1998 Star Wars: X-Wing Collector's Edition manual indexed with vision OCR (LightOnOCR-2), then queried in lilbee's interactive chat (qwen3-coder:30b, fully local). Three questions about dev team credits, energy management, and starfighter speeds — all answered from the OCR'd content.

Vision OCR demo

See benchmarks, test documents, and sample output for model comparisons.

One-shot question from OCR'd content

The scanned Star Wars: X-Wing Collector's Edition guide, queried with a single lilbee ask command — no interactive chat needed.

Standalone

Interactive local offline chat

[!NOTE] Entirely local on a 2021 M1 Pro with 32 GB RAM.

Model switching via tab completion, then a Q&A grounded in an indexed PDF.

Interactive local offline chat

Code index and search

Code search

Add a codebase and search with natural language. Tree-sitter provides AST-aware chunking.

JSON output

JSON output

Structured JSON output for agents and scripts.

Hardware requirements

When used standalone, lilbee runs entirely on your machine — chat with your documents privately, no cloud required.

Resource	Minimum	Recommended
RAM	8 GB	16–32 GB
GPU / Accelerator	—	Apple Metal (M-series), NVIDIA GPU (6+ GB VRAM)
Disk	2 GB (models + data)	10+ GB if using multiple models
CPU	Any modern x86_64 / ARM64	—

Ollama handles inference and uses Metal on macOS or CUDA on Linux/Windows. Without a GPU, models fall back to CPU — usable for embedding but slow for chat.

Install

Prerequisites

Python 3.11+
Ollama — the embedding model (nomic-embed-text) is auto-pulled on first sync. If no chat model is installed, lilbee prompts you to pick and download one.
Optional (for scanned PDF/image OCR): Tesseract (brew install tesseract / apt install tesseract-ocr) or an Ollama vision model (recommended for better quality — see vision OCR)

First-time download: If you're new to Ollama, expect the first run to take a while — models are large files that need to be downloaded once. For example, qwen3:8b is ~5 GB and the embedding model nomic-embed-text is ~274 MB. After the initial download, models are cached locally and load in seconds. You can check what you have installed with ollama list.

Install

pip install lilbee        # or: uv tool install lilbee

Development (run from source)

git clone https://github.com/tobocop2/lilbee && cd lilbee
uv sync
uv run lilbee

Quick start

See the usage guide.

Agent integration

lilbee can serve as a local retrieval backend for AI coding agents via MCP or JSON CLI. See docs/agent-integration.md for setup and usage.

HTTP Server

lilbee includes a REST API server for programmatic access:

lilbee serve                          # start on localhost:7433
lilbee serve --host 0.0.0.0 --port 8080

Endpoints include /api/search, /api/ask, /api/chat (with streaming SSE variants), /api/sync, /api/add, and /api/models. When the server is running, interactive API docs are available at /schema/redoc. See the API reference for the full OpenAPI schema.

Interactive chat

Running lilbee or lilbee chat enters an interactive REPL with conversation history, streaming responses, and slash commands:

Command	Description
`/status`	Show indexed documents and config
`/add [path]`	Add a file or directory (tab-completes paths)
`/model [name]`	Switch chat model — no args opens a curated picker; with a name, switches directly or prompts to download if not installed (tab-completes installed models)
`/vision [name\|off]`	Switch vision OCR model — no args opens a curated picker; with a name, prompts to download if not installed; `off` disables (tab-completes catalog models)
`/settings`	Show all current configuration values
`/set <key> <value>`	Change a setting (e.g. `/set temperature 0.7`)
`/version`	Show lilbee version
`/reset`	Delete all documents and data (asks for confirmation)
`/help`	Show available commands
`/quit`	Exit chat

Slash commands and paths tab-complete. A spinner shows while waiting for the first token from the LLM. Background sync progress appears in the toolbar without interrupting the conversation.

Supported formats

Text extraction powered by Kreuzberg, code chunking by tree-sitter. Structured formats (XML, JSON, CSV) get embedding-friendly preprocessing. This list is not exhaustive — Kreuzberg supports additional formats beyond what's listed here.

Format	Extensions	Requires
PDF	`.pdf`	—
Scanned PDF	`.pdf` (no extractable text)	Tesseract (auto, plain text) or Ollama vision model (recommended — preserves tables, headings, and layout as markdown)
Office	`.docx`, `.xlsx`, `.pptx`	—
eBook	`.epub`	—
Images (OCR)	`.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.webp`	Tesseract
Data	`.csv`, `.tsv`	—
Structured	`.xml`, `.json`, `.jsonl`, `.yaml`, `.yml`	—
Text	`.md`, `.txt`, `.html`, `.rst`	—
Code	`.py`, `.js`, `.ts`, `.go`, `.rs`, `.java` and 150+ more via tree-sitter (AST-aware chunking)	—

See the usage guide for OCR setup and model benchmarks.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tobocop

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.4

Mar 21, 2026

0.5.3

Mar 18, 2026

0.5.2

Mar 18, 2026

0.5.1

Mar 17, 2026

This version

0.5.0

Mar 17, 2026

0.4.6

Mar 13, 2026

0.4.5

Mar 13, 2026

0.4.4

Mar 13, 2026

0.4.3

Mar 12, 2026

0.4.1

Mar 12, 2026

0.4.0

Mar 12, 2026

0.3.10

Mar 10, 2026

0.3.9

Mar 10, 2026

0.3.8

Mar 9, 2026

0.3.7

Mar 9, 2026

0.3.6

Mar 9, 2026

0.3.5

Mar 9, 2026

0.3.4

Mar 9, 2026

0.3.3

Mar 7, 2026

0.3.2

Mar 7, 2026

0.3.1

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lilbee-0.5.0.tar.gz (14.5 MB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lilbee-0.5.0-py3-none-any.whl (67.1 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file lilbee-0.5.0.tar.gz.

File metadata

Download URL: lilbee-0.5.0.tar.gz
Upload date: Mar 17, 2026
Size: 14.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lilbee-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`fc3e2984f884752897fa9804ab7e11a048382301f4d8502004869b9563c1b1d5`
MD5	`a2d398c76024fec37e7bd093c67d492b`
BLAKE2b-256	`646b45cb824bf9dbc4f269a5b4596b6262d0a3b4ace79901469f7e311fbb07a5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lilbee-0.5.0.tar.gz:

Publisher: publish.yml on tobocop2/lilbee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lilbee-0.5.0.tar.gz
- Subject digest: fc3e2984f884752897fa9804ab7e11a048382301f4d8502004869b9563c1b1d5
- Sigstore transparency entry: 1114586569
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: tobocop2/lilbee@4aa306f03877cba9d422c8a6abf9150159577e30
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/tobocop2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4aa306f03877cba9d422c8a6abf9150159577e30
- Trigger Event: release

File details

Details for the file lilbee-0.5.0-py3-none-any.whl.

File metadata

Download URL: lilbee-0.5.0-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 67.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lilbee-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab89e8ff6113df402e61f7f8682239d3eda32a7243cb61d8005b4c54515d6536`
MD5	`abaf6d358f24ab64e48d84ae154ec814`
BLAKE2b-256	`5ac75cde179e7b6a0ec956ef94e22949bbf43b2c64304a1631229e5d1addd0d7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lilbee-0.5.0-py3-none-any.whl:

Publisher: publish.yml on tobocop2/lilbee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lilbee-0.5.0-py3-none-any.whl
- Subject digest: ab89e8ff6113df402e61f7f8682239d3eda32a7243cb61d8005b4c54515d6536
- Sigstore transparency entry: 1114586584
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: tobocop2/lilbee@4aa306f03877cba9d422c8a6abf9150159577e30
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/tobocop2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4aa306f03877cba9d422c8a6abf9150159577e30
- Trigger Event: release

lilbee 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

lilbee

Why lilbee

Demos

Vision OCR

Standalone

Hardware requirements

Install

Prerequisites

Install

Development (run from source)

Quick start

Agent integration

HTTP Server

Interactive chat

Supported formats

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance