Fast local-only semantic search CLI for markdown, text, chat and code — CPU-only, no cloud.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

williamliu

These details have not been verified by PyPI

Project description

fidx

Python License

Fast, local-only semantic search for markdown, text, chat exports and code.

CPU-only. No cloud, no GPU, no API keys. One SQLite file holds the full index. Built to be the retrieval layer for agentic workflows: millisecond warm queries, JSON output, and collection scoping.

Why

Local semantic search tools tend to buy recall with latency: LLM query expansion and LLM reranking push a single query to ~10 seconds on CPU. fidx takes a different trade — hybrid BM25 + 768-dim vector search fused with reciprocal-rank fusion (RRF), and no LLM calls in the query path. One ONNX embedding pass per query is the only model work.

Hybrid recall — FTS5 BM25 catches exact names and identifiers; 768-dim embeddings catch "that doc that discussed the indexing project"; RRF fuses both.
Millisecond queries — a warm daemon answers hybrid searches in ~5 ms (p50) on a ~20k-chunk index; cold CLI calls stay well under a second.
One file — documents, BM25 index and vectors live in a single SQLite database (FTS5 + sqlite-vec). Copy it, back it up, delete it.
Scoped search — group sources into named collections (-c emails) and search only what you mean.

Requirements

Python 3.11 or 3.12 whose sqlite3 supports loadable extensions and FTS5 (fidx loads the sqlite-vec extension). Run fidx doctor to verify.
Prebuilt wheels exist for the verified platforms below — no compiler needed.
First fidx index downloads the embedding model once (then fully offline).

Platform (triple)	Status	Notes
Linux x86_64	✅ verified (CI + Docker)	any Python 3.11/3.12 with extensions
macOS arm64 (Apple Silicon)	✅ verified (CI)	use Homebrew Python (see install)
Windows x86_64	✅ verified (CI)	python.org / uv Python
macOS Intel, Linux/Windows arm64	best-effort	depends on upstream wheel availability

Install

Why is the package named fmdidx? The project and its command are fidx, but the PyPI name fidx was already taken by an unrelated package — so fidx is distributed as fmdidx ("fast markdown index"). That is the only place the name differs: uv tool install fmdidx installs the fidx command. Until the first release, install from a built wheel or from source (below).

The recommended installer is uv, because a uv-managed Python ships loadable sqlite extensions on Linux and Windows.

Linux / Windows:

uv tool install fmdidx          # or: pipx install fmdidx
fidx doctor                          # verify your host

macOS: uv's bundled Python (and the python.org build) ship a sqlite3 without loadable-extension support, so use Homebrew Python:

brew install python
uv tool install --python "$(brew --prefix python)/libexec/bin/python" fmdidx
# or: pipx install --python "$(brew --prefix python)/libexec/bin/python3" fmdidx
fidx doctor

From a built wheel (works today, name-independent):

uv build
pip install --only-binary=:all: dist/*.whl    # use Homebrew Python on macOS
fidx doctor

From a checkout (development):

uv sync && uv run fidx doctor

If fidx doctor reports a failure, it prints exactly what is missing and how to fix it — see Troubleshooting.

Quick start

# Register directories as named collections
fidx collection add ~/notes --name notes
fidx collection add ~/mail/export --name emails --glob "**/*.txt"

# Scan + chunk + embed (incremental; first run downloads the ONNX model)
fidx index

# Search (hybrid BM25 + vector by default)
fidx search "the doc that discussed the most recent indexing project"
fidx search "Grace Hopper" --mode lexical      # exact-name lookup, no model load
fidx search "deployment checklist" -c notes    # scope to one collection

# Agent-friendly output
fidx search "error handling" --json -n 10
fidx search "auth" --files --min-score 0.02

# Fetch a document by path or docid
fidx get "notes/meeting.md"
fidx get "#a1b2c3"

Warm daemon (recommended for agents)

fidx serve &          # keeps the model + index hot on a unix socket
fidx search "..."     # all searches now take milliseconds

The CLI uses the daemon automatically when it is running; --no-daemon opts out.

Verifying your install

fidx doctor                          # host capability report (exit 0 = ready)

# Full end-to-end benchmark on a ~1,000-doc corpus against the installed CLI:
python scripts/e2e_smoke.py          # builds corpus, indexes, searches, gates recall

# Clean-machine proof in pristine Docker containers (Linux):
scripts/verify-install.sh            # builds the wheel, installs + runs e2e on 3.11 & 3.12

The same e2e runs in CI on Linux, macOS (arm64) and Windows × Python 3.11/3.12 (the install-matrix workflow) — installing the built wheel from scratch and asserting recall@10.

How it works

files ──> documents (SQLite) ──> FTS5 (BM25, porter)        ─┐
                │                                            ├─> RRF fusion ─> results
                └──> chunks ──> ONNX embeddings ─> sqlite-vec ┘

Chunking splits at the best structural break (headings, code-fence boundaries, blank lines) near a ~1800-char target with 15% overlap, never inside a code fence. Chunks store offsets, not copies.
Embeddings via fastembed/ONNX (CPU). The default profile is 768-dim; smaller profiles exist for small corpora.
Search runs BM25 and vector KNN in parallel and fuses with RRF (k=60); results are document-level with best-chunk snippets.

Troubleshooting

enable_load_extension / "sqlite3 was built without loadable-extension support" — your Python's sqlite cannot load sqlite-vec. This is the default on macOS system Python and uv/python.org macOS builds. Fix: install fidx with Homebrew Python (see macOS install above). fidx doctor confirms the fix.
sqlite-vec failed to load / wrong architecture — ensure a sqlite-vec wheel exists for your platform: pip install --only-binary=:all: sqlite-vec.
First search is slow / offline use — the embedding model downloads once on first index. Pre-seed FASTEMBED_CACHE_PATH to use fidx air-gapped.

Benchmarks

bench/ is a reproducible harness comparing fidx against QMD on four corpora with known-item queries (CPU-only, warm engines, idle box). Result quality has two axes that trade off: recall (is the right document in the top-10) and purity — noise@10 (share of returned results an LLM judge rated irrelevant, lower is better) and clean@10 (share of queries whose results contain zero noise, higher is better). The headline: fidx beats QMD's hybrid on recall, noise, and clean on every corpus, at ~300–1000× lower latency.

Corpus (size)	Engine	R@10 ↑	noise@10 ↓	clean@10 ↑	p50 latency
docs-small (2k)	fidx	0.933	0.219	0.560	20 ms
	QMD `query` (LLM hybrid)	0.933	0.564	0.053	33 s
	QMD `search` (FTS)	0.920	n/m	n/m	78 ms
docs (18.8k)	fidx	0.962	0.250	0.482	49 ms
	QMD `query`	0.914	0.677	0.060	36 s
	QMD `search`	0.896	0.353	0.818	87 ms
chat (8k)	fidx	0.908	0.133	0.710	18 ms
	QMD `query`	0.912	0.472	0.186	18 s
	QMD `search`	0.916	0.086	0.964	81 ms
code (92.3k)	fidx	0.864	0.127	0.704	452 ms
	QMD `query`	0.782	0.713	0.056	33 s
	QMD `search`	0.784	0.256	0.868	121 ms

fidx rows are measured with its built-in deterministic result truncation enabled (--truncate; ships off by default — without it fidx trades purity for recall, e.g. code R@10 0.900 at noise 0.297). "n/m" = not measured.

Hybrid vs hybrid (fidx vs QMD query): fidx wins all three quality metrics on every corpus — e.g. code recall +8 pts with 5.6× less noise — with no LLM anywhere in its query path.
Where QMD wins: its FTS search mode is the purity champion on chat (clean 0.964) and the latency champion on the big code corpus (121 ms vs fidx's 452 ms brute-force KNN over 92k vectors), but trails on recall where it matters (docs, code). QMD's pure-vector mode collapses to 0.048 R@10 on code.
fidx stays sub-second even on its weakest corpus and is ~65× faster than QMD's LLM modes there.

Full tables (R@1/R@3, untruncated numbers, per-language code results), the SymDex comparison, conditions, and the honest threats-to-validity: docs/BENCHMARKS.md; harness usage and methodology: bench/README.md.

Development

uv sync --extra dev
uv run pytest
scripts/verify-install.sh    # clean-machine install + e2e (Docker)

Architecture notes: docs/DESIGN.md. Contributing guide: CONTRIBUTING.md.

License

fidx is licensed under MIT AND LicenseRef-AI-Idea-Attribution-1.0: MIT plus the AI Idea Attribution Addendum v1.0. See LICENSE, LICENSES/MIT.txt, LICENSES/AI-Idea-Attribution-Addendum-1.0.txt, and AI_ATTRIBUTION.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

williamliu

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fmdidx-0.1.0.tar.gz (163.0 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fmdidx-0.1.0-py3-none-any.whl (43.5 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file fmdidx-0.1.0.tar.gz.

File metadata

Download URL: fmdidx-0.1.0.tar.gz
Upload date: Jul 4, 2026
Size: 163.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for fmdidx-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`87ace399ef92c898f4ac40873d36366c22b9d6a2ee8f87d4ff38ed08b323024a`
MD5	`b3230837cf0d732e015dc9bc9cccfbc9`
BLAKE2b-256	`04cfe6e243bc76587d7221232a74fa8d73f88fbf18bdbec491877039da140d8c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fmdidx-0.1.0.tar.gz:

Publisher: release.yml on williamliu-ai/fidx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fmdidx-0.1.0.tar.gz
- Subject digest: 87ace399ef92c898f4ac40873d36366c22b9d6a2ee8f87d4ff38ed08b323024a
- Sigstore transparency entry: 2064634986
- Sigstore integration time: Jul 4, 2026
Source repository:
- Permalink: williamliu-ai/fidx@78c4094c8c2113d0a2eda48d7f8986a237789263
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/williamliu-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@78c4094c8c2113d0a2eda48d7f8986a237789263
- Trigger Event: push

File details

Details for the file fmdidx-0.1.0-py3-none-any.whl.

File metadata

Download URL: fmdidx-0.1.0-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 43.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for fmdidx-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c9e758e3685d442c349c651a6770d20ba1dd6394eef01e9a8a7146fdf38167e`
MD5	`8665b6884953bbde635c95c7033319a6`
BLAKE2b-256	`1627cb81b2be56478f9632c38ecc6452037d9371ce5fb0d821beb57657e09406`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fmdidx-0.1.0-py3-none-any.whl:

Publisher: release.yml on williamliu-ai/fidx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fmdidx-0.1.0-py3-none-any.whl
- Subject digest: 1c9e758e3685d442c349c651a6770d20ba1dd6394eef01e9a8a7146fdf38167e
- Sigstore transparency entry: 2064635130
- Sigstore integration time: Jul 4, 2026
Source repository:
- Permalink: williamliu-ai/fidx@78c4094c8c2113d0a2eda48d7f8986a237789263
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/williamliu-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@78c4094c8c2113d0a2eda48d7f8986a237789263
- Trigger Event: push

fmdidx 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

fidx

Why

Requirements

Install

Quick start

Warm daemon (recommended for agents)

Verifying your install

How it works

Troubleshooting

Benchmarks

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance