Semantic (hybrid dense + sparse) search backend plugin for fmql.
Project description
fmql-semantic
Hybrid semantic search backend plugin for fmql.
- Dense retrieval via LiteLLM embeddings +
sqlite-vec. - Sparse retrieval via SQLite FTS5 (BM25).
- Fusion via reciprocal rank fusion (RRF).
- Optional reranking via LiteLLM rerank providers (Cohere, Voyage, etc.).
- Single-file SQLite index. No server.
Install
pip install fmql-semantic
fmql-semantic requires a Python build with sqlite3 loadable-extension support.
Most Python installs qualify: Linux distro Python, Windows Python, uv's bundled
Python, Homebrew's python, the python.org macOS installer, conda, and official
Docker images. If the extension loader is unavailable, the backend fails fast
with a clear error.
With pipx
fmql-semantic is a plugin library (no CLI of its own), so pipx install fmql-semantic does not work. Inject it into fmql's pipx env instead:
pipx inject fmql fmql-semantic
On macOS specifically, pin pipx to Homebrew's Python to sidestep the sqlite loadable-extension problem described below:
brew install python@3.12
pipx install --python /opt/homebrew/bin/python3.12 fmql
pipx inject fmql fmql-semantic
Or set PIPX_DEFAULT_PYTHON=/opt/homebrew/bin/python3.12 in ~/.zshrc so all
future pipx install calls use Homebrew Python automatically.
macOS + pyenv: extra setup required
Default pyenv install on macOS links Python against Apple's system sqlite
(/usr/lib/libsqlite3.dylib), which is compiled without loadable-extension
support for sandboxing reasons. Same is true of the macOS system Python at
/usr/bin/python3. In both cases connection.enable_load_extension(True)
raises NotSupportedError and fmql-semantic fails fast.
To use fmql-semantic on pyenv-installed Python on macOS, point pyenv at Homebrew's sqlite (which has loadable extensions enabled) and reinstall:
brew install sqlite
export LDFLAGS="-L$(brew --prefix sqlite)/lib"
export CPPFLAGS="-I$(brew --prefix sqlite)/include"
export PKG_CONFIG_PATH="$(brew --prefix sqlite)/lib/pkgconfig"
pyenv uninstall <version>
pyenv install <version>
python -c "import sqlite3; sqlite3.connect(':memory:').enable_load_extension(True); print('OK')"
The LDFLAGS/CPPFLAGS exports must be set while pyenv install runs; they
tell Python's build to prefer Homebrew's sqlite over Apple's. Once the OK
check passes, recreate your venv and reinstall fmql-semantic. This is a
one-time setup per pyenv Python version.
Configure
fmql-semantic reads configuration from three channels, in increasing
precedence:
- Process environment.
- A dotenv file pointed to by
--option env=path/to/.env. --option KEY=VALUEflags on the command line.
Environment variables
| Variable | Purpose |
|---|---|
FMQL_EMBEDDING_MODEL |
LiteLLM embedding model string (required). |
FMQL_EMBEDDING_API_BASE |
Override provider API base URL. |
FMQL_EMBEDDING_API_KEY |
Override provider API key. |
FMQL_EMBEDDING_BATCH_SIZE |
Packets per embedding call (default 100). |
FMQL_EMBEDDING_CONCURRENCY |
Max concurrent embedding requests (default 4). |
FMQL_EMBEDDING_MAX_TOKENS |
Per-packet token budget before truncation (default 8000). |
FMQL_RERANKER_MODEL |
LiteLLM rerank model. Enables reranking when set. |
FMQL_RERANKER_TOP_N |
Candidates sent to reranker (default 50). |
Standard LiteLLM provider env vars (OPENAI_API_KEY, VOYAGE_API_KEY,
OLLAMA_API_BASE, …) are read by LiteLLM directly from the process
environment. A dotenv file passed via --option env=path/to/.env also
publishes its non-FMQL_* keys into os.environ (without overriding
values already exported by the shell), so the same file can carry both
FMQL_* settings and provider credentials.
--option keys
Build: model, api_base, api_key, batch_size, concurrency,
max_tokens, fields, force, env.
Query: model, api_base, api_key, reranker_model, reranker_top_n,
rerank_required, no_rerank, dense_only, sparse_only, fetch_k, env.
Use
export FMQL_EMBEDDING_MODEL=openai/text-embedding-3-small
export OPENAI_API_KEY=...
# Build once:
fmql index ./my-notes --backend semantic
# Query:
fmql search "quarterly planning" --backend semantic --workspace ./my-notes -k 10
# Dense-only / sparse-only / disable rerank for this query:
fmql search q --backend semantic --workspace ./my-notes --option dense_only=true
fmql search q --backend semantic --workspace ./my-notes --option sparse_only=true
fmql search q --backend semantic --workspace ./my-notes --option no_rerank=true
The default index location is <workspace>/.fmql/semantic.db. Override with
--out (for fmql index) or --index (for fmql search).
Indexing
For each packet, the backend indexes:
<first present frontmatter field from --option fields=title,summary,name>
<body>
Frontmatter field values are otherwise not indexed — they're already queryable via fmql's structured layer.
Builds are incremental: packets whose content hash hasn't changed since the last build are skipped. Packets removed from the workspace are removed from the index. The index is committed per batch via SQLite WAL, so a crashed build leaves a queryable index that the next run picks up.
Model pinning
An index is pinned to the embedding model that built it. Rebuilding with a
different model refuses unless you pass --force (which drops the existing
tables). Dimension mismatches are caught the same way.
Provider notes
- OpenAI (
openai/text-embedding-3-small,openai/text-embedding-3-large) — batch caps at 2048; default 100 is fine. - Voyage (
voyage/voyage-3) — batch caps at 128. Set--option batch_size=128(or lower) for large indexes. - Cohere rerank (
cohere/rerank-v3.5) — works as a reranker model out of the box onceCOHERE_API_KEYis set. - Ollama (
ollama/nomic-embed-text) — setOLLAMA_API_BASEor--option api_base=http://localhost:11434.
Licensing
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fmql_semantic-0.1.2.tar.gz.
File metadata
- Download URL: fmql_semantic-0.1.2.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25edc7b3b00db3d8866b21dda6fa7e7fd06412d2e36791f0a51c05addbf27c8d
|
|
| MD5 |
a32c2850ce1b51ee641d6b6622a569c8
|
|
| BLAKE2b-256 |
6c4cb0126a70076a25237bdc699d0ad42eda30532dc663294aca0527268708ca
|
Provenance
The following attestation bundles were made for fmql_semantic-0.1.2.tar.gz:
Publisher:
publish-semantic.yml on buyuk-dev/fmql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fmql_semantic-0.1.2.tar.gz -
Subject digest:
25edc7b3b00db3d8866b21dda6fa7e7fd06412d2e36791f0a51c05addbf27c8d - Sigstore transparency entry: 1351476007
- Sigstore integration time:
-
Permalink:
buyuk-dev/fmql@1a161f414381a3620b45ea2828ae7cd5b8e2ba40 -
Branch / Tag:
refs/tags/semantic-v0.1.2 - Owner: https://github.com/buyuk-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-semantic.yml@1a161f414381a3620b45ea2828ae7cd5b8e2ba40 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fmql_semantic-0.1.2-py3-none-any.whl.
File metadata
- Download URL: fmql_semantic-0.1.2-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7328a3bbe289cc4740b154dcf7d712b27fdbab6823e663f32bc22f76a88d4eaf
|
|
| MD5 |
bef951d0309a8574cf71f52a2edeb238
|
|
| BLAKE2b-256 |
051287943e27641175b02c44e275abe37b27a497dc2a89d893ee6ec6f8161bd0
|
Provenance
The following attestation bundles were made for fmql_semantic-0.1.2-py3-none-any.whl:
Publisher:
publish-semantic.yml on buyuk-dev/fmql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fmql_semantic-0.1.2-py3-none-any.whl -
Subject digest:
7328a3bbe289cc4740b154dcf7d712b27fdbab6823e663f32bc22f76a88d4eaf - Sigstore transparency entry: 1351476377
- Sigstore integration time:
-
Permalink:
buyuk-dev/fmql@1a161f414381a3620b45ea2828ae7cd5b8e2ba40 -
Branch / Tag:
refs/tags/semantic-v0.1.2 - Owner: https://github.com/buyuk-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-semantic.yml@1a161f414381a3620b45ea2828ae7cd5b8e2ba40 -
Trigger Event:
push
-
Statement type: