Thin porcelain around the FAISS vector database.
Project description
fvdb - thin porcelain around FAISS
fvdb
is a simple, minimal wrapper around the FAISS vector database.
It uses a L2 index with normalised vectors.
It uses the faiss-cpu
package and sentence-transformers
for embeddings.
If you need the GPU version of FAISS (very probably not), you can just manually
install faiss-gpu
and use GPUIndexFlatL2
instead of IndexFlatL2
in fvdb/db.hy
.
Features
- similarity search with score
- choice of sentence-transformer embeddings
- useful formatting of results (json, tabulated...)
- cli access
Any input other than plain text (markdown, asciidoc, rst etc) is out of scope. You should one of the many available packages for that (unstructured, trafiltura, docling, etc.)
Usage
import hy # fvdb is written in Hy, but you can use it from python too
from fvdb import faiss, ingest, similar, sources, write
# data ingestion
v = faiss()
ingest(v, "docs.md")
ingest(v, "docs-dir")
write(v, "/tmp/test.fvdb") # defaults to $XDG_DATA_HOME/fvdb (~/.local/share/fvdb/ on Linux)
# search
similar(v, "some query text")
marginal(v, "some query text") # not yet implemented
# information, management
sources(v)
info(v)
nuke(v)
These are also available from the command line.
$ # defaults to $XDG_DATA_HOME/fvdb (~/.local/share/fvdb/ on Linux)
# data ingestion (saves on exit)
$ fvdb ingest doc.md
$ fvdb ingest docs-dir
$ # search
$ fvdb similarity "some query text" # default to json output
$ fvdb similarity -t "some query text" # --table / -t gives tabulated output
$ fvdb marginal "some query text" # not yet implemented
$ # information, management
$ fvdb sources
$ fvdb info
$ fvdb nuke
Configuration
Looks for $XDG_CONFIG_HOME/fvdb/conf.toml
, otherwise uses defaults.
Here is an example.
path = "/tmp/test.fvdb"
# You cannot mix embeddings models in a single fvdb
embeddings.model = "all-mpnet-base-v2" # conservative default
# some models need extra options
#embeddings.model = "Alibaba-NLP/gte-large-en-v1.5"
#embeddings.trust_remote_code = true
Installation
First install pytorch, which is used by sentence-transformers
.
You must decide if you want the CPU or CUDA (nvidia GPU) version of pytorch.
For just text embeddings for fvdb
, CPU is sufficient.
Then,
pip install fvdb
and that's it.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fvdb-0.0.2.tar.gz
.
File metadata
- Download URL: fvdb-0.0.2.tar.gz
- Upload date:
- Size: 47.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d7aa22fbe9b1c9dde7e450ea9e017c584c2e6158105714aa72e7c147c4fc2a8 |
|
MD5 | e32ec3bc6c0275d7258de87e4115dc8b |
|
BLAKE2b-256 | 07bebb09b21c26431c5b41defbcbf858c0fe475cee9112c779f147c26f3a2a9f |
File details
Details for the file fvdb-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: fvdb-0.0.2-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | af1c93e3860d73b2539ed2b143279761482d446b92cc678faee2d2e879d95a2e |
|
MD5 | ea657e7f330122fe1acc7f129e20b8e2 |
|
BLAKE2b-256 | 708162ac3ec15b556ac75f3f99d639b6ba73232adfa316f1e26aea4275ec3aa0 |