Skip to main content

Thin porcelain around the FAISS vector database.

Project description

fvdb - thin porcelain around FAISS

fvdb is a simple, minimal wrapper around the FAISS vector database. It uses a L2 index with normalised vectors.

It uses the faiss-cpu package and sentence-transformers for embeddings. If you need the GPU version of FAISS (very probably not), you can just manually install faiss-gpu and use GPUIndexFlatL2 instead of IndexFlatL2 in fvdb/db.hy.

Features

  • similarity search with score
  • choice of sentence-transformer embeddings
  • useful formatting of results (json, tabulated...)
  • cli access

Any input other than plain text (markdown, asciidoc, rst etc) is out of scope. You should one of the many available packages for that (unstructured, trafiltura, docling, etc.)

Usage

import hy # fvdb is written in Hy, but you can use it from python too
from fvdb import faiss, ingest, similar, sources, write

# data ingestion
v = faiss()
ingest(v, "docs.md")
ingest(v, "docs-dir")
write(v, "/tmp/test.fvdb") # defaults to $XDG_DATA_HOME/fvdb (~/.local/share/fvdb/ on Linux)

# search
similar(v, "some query text")
marginal(v, "some query text") # not yet implemented

# information, management
sources(v)
info(v)
nuke(v)

These are also available from the command line.

$ # defaults to $XDG_DATA_HOME/fvdb (~/.local/share/fvdb/ on Linux)
# data ingestion (saves on exit)
$ fvdb ingest doc.md
$ fvdb ingest docs-dir

$ # search
$ fvdb similarity "some query text"        # default to json output
$ fvdb similarity -t "some query text" # --table / -t gives tabulated output
$ fvdb marginal "some query text" # not yet implemented

$ # information, management
$ fvdb sources
$ fvdb info
$ fvdb nuke

Configuration

Looks for $XDG_CONFIG_HOME/fvdb/conf.toml, otherwise uses defaults.

Here is an example.

path = "/tmp/test.fvdb"

# You cannot mix embeddings models in a single fvdb
embeddings.model = "all-mpnet-base-v2" # conservative default

# some models need extra options
#embeddings.model = "Alibaba-NLP/gte-large-en-v1.5"
#embeddings.trust_remote_code = true

Installation

First install pytorch, which is used by sentence-transformers. You must decide if you want the CPU or CUDA (nvidia GPU) version of pytorch. For just text embeddings for fvdb, CPU is sufficient.

Then,

pip install fvdb

and that's it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fvdb-0.0.2.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

fvdb-0.0.2-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file fvdb-0.0.2.tar.gz.

File metadata

  • Download URL: fvdb-0.0.2.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for fvdb-0.0.2.tar.gz
Algorithm Hash digest
SHA256 7d7aa22fbe9b1c9dde7e450ea9e017c584c2e6158105714aa72e7c147c4fc2a8
MD5 e32ec3bc6c0275d7258de87e4115dc8b
BLAKE2b-256 07bebb09b21c26431c5b41defbcbf858c0fe475cee9112c779f147c26f3a2a9f

See more details on using hashes here.

File details

Details for the file fvdb-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: fvdb-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for fvdb-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 af1c93e3860d73b2539ed2b143279761482d446b92cc678faee2d2e879d95a2e
MD5 ea657e7f330122fe1acc7f129e20b8e2
BLAKE2b-256 708162ac3ec15b556ac75f3f99d639b6ba73232adfa316f1e26aea4275ec3aa0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page