Thin porcelain around the FAISS vector database.

These details have not been verified by PyPI

Project links

Repository

Project description

fvdb - thin porcelain around FAISS

fvdb is a simple, minimal wrapper around the FAISS vector database. It uses a L2 index with normalised vectors.

It uses the faiss-cpu package and sentence-transformers for embeddings. If you need the GPU version of FAISS (very probably not), you can just manually install faiss-gpu and use GPUIndexFlatL2 instead of IndexFlatL2 in fvdb/db.hy. You can still use a GPU text embedding model even while using faiss-cpu.

If summaries are enabled (not the default, see configuration section below), a summary of the extract will be stored alongside the extract.

It matches well with trag.

This project has no relationship with NVIDIA's deep learning project fVDB, which which it unfortunately shares a name.

Features

similarity search with score
choice of sentence-transformer embeddings
useful formatting of results (json, tabulated...)
cli access
extract summaries

Any input other than plain text (markdown, asciidoc, rst, source code etc.) is out of scope. You should one of the many available packages (unstructured, trafiltura, docling, etc.) to convert to plaintext in a separate step.

Usage

import hy # fvdb is written in Hy, but you can use it from python too
from fvdb import faiss, ingest, similar, sources, write

# data ingestion
v = faiss()
ingest(v, "doc.md")
ingest(v, "docs-dir")
write(v, "/tmp/test.fvdb") # defaults to $XDG_DATA_HOME/fvdb (~/.local/share/fvdb/ on Linux)

# search
results = similar(v, "some query text")
results = marginal(v, "some query text") # not yet implemented

# information, management
sources(v)
    { ...
      'docs-dir/Once More to the Lake.txt',
      'docs-dir/Politics and the English Language.txt',
      'docs-dir/Reflections on Gandhi.txt',
      'docs-dir/Shooting an elephant.txt',
      'docs-dir/The death of the moth.txt',
      ... }

info(v)
    {   'records': 42,
        'embeddings': 42,
        'embedding_dimension': 1024,
        'is_trained': True,
        'path': '/tmp/test-vdb',
        'sources': 24,
        'embedding_model': 'Alibaba-NLP/gte-large-en-v1.5'}

nuke(v)

These are also available from the command line.

$ # defaults to $XDG_DATA_HOME/fvdb (~/.local/share/fvdb/ on Linux)
# data ingestion (saves on exit)
$ fvdb ingest doc.md
    Adding 2 records

$ fvdb ingest docs-dir
    Adding 42 records

$ # search
$ fvdb similar -j "some query text" > results.json   # --json / -j gives json output

$ fvdb similar -r 2 "George Orwell's formative experience as a policeman in colonial Burma"
    # defaults to tabulated output (not all fields will be shown)
       score  source                             added                               page    length
    --------  ---------------------------------- --------------------------------  ------  --------
    0.579925  docs-dir/A hanging.txt             2024-11-05T11:37:26.232773+00:00       0      2582
    0.526988  docs-dir/Shooting an elephant.txt  2024-11-05T11:37:43.891659+00:00       0      3889

$ fvdb marginal "some query text"                       # not yet implemented

$ # information, management
$ fvdb sources
    ...
    docs-dir/Once More to the Lake.txt
    docs-dir/Politics and the English Language.txt
    docs-dir/Reflections on Gandhi.txt
    docs-dir/Shooting an elephant.txt
    docs-dir/The death of the moth.txt
    ...

$ fvdb info
    -------------------  -----------------------------
    records              44
    embeddings           44
    embedding_dimension  1024
    is_trained           True
    path                 /tmp/test
    sources              24
    embedding_model      Alibaba-NLP/gte-large-en-v1.5
    -------------------  -----------------------------

$ fvdb nuke

Configuration

Looks for $XDG_CONFIG_HOME/fvdb/conf.toml, otherwise uses defaults.

You cannot mix embeddings models in a single fvdb.

Here is an example.

# Sets the default path to something other than $XDG_CONFIG_HOME/fvdb/conf.toml
path = "/tmp/test.fvdb"

# Summaries are useful if you use an embedding model with large maximum sequence length,
# for example, gte-large-en-v1.5 has maximum sequence length of 8192.
summary = true		

# A conservative default model, maximum sequence length of 512,
# so no point using summaries.
embeddings.model = "all-mpnet-base-v2"

## Some models need extra options
#embeddings.model = "Alibaba-NLP/gte-large-en-v1.5"
#embeddings.trust_remote_code = true
## You can put some smaller models on a cpu, but larger models will be slow
#embeddings.device = "cpu"

Installation

First install pytorch, which is used by sentence-transformers. You must decide if you want the CPU or CUDA (nvidia GPU) version of pytorch. For just text embeddings for fvdb, CPU is sufficient, with the default model.

Then,

pip install fvdb

and that's it.

Planned

transition to sqlite from the pickled dict
optional progress bars for long jobs

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.11

Nov 18, 2025

0.1.10

Nov 18, 2025

0.1.9

Nov 17, 2025

0.1.8

Nov 2, 2025

0.1.7

Aug 15, 2025

0.1.6

Nov 27, 2024

0.1.5

Nov 16, 2024

0.1.4

Nov 8, 2024

0.1.3

Nov 5, 2024

0.1.2

Nov 5, 2024

0.1.1

Nov 5, 2024

0.0.2

Nov 4, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fvdb-0.1.11.tar.gz (51.6 kB view details)

Uploaded Nov 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fvdb-0.1.11-py3-none-any.whl (39.5 kB view details)

Uploaded Nov 18, 2025 Python 3

File details

Details for the file fvdb-0.1.11.tar.gz.

File metadata

Download URL: fvdb-0.1.11.tar.gz
Upload date: Nov 18, 2025
Size: 51.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for fvdb-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`dd77b344334a6ee3d8f82ef10fbda633a3f1ad49287c6f15e1f08025b5b562d8`
MD5	`0f1bc9dfad1e461847adaed1b31e35a1`
BLAKE2b-256	`604e8e28b737e07aa049eb8cf3b01278c0de8616f9ab72818594253d69588fe8`

See more details on using hashes here.

File details

Details for the file fvdb-0.1.11-py3-none-any.whl.

File metadata

Download URL: fvdb-0.1.11-py3-none-any.whl
Upload date: Nov 18, 2025
Size: 39.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for fvdb-0.1.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1370d63efa6a023c7df66a41dcb70e85d070145840301c7970f4fc9483a2b332`
MD5	`3669ae34a9de1e9f3d0286c3fc274f5c`
BLAKE2b-256	`06886ef59fe338a9b616af8248543ef1787b6783ce9d66d0f9ff88f10f6da2af`

See more details on using hashes here.

fvdb 0.1.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fvdb - thin porcelain around FAISS

Features

Usage

Configuration

Installation

Planned

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes