A tiny and fast local vector database in Python.
Project description
picovdb
An extremely fast, ultra-lightweight local vector database in Python.
"extremely fast": sub-millisecond query
"ultra-lighweight": One file with only Numpy and one optional dependency faiss-cpu.
Install
pip install picovdb
Usage
Create a db:
(Use SentenceTransformer embedding as example)
from sentence_transformers import SentenceTransformer
from picovdb import PicoVectorDB
CHUNK_SIZE = 256
model = SentenceTransformer('all-MiniLM-L6-v2')
dim = model.get_sentence_embedding_dimension()
with open('A_Christmas_Carol.txt', encoding='UTF8') as f:
content = f.read()
num_chunks = len(content) // CHUNK_SIZE + 1
chunks = [content[i * CHUNK_SIZE: (i + 1) * CHUNK_SIZE] for i in range(num_chunks)]
embeddings = model.encode(chunks)
data = [
{
"_vector_": embeddings[i],
"_id_": i,
"content": chunks[i],
}
for i in range(num_chunks)
]
db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
db.upsert(data)
db.save()
Query
db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
txt = "Are there no prisons? Are there no workhouses?"
emb = model.encode(txt)
q = db.query(emb, top_k=3)
print('query results:', q)
Benchmark
Embedding Dim: 1024.
Hardware: M3 MacBook Air
-
Pure Python:
- Inserting
100,000vectors took about0.5s - Doing 100 queries from
100,000vectors took roughly0.6s (0.006s per quiry).
- Inserting
-
With FAISS(cpu):
- Inserting
100,000vectors took110s - Doing 100 queries from
100,000vectors took0.05s (0.0005s or0.5 millisecondper quiry). - Doing 1000 queries from
100,000vectors in batch mode took0.2s (0.0002s or0.2 millisecondper quiry).
- Inserting
Hardware: PC with CPU Core i7-12700k and old-gen M2 Nvme SSD
-
Pure Python:
- Inserting
100,000vectors took about0.7s - Doing 100 queries from
100,000vectors took roughly1.3s (0.013s per quiry).
- Inserting
-
With FAISS(cpu):
- Inserting
100,000vectors took50s - Doing 100 queries from
100,000vectors took0.05s (0.0005s or0.5 millisecondper quiry). - Doing 1000 queries from
100,000vectors in batch mode took0.3s (0.0003s or0.3 millisecondper quiry).
- Inserting
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file picovdb-0.1.2.tar.gz.
File metadata
- Download URL: picovdb-0.1.2.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.10.12 Linux/5.15.0-135-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e12fecc73a4a3c95ecf6b9ec7aeb1bd5798e58b641c57b8815cc87b2955edd6
|
|
| MD5 |
4f8620c5951d852db2660b4a15f9624d
|
|
| BLAKE2b-256 |
66e9943e1b0bbbd9e010dffa87263a4b542811dc8fcb3e70b08cf996f921b8ef
|
File details
Details for the file picovdb-0.1.2-py3-none-any.whl.
File metadata
- Download URL: picovdb-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.10.12 Linux/5.15.0-135-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe766a4e61b9d2a4f79e5beb706f87110ae00192bacf284ee7d24d195e38e31b
|
|
| MD5 |
e3bc5b676692c40adfec6e2476356143
|
|
| BLAKE2b-256 |
0cac0c97db36be7d0f97d870babf498db26c9a5e0d4088df8757e0fdb427cb50
|