Stupid Vector Store (SVS): a vector database for the rest of us

These details have not been verified by PyPI

Project links

Project description

SVS Logo

Stupid Vector Store (SVS)

Test Status

🤔 What is SVS?
- Semantic search via deep-learning vector embeddings.
- A stupid-simple library for storing and retrieving your documents.
- Currently supports OpenAI and Ollama.
💩 Why is it stupid?
- Because it just uses SQLite and NumPy. Nothing fancy.
- That is our core design choice. We want something stupid simple, yet reasonably fast.
🧠 Is it possibly... smart in any way though?
- Maybe.
- It will squeeze the most juice from your machine: 🍊
  - Optimized SQL
  - Cache-friendly memory access
  - Fast in the places that matter 🚀
  - All with a simple Python interface
- Supports storing arbitrary metadata with each document. 🗃️
- Supports storing and querying (optional) parent-child relationships between documents. 👪
  - Fully hierarchical - parents can have parents, children can have children, whatever you need...
- Supports storing an (optional) graph structure over your documents.
  - So you can do GraphRAG!
  - Batteries not included:
    - This library only handles graph storage.
    - You have to implement your own graph algorithms.
- Supports generic key/value storage, for those random things you don't know where else to put. 🤷
- Both sync and asyncio implementations:
  - use the synchronous impl (svs.KB) for scripts, notebooks, etc
  - use the asyncio impl (svs.AsyncKB) for web-services, etc
- 100% Python type hints!

Overview

SVS is stupid yet can handle a million documents on commodity hardware, so it's probably perfect for you.

Should you use SVS? SVS is designed for the use-case where:

you have less than a million documents, and
you don't add/remove documents very often.

If that's you, then SVS will probably be the simples (and stupidest) way to manage your document vectors!

Installation
Used By
Quickstart
Speed & Benchmarks
Debug Logging
License

Installation

pip install -U svs

Used By

SVS is used in production by:

Quickstart

Here is the most simple use-case; it just queries a pre-built knowledge base! This particular example queries a knowledge base of "Dad Jokes" 🤩.

(taken from ./examples/quickstart.py)

import svs   # <-- pip install -U svs

import os
from dotenv import load_dotenv; load_dotenv()
assert os.environ.get('OPENAI_API_KEY'), "You must set your OPENAI_API_KEY environment variable!"

#
# The database remembers which embeddings provider (e.g. OpenAI) was used.
#
# The "Dad Jokes" database below uses OpenAI embeddings, so that's why you had
# to set your OPENAI_API_KEY above!
#
# NOTE: The first time you run this script it will download this database,
#       so expect that to take a few seconds...
#
DB_URL = 'https://github.com/Rhobota/svs/raw/main/examples/dad_jokes/dad_jokes.sqlite.gz'


def demo() -> None:
    kb = svs.KB(DB_URL)

    records = kb.retrieve('chicken', n = 10)

    for record in records:
        score = record['score']
        text = record['doc']['text']
        print(f" 😆 score={score:.4f}: {text}\n")

    kb.close()


if __name__ == '__main__':
    demo()

⚠️ Want to see how that Dad Jokes knowledge base was created? See: ./examples/dad_jokes/Build Dad Jokes KB.ipynb

Speed & Benchmarks

SQLite and NumPy are fast, thus SVS is fast 🏎️. Our goal is to minimize the amount of work done at the Python-layer.

Also, your bottleneck will certainly be the remote API calls to get document embeddings (e.g. calling out to OpenAI's API to get embeddings will be the slowest thing), so it's likely not critical to further optimize the Python-layer bits.

The following benchmarks were performed on 2018-era commodity hardware (Intel i3-8100):

Number of Documents	Load into SQLite	Get Embeddings for All Documents (remote API call)	Cosine Similarity + Sort + Retrieve Top-100 Documents [^3]
10,548 jokes [^1]	0.07 seconds	80 seconds	0.5 seconds (first query) + 0.011 seconds (subsequent queries)
1,000,000 synthetic documents [^2]	8 seconds	2 hours [^4]	2 minutes (first query) + 0.24 seconds (subsequent queries)

[^1]: This benchmark is from the Dad Jokes KB from this notebook.

[^2]: This benchmark is over one million synthetic documents, where those documents have an average length of 1,200 characters. Specifically, this notebook.

[^3]: This time does not include the time it takes to obtain the query string's embedding from the external service (i.e. from OpenAI's API); rather, it captures the time it takes to (1) compute the cosine similarity of the query string with all the documents' vectors (where embedding dimensionality is 1,536), then (2) sort the results, and then (3) retrieve the top-100 documents from the database. Note: The first query is slow because it must load the vectors from disk into RAM, while subsequent queries are fast since those vectors stay cached in RAM.

[^4]: This is an estimate based on the observed typical response times from OpenAI's embeddings API. For this test, we generate synthetic embeddings with dimensionality 1,536 to simulate the correct datasize and computation requirements as if we used "real" embeddings.

Debug Logging

This library logs using Python's builtin logging module. It logs mostly to INFO, so here's a snippet of code you can put in your app to see those traces:

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
)

# ... now use SVS as you normally would, but you'll see extra log traces!

License

svs is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.0

Jul 18, 2025

0.7.4

Jun 8, 2025

0.7.3

Feb 21, 2025

0.7.2

Feb 14, 2025

This version

0.7.1

Feb 10, 2025

0.7.0

Jan 27, 2025

0.6.2

Jan 22, 2025

0.6.1

Nov 9, 2024

0.6.0

Oct 16, 2024

0.5.0

Oct 14, 2024

0.4.0

Jul 28, 2024

0.3.2

Jul 18, 2024

0.3.1

Jul 17, 2024

0.3.0

Jul 16, 2024

0.2.0

Jul 13, 2024

0.1.0

Jun 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svs-0.7.1.tar.gz (24.5 MB view details)

Uploaded Feb 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

svs-0.7.1-py3-none-any.whl (23.7 kB view details)

Uploaded Feb 10, 2025 Python 3

File details

Details for the file svs-0.7.1.tar.gz.

File metadata

Download URL: svs-0.7.1.tar.gz
Upload date: Feb 10, 2025
Size: 24.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.28.1

File hashes

Hashes for svs-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`23f551e2cd8f5c4c852adbddacf08e32281436a02d7c5b0792680979b766a323`
MD5	`d0e663db26cded50330a445d251f7a12`
BLAKE2b-256	`b844b58fe64d04b111e6c4a0d0492522a7e5333a540df59f35c4db2c309ab1f6`

See more details on using hashes here.

File details

Details for the file svs-0.7.1-py3-none-any.whl.

File metadata

Download URL: svs-0.7.1-py3-none-any.whl
Upload date: Feb 10, 2025
Size: 23.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.28.1

File hashes

Hashes for svs-0.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c26cd658eef6d125ae3b0bc01986423c14d2cef07765c793d5e5419b2fc49135`
MD5	`fd2cea1016da2f4848914e34894ec214`
BLAKE2b-256	`07c1c519f325fe44a775e002026e857bde91dad81a38cf0801d5d357a59b8c9c`

See more details on using hashes here.

svs 0.7.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Stupid Vector Store (SVS)

Overview

Table of Contents

Installation

Used By

Quickstart

Speed & Benchmarks

Debug Logging

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes