Skip to main content

Super simple vector data storage based on vectorlite

Project description

VLite Storage

VLite Storage is a Python library that provides vector similarity search capabilities using SQLite and the vectorlite packages. It allows you to store documents with their vector embeddings and perform efficient similarity searches. For creating text embeddings, it integrates with the Ollama server.

Features

  • Document storage with vector embeddings
  • Efficient similarity search using HNSW algorithm
  • Integration with Ollama for text embeddings
  • Support for document metadata
  • Simple and intuitive API

Installation

pip install vlite-storage

Prerequisites

Quick Start

Here's a simple example of how to use VLite Storage:

from vlite_storage.embedders import OllamaEmbedder
from vlite_storage.storages import Storage

# Initialize embedder and storage
embedder = OllamaEmbedder()
dim = embedder.dimensions()
storage = Storage(db_name="my_database.db", dim=dim, embedding_fn=embedder)

# Add documents
storage.add(
    content="This is a sample document",
    metadata={"source": "example", "category": "sample"}
)

# Search for similar documents
results = storage.search("sample document", k=5)
for doc, distance in results:
    print(f"Content: {doc.content}")
    print(f"Metadata: {doc.metadata}")
    print(f"Distance: {distance}")

# Close the connection
storage.close()

API Reference

Storage Class

The main class for document storage and retrieval.

Storage(db_name: str, dim: int, embedding_fn: Optional[Callable[[str], np.ndarray]] = None)

Methods:

  • add(content: str, metadata: dict): Add a document with content and metadata
  • remove(rowid: int): Remove a document by ID
  • update(rowid: int, content: Optional[str], metadata: Optional[dict]): Update document content and/or metadata
  • get(rowid: int) -> Document: Retrieve a document by ID
  • search(text: str, k: int) -> List[Tuple[Document, float]]: Find k most similar documents
  • close(): Close the database connection

OllamaEmbedder Class

Class for generating text embeddings using Ollama models.

OllamaEmbedder(base_url: str = "http://localhost:11434", model_name: str = "bge-m3:latest")

Methods:

  • dimensions() -> int: Get embedding dimensions
  • __call__(texts: List[str]) -> np.ndarray: Generate embeddings for texts

Example Usage

Check the examples/ directory for more detailed examples, including:

  • Text chunk processing and storage
  • Similarity search in large documents
  • Metadata handling

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlite_storage-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vlite_storage-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file vlite_storage-0.1.0.tar.gz.

File metadata

  • Download URL: vlite_storage-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.18.0 CPython/3.10.12 Linux/5.15.0-130-generic

File hashes

Hashes for vlite_storage-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2e489bf2f836eda38c9c2c3a673b1bc5979a45ea1fa7369004861e2040ab8960
MD5 4b3878d537279654759c5f80aa532938
BLAKE2b-256 c10e4d9d3d35c218731c4d57fa9ae1d8d60ee9997d695ebb73c5c182059504d4

See more details on using hashes here.

File details

Details for the file vlite_storage-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vlite_storage-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.18.0 CPython/3.10.12 Linux/5.15.0-130-generic

File hashes

Hashes for vlite_storage-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f86300dc7835a652804b08573c03a457faa6e6958f0f769867493c183b2eeea
MD5 9676a1290cd8d43c1dd71dadd4ac459a
BLAKE2b-256 3d044ca0609f69f69b60883d0b19406e7d283bf56577741889eb56777ba578f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page