Skip to main content

A lightweight, efficient vector database with similarity search capabilities

Project description

VecStream

Tests Benchmarks PyPI version Python versions License

A lightweight, efficient vector database with similarity search capabilities, designed for machine learning and AI applications.

Features

  • Fast similarity search using optimized indexing
  • Efficient binary storage format for vectors and metadata
  • Automatic text embedding with sentence-transformers
  • Rich command-line interface with beautiful output
  • Cross-platform support (Windows, macOS, Linux)
  • Customizable storage locations
  • Metadata support for enhanced document management
  • Built-in text similarity search

Installation

pip install vecstream

Quick Start

Using the CLI

# Add a document
vecstream add "Machine learning is transforming technology" doc1

# Search for similar documents
vecstream search "AI and machine learning" --k 3

# Get document by ID
vecstream get doc1

# View database information
vecstream info

# Use custom storage location
vecstream add "Custom storage test" doc2 --db-path "./my_vectors"

# Remove a document
vecstream remove doc1

Using the Python API

from vecstream.binary_store import BinaryVectorStore

# Create a binary vector store
store = BinaryVectorStore("./vector_db")

# Add vectors with metadata
store.add_vector(
    id="doc1",
    vector=[1.0, 0.0, 0.0],
    metadata={"text": "Example document", "tags": ["test"]}
)

# Search similar vectors
results = store.search_similar([1.0, 0.0, 0.0], k=5)

# Get vector with metadata
vector, metadata = store.get_vector_with_metadata("doc1")

Storage Locations

By default, VecStream stores its data in:

  • Windows: %APPDATA%/VecStream/store/
  • macOS/Linux: ~/.vecstream/store/

You can specify a custom storage location using the --db-path option in CLI commands or by passing the path to BinaryVectorStore.

Storage Format

VecStream uses an efficient binary storage format:

  • Vectors: NumPy .npy format for fast access
  • Metadata: JSON format for flexibility
  • Automatic compression and optimization

CLI Features

The command-line interface provides:

  • Beautiful, colored output using Rich
  • Progress indicators for long operations
  • Detailed database information
  • Similarity scores in search results
  • Customizable search parameters
  • Error handling and user feedback

Python API

The Python API offers:

  • Direct access to vector operations
  • Metadata management
  • Custom storage locations
  • Efficient binary serialization
  • Rich search capabilities
  • Error handling and type safety

Requirements

  • Python 3.8 or higher
  • NumPy
  • SciPy
  • sentence-transformers
  • Rich (for CLI)
  • Click (for CLI)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Version History

  • 0.1.1 (2024-03-XX)

    • Fixed index initialization in IndexManager
    • Added specific version requirements for torch and torchvision
    • Improved dependency compatibility
    • Fixed CLI import issues
  • 0.1.0 (2024-03-XX)

    • Initial release
    • Basic vector storage and search functionality
    • CLI interface
    • Client-server architecture

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vecstream-0.2.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vecstream-0.2.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file vecstream-0.2.0.tar.gz.

File metadata

  • Download URL: vecstream-0.2.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for vecstream-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bb9ce91e381ea063f19e63c1faf4fa0d41761dfacb4ef1129e21cdfe25b2e447
MD5 290855ed2c025238a091d83887bd6006
BLAKE2b-256 a84538390aaa90ff3c07cd6db18a4bf6d1c6ded26324b82cfd2f4a1a3abd7c4f

See more details on using hashes here.

File details

Details for the file vecstream-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: vecstream-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for vecstream-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e1001a2bfa028120b2ac8c1435a6d3dfed592ffd5abe53d80d0f19a55b174eb
MD5 ac420b2addf8fc73791ca78596e73ad2
BLAKE2b-256 ddffc262834805ff815012917e5846965aa8ca0edfadbb158745023c65d8377c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page