A lightweight, efficient vector database with similarity search capabilities

These details have not been verified by PyPI

Project links

Homepage

Project description

VecStream

A lightweight, efficient vector database with similarity search capabilities, designed for machine learning and AI applications.

Features

Fast similarity search using optimized indexing
HNSW indexing for significantly improved search performance
Vector collections/namespaces for organizing different types of embeddings
Metadata filtering for fine-grained search control
Efficient binary storage format for vectors and metadata
Automatic text embedding with sentence-transformers
Rich command-line interface with beautiful output
Cross-platform support (Windows, macOS, Linux)
Customizable storage locations
Metadata support for enhanced document management
Built-in text similarity search

Installation

pip install vecstream

Quick Start

Using the CLI

# Add a document
vecstream add "Machine learning is transforming technology" doc1

# Search for similar documents
vecstream search "AI and machine learning" --k 3

# Search with metadata filtering
vecstream search "cloud computing" --filter '{"category": "ai", "year": 2023}'

# Get document by ID
vecstream get doc1

# View database information
vecstream info

# Create and use a collection
vecstream create_collection research
vecstream add "Neural networks research" doc2 --collection research

# Use custom storage location
vecstream add "Custom storage test" doc3 --db-path "./my_vectors"

# Remove a document
vecstream remove doc1

Using the Python API

from vecstream.collections import CollectionManager
from vecstream.binary_store import BinaryVectorStore

# Using collections for different vector types
manager = CollectionManager("./vector_db")
research_collection = manager.create_collection("research")
products_collection = manager.create_collection("products")

# Add vectors with metadata to collections
research_collection.add_vector(
    id="paper1",
    vector=[1.0, 0.0, 0.0],
    metadata={"topic": "AI", "year": 2023, "author": "Smith"}
)

# Search with metadata filtering
results = research_collection.search_similar(
    query=[1.0, 0.0, 0.0],
    k=5,
    filter_metadata={"year": 2023, "topic": "AI"}
)

# Basic binary store usage (compatible with earlier versions)
store = BinaryVectorStore("./vector_db")

# Add vectors with metadata
store.add_vector(
    id="doc1",
    vector=[1.0, 0.0, 0.0],
    metadata={"text": "Example document", "tags": ["test"]}
)

# Search similar vectors
results = store.search_similar([1.0, 0.0, 0.0], k=5)

# Get vector with metadata
vector, metadata = store.get_vector_with_metadata("doc1")

Storage Locations

By default, VecStream stores its data in:

Windows: %APPDATA%/VecStream/store/
macOS/Linux: ~/.vecstream/store/

You can specify a custom storage location using the --db-path option in CLI commands or by passing the path to CollectionManager or BinaryVectorStore.

Storage Format

VecStream uses an efficient binary storage format:

Vectors: NumPy .npy format for fast access
Metadata: JSON format for flexibility
Automatic compression and optimization
Collections organized in subdirectories

CLI Features

The command-line interface provides:

Vector Management: Add, get, update and remove vectors with add, get, and remove commands
Similarity Search: Fast vector search with search command with adjustable k-nearest neighbors
HNSW Indexing: Significantly faster search performance for large datasets (up to 100x faster)
Collections: Organize vectors by type with collection create, collection list, and other commands
Metadata Filtering: Filter search results with --filter '{"key": "value"}' syntax
Nested Filters: Support for dot notation in filters like --filter '{"details.color": "red"}'
Beautiful UI: Rich, colored output and progress indicators for long operations
Database Stats: View detailed database information with info command
Custom Storage: Specify storage locations with --db-path option

Python API

The Python API offers:

HNSW Indexing: Fast approximate nearest-neighbor search with customizable parameters:

from vecstream.hnsw_index import HNSWIndex
index = HNSWIndex(dim=128, M=16, ef_construction=200)

Collections: Organize vectors with the CollectionManager:

from vecstream.collections import CollectionManager
manager = CollectionManager("./vector_db", use_hnsw=True)
collection = manager.create_collection("images")

Metadata Filtering: Fine-grained search control:

results = collection.search_similar(query, filter_metadata={"category": "electronics"})

Nested Filtering: Access nested properties with dot notation:

results = collection.search_similar(query, filter_metadata={"details.color": "black"})

Binary Storage: Efficient serialization for large datasets:

from vecstream.binary_store import BinaryVectorStore
store = BinaryVectorStore("./vector_db")

Vector Operations: Direct access to similarity calculations, normalization, and more
Type Safety: Strong typing and error handling with descriptive exceptions

Requirements

Python 3.8 or higher
NumPy
SciPy
sentence-transformers
Rich (for CLI)
Click (for CLI)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Version History

0.3.0 (2024-03-XX)
- Added HNSW indexing for faster similarity search
- Added collections/namespaces for organizing vectors
- Added metadata filtering for search results
- Improved CLI with collection management commands
- Performance optimizations
0.2.0 (2024-03-XX)
- Added binary vector store
- Improved persistent storage
- Enhanced CLI functionality
- Added metadata support
0.1.0 (2024-03-XX)
- Initial release
- Basic vector storage and search functionality
- CLI interface
- Client-server architecture

Documentation

Document	Description	Link
API Reference	Complete reference of VecStream's classes, methods, and CLI commands	API Reference
Advanced Usage	Detailed examples and best practices for using VecStream	Advanced Usage

Key Features

Feature	Description	Documentation
HNSW Indexing	Fast approximate nearest neighbor search for large datasets	API Reference, Usage Examples
Collections	Organize vectors with metadata for better organization	API Reference, Usage Examples
Metadata Filtering	Filter search results using metadata properties	API Reference, Usage Examples
Binary Storage	Efficient storage format for large vector datasets	API Reference, Usage Examples

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.4

Mar 30, 2025

This version

0.3.3

Mar 18, 2025

0.3.2

Mar 18, 2025

0.3.1

Mar 18, 2025

0.2.0

Mar 16, 2025

0.1.1

Mar 16, 2025

0.1.0

Mar 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vecstream-0.3.3.tar.gz (27.2 kB view details)

Uploaded Mar 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vecstream-0.3.3-py3-none-any.whl (26.2 kB view details)

Uploaded Mar 18, 2025 Python 3

File details

Details for the file vecstream-0.3.3.tar.gz.

File metadata

Download URL: vecstream-0.3.3.tar.gz
Upload date: Mar 18, 2025
Size: 27.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for vecstream-0.3.3.tar.gz
Algorithm	Hash digest
SHA256	`634831e4709f0355a63fb7d53f6e929556bfd13f8c4684ab88916af06fcbd07b`
MD5	`02997c635659c2d49efaf65a1f31950d`
BLAKE2b-256	`109d15fc4a84c3d66d0fd4c5b8eb402147b81448f22de10ec38e2c91927b2695`

See more details on using hashes here.

File details

Details for the file vecstream-0.3.3-py3-none-any.whl.

File metadata

Download URL: vecstream-0.3.3-py3-none-any.whl
Upload date: Mar 18, 2025
Size: 26.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for vecstream-0.3.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`202662d8a89987cb83c225531472c394fdab12076bda6b2439940951edd6e91e`
MD5	`8783ee9a9ec4d257df611c5b27eaa45c`
BLAKE2b-256	`912dbf3f76b73b3cd66898b1eec6a87d99f6e4c05270ca50ddb96e03e61b0d64`

See more details on using hashes here.

vecstream 0.3.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VecStream

Features

Installation

Quick Start

Using the CLI

Using the Python API

Storage Locations

Storage Format

CLI Features

Python API

Requirements

Contributing

License

Version History

Documentation

Key Features

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes