A lightweight, efficient vector database with similarity search capabilities
Project description
VecStream
A lightweight, efficient vector database with similarity search capabilities, designed for machine learning and AI applications.
Features
- Fast similarity search using optimized indexing
- HNSW indexing for significantly improved search performance
- Vector collections/namespaces for organizing different types of embeddings
- Metadata filtering for fine-grained search control
- Efficient binary storage format for vectors and metadata
- Automatic text embedding with sentence-transformers
- Rich command-line interface with beautiful output
- Cross-platform support (Windows, macOS, Linux)
- Customizable storage locations
- Metadata support for enhanced document management
- Built-in text similarity search
Installation
pip install vecstream
Quick Start
Using the CLI
# Add a document
vecstream add "Machine learning is transforming technology" doc1
# Search for similar documents
vecstream search "AI and machine learning" --k 3
# Search with metadata filtering
vecstream search "cloud computing" --filter '{"category": "ai", "year": 2023}'
# Get document by ID
vecstream get doc1
# View database information
vecstream info
# Create and use a collection
vecstream create_collection research
vecstream add "Neural networks research" doc2 --collection research
# Use custom storage location
vecstream add "Custom storage test" doc3 --db-path "./my_vectors"
# Remove a document
vecstream remove doc1
Using the Python API
from vecstream.collections import CollectionManager
from vecstream.binary_store import BinaryVectorStore
# Using collections for different vector types
manager = CollectionManager("./vector_db")
research_collection = manager.create_collection("research")
products_collection = manager.create_collection("products")
# Add vectors with metadata to collections
research_collection.add_vector(
id="paper1",
vector=[1.0, 0.0, 0.0],
metadata={"topic": "AI", "year": 2023, "author": "Smith"}
)
# Search with metadata filtering
results = research_collection.search_similar(
query=[1.0, 0.0, 0.0],
k=5,
filter_metadata={"year": 2023, "topic": "AI"}
)
# Basic binary store usage (compatible with earlier versions)
store = BinaryVectorStore("./vector_db")
# Add vectors with metadata
store.add_vector(
id="doc1",
vector=[1.0, 0.0, 0.0],
metadata={"text": "Example document", "tags": ["test"]}
)
# Search similar vectors
results = store.search_similar([1.0, 0.0, 0.0], k=5)
# Get vector with metadata
vector, metadata = store.get_vector_with_metadata("doc1")
Storage Locations
By default, VecStream stores its data in:
- Windows:
%APPDATA%/VecStream/store/ - macOS/Linux:
~/.vecstream/store/
You can specify a custom storage location using the --db-path option in CLI commands or by passing the path to CollectionManager or BinaryVectorStore.
Storage Format
VecStream uses an efficient binary storage format:
- Vectors: NumPy
.npyformat for fast access - Metadata: JSON format for flexibility
- Automatic compression and optimization
- Collections organized in subdirectories
CLI Features
The command-line interface provides:
- Vector Management: Add, get, update and remove vectors with
add,get, andremovecommands - Similarity Search: Fast vector search with
searchcommand with adjustable k-nearest neighbors - HNSW Indexing: Significantly faster search performance for large datasets (up to 100x faster)
- Collections: Organize vectors by type with
collection create,collection list, and other commands - Metadata Filtering: Filter search results with
--filter '{"key": "value"}'syntax - Nested Filters: Support for dot notation in filters like
--filter '{"details.color": "red"}' - Beautiful UI: Rich, colored output and progress indicators for long operations
- Database Stats: View detailed database information with
infocommand - Custom Storage: Specify storage locations with
--db-pathoption
Python API
The Python API offers:
- HNSW Indexing: Fast approximate nearest-neighbor search with customizable parameters:
from vecstream.hnsw_index import HNSWIndex index = HNSWIndex(dim=128, M=16, ef_construction=200)
- Collections: Organize vectors with the CollectionManager:
from vecstream.collections import CollectionManager manager = CollectionManager("./vector_db", use_hnsw=True) collection = manager.create_collection("images")
- Metadata Filtering: Fine-grained search control:
results = collection.search_similar(query, filter_metadata={"category": "electronics"})
- Nested Filtering: Access nested properties with dot notation:
results = collection.search_similar(query, filter_metadata={"details.color": "black"})
- Binary Storage: Efficient serialization for large datasets:
from vecstream.binary_store import BinaryVectorStore store = BinaryVectorStore("./vector_db")
- Vector Operations: Direct access to similarity calculations, normalization, and more
- Type Safety: Strong typing and error handling with descriptive exceptions
Requirements
- Python 3.8 or higher
- NumPy
- SciPy
- sentence-transformers
- Rich (for CLI)
- Click (for CLI)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Version History
-
0.3.0 (2024-03-XX)
- Added HNSW indexing for faster similarity search
- Added collections/namespaces for organizing vectors
- Added metadata filtering for search results
- Improved CLI with collection management commands
- Performance optimizations
-
0.2.0 (2024-03-XX)
- Added binary vector store
- Improved persistent storage
- Enhanced CLI functionality
- Added metadata support
-
0.1.0 (2024-03-XX)
- Initial release
- Basic vector storage and search functionality
- CLI interface
- Client-server architecture
Documentation
| Document | Description | Link |
|---|---|---|
| API Reference | Complete reference of VecStream's classes, methods, and CLI commands | API Reference |
| Advanced Usage | Detailed examples and best practices for using VecStream | Advanced Usage |
Key Features
| Feature | Description | Documentation |
|---|---|---|
| HNSW Indexing | Fast approximate nearest neighbor search for large datasets | API Reference, Usage Examples |
| Collections | Organize vectors with metadata for better organization | API Reference, Usage Examples |
| Metadata Filtering | Filter search results using metadata properties | API Reference, Usage Examples |
| Binary Storage | Efficient storage format for large vector datasets | API Reference, Usage Examples |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vecstream-0.3.3.tar.gz.
File metadata
- Download URL: vecstream-0.3.3.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
634831e4709f0355a63fb7d53f6e929556bfd13f8c4684ab88916af06fcbd07b
|
|
| MD5 |
02997c635659c2d49efaf65a1f31950d
|
|
| BLAKE2b-256 |
109d15fc4a84c3d66d0fd4c5b8eb402147b81448f22de10ec38e2c91927b2695
|
File details
Details for the file vecstream-0.3.3-py3-none-any.whl.
File metadata
- Download URL: vecstream-0.3.3-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
202662d8a89987cb83c225531472c394fdab12076bda6b2439940951edd6e91e
|
|
| MD5 |
8783ee9a9ec4d257df611c5b27eaa45c
|
|
| BLAKE2b-256 |
912dbf3f76b73b3cd66898b1eec6a87d99f6e4c05270ca50ddb96e03e61b0d64
|