A lightweight, efficient vector database with similarity search capabilities
Project description
VecStream
A lightweight, efficient vector database with similarity search capabilities, designed for machine learning and AI applications.
Features
- Fast similarity search using optimized indexing
- Efficient binary storage format for vectors and metadata
- Automatic text embedding with sentence-transformers
- Rich command-line interface with beautiful output
- Cross-platform support (Windows, macOS, Linux)
- Customizable storage locations
- Metadata support for enhanced document management
- Built-in text similarity search
Installation
pip install vecstream
Quick Start
Using the CLI
# Add a document
vecstream add "Machine learning is transforming technology" doc1
# Search for similar documents
vecstream search "AI and machine learning" --k 3
# Get document by ID
vecstream get doc1
# View database information
vecstream info
# Use custom storage location
vecstream add "Custom storage test" doc2 --db-path "./my_vectors"
# Remove a document
vecstream remove doc1
Using the Python API
from vecstream.binary_store import BinaryVectorStore
# Create a binary vector store
store = BinaryVectorStore("./vector_db")
# Add vectors with metadata
store.add_vector(
id="doc1",
vector=[1.0, 0.0, 0.0],
metadata={"text": "Example document", "tags": ["test"]}
)
# Search similar vectors
results = store.search_similar([1.0, 0.0, 0.0], k=5)
# Get vector with metadata
vector, metadata = store.get_vector_with_metadata("doc1")
Storage Locations
By default, VecStream stores its data in:
- Windows:
%APPDATA%/VecStream/store/ - macOS/Linux:
~/.vecstream/store/
You can specify a custom storage location using the --db-path option in CLI commands or by passing the path to BinaryVectorStore.
Storage Format
VecStream uses an efficient binary storage format:
- Vectors: NumPy
.npyformat for fast access - Metadata: JSON format for flexibility
- Automatic compression and optimization
CLI Features
The command-line interface provides:
- Beautiful, colored output using Rich
- Progress indicators for long operations
- Detailed database information
- Similarity scores in search results
- Customizable search parameters
- Error handling and user feedback
Python API
The Python API offers:
- Direct access to vector operations
- Metadata management
- Custom storage locations
- Efficient binary serialization
- Rich search capabilities
- Error handling and type safety
Requirements
- Python 3.8 or higher
- NumPy
- SciPy
- sentence-transformers
- Rich (for CLI)
- Click (for CLI)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Version History
-
0.1.1 (2024-03-XX)
- Fixed index initialization in IndexManager
- Added specific version requirements for torch and torchvision
- Improved dependency compatibility
- Fixed CLI import issues
-
0.1.0 (2024-03-XX)
- Initial release
- Basic vector storage and search functionality
- CLI interface
- Client-server architecture
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vecstream-0.2.0.tar.gz.
File metadata
- Download URL: vecstream-0.2.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb9ce91e381ea063f19e63c1faf4fa0d41761dfacb4ef1129e21cdfe25b2e447
|
|
| MD5 |
290855ed2c025238a091d83887bd6006
|
|
| BLAKE2b-256 |
a84538390aaa90ff3c07cd6db18a4bf6d1c6ded26324b82cfd2f4a1a3abd7c4f
|
File details
Details for the file vecstream-0.2.0-py3-none-any.whl.
File metadata
- Download URL: vecstream-0.2.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e1001a2bfa028120b2ac8c1435a6d3dfed592ffd5abe53d80d0f19a55b174eb
|
|
| MD5 |
ac420b2addf8fc73791ca78596e73ad2
|
|
| BLAKE2b-256 |
ddffc262834805ff815012917e5846965aa8ca0edfadbb158745023c65d8377c
|