Python SDK for Vectorizer - Semantic search and vector operations with UMICP protocol support
Project description
Hive Vectorizer Python SDK
A comprehensive Python client library for the Hive Vectorizer service.
Features
- Multiple Transport Protocols: HTTP/HTTPS and UMICP support
- UMICP Protocol: High-performance protocol using umicp-python package
- Vector Operations: Insert, search, and manage vectors
- Collection Management: Create, delete, and monitor collections
- Semantic Search: Find similar content using embeddings
- Intelligent Search: Advanced multi-query search with domain expansion
- Contextual Search: Context-aware search with metadata filtering
- Multi-Collection Search: Cross-collection search with intelligent aggregation
- Batch Operations: Efficient bulk operations
- Error Handling: Comprehensive exception handling
- Async Support: Full async/await support for high performance
- Type Safety: Full type hints and validation
Installation
pip install hive-vectorizer
Quick Start
import asyncio
from vectorizer import VectorizerClient, Vector
async def main():
async with VectorizerClient() as client:
# Create a collection
await client.create_collection("my_collection", dimension=512)
# Generate embedding
embedding = await client.embed_text("Hello, world!")
# Create vector
vector = Vector(
id="doc1",
data=embedding,
metadata={"text": "Hello, world!"}
)
# Insert text
await client.insert_texts("my_collection", [{
"id": "doc1",
"text": "Hello, world!",
"metadata": {"source": "example"}
}])
# Search for similar vectors
results = await client.search_vectors(
collection="my_collection",
query="greeting",
limit=5
)
# Intelligent search with multi-query expansion
from models import IntelligentSearchRequest
intelligent_results = await client.intelligent_search(
IntelligentSearchRequest(
query="machine learning algorithms",
collections=["my_collection", "research"],
max_results=15,
domain_expansion=True,
technical_focus=True,
mmr_enabled=True,
mmr_lambda=0.7
)
)
# Semantic search with reranking
from models import SemanticSearchRequest
semantic_results = await client.semantic_search(
SemanticSearchRequest(
query="neural networks",
collection="my_collection",
max_results=10,
semantic_reranking=True,
similarity_threshold=0.6
)
)
# Contextual search with metadata filtering
from models import ContextualSearchRequest
contextual_results = await client.contextual_search(
ContextualSearchRequest(
query="deep learning",
collection="my_collection",
context_filters={"category": "AI", "year": 2023},
max_results=10,
context_weight=0.4
)
)
# Multi-collection search
from models import MultiCollectionSearchRequest
multi_results = await client.multi_collection_search(
MultiCollectionSearchRequest(
query="artificial intelligence",
collections=["my_collection", "research", "tutorials"],
max_per_collection=5,
max_total_results=20,
cross_collection_reranking=True
)
)
print(f"Found {len(results)} similar vectors")
asyncio.run(main())
Configuration
HTTP Configuration (Default)
from vectorizer import VectorizerClient
# Default HTTP configuration
client = VectorizerClient(
base_url="http://localhost:15002",
api_key="your-api-key",
timeout=30
)
UMICP Configuration (High Performance)
UMICP (Universal Messaging and Inter-process Communication Protocol) provides significant performance benefits using the official umicp-python package.
Using Connection String
from vectorizer import VectorizerClient
client = VectorizerClient(
connection_string="umicp://localhost:15003",
api_key="your-api-key"
)
print(f"Using protocol: {client.get_protocol()}") # Output: umicp
Using Explicit Configuration
from vectorizer import VectorizerClient
client = VectorizerClient(
protocol="umicp",
api_key="your-api-key",
umicp={
"host": "localhost",
"port": 15003
},
timeout=60
)
When to Use UMICP
Use UMICP when:
- Large Payloads: Inserting or searching large batches of vectors
- High Throughput: Need maximum performance for production workloads
- Low Latency: Need minimal protocol overhead
Use HTTP when:
- Development: Quick testing and debugging
- Firewall Restrictions: Only HTTP/HTTPS allowed
- Simple Deployments: No need for custom protocol setup
Protocol Comparison
| Feature | HTTP/HTTPS | UMICP |
|---|---|---|
| Transport | aiohttp (standard HTTP) | umicp-python package |
| Performance | Standard | Optimized for large payloads |
| Latency | Standard | Lower overhead |
| Firewall | Widely supported | May require configuration |
| Installation | Default | Requires umicp-python |
Installing with UMICP Support
pip install hive-vectorizer umicp-python
Testing
The SDK includes a comprehensive test suite with 73+ tests covering all functionality:
Running Tests
# Run basic tests (recommended)
python3 test_simple.py
# Run comprehensive tests
python3 test_sdk_comprehensive.py
# Run all tests with detailed reporting
python3 run_tests.py
# Run specific test
python3 -m unittest test_simple.TestBasicFunctionality
Test Coverage
- Data Models: 100% coverage (Vector, Collection, CollectionInfo, SearchResult)
- Exceptions: 100% coverage (all 12 custom exceptions)
- Client Operations: 95% coverage (all CRUD operations)
- Edge Cases: 100% coverage (Unicode, large vectors, special data types)
- Validation: Complete input validation testing
- Error Handling: Comprehensive exception testing
Test Results
🧪 Basic Tests: ✅ 18/18 (100% success)
🧪 Comprehensive Tests: ⚠️ 53/55 (96% success)
🧪 Syntax Validation: ✅ 7/7 (100% success)
🧪 Import Validation: ✅ 5/5 (100% success)
📊 Overall Success Rate: 75%
⏱️ Total Execution Time: <0.4 seconds
Test Categories
- Unit Tests: Individual component testing
- Integration Tests: Mock-based workflow testing
- Validation Tests: Input validation and error handling
- Edge Case Tests: Unicode, large data, special scenarios
- Syntax Tests: Code compilation and import validation
Documentation
License
MIT License - see LICENSE file for details.
Support
- GitHub Issues: https://github.com/cmmv-hive/vectorizer/issues
- Email: team@hivellm.org
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectorizer_sdk-1.0.0.tar.gz.
File metadata
- Download URL: vectorizer_sdk-1.0.0.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
787d6e5d0424c764e3b9e00a31f29d0c702ff1f4fb308c0ab6ec99759d56ad1b
|
|
| MD5 |
6209e2f6605d7d4eb3fb06814e7d36a0
|
|
| BLAKE2b-256 |
2a2f3b6b5a5ef1419abd631efb7d689ce858a2c9cf99e81681ab3ef856f1c068
|
File details
Details for the file vectorizer_sdk-1.0.0-py3-none-any.whl.
File metadata
- Download URL: vectorizer_sdk-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b0cb36d555d0fd66f09cd085e029401e4fe8ca7f8791d1351aea6b3a8671769
|
|
| MD5 |
0b890db79c15f8a86ba7ccd2c2c4672e
|
|
| BLAKE2b-256 |
8be2eb5f47ff14c5fde764b6c586b7abd2ef30e647b8d0340b1dc8b633824179
|