Python SDK for Vectorizer - Semantic search and vector operations with UMICP protocol support

These details have not been verified by PyPI

Project links

Project description

Vectorizer Python SDK

A comprehensive Python SDK for the Vectorizer semantic search service.

Package: vectorizer_sdk (PEP 625 compliant)
Version: 1.5.0
PyPI: https://pypi.org/project/vectorizer-sdk/

Features

Multiple Transport Protocols: HTTP/HTTPS and UMICP support
UMICP Protocol: High-performance protocol using umicp-sdk package (v0.3.2+)
Vector Operations: Insert, search, and manage vectors
Collection Management: Create, delete, and monitor collections
Semantic Search: Find similar content using embeddings
Intelligent Search: Advanced multi-query search with domain expansion
Contextual Search: Context-aware search with metadata filtering
Multi-Collection Search: Cross-collection search with intelligent aggregation
Hybrid Search: Combine dense and sparse vectors for improved search quality
Qdrant Compatibility: Full Qdrant REST API compatibility for easy migration
Batch Operations: Efficient bulk operations
Error Handling: Comprehensive exception handling
Async Support: Full async/await support for high performance
Type Safety: Full type hints and validation

Installation

# Install from PyPI
pip install vectorizer-sdk

# Or specific version
pip install vectorizer-sdk==1.5.0

Quick Start

import asyncio
from vectorizer import VectorizerClient, Vector

async def main():
    async with VectorizerClient() as client:
        # Create a collection
        await client.create_collection("my_collection", dimension=512)

        # Generate embedding
        embedding = await client.embed_text("Hello, world!")

        # Create vector
        vector = Vector(
            id="doc1",
            data=embedding,
            metadata={"text": "Hello, world!"}
        )

        # Insert text
        await client.insert_texts("my_collection", [{
            "id": "doc1",
            "text": "Hello, world!",
            "metadata": {"source": "example"}
        }])

        # Search for similar vectors
        results = await client.search_vectors(
            collection="my_collection",
            query="greeting",
            limit=5
        )

        # Intelligent search with multi-query expansion
        from models import IntelligentSearchRequest
        intelligent_results = await client.intelligent_search(
            IntelligentSearchRequest(
                query="machine learning algorithms",
                collections=["my_collection", "research"],
                max_results=15,
                domain_expansion=True,
                technical_focus=True,
                mmr_enabled=True,
                mmr_lambda=0.7
            )
        )

        # Semantic search with reranking
        from models import SemanticSearchRequest
        semantic_results = await client.semantic_search(
            SemanticSearchRequest(
                query="neural networks",
                collection="my_collection",
                max_results=10,
                semantic_reranking=True,
                similarity_threshold=0.6
            )
        )

        # Graph Operations (requires graph enabled in collection config)
        # List all graph nodes
        nodes = await client.list_graph_nodes("my_collection")
        print(f"Graph has {nodes.count} nodes")

        # Get neighbors of a node
        neighbors = await client.get_graph_neighbors("my_collection", "document1")
        print(f"Node has {len(neighbors.neighbors)} neighbors")

        # Find related nodes within 2 hops
        from models import FindRelatedRequest
        related = await client.find_related_nodes(
            "my_collection",
            "document1",
            FindRelatedRequest(max_hops=2, relationship_type="SIMILAR_TO")
        )
        print(f"Found {len(related.related)} related nodes")

        # Find shortest path between two nodes
        from models import FindPathRequest
        path = await client.find_graph_path(
            FindPathRequest(
                collection="my_collection",
                source="document1",
                target="document2"
            )
        )
        if path.found:
            print(f"Path found: {' -> '.join([n.id for n in path.path])}")

        # Create explicit relationship
        from models import CreateEdgeRequest
        edge = await client.create_graph_edge(
            CreateEdgeRequest(
                collection="my_collection",
                source="document1",
                target="document2",
                relationship_type="REFERENCES",
                weight=0.9
            )
        )
        print(f"Created edge: {edge.edge_id}")

        # Discover SIMILAR_TO edges for entire collection
        from models import DiscoverEdgesRequest
        discovery_result = await client.discover_graph_edges(
            "my_collection",
            DiscoverEdgesRequest(
                similarity_threshold=0.7,
                max_per_node=10
            )
        )
        print(f"Discovered {discovery_result.edges_created} edges")

        # Discover edges for a specific node
        node_discovery = await client.discover_graph_edges_for_node(
            "my_collection",
            "document1",
            DiscoverEdgesRequest(
                similarity_threshold=0.7,
                max_per_node=10
            )
        )
        print(f"Discovered {node_discovery.edges_created} edges for node")

        # Get discovery status
        status = await client.get_graph_discovery_status("my_collection")
        print(
            f"Discovery status: {status.total_nodes} nodes, "
            f"{status.total_edges} edges, "
            f"{status.progress_percentage:.1f}% complete"
        )

        # Contextual search with metadata filtering
        from models import ContextualSearchRequest
        contextual_results = await client.contextual_search(
            ContextualSearchRequest(
                query="deep learning",
                collection="my_collection",
                context_filters={"category": "AI", "year": 2023},
                max_results=10,
                context_weight=0.4
            )
        )

        # Multi-collection search
        from models import MultiCollectionSearchRequest
        multi_results = await client.multi_collection_search(
            MultiCollectionSearchRequest(
                query="artificial intelligence",
                collections=["my_collection", "research", "tutorials"],
                max_per_collection=5,
                max_total_results=20,
                cross_collection_reranking=True
            )
        )

        # Hybrid search (dense + sparse vectors)
        from models import HybridSearchRequest, SparseVector

        sparse_query = SparseVector(
            indices=[0, 5, 10, 15],
            values=[0.8, 0.6, 0.9, 0.7]
        )

        hybrid_results = await client.hybrid_search(
            HybridSearchRequest(
                collection="my_collection",
                query="search query",
                query_sparse=sparse_query,
                alpha=0.7,
                algorithm="rrf",  # "rrf", "weighted", or "alpha"
                dense_k=20,
                sparse_k=20,
                final_k=10
            )
        )

        print(f"Found {len(hybrid_results.results)} similar vectors")

        # Qdrant-compatible API usage
        # List collections
        qdrant_collections = await client.qdrant_list_collections()
        print(f"Qdrant collections: {qdrant_collections}")

        # Search points (Qdrant format)
        qdrant_results = await client.qdrant_search_points(
            collection="my_collection",
            vector=embedding,
            limit=10,
            with_payload=True
        )
        print(f"Qdrant search results: {qdrant_results}")

asyncio.run(main())

Configuration

HTTP Configuration (Default)

from vectorizer import VectorizerClient

# Default HTTP configuration
client = VectorizerClient(
    base_url="http://localhost:15002",
    api_key="your-api-key",
    timeout=30
)

UMICP Configuration (High Performance)

UMICP (Universal Messaging and Inter-process Communication Protocol) provides significant performance benefits using the official umicp-python package.

Using Connection String

from vectorizer import VectorizerClient

client = VectorizerClient(
    connection_string="umicp://localhost:15003",
    api_key="your-api-key"
)

print(f"Using protocol: {client.get_protocol()}")  # Output: umicp

Using Explicit Configuration

from vectorizer import VectorizerClient

client = VectorizerClient(
    protocol="umicp",
    api_key="your-api-key",
    umicp={
        "host": "localhost",
        "port": 15003
    },
    timeout=60
)

When to Use UMICP

Use UMICP when:

Large Payloads: Inserting or searching large batches of vectors
High Throughput: Need maximum performance for production workloads
Low Latency: Need minimal protocol overhead

Use HTTP when:

Development: Quick testing and debugging
Firewall Restrictions: Only HTTP/HTTPS allowed
Simple Deployments: No need for custom protocol setup

Protocol Comparison

Feature	HTTP/HTTPS	UMICP
Transport	aiohttp (standard HTTP)	umicp-python package
Performance	Standard	Optimized for large payloads
Latency	Standard	Lower overhead
Firewall	Widely supported	May require configuration
Installation	Default	Requires umicp-python

Installing with UMICP Support

pip install vectorizer-sdk umicp-python

Testing

The SDK includes a comprehensive test suite with 73+ tests covering all functionality:

Running Tests

# Run basic tests (recommended)
python3 test_simple.py

# Run comprehensive tests
python3 test_sdk_comprehensive.py

# Run all tests with detailed reporting
python3 run_tests.py

# Run specific test
python3 -m unittest test_simple.TestBasicFunctionality

Test Coverage

Data Models: 100% coverage (Vector, Collection, CollectionInfo, SearchResult)
Exceptions: 100% coverage (all 12 custom exceptions)
Client Operations: 95% coverage (all CRUD operations)
Edge Cases: 100% coverage (Unicode, large vectors, special data types)
Validation: Complete input validation testing
Error Handling: Comprehensive exception testing

Test Results

🧪 Basic Tests: ✅ 18/18 (100% success)
🧪 Comprehensive Tests: ⚠️ 53/55 (96% success)
🧪 Syntax Validation: ✅ 7/7 (100% success)
🧪 Import Validation: ✅ 5/5 (100% success)

📊 Overall Success Rate: 75%
⏱️ Total Execution Time: <0.4 seconds

Test Categories

Unit Tests: Individual component testing
Integration Tests: Mock-based workflow testing
Validation Tests: Input validation and error handling
Edge Case Tests: Unicode, large data, special scenarios
Syntax Tests: Code compilation and import validation

Documentation

License

MIT License - see LICENSE file for details.

Support

GitHub Issues: https://github.com/cmmv-hive/vectorizer/issues
Email: team@hivellm.org

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.3

Apr 22, 2026

3.0.0

Apr 21, 2026

2.2.0

Dec 11, 2025

2.1.0

Dec 11, 2025

1.7.1

Nov 30, 2025

This version

1.5.1

Nov 24, 2025

1.3.0

Nov 16, 2025

1.1.2

Oct 25, 2025

1.0.1

Oct 24, 2025

1.0.0

Oct 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorizer_sdk-1.5.1.tar.gz (41.9 kB view details)

Uploaded Nov 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vectorizer_sdk-1.5.1-py3-none-any.whl (15.6 kB view details)

Uploaded Nov 24, 2025 Python 3

File details

Details for the file vectorizer_sdk-1.5.1.tar.gz.

File metadata

Download URL: vectorizer_sdk-1.5.1.tar.gz
Upload date: Nov 24, 2025
Size: 41.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for vectorizer_sdk-1.5.1.tar.gz
Algorithm	Hash digest
SHA256	`3de67e386a4f2f0e582ed203ad4c60b92922ff49b0ec1cf31f891c163f932763`
MD5	`b6973ca630f2cd1ffa3ccc375590ef87`
BLAKE2b-256	`d6bce4c954f4ba10c75771ecf2aa4e739859e02dd37b0d62bd3e878dd0992c53`

See more details on using hashes here.

File details

Details for the file vectorizer_sdk-1.5.1-py3-none-any.whl.

File metadata

Download URL: vectorizer_sdk-1.5.1-py3-none-any.whl
Upload date: Nov 24, 2025
Size: 15.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for vectorizer_sdk-1.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8cfd326d9024d9467efc7d8786de47b6e0aaf940944dbbb07e46b2a43bc6f88c`
MD5	`ec3bbaca893e39b6fdf4094339da098c`
BLAKE2b-256	`0511d83bc8c05e774e3f3eba26009e5ae35174480b85cff889698aece12619cb`

See more details on using hashes here.

vectorizer-sdk 1.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vectorizer Python SDK

Features

Installation

Quick Start

Configuration

HTTP Configuration (Default)

UMICP Configuration (High Performance)

Using Connection String

Using Explicit Configuration

When to Use UMICP

Protocol Comparison

Installing with UMICP Support

Testing

Running Tests

Test Coverage

Test Results

Test Categories

Documentation

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes