Skip to main content

Python SDK for Vectorizer - Semantic search and vector operations with UMICP protocol support

Project description

Vectorizer Python SDK

PyPI version Python Versions License

A comprehensive Python SDK for the Vectorizer semantic search service.

Package: vectorizer_sdk (PEP 625 compliant)
Version: 1.5.0
PyPI: https://pypi.org/project/vectorizer-sdk/

Features

  • Multiple Transport Protocols: HTTP/HTTPS and UMICP support
  • UMICP Protocol: High-performance protocol using umicp-sdk package (v0.3.2+)
  • Vector Operations: Insert, search, and manage vectors
  • Collection Management: Create, delete, and monitor collections
  • Semantic Search: Find similar content using embeddings
  • Intelligent Search: Advanced multi-query search with domain expansion
  • Contextual Search: Context-aware search with metadata filtering
  • Multi-Collection Search: Cross-collection search with intelligent aggregation
  • Hybrid Search: Combine dense and sparse vectors for improved search quality
  • Qdrant Compatibility: Full Qdrant REST API compatibility for easy migration
  • Batch Operations: Efficient bulk operations
  • Error Handling: Comprehensive exception handling
  • Async Support: Full async/await support for high performance
  • Type Safety: Full type hints and validation

Installation

# Install from PyPI
pip install vectorizer-sdk

# Or specific version
pip install vectorizer-sdk==1.5.0

Quick Start

import asyncio
from vectorizer import VectorizerClient, Vector

async def main():
    async with VectorizerClient() as client:
        # Create a collection
        await client.create_collection("my_collection", dimension=512)

        # Generate embedding
        embedding = await client.embed_text("Hello, world!")

        # Create vector
        vector = Vector(
            id="doc1",
            data=embedding,
            metadata={"text": "Hello, world!"}
        )

        # Insert text
        await client.insert_texts("my_collection", [{
            "id": "doc1",
            "text": "Hello, world!",
            "metadata": {"source": "example"}
        }])

        # Search for similar vectors
        results = await client.search_vectors(
            collection="my_collection",
            query="greeting",
            limit=5
        )

        # Intelligent search with multi-query expansion
        from models import IntelligentSearchRequest
        intelligent_results = await client.intelligent_search(
            IntelligentSearchRequest(
                query="machine learning algorithms",
                collections=["my_collection", "research"],
                max_results=15,
                domain_expansion=True,
                technical_focus=True,
                mmr_enabled=True,
                mmr_lambda=0.7
            )
        )

        # Semantic search with reranking
        from models import SemanticSearchRequest
        semantic_results = await client.semantic_search(
            SemanticSearchRequest(
                query="neural networks",
                collection="my_collection",
                max_results=10,
                semantic_reranking=True,
                similarity_threshold=0.6
            )
        )

        # Graph Operations (requires graph enabled in collection config)
        # List all graph nodes
        nodes = await client.list_graph_nodes("my_collection")
        print(f"Graph has {nodes.count} nodes")

        # Get neighbors of a node
        neighbors = await client.get_graph_neighbors("my_collection", "document1")
        print(f"Node has {len(neighbors.neighbors)} neighbors")

        # Find related nodes within 2 hops
        from models import FindRelatedRequest
        related = await client.find_related_nodes(
            "my_collection",
            "document1",
            FindRelatedRequest(max_hops=2, relationship_type="SIMILAR_TO")
        )
        print(f"Found {len(related.related)} related nodes")

        # Find shortest path between two nodes
        from models import FindPathRequest
        path = await client.find_graph_path(
            FindPathRequest(
                collection="my_collection",
                source="document1",
                target="document2"
            )
        )
        if path.found:
            print(f"Path found: {' -> '.join([n.id for n in path.path])}")

        # Create explicit relationship
        from models import CreateEdgeRequest
        edge = await client.create_graph_edge(
            CreateEdgeRequest(
                collection="my_collection",
                source="document1",
                target="document2",
                relationship_type="REFERENCES",
                weight=0.9
            )
        )
        print(f"Created edge: {edge.edge_id}")

        # Discover SIMILAR_TO edges for entire collection
        from models import DiscoverEdgesRequest
        discovery_result = await client.discover_graph_edges(
            "my_collection",
            DiscoverEdgesRequest(
                similarity_threshold=0.7,
                max_per_node=10
            )
        )
        print(f"Discovered {discovery_result.edges_created} edges")

        # Discover edges for a specific node
        node_discovery = await client.discover_graph_edges_for_node(
            "my_collection",
            "document1",
            DiscoverEdgesRequest(
                similarity_threshold=0.7,
                max_per_node=10
            )
        )
        print(f"Discovered {node_discovery.edges_created} edges for node")

        # Get discovery status
        status = await client.get_graph_discovery_status("my_collection")
        print(
            f"Discovery status: {status.total_nodes} nodes, "
            f"{status.total_edges} edges, "
            f"{status.progress_percentage:.1f}% complete"
        )

        # Contextual search with metadata filtering
        from models import ContextualSearchRequest
        contextual_results = await client.contextual_search(
            ContextualSearchRequest(
                query="deep learning",
                collection="my_collection",
                context_filters={"category": "AI", "year": 2023},
                max_results=10,
                context_weight=0.4
            )
        )

        # Multi-collection search
        from models import MultiCollectionSearchRequest
        multi_results = await client.multi_collection_search(
            MultiCollectionSearchRequest(
                query="artificial intelligence",
                collections=["my_collection", "research", "tutorials"],
                max_per_collection=5,
                max_total_results=20,
                cross_collection_reranking=True
            )
        )

        # Hybrid search (dense + sparse vectors)
        from models import HybridSearchRequest, SparseVector

        sparse_query = SparseVector(
            indices=[0, 5, 10, 15],
            values=[0.8, 0.6, 0.9, 0.7]
        )

        hybrid_results = await client.hybrid_search(
            HybridSearchRequest(
                collection="my_collection",
                query="search query",
                query_sparse=sparse_query,
                alpha=0.7,
                algorithm="rrf",  # "rrf", "weighted", or "alpha"
                dense_k=20,
                sparse_k=20,
                final_k=10
            )
        )

        print(f"Found {len(hybrid_results.results)} similar vectors")

        # Qdrant-compatible API usage
        # List collections
        qdrant_collections = await client.qdrant_list_collections()
        print(f"Qdrant collections: {qdrant_collections}")

        # Search points (Qdrant format)
        qdrant_results = await client.qdrant_search_points(
            collection="my_collection",
            vector=embedding,
            limit=10,
            with_payload=True
        )
        print(f"Qdrant search results: {qdrant_results}")

asyncio.run(main())

Configuration

HTTP Configuration (Default)

from vectorizer import VectorizerClient

# Default HTTP configuration
client = VectorizerClient(
    base_url="http://localhost:15002",
    api_key="your-api-key",
    timeout=30
)

UMICP Configuration (High Performance)

UMICP (Universal Messaging and Inter-process Communication Protocol) provides significant performance benefits using the official umicp-python package.

Using Connection String

from vectorizer import VectorizerClient

client = VectorizerClient(
    connection_string="umicp://localhost:15003",
    api_key="your-api-key"
)

print(f"Using protocol: {client.get_protocol()}")  # Output: umicp

Using Explicit Configuration

from vectorizer import VectorizerClient

client = VectorizerClient(
    protocol="umicp",
    api_key="your-api-key",
    umicp={
        "host": "localhost",
        "port": 15003
    },
    timeout=60
)

When to Use UMICP

Use UMICP when:

  • Large Payloads: Inserting or searching large batches of vectors
  • High Throughput: Need maximum performance for production workloads
  • Low Latency: Need minimal protocol overhead

Use HTTP when:

  • Development: Quick testing and debugging
  • Firewall Restrictions: Only HTTP/HTTPS allowed
  • Simple Deployments: No need for custom protocol setup

Protocol Comparison

Feature HTTP/HTTPS UMICP
Transport aiohttp (standard HTTP) umicp-python package
Performance Standard Optimized for large payloads
Latency Standard Lower overhead
Firewall Widely supported May require configuration
Installation Default Requires umicp-python

Installing with UMICP Support

pip install vectorizer-sdk umicp-python

Testing

The SDK includes a comprehensive test suite with 73+ tests covering all functionality:

Running Tests

# Run basic tests (recommended)
python3 test_simple.py

# Run comprehensive tests
python3 test_sdk_comprehensive.py

# Run all tests with detailed reporting
python3 run_tests.py

# Run specific test
python3 -m unittest test_simple.TestBasicFunctionality

Test Coverage

  • Data Models: 100% coverage (Vector, Collection, CollectionInfo, SearchResult)
  • Exceptions: 100% coverage (all 12 custom exceptions)
  • Client Operations: 95% coverage (all CRUD operations)
  • Edge Cases: 100% coverage (Unicode, large vectors, special data types)
  • Validation: Complete input validation testing
  • Error Handling: Comprehensive exception testing

Test Results

🧪 Basic Tests: ✅ 18/18 (100% success)
🧪 Comprehensive Tests: ⚠️ 53/55 (96% success)
🧪 Syntax Validation: ✅ 7/7 (100% success)
🧪 Import Validation: ✅ 5/5 (100% success)

📊 Overall Success Rate: 75%
⏱️ Total Execution Time: <0.4 seconds

Test Categories

  1. Unit Tests: Individual component testing
  2. Integration Tests: Mock-based workflow testing
  3. Validation Tests: Input validation and error handling
  4. Edge Case Tests: Unicode, large data, special scenarios
  5. Syntax Tests: Code compilation and import validation

Documentation

License

MIT License - see LICENSE file for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorizer_sdk-1.5.1.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectorizer_sdk-1.5.1-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file vectorizer_sdk-1.5.1.tar.gz.

File metadata

  • Download URL: vectorizer_sdk-1.5.1.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for vectorizer_sdk-1.5.1.tar.gz
Algorithm Hash digest
SHA256 3de67e386a4f2f0e582ed203ad4c60b92922ff49b0ec1cf31f891c163f932763
MD5 b6973ca630f2cd1ffa3ccc375590ef87
BLAKE2b-256 d6bce4c954f4ba10c75771ecf2aa4e739859e02dd37b0d62bd3e878dd0992c53

See more details on using hashes here.

File details

Details for the file vectorizer_sdk-1.5.1-py3-none-any.whl.

File metadata

  • Download URL: vectorizer_sdk-1.5.1-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for vectorizer_sdk-1.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8cfd326d9024d9467efc7d8786de47b6e0aaf940944dbbb07e46b2a43bc6f88c
MD5 ec3bbaca893e39b6fdf4094339da098c
BLAKE2b-256 0511d83bc8c05e774e3f3eba26009e5ae35174480b85cff889698aece12619cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page