Skip to main content

Universal vector search wrapper for Postgres, MySQL, MariaDB, SQLite, DuckDB, ClickHouse, Redis, RethinkDB (pgvector, HeatWave, sqlite-vss, DuckDB VSS, ClickHouse ANN, RediSearch, Real-time changefeeds)

Project description

vectorwrap 0.9.0

PyPI GitHub Stars CI Coverage

SQLite→Postgres swap demo

Universal vector search wrapper for Postgres, MySQL, SQLite, DuckDB, ClickHouse (pgvector, HeatWave, sqlite-vss, DuckDB VSS, ClickHouse ANN).

Switch between PostgreSQL, MySQL, SQLite, DuckDB, and ClickHouse vector backends with a single line of code. Perfect for prototyping, testing, and production deployments.

Stable API - Core methods follow semantic versioning with backward compatibility guarantees.

Quick Start

Open in Colab

# Core install (PostgreSQL + MySQL support)
pip install vectorwrap

# Add SQLite support (requires system SQLite with extension support)
pip install "vectorwrap[sqlite]"

# Add DuckDB support (includes VSS extension)
pip install "vectorwrap[duckdb]"

# Add ClickHouse support (includes clickhouse-connect)
pip install "vectorwrap[clickhouse]"

# Install all backends for development
pip install "vectorwrap[sqlite,duckdb,clickhouse]"
from vectorwrap import VectorDB

# Your embedding function (use OpenAI, Hugging Face, etc.)
def embed(text: str) -> list[float]:
    # Return your 1536-dim embeddings here
    return [0.1, 0.2, ...] 

# Connect to any supported database
db = VectorDB("postgresql://user:pass@host/db")  # or mysql://... or sqlite:///path.db or duckdb:///path.db or clickhouse://...
db.create_collection("products", dim=1536)

# Insert vectors with metadata
db.upsert("products", 1, embed("Apple iPhone 15 Pro"), {"category": "phone", "price": 999})
db.upsert("products", 2, embed("Samsung Galaxy S24"), {"category": "phone", "price": 899})

# Semantic search with filtering
results = db.query(
    collection="products",
    query_vector=embed("latest smartphone"),
    top_k=5,
    filter={"category": "phone"}
)
print(results)  # → [(1, 0.023), (2, 0.087)]

Supported Backends

Database Vector Type Indexing Installation Notes
PostgreSQL 16+ + pgvector VECTOR(n) HNSW CREATE EXTENSION vector; Production ready
MySQL 8.2+ HeatWave VECTOR(n) Automatic Built-in Native vector support
MySQL ≤8.0 (legacy) JSON arrays None Built-in Slower, Python distance
MariaDB 11.8+ GA LTS VECTOR(n) HNSW Built-in Native vectors, 10M+ users
MariaDB <11.8 (legacy) JSON arrays None Built-in Auto-fallback, Python distance
SQLite + sqlite-vss Virtual table HNSW pip install "vectorwrap[sqlite]" Great for prototyping
DuckDB + VSS FLOAT[] arrays HNSW pip install "vectorwrap[duckdb]" Analytics + vectors
ClickHouse Array(Float32) HNSW pip install "vectorwrap[clickhouse]" High-performance analytics
Redis + RediSearch Binary vectors HNSW/FLAT pip install "vectorwrap[redis]" Ultra-fast in-memory search
RethinkDB JSON arrays In-memory HNSW pip install "vectorwrap[rethinkdb]" WORLD'S FIRST: Real-time changefeeds

Examples

Complete Example with OpenAI Embeddings

from openai import OpenAI
from vectorwrap import VectorDB

client = OpenAI()

def embed(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Use any database - just change the connection string!
db = VectorDB("postgresql://user:pass@localhost/vectors")
db.create_collection("documents", dim=1536)

# Add some documents
documents = [
    ("Python is a programming language", {"topic": "programming"}),
    ("Machine learning uses neural networks", {"topic": "ai"}),
    ("Databases store structured data", {"topic": "data"}),
]

for i, (text, metadata) in enumerate(documents):
    db.upsert("documents", i, embed(text), metadata)

# Search for similar content
query = "What is artificial intelligence?"
results = db.query("documents", embed(query), top_k=2)

for doc_id, distance in results:
    print(f"Document {doc_id}: distance={distance:.3f}")

Database-Specific Connection Strings

# PostgreSQL with pgvector
db = VectorDB("postgresql://user:password@localhost:5432/mydb")

# MySQL (8.2+ with native vectors or legacy JSON mode)  
db = VectorDB("mysql://user:password@localhost:3306/mydb")

# SQLite (local file or in-memory)
db = VectorDB("sqlite:///./vectors.db")
db = VectorDB("sqlite:///:memory:")

# DuckDB (local file or in-memory)
db = VectorDB("duckdb:///./vectors.db")
db = VectorDB("duckdb:///:memory:")

# ClickHouse (local or remote)
db = VectorDB("clickhouse://default@localhost:8123/default")
db = VectorDB("clickhouse://user:password@host:port/database")

API Reference

VectorDB(connection_string: str) - Stable

Create a vector database connection.

create_collection(name: str, dim: int) - Stable

Create a new collection for vectors of dimension dim.

upsert(collection: str, id: int, vector: list[float], metadata: dict = None) - Stable

Insert or update a vector with optional metadata.

query(collection: str, query_vector: list[float], top_k: int = 5, filter: dict = None) - Stable

Find the top_k most similar vectors. Returns list of (id, distance) tuples.

Filtering Support:

  • PostgreSQL & MySQL: Native SQL filtering
  • SQLite: Adaptive oversampling (fetches more results, then filters)
  • DuckDB: Native JSON filtering with SQL predicates
  • ClickHouse: Native JSON filtering with JSONExtract functions

API Stability

vectorwrap follows semantic versioning and maintains API stability:

Stable APIs (No breaking changes in minor versions)

  • Core Interface: VectorDB() constructor and connection string format
  • Collection Management: create_collection(name, dim)
  • Data Operations: upsert(collection, id, vector, metadata) and query(collection, query_vector, top_k, filter)
  • Return Formats: Query results as [(id, distance), ...] tuples

Evolving APIs (May change in minor versions with deprecation warnings)

  • Backend-specific optimizations: Index configuration, distance metrics
  • Advanced filtering: Complex filter syntax beyond simple key-value pairs
  • Batch operations: Bulk insert/update methods (planned)

Experimental (May change without notice)

  • New backends: Recently added database support may have API refinements
  • Extension methods: Database-specific functionality not in core API

Version Compatibility Promise

  • Patch versions (0.3.1 → 0.3.2): Only bug fixes, no API changes
  • Minor versions (0.3.x → 0.4.0): New features, deprecated APIs get warnings
  • Major versions (0.x → 1.0): Breaking changes allowed, migration guide provided

Current Status: v0.4.0 - Stable release with API backward compatibility guarantees

Installation Notes

SQLite Setup

SQLite support requires loadable extensions. On some systems you may need:

# macOS with Homebrew
brew install sqlite
export LDFLAGS="-L$(brew --prefix sqlite)/lib"
export CPPFLAGS="-I$(brew --prefix sqlite)/include"
pip install "vectorwrap[sqlite]"

# Or use system package manager
# Ubuntu: apt install libsqlite3-dev
# CentOS: yum install sqlite-devel

PostgreSQL Setup

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

MySQL Setup

MySQL 8.2+ has native VECTOR type support. For older versions, vectorwrap automatically falls back to JSON storage with Python-based distance calculations.

MariaDB Setup

MariaDB 11.8 GA LTS introduced native VECTOR data type with HNSW indexing, similar to pgvector. For older versions, vectorwrap automatically falls back to JSON storage.

# MariaDB 11.8+ (native VECTOR support)
db = VectorDB("mariadb://user:pass@localhost:3306/vectordb")
db.create_collection("embeddings", dim=1536)  # Uses VECTOR(1536) with HNSW

# Older versions automatically use JSON fallback
# No code changes needed - version detection is automatic

DuckDB Setup

DuckDB includes the VSS extension by default since v0.10.2. The extension provides HNSW indexing for fast vector similarity search:

# Works out of the box with vectorwrap[duckdb]
db = VectorDB("duckdb:///analytics.db")
db.create_collection("embeddings", dim=1536)  # Auto-creates HNSW index

ClickHouse Setup

ClickHouse provides native support for vector similarity search using ANN indexes:

# Works with vectorwrap[clickhouse]
db = VectorDB("clickhouse://default@localhost:8123/default")
db.create_collection("embeddings", dim=1536)  # Auto-creates HNSW index

Note: ClickHouse vector similarity indexes require ClickHouse version 25.8+ with the experimental feature enabled. The backend automatically handles this configuration.

Use Cases

  • Prototyping: Start with SQLite or DuckDB, scale to PostgreSQL or ClickHouse
  • Testing: Use in-memory databases (SQLite/DuckDB) for fast tests
  • Analytics: DuckDB or ClickHouse for combining vector search with analytical queries
  • Multi-tenant: Different customers on different database backends
  • Migration: Move vector data between database systems seamlessly
  • Hybrid deployments: PostgreSQL for production, DuckDB/ClickHouse for analytics
  • High-performance: ClickHouse for large-scale vector search workloads

Integrations

vectorwrap integrates with popular AI frameworks and platforms:

  • Appwrite: Add AI/vector capabilities to Appwrite apps (uses MariaDB backend) - Testing Guide
  • LangChain: Drop-in VectorStore adapter for RAG pipelines
  • LlamaIndex: VectorStore wrapper for data frameworks
  • Supabase: Managed PostgreSQL + pgvector helper
  • Milvus: Enterprise vector database adapter
  • Qdrant: Cloud-native vector search integration
  • Weaviate: Production-scale vector database integration
# Install with integrations
pip install "vectorwrap[langchain]"
pip install "vectorwrap[llamaindex]"
pip install "vectorwrap[milvus]"
pip install "vectorwrap[qdrant]"
pip install "vectorwrap[weaviate]"

Example with Appwrite (No External Vector DB Needed):

from vectorwrap.integrations.appwrite import AppwriteVectorStore

# Add vector search to your Appwrite app
vector_store = AppwriteVectorStore.from_connection_string(
    connection_url="mariadb://appwrite:password@localhost:3306/appwrite",
    collection_name="embeddings",
    dimension=1536
)

# Store and search vectors in Appwrite's MariaDB
vector_store.add_documents([
    {"text": "Hello world", "metadata": {"source": "doc1"}}
], embedding_function=embed_fn)

results = vector_store.search("greeting", embed_fn, top_k=5)

Example with LangChain:

from langchain.embeddings import OpenAIEmbeddings
from vectorwrap.integrations.langchain import VectorwrapStore

embeddings = OpenAIEmbeddings()
vectorstore = VectorwrapStore(
    connection_url="postgresql://user:pass@localhost/db",
    collection_name="documents",
    embedding_function=embeddings
)

vectorstore.add_texts(["Hello world", "LangChain + vectorwrap"])
results = vectorstore.similarity_search("greeting", k=5)

Example with Weaviate:

from vectorwrap.integrations.weaviate import WeaviateBackend

# Connect to Weaviate (local or cloud)
db = WeaviateBackend(url="http://localhost:8080")
# Or cloud: WeaviateBackend(url="https://xxx.weaviate.network", api_key="your-key")

# Create collection and insert vectors
db.create_collection("documents", dim=1536)
db.upsert("documents", 1, embedding_vector, {"source": "doc1"})

# Query with metadata filters
results = db.query("documents", query_vector, top_k=10, filter={"source": "doc1"})

See docs/INTEGRATIONS.md for complete integration guide.

Benchmarks

Comprehensive performance benchmarks are available in the bench/ directory.

Quick benchmark:

pip install "vectorwrap[all]" matplotlib
python bench/benchmark.py
python bench/visualize.py benchmark_results.json

See bench/README.md for detailed benchmarking guide.

Roadmap

v1.0 Stable Release

  • API Freeze: Lock stable APIs with full backward compatibility
  • Production Testing: Comprehensive benchmarks across all backends [DONE]
  • Documentation: Complete API docs and migration guides

Future Features

  • Elasticsearch with dense vector fields
  • Batch operations for bulk inserts
  • Index configuration options
  • Distance metrics: Cosine, dot product, custom functions

License

MIT © 2025 Mihir Ahuja


If vectorwrap saved you time, please star the repo – it helps others discover it!

PyPI PackageGitHub RepositoryReport Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorwrap-0.9.0.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectorwrap-0.9.0-py3-none-any.whl (53.2 kB view details)

Uploaded Python 3

File details

Details for the file vectorwrap-0.9.0.tar.gz.

File metadata

  • Download URL: vectorwrap-0.9.0.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for vectorwrap-0.9.0.tar.gz
Algorithm Hash digest
SHA256 2a2be7358e89faeb78cef835519219a24e041b01f7c05cac9b11bcad29b4d629
MD5 6f54b9476a7228e1c82db53d870224cb
BLAKE2b-256 5f3271f3138e1944efa205b1a7479369d3d21e42a7f623b36eecaf37730e5aaa

See more details on using hashes here.

File details

Details for the file vectorwrap-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: vectorwrap-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for vectorwrap-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6f5e84830a5a8f4f5ed4fe1f80577ca5a0a2d65274ef63f5a99ecf9c9c1f2741
MD5 ad03672e5a4879b3fe5fec3b78e819c0
BLAKE2b-256 e3157d9c2015079b8d70566dd1e2b1378d2725eeea526ea79afde18a58655406

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page