Cross-platform vector database engine with pluggable adapters
Project description
CrossVector
Cross-platform Vector Database Engine
A flexible, production-ready vector database engine with pluggable adapters for multiple vector databases (AstraDB, ChromaDB, Milvus, PGVector) and embedding providers (OpenAI, Gemini, and more).
Simplify your vector search infrastructure with a single, unified API across all major vector databases.
Features
- Pluggable Architecture: Easy adapter pattern for both databases and embeddings
- Multiple Vector Databases: AstraDB, ChromaDB, Milvus, PGVector
- Multiple Embedding Providers: OpenAI (Gemini coming soon)
- Install Only What You Need: Optional dependencies per adapter
- Type-Safe: Full Pydantic validation
- Consistent API: Same interface across all adapters
Supported Vector Databases
| Database | Status | Features |
|---|---|---|
| AstraDB | ✅ Production | Cloud-native Cassandra, lazy initialization |
| ChromaDB | ✅ Production | Cloud/HTTP/Local modes, auto-fallback |
| Milvus | ✅ Production | Auto-indexing, schema validation |
| PGVector | ✅ Production | PostgreSQL extension, JSONB metadata |
Supported Embedding Providers
| Provider | Status | Models |
|---|---|---|
| OpenAI | ✅ Production | text-embedding-3-small, 3-large, ada-002 |
| Gemini | ✅ Production | text-embedding-004, gemini-embedding-001 |
Installation
Minimal (core only)
pip install crossvector
With specific adapters
# AstraDB + OpenAI
pip install crossvector[astradb,openai]
# ChromaDB + OpenAI
pip install crossvector[chromadb,openai]
# All databases + OpenAI
pip install crossvector[all-dbs,openai]
# Everything
pip install crossvector[all]
Quick Start
from crossvector import VectorEngine, Document, UpsertRequest, SearchRequest
from crossvector.embeddings.openai import OpenAIEmbeddingAdapter
from crossvector.dbs.astradb import AstraDBAdapter
# Initialize engine
engine = VectorEngine(
embedding_adapter=OpenAIEmbeddingAdapter(model_name="text-embedding-3-small"),
db_adapter=AstraDBAdapter(),
collection_name="my_documents"
)
# Upsert documents
docs = [
Document(id="doc1", text="The quick brown fox", metadata={"category": "animals"}),
Document(id="doc2", text="Artificial intelligence", metadata={"category": "tech"}),
]
result = engine.upsert(UpsertRequest(documents=docs))
print(f"Inserted {result['count']} documents")
# Search
results = engine.search(SearchRequest(query="AI and ML", limit=5))
for doc in results:
print(f"Score: {doc.get('$similarity', 'N/A')}, Text: {doc.get('text')}")
# Get document by ID
doc = engine.get("doc1")
# Count documents
count = engine.count()
# Delete documents
engine.delete_one("doc1")
engine.delete_many(["doc2", "doc3"])
Configuration
Environment Variables
Create a .env file:
# OpenAI (for embeddings)
OPENAI_API_KEY=sk-...
# AstraDB
ASTRA_DB_APPLICATION_TOKEN=AstraCS:...
ASTRA_DB_API_ENDPOINT=https://...
ASTRA_DB_COLLECTION_NAME=my_collection
# ChromaDB Cloud
CHROMA_API_KEY=...
CHROMA_CLOUD_TENANT=...
CHROMA_CLOUD_DATABASE=...
# Milvus
MILVUS_API_ENDPOINT=https://...
MILVUS_USER=...
MILVUS_PASSWORD=...
# PGVector
PGVECTOR_HOST=localhost
PGVECTOR_PORT=5432
PGVECTOR_DBNAME=vectordb
PGVECTOR_USER=postgres
PGVECTOR_PASSWORD=...
# Vector metric (cosine, dot_product, euclidean)
VECTOR_METRIC=cosine
Database-Specific Examples
AstraDB
from crossvector.dbs.astradb import AstraDBAdapter
adapter = AstraDBAdapter()
adapter.initialize(
collection_name="my_collection",
embedding_dimension=1536,
metric="cosine"
)
ChromaDB
from crossvector.dbs.chroma import ChromaDBAdapter
# Local mode
adapter = ChromaDBAdapter()
# Cloud mode (auto-detected from env vars)
# CHROMA_API_KEY, CHROMA_CLOUD_TENANT, CHROMA_CLOUD_DATABASE
adapter = ChromaDBAdapter()
adapter.initialize(
collection_name="my_collection",
embedding_dimension=1536
)
Milvus
from crossvector.dbs.milvus import MilvusDBAdapter
adapter = MilvusDBAdapter()
adapter.initialize(
collection_name="my_collection",
embedding_dimension=1536,
metric="cosine"
)
PGVector
from crossvector.dbs.pgvector import PGVectorAdapter
adapter = PGVectorAdapter()
adapter.initialize(
table_name="my_vectors",
embedding_dimension=1536,
metric="cosine"
)
Custom Adapters
Create Custom Database Adapter
from crossvector.abc import VectorDBAdapter
from typing import Any, Dict, List, Set
class MyCustomDBAdapter(VectorDBAdapter):
def initialize(self, collection_name: str, embedding_dimension: int, metric: str = "cosine"):
# Your implementation
pass
def get_collection(self, collection_name: str, embedding_dimension: int, metric: str = "cosine"):
# Your implementation
pass
def upsert(self, documents: List[Dict[str, Any]]):
# Your implementation
pass
def search(self, vector: List[float], limit: int, fields: Set[str]) -> List[Dict[str, Any]]:
# Your implementation
pass
def get(self, id: str) -> Dict[str, Any] | None:
# Your implementation
pass
def count(self) -> int:
# Your implementation
pass
def delete_one(self, id: str) -> int:
# Your implementation
pass
def delete_many(self, ids: List[str]) -> int:
# Your implementation
pass
Create Custom Embedding Adapter
from crossvector.abc import EmbeddingAdapter
from typing import List
class MyCustomEmbeddingAdapter(EmbeddingAdapter):
def __init__(self, model_name: str):
super().__init__(model_name)
# Initialize your client
@property
def embedding_dimension(self) -> int:
return 768 # Your model's dimension
def get_embeddings(self, texts: List[str]) -> List[List[float]]:
# Your implementation
pass
Document Format
All adapters expect documents in this standard format:
{
"_id": "unique-doc-id", # Document ID (string)
"$vector": [0.1, 0.2, ...], # Embedding vector (List[float])
"text": "original text content", # Original text
"any_field": "value", # Additional metadata fields
"another_field": 123,
}
Development
# Clone repository
git clone https://github.com/thewebscraping/crossvector.git
cd crossvector
# Install with dev dependencies
pip install -e ".[all,dev]"
# Run tests
pytest
# Run linting
ruff check .
# Format code
ruff format .
# Setup pre-commit hooks
pre-commit install
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=. --cov-report=html
# Run specific adapter tests
pytest tests/test_gemini_embeddings.py
pytest tests/test_openai_embeddings.py
License
MIT License - see LICENSE file for details
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Roadmap
- Gemini embedding adapter
- Qdrant adapter (not supported yet)
- Pinecone adapter (not supported yet)
- Weaviate adapter (not supported yet)
- Async support
- Batch operations optimization
- Advanced filtering
- Hybrid search (vector + keyword)
- Rerank support (planned)
- Additional embedding providers (e.g., Cohere, Mistral, Ollama)
Support
For issues and questions:
- GitHub Issues: https://github.com/thewebscraping/crossvector/issues
- Email: thetwofarm@gmail.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crossvector-0.1.0.tar.gz.
File metadata
- Download URL: crossvector-0.1.0.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff03a9800a1b005ce1d42c526fc756923678c2402f16dbf3dd0ab15ff47e1e47
|
|
| MD5 |
3e66e210a8ac6963c304a156bc16a26a
|
|
| BLAKE2b-256 |
937761e2762720ce98a1debba58d65dbf9c0f627b817085b68985b446618fa67
|
Provenance
The following attestation bundles were made for crossvector-0.1.0.tar.gz:
Publisher:
publish.yml on thewebscraping/crossvector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crossvector-0.1.0.tar.gz -
Subject digest:
ff03a9800a1b005ce1d42c526fc756923678c2402f16dbf3dd0ab15ff47e1e47 - Sigstore transparency entry: 717275249
- Sigstore integration time:
-
Permalink:
thewebscraping/crossvector@eb6c14d32303a0fb0eead761ed3024cb1cf3e261 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/thewebscraping
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@eb6c14d32303a0fb0eead761ed3024cb1cf3e261 -
Trigger Event:
release
-
Statement type:
File details
Details for the file crossvector-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crossvector-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c759e5b78129f367a2dc5d89ae80832cd1a61cdad2a4881eefeaed646d155832
|
|
| MD5 |
90bce9e1f0199f0a41b8803c9091729f
|
|
| BLAKE2b-256 |
71c3fed5d727bea27197e14eed96fe5b3d5c5fe09ea3f16462d9af838286657f
|
Provenance
The following attestation bundles were made for crossvector-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on thewebscraping/crossvector
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crossvector-0.1.0-py3-none-any.whl -
Subject digest:
c759e5b78129f367a2dc5d89ae80832cd1a61cdad2a4881eefeaed646d155832 - Sigstore transparency entry: 717275314
- Sigstore integration time:
-
Permalink:
thewebscraping/crossvector@eb6c14d32303a0fb0eead761ed3024cb1cf3e261 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/thewebscraping
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@eb6c14d32303a0fb0eead761ed3024cb1cf3e261 -
Trigger Event:
release
-
Statement type: