Where vectors come alive - A lightweight, visual-first vector database with embedded ML models

These details have not been verified by PyPI

Project links

Project description

VectrixDB

Where vectors come alive.

A lightweight vector database with embedded ML models, beautiful dashboard, and GraphRAG - no API keys required.

Features

4 Search Modes - Dense, Hybrid, Ultimate, and Graph (GraphRAG)
8 Storage Backends - Memory, SQLite, Lakebase, DeltaLake, CosmosDB, PostgreSQL, OpenSearch, Aurora PostgreSQL
Embedded Models - Works offline with bundled ONNX models
Model Selection - Choose from bundled, HuggingFace, or GitHub release models
Document Index - Hierarchical document storage with chunking
Visual Dashboard - Built-in web UI for managing collections
Zero Config - Just pip install and start using

Installation

From PyPI (Recommended)

pip install vectrixdb

From GitHub (Latest)

pip install git+https://github.com/knowusuboaky/VectrixDB.git

Specific Version from GitHub

pip install git+https://github.com/knowusuboaky/VectrixDB.git@v2.0.0

From Source

git clone https://github.com/knowusuboaky/VectrixDB.git
cd VectrixDB
pip install -e .

Optional Dependencies

# HuggingFace sentence-transformers
pip install vectrixdb[hf]

# FastEmbed (lightweight ONNX embeddings)
pip install vectrixdb[fastembed]

# All embedding providers
pip install vectrixdb[embeddings]

# Visualization (UMAP)
pip install vectrixdb[viz]

# Everything
pip install vectrixdb[all]

Quick Start

from vectrixdb import Vectrix

db = Vectrix("my_docs")
db.add(["Python is great", "JavaScript powers the web", "Rust is fast"])

results = db.search("programming")
print(results.top.text)

Search Modes

VectrixDB offers 4 search modes, each building on the previous:

Mode	Components	Best For
`dense`	Vector similarity	Fast semantic search
`hybrid`	Dense + Sparse + Reranker	Keyword + semantic matching
`ultimate`	Hybrid + ColBERT	Maximum accuracy
`graph`	Ultimate + Knowledge Graph	Complex reasoning (GraphRAG)

# Choose your mode
db = Vectrix("docs", mode="dense")     # Fastest
db = Vectrix("docs", mode="hybrid")    # Balanced
db = Vectrix("docs", mode="ultimate")  # Best quality
db = Vectrix("docs", mode="graph")     # GraphRAG

Model Selection

Customize models for each component. Models load from 3 sources:

1. Bundled Models (Offline, No Downloads)

Pre-packaged ONNX models that work without internet (~100MB total):

db = Vectrix(
    "docs",
    mode="ultimate",
    dense_model="e5-small",            # 384 dim, 33MB
    sparse_model="bm25",               # 1MB
    reranker_model="L12",              # 33MB
    late_interaction_model="colbert",  # 33MB
)

Component	Alias	Model	Dimension	Size
Dense	`e5-small`	intfloat/e5-small-v2	384	33MB
Sparse	`bm25`	BM25 vocabulary	-	1MB
Reranker	`L12`	ms-marco-MiniLM-L12-v2	-	33MB
ColBERT	`colbert`	answerai-colbert-small-v1	128	33MB

2. GitHub Release Models (Auto-Downloaded)

Larger models hosted on GitHub releases (downloaded on first use):

db = Vectrix(
    "docs",
    mode="ultimate",
    dense_model="bge-base",            # 768 dim, higher quality
    sparse_model="bm25",
    reranker_model="bge-reranker",     # Higher quality
    late_interaction_model="colbert-v2",
)

Alias	Model	Dimension	Size
`bge-base`	BAAI/bge-base-en-v1.5	768	110MB
`bge-small`	BAAI/bge-small-en-v1.5	384	127MB
`bge-reranker`	BAAI/bge-reranker-base	-	212MB
`colbert-v2`	colbert-ir/colbertv2.0	128	67MB
`splade`	SPLADE++	-	508MB

3. HuggingFace Models

Use any compatible model from HuggingFace (requires pip install vectrixdb[hf]):

db = Vectrix(
    "docs",
    mode="hybrid",
    dense_model="BAAI/bge-large-en-v1.5",
    sparse_model="naver/splade-cocondenser-ensembledistil",
    reranker_model="cross-encoder/ms-marco-MiniLM-L-12-v2",
)

Compatible models:

Dense: BAAI/bge-large-en-v1.5, intfloat/e5-large-v2, sentence-transformers/all-mpnet-base-v2
Sparse: naver/splade-cocondenser-ensembledistil
Reranker: cross-encoder/ms-marco-MiniLM-L-12-v2, BAAI/bge-reranker-base
ColBERT: jinaai/jina-colbert-v2, colbert-ir/colbertv2.0

Storage Backends

VectrixDB supports 8 storage backends:

Backend	Type	Persistence	Modes	Best For
`memory`	In-Memory	No	All	Testing, small datasets
`sqlite`	File-based	Yes	All	Local development
`lakebase`	PostgreSQL + pgvector	Yes	All	Databricks Lakebase
`delta_lake`	Delta Lake	Yes	All	Databricks Unity Catalog
`cosmosdb`	Azure CosmosDB	Yes	All	Azure cloud
`postgresql`	PostgreSQL + pgvector	Yes	All	Self-hosted PostgreSQL
`opensearch`	AWS OpenSearch	Yes	Dense, Hybrid	AWS managed search
`aurora_postgresql`	AWS Aurora + pgvector	Yes	All	AWS managed PostgreSQL

Memory Storage (Default)

from vectrixdb import VectrixDB, StorageConfig, StorageBackend

# In-memory (default, no persistence)
db = VectrixDB()

# Or explicitly
config = StorageConfig(backend=StorageBackend.MEMORY)
db = VectrixDB(storage_config=config)

SQLite Storage (Local Persistence)

from vectrixdb import VectrixDB

# SQLite with file path
db = VectrixDB(path="./my_vectors")

# Creates: ./my_vectors/vectrix.db

Lakebase Storage (Databricks)

from vectrixdb import Vectrix, VectrixDB

# Connect to Lakebase (PostgreSQL + pgvector)
lakebase = VectrixDB.with_lakebase(
    host="your-lakebase-host.cloud.databricks.com",
    database="databricks_postgres",
    user="your-user",
    password="your-oauth-token",  # OAuth JWT from Lakebase Connect
    port=5432,
    schema="public",  # Optional, defaults to "public"
)

# Use with Vectrix
db = Vectrix(
    "products",
    mode="ultimate",
    storage_backend=lakebase,
)

db.add(texts=["Product A", "Product B"])
results = db.search("query")

Delta Lake Storage (Databricks Unity Catalog)

from vectrixdb import VectrixDB

# Connect to Delta Lake via Databricks SQL
delta = VectrixDB.with_delta_lake(
    workspace_url="https://your-workspace.cloud.databricks.com",
    token="dapi_your_token",
    catalog="main",
    schema="vectrixdb",
    warehouse_id="your_warehouse_id",
)

# Use with Vectrix
db = Vectrix("products", mode="hybrid", storage_backend=delta)

CosmosDB Storage (Azure)

from vectrixdb import VectrixDB, StorageConfig, StorageBackend

config = StorageConfig(
    backend=StorageBackend.COSMOSDB,
    cosmos_endpoint="https://your-account.documents.azure.com:443/",
    cosmos_key="your-primary-key",
    cosmos_database="vectrixdb",
)

db = VectrixDB(storage_config=config)

OpenSearch Storage (AWS)

AWS OpenSearch Serverless with native k-NN vector search.

Note: OpenSearch supports dense and hybrid modes only.

from vectrixdb import VectrixDB

opensearch = VectrixDB.with_opensearch(
    endpoint="https://xxx.us-east-1.aoss.amazonaws.com",
    region="us-east-1",
)

Aurora PostgreSQL Storage (AWS)

AWS Aurora PostgreSQL with pgvector. Supports all modes including ultimate.

from vectrixdb import VectrixDB

aurora = VectrixDB.with_aurora_postgresql(
    host="cluster.xxx.us-east-1.rds.amazonaws.com",
    database="vectrixdb",
    user="admin",
    password="password",
)

Adaptive Schema

Schema adapts based on selected mode:

Mode	Columns Created
`dense`	`id`, `dense_embedding`, `metadata`, `text_content`, `created_at`, `updated_at`
`hybrid`	+ `sparse_embedding`
`ultimate`	+ `late_interaction_embedding`
`graph`	Same as ultimate + graph tables

Document Index

Hierarchical document storage with automatic chunking:

from vectrixdb import DocumentIndex, chunk_text, chunk_with_context

# Create document index
doc_index = DocumentIndex("./docs_index")

# Chunk text (simple)
chunks = chunk_text(
    "Your long document text here...",
    chunk_size=1000,
    chunk_overlap=200,
)

# Chunk markdown with context (preserves headings)
chunks = chunk_with_context(
    markdown_text,
    chunk_size=1200,
    chunk_overlap=200,
)
# Returns: [{"content": "...", "heading": "Section Title", "level": 2}, ...]

# Build tree from markdown
from vectrixdb import build_tree_from_markdown, build_tree_from_pdf

tree = build_tree_from_markdown(markdown_content)
tree = build_tree_from_pdf(pdf_path)

Document Index with Storage Backend

from vectrixdb import DocumentIndex, VectrixDB

# Connect to storage
lakebase = VectrixDB.with_lakebase(...)

# Document index uses storage backend
doc_index = DocumentIndex(storage=lakebase)

# Save documents and nodes
doc_index.save_document({
    "doc_id": "doc_001",
    "title": "My Document",
    "doc_type": "markdown",
    "page_count": 5,
})

# Query documents
docs = doc_index.list_documents()
nodes = doc_index.get_document_nodes("doc_001")

Metadata & Filtering

db.add(
    texts=["iPhone 15", "Galaxy S24", "Pixel 8"],
    metadata=[
        {"brand": "Apple", "price": 999},
        {"brand": "Samsung", "price": 899},
        {"brand": "Google", "price": 699}
    ]
)

# Filter by metadata
results = db.search("smartphone", filter={"brand": "Apple"})

# Complex filters
results = db.search("phone", filter={
    "brand": {"$in": ["Apple", "Samsung"]},
    "price": {"$lt": 1000}
})

Advanced API

For full control, use the VectrixDB class directly:

from vectrixdb import VectrixDB, Collection

# Create database
db = VectrixDB(path="./my_db")

# Create collection with specific dimension
coll = db.create_collection("products", dimension=384)

# Add vectors directly
coll.add(
    ids=["p1", "p2"],
    vectors=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
    metadata=[{"name": "Product A"}, {"name": "Product B"}],
)

# Search with vectors
results = coll.search(query=[0.1, 0.2, ...], limit=10)

# List collections
collections = db.list_collections()

# Delete collection
db.delete_collection("products")

Embedded Models API

Use embedding models directly:

from vectrixdb import (
    DenseEmbedder,
    SparseEmbedder,
    RerankerEmbedder,
    LateInteractionEmbedder,
)

# Dense embeddings
dense = DenseEmbedder(model="e5-small")
vectors = dense.embed(["Hello world", "How are you?"])

# Sparse embeddings (BM25)
sparse = SparseEmbedder()
sparse_vectors = sparse.embed(["Hello world"])

# Reranker
reranker = RerankerEmbedder(model="L12")
scores = reranker.rerank("query", ["doc1", "doc2", "doc3"])

# Late interaction (ColBERT)
colbert = LateInteractionEmbedder(model="colbert")
token_embeddings = colbert.embed(["Hello world"])

REST API

Start the server:

VECTRIXDB_API_KEY=your_secret vectrixdb serve --port 7337

Open the dashboard at http://localhost:7337/dashboard

API Examples

# Create collection
curl -X POST http://localhost:7337/api/v1/collections \
  -H "Content-Type: application/json" \
  -H "api-key: your_secret" \
  -d '{"name": "docs", "dimension": 384}'

# Add documents (auto-embedding)
curl -X POST http://localhost:7337/api/v1/collections/docs/text-upsert \
  -H "Content-Type: application/json" \
  -H "api-key: your_secret" \
  -d '{"points": [{"id": "1", "text": "Hello world"}]}'

# Search
curl -X POST http://localhost:7337/api/v1/collections/docs/text-search \
  -H "Content-Type: application/json" \
  -H "api-key: your_secret" \
  -d '{"query_text": "greeting", "limit": 10}'

GraphRAG

Build knowledge graphs from documents:

from vectrixdb import Vectrix, create_openai_config

# Create with graph mode
db = Vectrix("docs", mode="graph")

# Or with custom LLM config
config = create_openai_config(
    api_key="your-openai-key",
    model="gpt-4o-mini",
)

db = Vectrix(
    "docs",
    mode="graph",
    graphrag_config=config,
)

# Add documents (extracts entities & relationships)
db.add(["Apple announced the iPhone 15 in September 2023."])

# Search with graph reasoning
results = db.search("What products did Apple release?")

Project Structure

VectrixDB/
├── vectrixdb/
│   ├── core/           # Vector index, storage, search
│   │   ├── storage.py  # All storage backends
│   │   ├── collection.py
│   │   ├── database.py
│   │   ├── document_index.py
│   │   ├── graphrag/   # Knowledge graph
│   │   └── search/     # Search algorithms
│   ├── api/            # FastAPI server
│   ├── models/         # Embedded ONNX models
│   │   └── data/       # Bundled model files
│   ├── dashboard/      # Web UI
│   ├── easy.py         # Vectrix simple API
│   └── cli.py          # Command line
├── tests/
└── pyproject.toml

Requirements

Python 3.9+
No API keys needed (for bundled models)
Models are bundled or auto-downloaded

License

Apache 2.0

Author

Kwadwo Daddy Nyame Owusu - Boakye

GitHub: @knowusuboaky

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.7

May 26, 2026

2.1.5

May 26, 2026

2.1.3

May 26, 2026

2.1.2

May 26, 2026

2.1.1

May 26, 2026

This version

2.1.0

May 22, 2026

2.0.2

May 17, 2026

2.0.1

May 17, 2026

2.0.0

May 17, 2026

1.9.9

May 16, 2026

1.9.8

May 16, 2026

1.9.7

May 16, 2026

1.9.6

May 16, 2026

1.9.5

May 16, 2026

1.9.4

May 16, 2026

1.9.3

May 16, 2026

1.9.2

May 16, 2026

1.9.1

May 16, 2026

1.9.0

May 16, 2026

1.8.7

May 16, 2026

1.8.6

May 16, 2026

1.8.5

May 16, 2026

1.8.4

May 16, 2026

1.8.3

May 16, 2026

1.8.2

May 16, 2026

1.4.2

May 16, 2026

1.4.1

May 16, 2026

1.4.0

May 16, 2026

1.3.0

May 16, 2026

1.2.0

May 15, 2026

1.1.2

Feb 5, 2026

1.1.1

Feb 5, 2026

1.1.0

Feb 1, 2026

1.0.9

Feb 1, 2026

1.0.8

Feb 1, 2026

1.0.7

Feb 1, 2026

1.0.6

Feb 1, 2026

1.0.5

Feb 1, 2026

1.0.4

Feb 1, 2026

1.0.3

Feb 1, 2026

1.0.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectrixdb-2.1.0.tar.gz (68.5 MB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vectrixdb-2.1.0-py3-none-any.whl (68.7 MB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file vectrixdb-2.1.0.tar.gz.

File metadata

Download URL: vectrixdb-2.1.0.tar.gz
Upload date: May 22, 2026
Size: 68.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vectrixdb-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2941d518ea745162f6209af7aa409c29436c34df99747868f252f00c9bbd31a9`
MD5	`cc1281f8c957a83446ac2f3ed5eb953e`
BLAKE2b-256	`b236f3fd0c4f91c820a9eb92583472c0373551a246de7462959cc8356435c25e`

See more details on using hashes here.

File details

Details for the file vectrixdb-2.1.0-py3-none-any.whl.

File metadata

Download URL: vectrixdb-2.1.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 68.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vectrixdb-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b46076ffdc49fe793a65ea810d2eac9d127669e264591f6724622b6a8ccb5db8`
MD5	`7d723b7ed8a4c95f5396537d7cb905f5`
BLAKE2b-256	`bffa4a5efe1a113bbd06701e9889b366e96bbc488c2abc29c9f60f31e0b5f3db`

See more details on using hashes here.

vectrixdb 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VectrixDB

Features

Installation

From PyPI (Recommended)

From GitHub (Latest)

Specific Version from GitHub

From Source

Optional Dependencies

Quick Start

Search Modes

Model Selection

1. Bundled Models (Offline, No Downloads)

2. GitHub Release Models (Auto-Downloaded)

3. HuggingFace Models

Storage Backends

Memory Storage (Default)

SQLite Storage (Local Persistence)

Lakebase Storage (Databricks)

Delta Lake Storage (Databricks Unity Catalog)

CosmosDB Storage (Azure)

OpenSearch Storage (AWS)

Aurora PostgreSQL Storage (AWS)

Adaptive Schema

Document Index

Document Index with Storage Backend

Metadata & Filtering

Advanced API

Embedded Models API

REST API

API Examples

GraphRAG

Project Structure

Requirements

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes