Skip to main content

A lightweight vector database with incremental inserts, automatic exact-to-ANN switching, and explicit storage management. Add vectors anytime without rebuilding the index

Project description

๐Ÿฆ Vinkra

Vinkra Logo

Vector Incremental Nano Kit โ€” Reconfigurated Automatically

โ€œVector DB that self-organize. Auto-switch, Auto-tune, Auto-scale.โ€

Python Version PyPI CodeFactor Coverage Status Stability Tests License: MIT

[!WARNING] This project is currently in pre-alpha.


Table of Contents

Table of Contents generated with DocToc


๐Ÿค” So What's vinkra Anyway? (And Why Should You Care?)

Most vector databases force a trade-off: you either over-engineer for small datasets or hit a performance cliff as you scale. Youโ€™re left babysitting indices, manually tuning parameters, and praying your hardware can keep up.

Vink eliminates the guesswork. It automatically switches from Exact Search (for 100% precision) to ANN (for massive scale with IVF-PQ) based on dataset size and runtime latency. Whether you are running on a mobile device or a high-end server, Vink adapts its optimization strategy to your hardware and data distribution.

Feature Why it's awesome
โž• Incremental Inserts Add vectors anytime. Your index grows with your data, not against it.
๐Ÿ“Ÿ Hardware-Aware Auto-Switch It figures out when to ditch exact search and switch to ANN based on latency prediction.
โš™๏ธ Self-Tuning Engine Background reconfiguration keeps clusters fresh as your data evolves.
๐ŸŽฏ Production-Ready Search Filtered searches, soft deletes, compact, dual-metric (Euclidean + cosine).
๐Ÿ’พ Explicit Storage Disk or memory โ€” you control where your data lives.

Unlike enterprise solutions (Milvus, Pinecone) that require complex Docker or cloud setup, Vink runs entirely local โ€” zero dependencies beyond pip install.

And that's just the start - there's plenty more to explore!


๐Ÿ“ฆ Installation

First ensure that you have the necessary system dependencies installed.

  • Linux only: Required for building rii

    # Debian/Ubuntu
    sudo apt-get install python3-dev
    
    # RedHat/Fedora/CentOS
    sudo dnf install python3-devel -y
    
    # CentOS 7 and older
    sudo yum install python3-devel
    
  • Android/Termux:

    pkg install -y tur-repo
    pkg install python-scipy
    

The Quick & Easy Way

The simplest way to get started is with pip:

pip install vinkra

The From-Source Way

Prefer building from source? You can clone and install manually for full control:

git clone https://github.com/speedyk-005/vinkra.git
cd vinkra
pip install -e .

(But honestly, the pip way is usually way easier!)


โœ… Proof It Works

Run the demo to see auto-switch in action:

# Install and run anywhere
curl -O https://raw.githubusercontent.com/speedyk-005/vinkra/main/demo_poc.py
python demo_poc.py

The demo uses:

  • switch_latency_ms=120 (vs 300 default) โ€” triggers switch sooner
  • dim=128
  • Batches of 10,000 vectors

The switch happens when latency exceeds switch_latency_ms. A Power Law model (y = a * x^b) continuously tunes itself from actual search latencies to predict future performance. New vectors are buffered during the switch with zero downtime.

Results vary by hardware and system load โ€” faster machines switch later, and running other programs will affect timing.

Example output:

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Vectors โ”ƒ      Strategy      โ”ƒ Avg Query (ms) โ”ƒ Insert Time (s) โ”ƒ     Status     โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ 10,000  โ”‚    exact_search    โ”‚     32.486     โ”‚      0.806      โ”‚  Exact Search  โ”‚
โ”‚ 20,000  โ”‚    exact_search    โ”‚     79.690     โ”‚      0.729      โ”‚  Exact Search  โ”‚
โ”‚ 30,000  โ”‚    exact_search    โ”‚    107.419     โ”‚      0.720      โ”‚  Exact Search  โ”‚
โ”‚ 40,000  โ”‚    exact_search    โ”‚    188.063     โ”‚      0.771      โ”‚  โš™ Building ANN โ”‚
โ”‚ 50,000  โ”‚ approximate_search โ”‚     0.000      โ”‚     10.051      โ”‚  โœ“ ANN Active  โ”‚
โ”‚ 60,000  โ”‚ approximate_search โ”‚    155.239     โ”‚      1.323      โ”‚  โœ“ ANN Active  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœ“ ANN switch successfully triggered!

๐Ÿš€ Usage

Initialization (VinkraDB API)

from vinkra import VinkraDB

# Create a database with 128-dimensional vectors
db = VinkraDB("./data", dim=128)

# Or use in-memory mode (no persistence)
db = VinkraDB(":memory:", dim=128)

# With custom settings
db = VinkraDB(
    dir_path="./data",
    dim=384,
    metric="euclidean",       # or "cosine" (default: euclidean)
    force_exact=False,         # or True to disable ANN (default: False)
    ann_config=None,           # ANNConfig for PQ/OPQ (default: auto-generated)
    switch_latency_ms=300,    # ms threshold for ANN switch (default: 300)
    embedding_callback=None,  # fn to generate embeddings from content
    overwrite=False,          # overwrite existing index (default: False)
    verbose=False              # enable verbose output (default: False)
)

AnnConfig (API)

Want custom ANN settings?

from vinkra import AnnConfig

config = AnnConfig(
    num_subspaces=16,        # number of sub-vectors (default: 32)
    quantizer="pq",           # "pq" or "opq" (default: pq)
    codebook_size=128,        # centroids per subspace (default: 256)
)
db = VinkraDB("./data", dim=384, ann_config=config)

# print all available options:
AnnConfig.help()

Add (API)

Records need:

  • content (required): text to store
  • embedding (required if no callback): list of floats or numpy array, shape (d,) or (1, d)
  • id (optional): valid UUIDv7
  • metadata (optional): dict of key-value pairs

Provide embeddings directly or use a callback to generate them on the fly.

With embedding callback

db = VinkraDB("./data", dim=384, embedding_callback=my_embedding_fn)

# Just provide content โ€” embeddings generated automatically
db.add([
    {"content": "Hello world", "metadata": {"source": "doc1"}},
    {"content": "Another text"},
])

Without callback

Provide embeddings directly:

db.add([
    {"content": "Hello world", "embedding": [0.1] * 384, "metadata": {"source": "doc1"}},
    {"content": "Another text", "embedding": [0.2] * 384}}
)]

Search (API)

Results include:

  • id: vector ID
  • content: text content
  • distance: similarity score (lower is closer for euclidean)
  • metadata: key-value pairs
  • embedding: (only if include_vectors=True)

Without filters

# Basic search
results = db.search(query_vec=[0.1] * 384, top_k=5)

# Include embeddings in results
results = db.search(query_vec=[0.1] * 384, include_vectors=True)

With filters

Filter syntax supports ==, !=, >, <, >=, <= with strings, numbers, and booleans. More operators coming in future updates.

results = db.search(
    query_vec=[0.1] * 384,
    top_k=10,
    filters=["source == 'doc1'", "score >= 50", "new == True"]
)

Delete

Soft deletion (API)

Soft-delete vectors by ID without rebuilding the index โ€” fast and efficient.

# IDs come from search results or when adding
db.soft_delete(["0192a5b4-7f3c-7d6e-9a1b-2c3d4e5f6a7b", "0192a5b4-7f3c-7d6e-9a1b-2c3d4e5f6a7c"])

Compaction (API)

Actually remove soft-deleted records and reclaim storage:

db.compact()

[!WARNING] Can take 20-200+ seconds with approximate strategy depending on data size. Run during maintenance windows or off-peak hours. If not enough vectors remain to retrain the codec, rebuild is skipped.

Stats (API)

Get database statistics:

stats = db.stats()
# {
#     "version": "...",
#     "dimension": 128,
#     "metric": "euclidean",
#     "strategy": "exact_search",
#     "last_saved_at": "...",
#     "last_deleted_at": "...",
#     "active_count": 1000,
#     "deleted_count": 5
# }

๐Ÿšจ Exceptions (API)

Something go wrong?

Exception When it hits
InvalidInputError Bad data or invalid params
VectorDimensionError Embedding dim mismatch
InvalidIdError Malformed UUIDv7
FilterError Bad filter syntax

๐Ÿ—บ Features & Roadmap

  • Incremental Inserts
  • Hardware-Aware Auto-Switch
  • Soft deletes + compact
  • Save/Load
  • Filter DSL
    • basic filters: Quick Comparison
    • Complex Filters: Content Matching, Null Checks, date/time literals, ...
  • Recovery: recover soft-deleted vectors
  • Collections: Multi-collection support for managing multiple indices
  • CLI - command-line interface
  • REST API: HTTP API for remote vector operations
  • Integrations: LangChain, LlamaIndex, and other integrations

๐Ÿ”ง Core Dependencies

  • rii โ€” C++ ANN library with pybind11 bindings (IVF-PQ index storage)
  • nanopq โ€” Pure Python PQ encoding/decoding
  • scipy โ€” Scientific computing (distance calculations)
  • numpy โ€” Numerical computing
  • SQLite โ€” Metadata storage (content, embeddings, metadata), filtering queries

๐Ÿค Contributing

Bug fixes, features, docs โ€” all welcome. Check out CONTRIBUTING.md for the full details.


๐Ÿ“œ License

Check out the LICENSE file for all the details.

MIT License. Use freely, modify boldly, and credit appropriately! (We're not that legendary... yet ๐Ÿ˜‰)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vinkra-0.1.0a1.tar.gz (42.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vinkra-0.1.0a1-py3-none-any.whl (37.8 kB view details)

Uploaded Python 3

File details

Details for the file vinkra-0.1.0a1.tar.gz.

File metadata

  • Download URL: vinkra-0.1.0a1.tar.gz
  • Upload date:
  • Size: 42.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for vinkra-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 8d7b85f7afd70bd573e477d9a03ab62114845d140c2c52997d4f5feea4eb6282
MD5 e3d84cb1b8a2f0a6ef4c9884ccc65da2
BLAKE2b-256 514ce65c239c3724d57df40c47c958b22b7138d545bc844a5ead5d686e7df11c

See more details on using hashes here.

File details

Details for the file vinkra-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: vinkra-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for vinkra-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e0f63efdc3d0a860ac8f7de3153e113d9a309987bad349018aa222b81b1f5b3
MD5 724bbd1698dd04018be43c5bd75c9eac
BLAKE2b-256 952b7a9100c99ce3e78f8fee709410ce3bc6143a94511790ae7701b9d351e588

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page