Skip to main content

MongoDB-compatible Python client for KeraDB - a lightweight embedded NoSQL database with vector search

Project description

KeraDB Python SDK

A MongoDB-compatible Python client for KeraDB - a lightweight, embedded NoSQL document database with advanced vector search capabilities.

License: MIT Python Version

Features

  • MongoDB-Compatible API: Familiar API for easy migration from MongoDB
  • Embedded Database: No server required, runs directly in your application
  • Vector Search: Built-in HNSW-based vector similarity search
  • Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Manhattan
  • Vector Compression: Delta and quantized compression for efficient storage
  • Zero Dependencies: Pure Python with no external dependencies for document operations
  • High Performance: Written in Rust with Python bindings via FFI
  • ACID Transactions: Full transaction support for data integrity

Installation

From PyPI (when published)

pip install keradb

From Source

  1. First, build the native KeraDB library:
cd ../../../  # Navigate to project root
cargo build --release
  1. Install the Python package:
cd sdks/python
pip install -e .

For Development

pip install -e ".[dev]"

Quick Start

Basic Document Operations

import keradb

# Connect to database (creates if doesn't exist)
client = keradb.connect("mydb.ndb")
db = client.database()
users = db.collection("users")

# Insert documents
result = users.insert_one({"name": "Alice", "age": 30, "email": "alice@example.com"})
print(f"Inserted ID: {result.inserted_id}")

# Find documents
user = users.find_one({"_id": result.inserted_id})
all_users = users.find().all()

# Update documents
users.update_one(
    {"_id": result.inserted_id},
    {"$set": {"age": 31}}
)

# Delete documents
users.delete_one({"_id": result.inserted_id})

# Close connection
client.close()

Using Context Manager

import keradb

with keradb.connect("mydb.ndb") as client:
    db = client.database()
    users = db.collection("users")
    
    users.insert_one({"name": "Bob", "age": 25})
    count = users.count_documents({})
    print(f"Total users: {count}")

Vector Search

import keradb
import random
import math

# Generate a random normalized embedding
def generate_embedding(dimensions):
    vec = [random.random() * 2 - 1 for _ in range(dimensions)]
    norm = math.sqrt(sum(x * x for x in vec))
    return [x / norm for x in vec]

# Connect and create vector collection
client = keradb.connect("vectors.ndb")

config = keradb.VectorConfig(
    dimensions=128,
    distance=keradb.Distance.COSINE,
    m=16,
    ef_construction=200,
    ef_search=50,
).with_delta_compression()

client.create_vector_collection("articles", config)

# Insert vectors with metadata
embedding = generate_embedding(128)
vector_id = client.insert_vector(
    "articles",
    embedding,
    {"title": "Machine Learning Basics", "category": "tech"}
)

# Search for similar vectors
query = generate_embedding(128)
results = client.vector_search("articles", query, k=10)

for result in results:
    print(f"[{result.rank}] {result.document.metadata['title']}")
    print(f"    Score: {result.score:.4f}")

# Get statistics
stats = client.vector_stats("articles")
print(f"Vectors: {stats.vector_count}, Memory: {stats.memory_usage:,} bytes")

client.close()

API Reference

Client

keradb.connect(path: str) -> Client

Create or open a KeraDB database.

Parameters:

  • path: Path to the database file

Returns: Client instance

Database

client.database(name: Optional[str] = None) -> Database

Get a database instance. The name parameter is optional and kept for MongoDB compatibility.

database.collection(name: str) -> Collection

Get a collection by name.

database.list_collection_names() -> List[str]

Get a list of all collection names.

Collection

Document Operations

  • insert_one(document: Dict) -> InsertOneResult
  • insert_many(documents: List[Dict]) -> InsertManyResult
  • find_one(filter: Optional[Dict] = None) -> Optional[Dict]
  • find(filter: Optional[Dict] = None) -> Cursor
  • update_one(filter: Dict, update: Dict) -> UpdateResult
  • update_many(filter: Dict, update: Dict) -> UpdateResult
  • delete_one(filter: Dict) -> DeleteResult
  • delete_many(filter: Dict) -> DeleteResult
  • count_documents(filter: Optional[Dict] = None) -> int

Supported MongoDB Operators

Update Operators:

  • $set: Set field values
  • $unset: Remove fields
  • $inc: Increment numeric values
  • $push: Append to arrays

Query Operators:

  • $eq, $ne: Equality, inequality
  • $gt, $gte, $lt, $lte: Comparison
  • $in, $nin: Array membership
  • $and, $or: Logical operators

Cursor

  • limit(n: int) -> Cursor: Limit number of results
  • skip(n: int) -> Cursor: Skip n results
  • all() -> List[Dict]: Return all documents as a list

Vector Operations

Creating Vector Collections

config = keradb.VectorConfig(
    dimensions=128,
    distance=keradb.Distance.COSINE,
    m=16,  # HNSW connections per node
    ef_construction=200,  # Build quality
    ef_search=50,  # Query quality
)

client.create_vector_collection("my_vectors", config)

Vector Configuration Options

Distance Metrics:

  • Distance.COSINE: Cosine similarity (default)
  • Distance.EUCLIDEAN: L2 distance
  • Distance.DOT_PRODUCT: Dot product
  • Distance.MANHATTAN: L1 distance

Compression:

  • with_delta_compression(): Store sparse differences
  • with_quantized_compression(): Aggressive quantization
  • No compression (default): Store full vectors

Vector CRUD

# Insert
vector_id = client.insert_vector(collection, embedding, metadata)

# Search
results = client.vector_search(collection, query_vector, k=10)

# Get by ID
doc = client.get_vector(collection, vector_id)

# Delete
client.delete_vector(collection, vector_id)

# Statistics
stats = client.vector_stats(collection)

Examples

See the examples directory for complete examples:

Run examples:

python examples/basic.py
python examples/vector_search.py

Benchmarks

Run benchmarks to compare performance:

# Install benchmark dependencies
pip install -e ".[benchmark]"

# Run all benchmarks
pytest benchmarks/ -v

# Run specific benchmarks
pytest benchmarks/benchmark_documents.py -v
pytest benchmarks/benchmark_vectors.py -v

See benchmarks/README.md for more details.

Testing

Run the test suite:

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=keradb --cov-report=html

Requirements

  • Python 3.8 or higher
  • KeraDB native library (libkeradb.so / libkeradb.dylib / keradb.dll)

Optional Dependencies

For benchmarks and development:

  • pytest >= 7.0.0
  • pytest-benchmark >= 4.0.0
  • numpy >= 1.20.0 (for faster vector generation in benchmarks)

Platform Support

  • Linux (x86_64, ARM64)
  • macOS (Intel, Apple Silicon)
  • Windows (x86_64)

Performance

KeraDB is designed for high performance:

  • Document Operations: 10,000+ inserts/sec, sub-millisecond reads
  • Vector Search: Sub-millisecond similarity search on millions of vectors
  • Memory Efficient: Delta and quantized compression reduce memory usage by 60-80%
  • Zero-Copy: Efficient FFI layer with minimal overhead

Architecture

┌─────────────────────────────┐
│    Python Application       │
└──────────┬──────────────────┘
           │
           ├─ keradb.connect()
           ├─ Collection API (MongoDB-compatible)
           └─ Vector Search API
           │
┌──────────▼──────────────────┐
│    Python FFI Layer         │
│  (ctypes bindings)          │
└──────────┬──────────────────┘
           │
┌──────────▼──────────────────┐
│   Rust Core Library         │
│  - Document storage (LSM)   │
│  - Vector search (HNSW)     │
│  - Compression              │
└─────────────────────────────┘

MongoDB Compatibility

The SDK aims to be compatible with MongoDB's API where practical:

✅ Supported:

  • Basic CRUD operations
  • Query operators ($eq, $gt, $in, etc.)
  • Update operators ($set, $inc, $push, etc.)
  • Cursor operations (limit, skip)
  • find_one, find, insert_one, insert_many
  • update_one, update_many, delete_one, delete_many

⚠️ Partial Support:

  • Aggregation pipeline (limited)
  • Indexes (automatic for performance)

❌ Not Supported:

  • GridFS
  • Transactions (planned)
  • Replication
  • Sharding

Performance

KeraDB delivers exceptional performance for embedded database operations. Here are benchmark results comparing KeraDB vs SQLite on Windows with Python 3.13:

KeraDB vs SQLite Performance Comparison

Operation KeraDB (μs) SQLite (μs) Speedup KeraDB OPS SQLite OPS
Count 1.7 116.0 68x faster 579,980 8,622
Find by ID 10.7 131.7 12x faster 93,510 7,595
Update 82.9 159.4 2x faster 12,059 6,272
Insert 99.3 5,234 53x faster 10,066 191
Find All 461.3 390.5 1.2x slower 2,168 2,561
Delete 161.1 4,801 30x faster 6,207 208
Batch Insert (100) 11,165 - - 90 -

Key Performance Insights

🚀 KeraDB Advantages:

  • 68x faster document counting
  • 53x faster single document inserts
  • 30x faster document deletion
  • 12x faster lookups by ID
  • 2x faster document updates
  • Batch operations: 90 ops/second for 100 documents

Why KeraDB is Faster:

  • Direct memory-mapped B-tree access
  • Rust-based native implementation with zero-copy operations
  • Optimized for document-oriented workloads
  • No SQL parsing overhead

📊 Use Cases:

  • Embedded applications requiring high-speed document operations
  • Real-time data processing
  • High-throughput logging and event storage
  • Applications needing MongoDB-like API with SQLite-level simplicity

Run Benchmarks Yourself

pip install -e ".[dev]"
python -m pytest benchmarks/ -v --benchmark-only --benchmark-sort=name

Full benchmark details: See dev-docs/BENCHMARK_RESULTS.md

Contributing

Contributions are welcome! Please see the main project repository for guidelines.

License

MIT License - See LICENSE file for details

Links

Changelog

Version 0.1.0

  • Initial release
  • MongoDB-compatible document operations
  • Vector search with HNSW
  • Multiple distance metrics
  • Vector compression (delta, quantized)
  • Python 3.8+ support
  • Comprehensive test suite and benchmarks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keradb-0.1.0.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keradb-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file keradb-0.1.0.tar.gz.

File metadata

  • Download URL: keradb-0.1.0.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for keradb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 558ad05fef1d258c127288649be2e9b8a99df268fa1a1a97fa821edee329bb35
MD5 8e626e858efcbe0392cb014cbce8b4c0
BLAKE2b-256 ff9db9460561b87991885c9fa090513de98edd831aee3c29cc124a49d03a160c

See more details on using hashes here.

File details

Details for the file keradb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: keradb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for keradb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5037a181a148a2fe3dd588b82180486f9addb75c458e0019b16c2eeb421215d8
MD5 29e2ea4b88ebab0f46272f679869f230
BLAKE2b-256 69a4dc889bf5eed1a5b4df219e6f28087116a9ffcf32dbc8c661c49fd222ee88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page