MongoDB-compatible Python client for KeraDB - a lightweight embedded NoSQL database with vector search
Project description
KeraDB Python SDK
A MongoDB-compatible Python client for KeraDB - a lightweight, embedded NoSQL document database with advanced vector search capabilities.
Features
- MongoDB-Compatible API: Familiar API for easy migration from MongoDB
- Embedded Database: No server required, runs directly in your application
- Vector Search: Built-in HNSW-based vector similarity search
- Multiple Distance Metrics: Cosine, Euclidean, Dot Product, Manhattan
- Vector Compression: Delta and quantized compression for efficient storage
- Zero Dependencies: Pure Python with no external dependencies for document operations
- High Performance: Written in Rust with Python bindings via FFI
- ACID Transactions: Full transaction support for data integrity
Installation
From PyPI (when published)
pip install keradb
From Source
- First, build the native KeraDB library:
cd ../../../ # Navigate to project root
cargo build --release
- Install the Python package:
cd sdks/python
pip install -e .
For Development
pip install -e ".[dev]"
Quick Start
Basic Document Operations
import keradb
# Connect to database (creates if doesn't exist)
client = keradb.connect("mydb.ndb")
db = client.database()
users = db.collection("users")
# Insert documents
result = users.insert_one({"name": "Alice", "age": 30, "email": "alice@example.com"})
print(f"Inserted ID: {result.inserted_id}")
# Find documents
user = users.find_one({"_id": result.inserted_id})
all_users = users.find().all()
# Update documents
users.update_one(
{"_id": result.inserted_id},
{"$set": {"age": 31}}
)
# Delete documents
users.delete_one({"_id": result.inserted_id})
# Close connection
client.close()
Using Context Manager
import keradb
with keradb.connect("mydb.ndb") as client:
db = client.database()
users = db.collection("users")
users.insert_one({"name": "Bob", "age": 25})
count = users.count_documents({})
print(f"Total users: {count}")
Vector Search
import keradb
import random
import math
# Generate a random normalized embedding
def generate_embedding(dimensions):
vec = [random.random() * 2 - 1 for _ in range(dimensions)]
norm = math.sqrt(sum(x * x for x in vec))
return [x / norm for x in vec]
# Connect and create vector collection
client = keradb.connect("vectors.ndb")
config = keradb.VectorConfig(
dimensions=128,
distance=keradb.Distance.COSINE,
m=16,
ef_construction=200,
ef_search=50,
).with_delta_compression()
client.create_vector_collection("articles", config)
# Insert vectors with metadata
embedding = generate_embedding(128)
vector_id = client.insert_vector(
"articles",
embedding,
{"title": "Machine Learning Basics", "category": "tech"}
)
# Search for similar vectors
query = generate_embedding(128)
results = client.vector_search("articles", query, k=10)
for result in results:
print(f"[{result.rank}] {result.document.metadata['title']}")
print(f" Score: {result.score:.4f}")
# Get statistics
stats = client.vector_stats("articles")
print(f"Vectors: {stats.vector_count}, Memory: {stats.memory_usage:,} bytes")
client.close()
API Reference
Client
keradb.connect(path: str) -> Client
Create or open a KeraDB database.
Parameters:
path: Path to the database file
Returns: Client instance
Database
client.database(name: Optional[str] = None) -> Database
Get a database instance. The name parameter is optional and kept for MongoDB compatibility.
database.collection(name: str) -> Collection
Get a collection by name.
database.list_collection_names() -> List[str]
Get a list of all collection names.
Collection
Document Operations
insert_one(document: Dict) -> InsertOneResultinsert_many(documents: List[Dict]) -> InsertManyResultfind_one(filter: Optional[Dict] = None) -> Optional[Dict]find(filter: Optional[Dict] = None) -> Cursorupdate_one(filter: Dict, update: Dict) -> UpdateResultupdate_many(filter: Dict, update: Dict) -> UpdateResultdelete_one(filter: Dict) -> DeleteResultdelete_many(filter: Dict) -> DeleteResultcount_documents(filter: Optional[Dict] = None) -> int
Supported MongoDB Operators
Update Operators:
$set: Set field values$unset: Remove fields$inc: Increment numeric values$push: Append to arrays
Query Operators:
$eq,$ne: Equality, inequality$gt,$gte,$lt,$lte: Comparison$in,$nin: Array membership$and,$or: Logical operators
Cursor
limit(n: int) -> Cursor: Limit number of resultsskip(n: int) -> Cursor: Skip n resultsall() -> List[Dict]: Return all documents as a list
Vector Operations
Creating Vector Collections
config = keradb.VectorConfig(
dimensions=128,
distance=keradb.Distance.COSINE,
m=16, # HNSW connections per node
ef_construction=200, # Build quality
ef_search=50, # Query quality
)
client.create_vector_collection("my_vectors", config)
Vector Configuration Options
Distance Metrics:
Distance.COSINE: Cosine similarity (default)Distance.EUCLIDEAN: L2 distanceDistance.DOT_PRODUCT: Dot productDistance.MANHATTAN: L1 distance
Compression:
with_delta_compression(): Store sparse differenceswith_quantized_compression(): Aggressive quantization- No compression (default): Store full vectors
Vector CRUD
# Insert
vector_id = client.insert_vector(collection, embedding, metadata)
# Search
results = client.vector_search(collection, query_vector, k=10)
# Get by ID
doc = client.get_vector(collection, vector_id)
# Delete
client.delete_vector(collection, vector_id)
# Statistics
stats = client.vector_stats(collection)
Examples
See the examples directory for complete examples:
- basic.py - Basic document operations
- vector_search.py - Vector search demo
Run examples:
python examples/basic.py
python examples/vector_search.py
Benchmarks
Run benchmarks to compare performance:
# Install benchmark dependencies
pip install -e ".[benchmark]"
# Run all benchmarks
pytest benchmarks/ -v
# Run specific benchmarks
pytest benchmarks/benchmark_documents.py -v
pytest benchmarks/benchmark_vectors.py -v
See benchmarks/README.md for more details.
Testing
Run the test suite:
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=keradb --cov-report=html
Requirements
- Python 3.8 or higher
- KeraDB native library (libkeradb.so / libkeradb.dylib / keradb.dll)
Optional Dependencies
For benchmarks and development:
- pytest >= 7.0.0
- pytest-benchmark >= 4.0.0
- numpy >= 1.20.0 (for faster vector generation in benchmarks)
Platform Support
- Linux (x86_64, ARM64)
- macOS (Intel, Apple Silicon)
- Windows (x86_64)
Performance
KeraDB is designed for high performance:
- Document Operations: 10,000+ inserts/sec, sub-millisecond reads
- Vector Search: Sub-millisecond similarity search on millions of vectors
- Memory Efficient: Delta and quantized compression reduce memory usage by 60-80%
- Zero-Copy: Efficient FFI layer with minimal overhead
Architecture
┌─────────────────────────────┐
│ Python Application │
└──────────┬──────────────────┘
│
├─ keradb.connect()
├─ Collection API (MongoDB-compatible)
└─ Vector Search API
│
┌──────────▼──────────────────┐
│ Python FFI Layer │
│ (ctypes bindings) │
└──────────┬──────────────────┘
│
┌──────────▼──────────────────┐
│ Rust Core Library │
│ - Document storage (LSM) │
│ - Vector search (HNSW) │
│ - Compression │
└─────────────────────────────┘
MongoDB Compatibility
The SDK aims to be compatible with MongoDB's API where practical:
✅ Supported:
- Basic CRUD operations
- Query operators ($eq, $gt, $in, etc.)
- Update operators ($set, $inc, $push, etc.)
- Cursor operations (limit, skip)
- find_one, find, insert_one, insert_many
- update_one, update_many, delete_one, delete_many
⚠️ Partial Support:
- Aggregation pipeline (limited)
- Indexes (automatic for performance)
❌ Not Supported:
- GridFS
- Transactions (planned)
- Replication
- Sharding
Performance
KeraDB delivers exceptional performance for embedded database operations. Here are benchmark results comparing KeraDB vs SQLite on Windows with Python 3.13:
KeraDB vs SQLite Performance Comparison
| Operation | KeraDB (μs) | SQLite (μs) | Speedup | KeraDB OPS | SQLite OPS |
|---|---|---|---|---|---|
| Count | 1.7 | 116.0 | 68x faster | 579,980 | 8,622 |
| Find by ID | 10.7 | 131.7 | 12x faster | 93,510 | 7,595 |
| Update | 82.9 | 159.4 | 2x faster | 12,059 | 6,272 |
| Insert | 99.3 | 5,234 | 53x faster | 10,066 | 191 |
| Find All | 461.3 | 390.5 | 1.2x slower | 2,168 | 2,561 |
| Delete | 161.1 | 4,801 | 30x faster | 6,207 | 208 |
| Batch Insert (100) | 11,165 | - | - | 90 | - |
Key Performance Insights
🚀 KeraDB Advantages:
- 68x faster document counting
- 53x faster single document inserts
- 30x faster document deletion
- 12x faster lookups by ID
- 2x faster document updates
- Batch operations: 90 ops/second for 100 documents
⚡ Why KeraDB is Faster:
- Direct memory-mapped B-tree access
- Rust-based native implementation with zero-copy operations
- Optimized for document-oriented workloads
- No SQL parsing overhead
📊 Use Cases:
- Embedded applications requiring high-speed document operations
- Real-time data processing
- High-throughput logging and event storage
- Applications needing MongoDB-like API with SQLite-level simplicity
Run Benchmarks Yourself
pip install -e ".[dev]"
python -m pytest benchmarks/ -v --benchmark-only --benchmark-sort=name
Full benchmark details: See dev-docs/BENCHMARK_RESULTS.md
Contributing
Contributions are welcome! Please see the main project repository for guidelines.
License
MIT License - See LICENSE file for details
Links
- Documentation: https://keradb.github.io
- Repository: https://github.com/keradb/keradb
- Issues: https://github.com/keradb/keradb/issues
Changelog
Version 0.1.0
- Initial release
- MongoDB-compatible document operations
- Vector search with HNSW
- Multiple distance metrics
- Vector compression (delta, quantized)
- Python 3.8+ support
- Comprehensive test suite and benchmarks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keradb-0.1.0.tar.gz.
File metadata
- Download URL: keradb-0.1.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
558ad05fef1d258c127288649be2e9b8a99df268fa1a1a97fa821edee329bb35
|
|
| MD5 |
8e626e858efcbe0392cb014cbce8b4c0
|
|
| BLAKE2b-256 |
ff9db9460561b87991885c9fa090513de98edd831aee3c29cc124a49d03a160c
|
File details
Details for the file keradb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: keradb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5037a181a148a2fe3dd588b82180486f9addb75c458e0019b16c2eeb421215d8
|
|
| MD5 |
29e2ea4b88ebab0f46272f679869f230
|
|
| BLAKE2b-256 |
69a4dc889bf5eed1a5b4df219e6f28087116a9ffcf32dbc8c661c49fd222ee88
|