A multi-user vector database for efficient document storage and retrieval
Project description
RecallDB: Multi-User Vector Database
RecallDB is a multi-user vector database that allows independent document access for each user. It supports vector embeddings for semantic search and efficient document retrieval.
Features
- Multi-user support: Each user has their own isolated document space
- Collection-based organization: Group related documents into collections
- Vector search: Use semantic search to find relevant documents
- Metadata filtering: Search and filter using document metadata
- PyArrow storage: Efficient data storage using PyArrow's Arrow IPC format
- Embedding models: Support for various embedding models
- Parallel processing: Asynchronous/batched embeddings for faster processing
- Chunking support: Break large documents into manageable chunks
- Flexible API: Simple interface for document management and retrieval
Installation
pip install recalldb
Quick Start
from recall_db import RecallDB
# Create embedding function
def embedding_function(text):
# Your embedding logic here
# Should return a normalized numpy array
pass
# Initialize RecallDB
db = RecallDB(
embedding_function=embedding_function,
storage_path="./recalldb_data"
)
# Add documents for a user
user_id = "user123"
db.add_document(
user_id=user_id,
text="This is a sample document about artificial intelligence.",
collection="ai_docs",
metadata={"topic": "AI", "type": "introduction"}
)
# Search for documents
results = db.search(
user_id=user_id,
query="artificial intelligence concepts",
collections=["ai_docs"],
top_k=5
)
# Print results
for result in results:
print(f"Document ID: {result['id']}")
print(f"Content: {result['content'][:100]}...")
print(f"Score: {result['score']}")
print(f"Metadata: {result['metadata']}")
print("---")
Core Concepts
Users and Data Isolation
RecallDB maintains strict data isolation between users. Each user has their own separate database instance, ensuring that:
- User A cannot access documents from User B
- Search queries are scoped to the requesting user's documents only
- Each user's data can be independently managed and persisted
Collections
Collections in RecallDB are similar to tables in traditional databases:
- Documents are organized into named collections (e.g., "Articles", "Products", "Notes")
- Collections help organize documents logically
- Search can be restricted to specific collections
Documents and Metadata
Each document in RecallDB consists of:
- Text content that is semantically embedded
- Optional metadata for filtering and organization
- A unique document ID (auto-generated or user-provided)
Performance
In benchmarks using the GIST-1M dataset, RecallDB demonstrated excellent performance:
- Recall@1: 0.99 with ~6ms latency
- Consistent sub-7ms query times
- See benchmark results for details
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recalldb-0.1.0.tar.gz.
File metadata
- Download URL: recalldb-0.1.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90d1f17805d235d952832b86d08a2a7d6536c2c77b06b6667f3aa3d984552a31
|
|
| MD5 |
20dd1238f71efe16e440b927d29bc240
|
|
| BLAKE2b-256 |
3a5befdb272b85829e34d0a34cb7167d29e2533a94da1eeca7dca951b0e68375
|
File details
Details for the file recalldb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: recalldb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ed0d293fa77cda2b6e425982ec7a406cca1516589bd3cc100da46c1c5d64296
|
|
| MD5 |
953c23c83283b6d56a55282b38e005aa
|
|
| BLAKE2b-256 |
aefaba390e3994ae489efd05c68f3273770749e0e27b21cafd8a91919cafde9c
|