Skip to main content

A multi-user vector database for efficient document storage and retrieval

Project description

RecallDB: Multi-User Vector Database

RecallDB is a multi-user vector database that allows independent document access for each user. It supports vector embeddings for semantic search and efficient document retrieval.

Features

  • Multi-user support: Each user has their own isolated document space
  • Collection-based organization: Group related documents into collections
  • Vector search: Use semantic search to find relevant documents
  • Metadata filtering: Search and filter using document metadata
  • PyArrow storage: Efficient data storage using PyArrow's Arrow IPC format
  • Embedding models: Support for various embedding models
  • Parallel processing: Asynchronous/batched embeddings for faster processing
  • Chunking support: Break large documents into manageable chunks
  • Flexible API: Simple interface for document management and retrieval

Installation

pip install recalldb

Quick Start

from recall_db import RecallDB

# Create embedding function
def embedding_function(text):
    # Your embedding logic here
    # Should return a normalized numpy array
    pass

# Initialize RecallDB
db = RecallDB(
    embedding_function=embedding_function,
    storage_path="./recalldb_data"
)

# Add documents for a user
user_id = "user123"
db.add_document(
    user_id=user_id,
    text="This is a sample document about artificial intelligence.",
    collection="ai_docs",
    metadata={"topic": "AI", "type": "introduction"}
)

# Search for documents
results = db.search(
    user_id=user_id,
    query="artificial intelligence concepts",
    collections=["ai_docs"],
    top_k=5
)

# Print results
for result in results:
    print(f"Document ID: {result['id']}")
    print(f"Content: {result['content'][:100]}...")
    print(f"Score: {result['score']}")
    print(f"Metadata: {result['metadata']}")
    print("---")

Core Concepts

Users and Data Isolation

RecallDB maintains strict data isolation between users. Each user has their own separate database instance, ensuring that:

  • User A cannot access documents from User B
  • Search queries are scoped to the requesting user's documents only
  • Each user's data can be independently managed and persisted

Collections

Collections in RecallDB are similar to tables in traditional databases:

  • Documents are organized into named collections (e.g., "Articles", "Products", "Notes")
  • Collections help organize documents logically
  • Search can be restricted to specific collections

Documents and Metadata

Each document in RecallDB consists of:

  • Text content that is semantically embedded
  • Optional metadata for filtering and organization
  • A unique document ID (auto-generated or user-provided)

Performance

In benchmarks using the GIST-1M dataset, RecallDB demonstrated excellent performance:

  • Recall@1: 0.99 with ~6ms latency
  • Consistent sub-7ms query times
  • See benchmark results for details

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recalldb-0.1.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recalldb-0.1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file recalldb-0.1.0.tar.gz.

File metadata

  • Download URL: recalldb-0.1.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for recalldb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 90d1f17805d235d952832b86d08a2a7d6536c2c77b06b6667f3aa3d984552a31
MD5 20dd1238f71efe16e440b927d29bc240
BLAKE2b-256 3a5befdb272b85829e34d0a34cb7167d29e2533a94da1eeca7dca951b0e68375

See more details on using hashes here.

File details

Details for the file recalldb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: recalldb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for recalldb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ed0d293fa77cda2b6e425982ec7a406cca1516589bd3cc100da46c1c5d64296
MD5 953c23c83283b6d56a55282b38e005aa
BLAKE2b-256 aefaba390e3994ae489efd05c68f3273770749e0e27b21cafd8a91919cafde9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page