Skip to main content

DuckDB vector adapter for Cognee with planned graph support

Project description

DuckDB

🧠 Cognee DuckDB Vector Adapter

Lightning fast embedded vector search for Cognee using DuckDB with planned graph support

License: Apache 2.0 Language

Powered by DuckDB

Cognee    DuckDB Docs    Examples    Support

Features

  • Zero-configuration embedded vector database - no external server required
  • Full support for vector embeddings storage and retrieval
  • High-performance vector similarity search using DuckDB's native array operations
  • Persistent or in-memory database options
  • Vector-first design with planned graph support in future releases
  • Comprehensive error handling and logging

Installation

pip install cognee-community-hybrid-adapter-duckdb

Prerequisites

None! DuckDB is an embedded database that requires no external dependencies or server setup. Just install and use.

Examples

Checkout the examples/ folder!

Basic vector search example:

uv run examples/example.py

Document processing example with generated story:

uv run examples/simple_document_example/cognee_simple_document_demo.py

This example demonstrates processing a generated story text file (generated_story.txt) along with other documents like Alice in Wonderland.

You will need an OpenAI API key to run the example scripts.

Usage

import os
import asyncio
from cognee import config, prune, add, cognify, search, SearchType

# Import the register module to enable DuckDB support
from cognee_community_hybrid_adapter_duckdb import register

async def main():
    # Configure DuckDB as vector database
    config.set_vector_db_config({
        "vector_db_provider": "duckdb",
        "vector_db_url": "my_database.db",  # File path or None for in-memory
    })
    
    # Optional: Clean previous data
    await prune.prune_data()
    await prune.prune_system()
    
    # Add your content
    await add("""
    Natural language processing (NLP) is an interdisciplinary
    subfield of computer science and information retrieval.
    """)
    
    # Process with cognee
    await cognify()
    
    # Search (use vector-based search types)
    search_results = await search(
        query_type=SearchType.CHUNKS, 
        query_text="Tell me about NLP"
    )
    
    for result in search_results:
        print("Search result:", result)

if __name__ == "__main__":
    asyncio.run(main())

Configuration

Configure DuckDB as your vector database in cognee:

  • vector_db_provider: Set to "duckdb"
  • vector_db_url: Database file path (e.g., "my_db.db"), None for in-memory, or MotherDuck URL for cloud

Database Options

# Persistent file-based database
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": "cognee_vectors.db"
})

# In-memory database (fastest, but data is lost on restart)
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": None  # or ":memory:"
})

# Absolute path to database file
config.set_vector_db_config({
    "vector_db_provider": "duckdb", 
    "vector_db_url": "/path/to/my/database.db"
})

# MotherDuck cloud database
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": "md:my_database"  # Replace with your MotherDuck database
})

Requirements

  • Python >= 3.12, <= 3.13
  • duckdb >= 1.3.2
  • cognee >= 0.2.3

Roadmap: Graph Support

This adapter is currently vector-focused with plans to add full graph database capabilities in future releases. The foundation is already in place with DuckDB's property graph extensions.

Current Status:

  • ✅ Full vector similarity search
  • ✅ Embedding storage and retrieval
  • ✅ Collection management
  • 🚧 Graph operations (coming soon)

Error Handling

The adapter includes comprehensive error handling:

  • CollectionNotFoundError: Raised when attempting operations on non-existent collections
  • InvalidValueError: Raised for invalid query parameters
  • NotImplementedError: Currently raised for graph operations (graph support coming soon)
  • Graceful handling of database connection issues and embedding errors

Performance

DuckDB provides excellent performance characteristics:

  • Embedded: No network overhead - everything runs in-process
  • Columnar: Optimized storage format for analytical workloads
  • Vectorized: SIMD operations for fast vector similarity calculations
  • ACID: Full transactional support with data consistency
  • Memory efficient: Minimal memory footprint compared to traditional databases

Troubleshooting

Common Issues

  1. File Permission Errors: Ensure write permissions to the directory containing your database file
  2. Embedding Dimension Mismatch: Verify embedding dimensions match collection configuration
  3. Collection Not Found: Always create collections before adding data points
  4. Graph Operations: Graph support is planned for future releases - currently use vector search

Debug Logging

The adapter uses Cognee's logging system. Enable debug logging to see detailed operation logs:

import logging
logging.getLogger("DuckDBAdapter").setLevel(logging.DEBUG)

Database Option Comparison

Option Pros Cons
File-based ("my_db.db") ✅ Persistent storage
✅ Survives restarts
✅ Can handle large datasets
❌ Slower I/O
❌ Disk space usage
In-memory (None) ✅ Maximum performance
✅ No disk usage
✅ Perfect for testing
❌ Data lost on restart
❌ Limited by RAM
MotherDuck ("md:database") ✅ Cloud-hosted
✅ Shared access
✅ Managed service
✅ Scalable
❌ Requires internet
❌ Potential latency
❌ MotherDuck account needed

Development

To contribute or modify the adapter:

  1. Clone the repository and cd into the packages/hybrid/duckdb folder
  2. Install dependencies: uv sync --all-extras
  3. Run tests: uv run examples/example.py
  4. Make your changes, test, and submit a PR

Extensions Used

This adapter automatically loads these DuckDB extensions:

  • duckpgq: Property graph queries (foundation for upcoming graph support)
  • vss: Vector similarity search with HNSW indexing support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file cognee_community_hybrid_adapter_duckdb-0.1.2.tar.gz.

File metadata

File hashes

Hashes for cognee_community_hybrid_adapter_duckdb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5c9c549593b08b878eeffb6326801383243a503245b98a975382fe0d568cad5f
MD5 358a88087de9a40df3d4f21b27c92612
BLAKE2b-256 384174a63eb2bc73875d546b5b2471081982d3523e030d097aee70c89894151f

See more details on using hashes here.

File details

Details for the file cognee_community_hybrid_adapter_duckdb-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for cognee_community_hybrid_adapter_duckdb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f85eb0847b947f9e39bf7241e0508457c966cf80c774a2c3f6761754ea4db069
MD5 d5b4b85d6b553010f23da692cbdf0dee
BLAKE2b-256 703ec3688cab32314b29b010b768ba3613fea847ef422527923a6c31d6311a2d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page