Skip to main content

DuckDB vector adapter for Cognee with planned graph support

Project description

DuckDB

🧠 Cognee DuckDB Vector Adapter

Lightning fast embedded vector search for Cognee using DuckDB with planned graph support

License: Apache 2.0 Language

Powered by DuckDB

Cognee    DuckDB Docs    Examples    Support

Features

  • Zero-configuration embedded vector database - no external server required
  • Full support for vector embeddings storage and retrieval
  • High-performance vector similarity search using DuckDB's native array operations
  • Persistent or in-memory database options
  • Vector-first design with planned graph support in future releases
  • Comprehensive error handling and logging

Installation

pip install cognee-community-hybrid-adapter-duckdb

Prerequisites

None! DuckDB is an embedded database that requires no external dependencies or server setup. Just install and use.

Examples

Checkout the examples/ folder!

Basic vector search example:

uv run examples/example.py

Document processing example with generated story:

uv run examples/simple_document_example/cognee_simple_document_demo.py

This example demonstrates processing a generated story text file (generated_story.txt) along with other documents like Alice in Wonderland.

You will need an OpenAI API key to run the example scripts.

Usage

import os
import asyncio
from cognee import config, prune, add, cognify, search, SearchType

# Import the register module to enable DuckDB support
from cognee_community_hybrid_adapter_duckdb import register

async def main():
    # Configure DuckDB as vector database
    config.set_vector_db_config({
        "vector_db_provider": "duckdb",
        "vector_db_url": "my_database.db",  # File path or None for in-memory
    })
    
    # Optional: Clean previous data
    await prune.prune_data()
    await prune.prune_system()
    
    # Add your content
    await add("""
    Natural language processing (NLP) is an interdisciplinary
    subfield of computer science and information retrieval.
    """)
    
    # Process with cognee
    await cognify()
    
    # Search (use vector-based search types)
    search_results = await search(
        query_type=SearchType.CHUNKS, 
        query_text="Tell me about NLP"
    )
    
    for result in search_results:
        print("Search result:", result)

if __name__ == "__main__":
    asyncio.run(main())

Configuration

Configure DuckDB as your vector database in cognee:

  • vector_db_provider: Set to "duckdb"
  • vector_db_url: Database file path (e.g., "my_db.db"), None for in-memory, or MotherDuck URL for cloud

Database Options

# Persistent file-based database
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": "cognee_vectors.db"
})

# In-memory database (fastest, but data is lost on restart)
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": None  # or ":memory:"
})

# Absolute path to database file
config.set_vector_db_config({
    "vector_db_provider": "duckdb", 
    "vector_db_url": "/path/to/my/database.db"
})

# MotherDuck cloud database
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": "md:my_database"  # Replace with your MotherDuck database
})

Requirements

  • Python >= 3.12, <= 3.13
  • duckdb >= 1.3.2
  • cognee >= 0.2.3

Roadmap: Graph Support

This adapter is currently vector-focused with plans to add full graph database capabilities in future releases. The foundation is already in place with DuckDB's property graph extensions.

Current Status:

  • ✅ Full vector similarity search
  • ✅ Embedding storage and retrieval
  • ✅ Collection management
  • 🚧 Graph operations (coming soon)

Error Handling

The adapter includes comprehensive error handling:

  • CollectionNotFoundError: Raised when attempting operations on non-existent collections
  • InvalidValueError: Raised for invalid query parameters
  • NotImplementedError: Currently raised for graph operations (graph support coming soon)
  • Graceful handling of database connection issues and embedding errors

Performance

DuckDB provides excellent performance characteristics:

  • Embedded: No network overhead - everything runs in-process
  • Columnar: Optimized storage format for analytical workloads
  • Vectorized: SIMD operations for fast vector similarity calculations
  • ACID: Full transactional support with data consistency
  • Memory efficient: Minimal memory footprint compared to traditional databases

Troubleshooting

Common Issues

  1. File Permission Errors: Ensure write permissions to the directory containing your database file
  2. Embedding Dimension Mismatch: Verify embedding dimensions match collection configuration
  3. Collection Not Found: Always create collections before adding data points
  4. Graph Operations: Graph support is planned for future releases - currently use vector search

Debug Logging

The adapter uses Cognee's logging system. Enable debug logging to see detailed operation logs:

import logging
logging.getLogger("DuckDBAdapter").setLevel(logging.DEBUG)

Database Option Comparison

Option Pros Cons
File-based ("my_db.db") ✅ Persistent storage
✅ Survives restarts
✅ Can handle large datasets
❌ Slower I/O
❌ Disk space usage
In-memory (None) ✅ Maximum performance
✅ No disk usage
✅ Perfect for testing
❌ Data lost on restart
❌ Limited by RAM
MotherDuck ("md:database") ✅ Cloud-hosted
✅ Shared access
✅ Managed service
✅ Scalable
❌ Requires internet
❌ Potential latency
❌ MotherDuck account needed

Development

To contribute or modify the adapter:

  1. Clone the repository and cd into the packages/hybrid/duckdb folder
  2. Install dependencies: uv sync --all-extras
  3. Run tests: uv run examples/example.py
  4. Make your changes, test, and submit a PR

Extensions Used

This adapter automatically loads these DuckDB extensions:

  • duckpgq: Property graph queries (foundation for upcoming graph support)
  • vss: Vector similarity search with HNSW indexing support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognee_community_hybrid_adapter_duckdb-0.1.0.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file cognee_community_hybrid_adapter_duckdb-0.1.0.tar.gz.

File metadata

File hashes

Hashes for cognee_community_hybrid_adapter_duckdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2bc314d54d265da66dfb89c1e44a9246d07006cbd1f413f831ea98fe66bc2ea1
MD5 ef28bf34452939b398212d417e6a3a11
BLAKE2b-256 59f44a346f7bf41145411ddba58ee4215e4399854038ebfc20d0f0a72f2d99c3

See more details on using hashes here.

File details

Details for the file cognee_community_hybrid_adapter_duckdb-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cognee_community_hybrid_adapter_duckdb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d16acf6631c0a70f48e084e4e45ec49d1e6f6810a26f9179d7a3f6f8a343852
MD5 d2e56597be4c56ec597b67dcbffdb11a
BLAKE2b-256 10664e4d0b2eeccb854fc02008ac16b98716a0934e4254630ad7fd784e711215

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page