Skip to main content

DuckDB vector adapter for Cognee with planned graph support

Project description

DuckDB

🧠 Cognee DuckDB Vector Adapter

Lightning fast embedded vector search for Cognee using DuckDB with planned graph support

License: Apache 2.0 Language

Powered by DuckDB

Cognee    DuckDB Docs    Examples    Support

Features

  • Zero-configuration embedded vector database - no external server required
  • Full support for vector embeddings storage and retrieval
  • High-performance vector similarity search using DuckDB's native array operations
  • Persistent or in-memory database options
  • Vector-first design with planned graph support in future releases
  • Comprehensive error handling and logging

Installation

pip install cognee-community-hybrid-adapter-duckdb

Prerequisites

None! DuckDB is an embedded database that requires no external dependencies or server setup. Just install and use.

Examples

Checkout the examples/ folder!

Basic vector search example:

uv run examples/example.py

Document processing example with generated story:

uv run examples/simple_document_example/cognee_simple_document_demo.py

This example demonstrates processing a generated story text file (generated_story.txt) along with other documents like Alice in Wonderland.

You will need an OpenAI API key to run the example scripts.

Usage

import os
import asyncio
from cognee import config, prune, add, cognify, search, SearchType

# Import the register module to enable DuckDB support
from cognee_community_hybrid_adapter_duckdb import register

async def main():
    # Configure DuckDB as vector database
    config.set_vector_db_config({
        "vector_db_provider": "duckdb",
        "vector_db_url": "my_database.db",  # File path or None for in-memory
    })
    
    # Optional: Clean previous data
    await prune.prune_data()
    await prune.prune_system()
    
    # Add your content
    await add("""
    Natural language processing (NLP) is an interdisciplinary
    subfield of computer science and information retrieval.
    """)
    
    # Process with cognee
    await cognify()
    
    # Search (use vector-based search types)
    search_results = await search(
        query_type=SearchType.CHUNKS, 
        query_text="Tell me about NLP"
    )
    
    for result in search_results:
        print("Search result:", result)

if __name__ == "__main__":
    asyncio.run(main())

Configuration

Configure DuckDB as your vector database in cognee:

  • vector_db_provider: Set to "duckdb"
  • vector_db_url: Database file path (e.g., "my_db.db"), None for in-memory, or MotherDuck URL for cloud

Database Options

# Persistent file-based database
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": "cognee_vectors.db"
})

# In-memory database (fastest, but data is lost on restart)
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": None  # or ":memory:"
})

# Absolute path to database file
config.set_vector_db_config({
    "vector_db_provider": "duckdb", 
    "vector_db_url": "/path/to/my/database.db"
})

# MotherDuck cloud database
config.set_vector_db_config({
    "vector_db_provider": "duckdb",
    "vector_db_url": "md:my_database"  # Replace with your MotherDuck database
})

Requirements

  • Python >= 3.12, <= 3.13
  • duckdb >= 1.3.2
  • cognee >= 0.2.3

Roadmap: Graph Support

This adapter is currently vector-focused with plans to add full graph database capabilities in future releases. The foundation is already in place with DuckDB's property graph extensions.

Current Status:

  • ✅ Full vector similarity search
  • ✅ Embedding storage and retrieval
  • ✅ Collection management
  • 🚧 Graph operations (coming soon)

Error Handling

The adapter includes comprehensive error handling:

  • CollectionNotFoundError: Raised when attempting operations on non-existent collections
  • InvalidValueError: Raised for invalid query parameters
  • NotImplementedError: Currently raised for graph operations (graph support coming soon)
  • Graceful handling of database connection issues and embedding errors

Performance

DuckDB provides excellent performance characteristics:

  • Embedded: No network overhead - everything runs in-process
  • Columnar: Optimized storage format for analytical workloads
  • Vectorized: SIMD operations for fast vector similarity calculations
  • ACID: Full transactional support with data consistency
  • Memory efficient: Minimal memory footprint compared to traditional databases

Troubleshooting

Common Issues

  1. File Permission Errors: Ensure write permissions to the directory containing your database file
  2. Embedding Dimension Mismatch: Verify embedding dimensions match collection configuration
  3. Collection Not Found: Always create collections before adding data points
  4. Graph Operations: Graph support is planned for future releases - currently use vector search

Debug Logging

The adapter uses Cognee's logging system. Enable debug logging to see detailed operation logs:

import logging
logging.getLogger("DuckDBAdapter").setLevel(logging.DEBUG)

Database Option Comparison

Option Pros Cons
File-based ("my_db.db") ✅ Persistent storage
✅ Survives restarts
✅ Can handle large datasets
❌ Slower I/O
❌ Disk space usage
In-memory (None) ✅ Maximum performance
✅ No disk usage
✅ Perfect for testing
❌ Data lost on restart
❌ Limited by RAM
MotherDuck ("md:database") ✅ Cloud-hosted
✅ Shared access
✅ Managed service
✅ Scalable
❌ Requires internet
❌ Potential latency
❌ MotherDuck account needed

Development

To contribute or modify the adapter:

  1. Clone the repository and cd into the packages/hybrid/duckdb folder
  2. Install dependencies: uv sync --all-extras
  3. Run tests: uv run examples/example.py
  4. Make your changes, test, and submit a PR

Extensions Used

This adapter automatically loads these DuckDB extensions:

  • duckpgq: Property graph queries (foundation for upcoming graph support)
  • vss: Vector similarity search with HNSW indexing support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognee_community_hybrid_adapter_duckdb-0.1.4.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file cognee_community_hybrid_adapter_duckdb-0.1.4.tar.gz.

File metadata

File hashes

Hashes for cognee_community_hybrid_adapter_duckdb-0.1.4.tar.gz
Algorithm Hash digest
SHA256 70ad7e15faf3b84d4700a8371ea665f36eec99799a8343fa9be3fe5cf6f47e9e
MD5 9b41711eb55e646f30b60220cb221720
BLAKE2b-256 7f3952344e79fe43d639c8dc0863f9285301e67b1d1fb0211b02db8a9aa09165

See more details on using hashes here.

File details

Details for the file cognee_community_hybrid_adapter_duckdb-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for cognee_community_hybrid_adapter_duckdb-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b5742ea1da653c477e8cb4dd846e30acd22fe0c663acd4baac03c63ae2d3ae0d
MD5 606d89b02b383d253a3e665b6803f4e3
BLAKE2b-256 519c980b16f16070868791cfd080ca392b612097296ac62c0a0b5c2693a0b837

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page