DuckDB vector adapter for Cognee with planned graph support
Project description
🧠 Cognee DuckDB Vector Adapter
Features
- Zero-configuration embedded vector database - no external server required
- Full support for vector embeddings storage and retrieval
- High-performance vector similarity search using DuckDB's native array operations
- Persistent or in-memory database options
- Vector-first design with planned graph support in future releases
- Comprehensive error handling and logging
Installation
pip install cognee-community-hybrid-adapter-duckdb
Prerequisites
None! DuckDB is an embedded database that requires no external dependencies or server setup. Just install and use.
Examples
Checkout the examples/ folder!
Basic vector search example:
uv run examples/example.py
Document processing example with generated story:
uv run examples/simple_document_example/cognee_simple_document_demo.py
This example demonstrates processing a generated story text file (generated_story.txt) along with other documents like Alice in Wonderland.
You will need an OpenAI API key to run the example scripts.
Usage
import os
import asyncio
from cognee import config, prune, add, cognify, search, SearchType
# Import the register module to enable DuckDB support
from cognee_community_hybrid_adapter_duckdb import register
async def main():
# Configure DuckDB as vector database
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "my_database.db", # File path or None for in-memory
})
# Optional: Clean previous data
await prune.prune_data()
await prune.prune_system()
# Add your content
await add("""
Natural language processing (NLP) is an interdisciplinary
subfield of computer science and information retrieval.
""")
# Process with cognee
await cognify()
# Search (use vector-based search types)
search_results = await search(
query_type=SearchType.CHUNKS,
query_text="Tell me about NLP"
)
for result in search_results:
print("Search result:", result)
if __name__ == "__main__":
asyncio.run(main())
Configuration
Configure DuckDB as your vector database in cognee:
vector_db_provider: Set to "duckdb"vector_db_url: Database file path (e.g., "my_db.db"),Nonefor in-memory, or MotherDuck URL for cloud
Database Options
# Persistent file-based database
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "cognee_vectors.db"
})
# In-memory database (fastest, but data is lost on restart)
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": None # or ":memory:"
})
# Absolute path to database file
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "/path/to/my/database.db"
})
# MotherDuck cloud database
config.set_vector_db_config({
"vector_db_provider": "duckdb",
"vector_db_url": "md:my_database" # Replace with your MotherDuck database
})
Requirements
- Python >= 3.12, <= 3.13
- duckdb >= 1.3.2
- cognee >= 0.2.3
Roadmap: Graph Support
This adapter is currently vector-focused with plans to add full graph database capabilities in future releases. The foundation is already in place with DuckDB's property graph extensions.
Current Status:
- ✅ Full vector similarity search
- ✅ Embedding storage and retrieval
- ✅ Collection management
- 🚧 Graph operations (coming soon)
Error Handling
The adapter includes comprehensive error handling:
CollectionNotFoundError: Raised when attempting operations on non-existent collectionsInvalidValueError: Raised for invalid query parametersNotImplementedError: Currently raised for graph operations (graph support coming soon)- Graceful handling of database connection issues and embedding errors
Performance
DuckDB provides excellent performance characteristics:
- Embedded: No network overhead - everything runs in-process
- Columnar: Optimized storage format for analytical workloads
- Vectorized: SIMD operations for fast vector similarity calculations
- ACID: Full transactional support with data consistency
- Memory efficient: Minimal memory footprint compared to traditional databases
Troubleshooting
Common Issues
- File Permission Errors: Ensure write permissions to the directory containing your database file
- Embedding Dimension Mismatch: Verify embedding dimensions match collection configuration
- Collection Not Found: Always create collections before adding data points
- Graph Operations: Graph support is planned for future releases - currently use vector search
Debug Logging
The adapter uses Cognee's logging system. Enable debug logging to see detailed operation logs:
import logging
logging.getLogger("DuckDBAdapter").setLevel(logging.DEBUG)
Database Option Comparison
| Option | Pros | Cons |
|---|---|---|
File-based ("my_db.db") |
✅ Persistent storage ✅ Survives restarts ✅ Can handle large datasets |
❌ Slower I/O ❌ Disk space usage |
In-memory (None) |
✅ Maximum performance ✅ No disk usage ✅ Perfect for testing |
❌ Data lost on restart ❌ Limited by RAM |
MotherDuck ("md:database") |
✅ Cloud-hosted ✅ Shared access ✅ Managed service ✅ Scalable |
❌ Requires internet ❌ Potential latency ❌ MotherDuck account needed |
Development
To contribute or modify the adapter:
- Clone the repository and
cdinto thepackages/hybrid/duckdbfolder - Install dependencies:
uv sync --all-extras - Run tests:
uv run examples/example.py - Make your changes, test, and submit a PR
Extensions Used
This adapter automatically loads these DuckDB extensions:
- duckpgq: Property graph queries (foundation for upcoming graph support)
- vss: Vector similarity search with HNSW indexing support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cognee_community_hybrid_adapter_duckdb-0.1.2.tar.gz.
File metadata
- Download URL: cognee_community_hybrid_adapter_duckdb-0.1.2.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.11 Darwin/24.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c9c549593b08b878eeffb6326801383243a503245b98a975382fe0d568cad5f
|
|
| MD5 |
358a88087de9a40df3d4f21b27c92612
|
|
| BLAKE2b-256 |
384174a63eb2bc73875d546b5b2471081982d3523e030d097aee70c89894151f
|
File details
Details for the file cognee_community_hybrid_adapter_duckdb-0.1.2-py3-none-any.whl.
File metadata
- Download URL: cognee_community_hybrid_adapter_duckdb-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.12.11 Darwin/24.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f85eb0847b947f9e39bf7241e0508457c966cf80c774a2c3f6761754ea4db069
|
|
| MD5 |
d5b4b85d6b553010f23da692cbdf0dee
|
|
| BLAKE2b-256 |
703ec3688cab32314b29b010b768ba3613fea847ef422527923a6c31d6311a2d
|