Deploy simple vector similarity search service by DuckDB.
Project description
DVS - DuckDB Vector Similarity Search
A Python library for vector similarity search powered by DuckDB and OpenAI embeddings.
Features
- Fast Vector Search: Efficient similarity search using DuckDB's vector capabilities
- OpenAI Integration: Automatic embedding generation with OpenAI models
- Caching: Built-in embedding cache for improved performance
- Simple API: Easy-to-use Python interface
- Flexible Storage: Store documents with metadata
Installation
pip install dvs-py
Quick Start
Basic Usage
import asyncio
import tempfile
import openai_embeddings_model as oai_emb_model
from dvs import DVS
# Initialize DVS with a database file and model
dvs = DVS(
tempfile.NamedTemporaryFile(suffix=".duckdb").name,
model="text-embedding-3-small",
model_settings=oai_emb_model.ModelSettings(dimensions=1536)
)
# Add documents
dvs.add("Apple announced new iPhone features with upgraded camera and A16 chip.")
dvs.add("Microsoft updated Azure with enhanced AI tools and security features.")
# Search
results = asyncio.run(dvs.search("What are the new iPhone features?"))
print(f"Found {len(results)} results")
for point, document, score in results:
print(f"Score: {score:.3f} - {document.content[:100]}...")
Advanced Configuration
import asyncio
import pathlib
import diskcache
import openai
import openai_embeddings_model as oai_emb_model
from dvs import DVS
# Configure with custom cache and model settings
dvs = DVS(
"./my_database.duckdb",
model=oai_emb_model.OpenAIEmbeddingsModel(
model="text-embedding-3-small",
openai_client=openai.OpenAI(),
cache=diskcache.Cache("./cache/embeddings.cache"),
),
model_settings=oai_emb_model.ModelSettings(dimensions=1536),
verbose=True
)
# Add documents with metadata
from dvs.types.document import Document
doc = Document.from_content(
"Latest developments in artificial intelligence...",
name="AI Research Paper",
metadata={"author": "John Doe", "year": 2024}
)
dvs.add(doc)
# Search with more results
results = asyncio.run(dvs.search("artificial intelligence", top_k=10))
Configuration
Set your OpenAI API key:
export OPENAI_API_KEY="your-api-key"
Document Management
Adding Documents
# Add single document
dvs.add("Your document content here")
# Add multiple documents
documents = [
"First document content",
"Second document content",
"Third document content"
]
dvs.add(documents)
# Add documents with metadata
from dvs.types.document import Document
docs = [
Document.from_content("Content 1", name="Doc 1", metadata={"category": "tech"}),
Document.from_content("Content 2", name="Doc 2", metadata={"category": "science"})
]
dvs.add(docs)
Searching Documents
# Basic search
results = asyncio.run(dvs.search("your query"))
# Search with more results
results = asyncio.run(dvs.search("your query", top_k=10))
# Search with embeddings included
results = asyncio.run(dvs.search("your query", with_embedding=True))
Removing Documents
# Get document ID from search results
results = asyncio.run(dvs.search("some query"))
doc_id = results[0][1].document_id
# Remove document
dvs.remove(doc_id)
# Remove multiple documents
dvs.remove([doc_id1, doc_id2, doc_id3])
Development
Install development dependencies:
make install-all
Run tests:
make pytest
Format code:
make format-all
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
If you encounter any issues or have questions, please open an issue on GitHub.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dvs_py-1.1.0.tar.gz.
File metadata
- Download URL: dvs_py-1.1.0.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee55bc9517913368d79e62addfc54049378c626ba8c27c17ec0e81a118706111
|
|
| MD5 |
b2141bef72482b8c051bc6b06d4f94ed
|
|
| BLAKE2b-256 |
9dd2c07581fb9ba5295ad954523a1068f0f0d406f7cfd82dfaf7aafeebc2b12c
|
File details
Details for the file dvs_py-1.1.0-py3-none-any.whl.
File metadata
- Download URL: dvs_py-1.1.0-py3-none-any.whl
- Upload date:
- Size: 40.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5832b2df88fe2519d4009733bf630a7d9786e63885cd28d502c1578b8d67854
|
|
| MD5 |
14fb8cb7b37dcd8313ed8be60e19c0cc
|
|
| BLAKE2b-256 |
fbdb2bbe80325ece6ecd90dc101e1d33e96db7d85aac72911846a6641d4f2d80
|