Skip to main content

Deploy simple vector similarity search service by DuckDB.

Project description

DVS - DuckDB Vector Similarity Search

PyPI version Python 3.11+ License: MIT

A Python library for vector similarity search powered by DuckDB and OpenAI embeddings.

Features

  • Fast Vector Search: Efficient similarity search using DuckDB's vector capabilities
  • OpenAI Integration: Automatic embedding generation with OpenAI models
  • Caching: Built-in embedding cache for improved performance
  • Simple API: Easy-to-use Python interface
  • Flexible Storage: Store documents with metadata

Installation

pip install dvs-py

Quick Start

Basic Usage

import asyncio
import tempfile
import openai_embeddings_model as oai_emb_model
from dvs import DVS

# Initialize DVS with a database file and model
dvs = DVS(
    tempfile.NamedTemporaryFile(suffix=".duckdb").name,
    model="text-embedding-3-small",
    model_settings=oai_emb_model.ModelSettings(dimensions=1536)
)

# Add documents
dvs.add("Apple announced new iPhone features with upgraded camera and A16 chip.")
dvs.add("Microsoft updated Azure with enhanced AI tools and security features.")

# Search
results = asyncio.run(dvs.search("What are the new iPhone features?"))
print(f"Found {len(results)} results")
for point, document, score in results:
    print(f"Score: {score:.3f} - {document.content[:100]}...")

Advanced Configuration

import asyncio
import pathlib
import diskcache
import openai
import openai_embeddings_model as oai_emb_model
from dvs import DVS

# Configure with custom cache and model settings
dvs = DVS(
    "./my_database.duckdb",
    model=oai_emb_model.OpenAIEmbeddingsModel(
        model="text-embedding-3-small",
        openai_client=openai.OpenAI(),
        cache=diskcache.Cache("./cache/embeddings.cache"),
    ),
    model_settings=oai_emb_model.ModelSettings(dimensions=1536),
    verbose=True
)

# Add documents with metadata
from dvs.types.document import Document

doc = Document.from_content(
    "Latest developments in artificial intelligence...",
    name="AI Research Paper",
    metadata={"author": "John Doe", "year": 2024}
)
dvs.add(doc)

# Search with more results
results = asyncio.run(dvs.search("artificial intelligence", top_k=10))

Configuration

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key"

Document Management

Adding Documents

# Add single document
dvs.add("Your document content here")

# Add multiple documents
documents = [
    "First document content",
    "Second document content",
    "Third document content"
]
dvs.add(documents)

# Add documents with metadata
from dvs.types.document import Document

docs = [
    Document.from_content("Content 1", name="Doc 1", metadata={"category": "tech"}),
    Document.from_content("Content 2", name="Doc 2", metadata={"category": "science"})
]
dvs.add(docs)

Searching Documents

# Basic search
results = asyncio.run(dvs.search("your query"))

# Search with more results
results = asyncio.run(dvs.search("your query", top_k=10))

# Search with embeddings included
results = asyncio.run(dvs.search("your query", with_embedding=True))

Removing Documents

# Get document ID from search results
results = asyncio.run(dvs.search("some query"))
doc_id = results[0][1].document_id

# Remove document
dvs.remove(doc_id)

# Remove multiple documents
dvs.remove([doc_id1, doc_id2, doc_id3])

Development

Install development dependencies:

make install-all

Run tests:

make pytest

Format code:

make format-all

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

If you encounter any issues or have questions, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dvs_py-1.1.0.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dvs_py-1.1.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file dvs_py-1.1.0.tar.gz.

File metadata

  • Download URL: dvs_py-1.1.0.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0

File hashes

Hashes for dvs_py-1.1.0.tar.gz
Algorithm Hash digest
SHA256 ee55bc9517913368d79e62addfc54049378c626ba8c27c17ec0e81a118706111
MD5 b2141bef72482b8c051bc6b06d4f94ed
BLAKE2b-256 9dd2c07581fb9ba5295ad954523a1068f0f0d406f7cfd82dfaf7aafeebc2b12c

See more details on using hashes here.

File details

Details for the file dvs_py-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: dvs_py-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.13 Darwin/24.5.0

File hashes

Hashes for dvs_py-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5832b2df88fe2519d4009733bf630a7d9786e63885cd28d502c1578b8d67854
MD5 14fb8cb7b37dcd8313ed8be60e19c0cc
BLAKE2b-256 fbdb2bbe80325ece6ecd90dc101e1d33e96db7d85aac72911846a6641d4f2d80

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page