Skip to main content

VectorChord Python SDK

Project description

Python Check Pages GitHub License PyPI - Version Discord Blog

Turn PostgreSQL into your search engine in a Pythonic way.

Installation

pip install vechord

The related Docker images can be found in VectorChord Suite.

  • DockerHub: tensorchord/vchord-suite:pg17-20250620
  • GitHub Packages: ghcr.io/tensorchord/vchord-suite:pg17-20250620

Features

  • vector search with RaBitQ (powered by VectorChord)
  • multivec search with WARP (powered by VectorChord)
  • keyword search with BM25 score (powered by VectorChord-bm25)
  • reduce boilerplate code by taking full advantage of the Python type hint
  • provide decorator to inject the data from/to the database
  • guarantee the data consistency with the PostgreSQL transaction
  • auto-generate the web service
  • provide common tools like (can also use any other libraries):
    • Augmenter for contextual retrieval
    • Chunker to segment the text into chunks
    • Embedding to generate the embedding from the text
    • Evaluator to evaluate the search results with NDCG, MAP, Recall, etc.
    • Extractor to extract the content from PDF, HTML, etc.
    • EntityRecognizer to extract the entities and relations from the text
    • Reranker for hybrid search

Examples

  • simple.py: for people that are familiar with specialized vector database APIs
  • beir.py: the most flexible way to use the library (loading, indexing, querying and evaluation)
  • web.py: build a web application with from the defined tables and pipeline
  • essay.py: extract the content from Paul Graham's essays and evaluate the search results from LLM generated queries
  • contextual.py: contextual retrieval example with local PDF
  • anthropic.py: contextual retrieval with the Anthropic's Tutorial example
  • hybrid.py: hybrid search that rerank the results from vector search with keyword search
  • graph.py: graph-like entity-relation retrieval
  • dynamic.py: run arbitrary pipelines with dynamic steps

User Guide

For more details, check our API reference and User Guide.

Define the table

from typing import Annotated, Optional
from vechord.spec import Table, Vector, PrimaryKeyAutoIncrease, ForeignKey

# use 3072 dimension vector
DenseVector = Vector[3072]

class Document(Table, kw_only=True):
    uid: Optional[PrimaryKeyAutoIncrease] = None  # auto-increase id, no need to set
    link: str = ""
    text: str

class Chunk(Table, kw_only=True)
    uid: Optional[PrimaryKeyAutoIncrease] = None
    doc_id: Annotated[int, ForeignKey[Document.uid]]  # reference to `Document.uid` on DELETE CASCADE
    vector: DenseVector  # this comes with a default vector index
    text: str

Inject with decorator

import httpx
from vechord.registry import VechordRegistry
from vechord.extract import SimpleExtractor
from vechord.embedding import GeminiDenseEmbedding

vr = VechordRegistry(namespace="test", url="postgresql://postgres:postgres@127.0.0.1:5432/", tables=[Document, Chunk])
extractor = SimpleExtractor()
emb = GeminiDenseEmbedding()

@vr.inject(output=Document)  # dump to the `Document` table
# function parameters are free to define since `inject(input=...)` is not set
async def add_document(url: str) -> Document:  # the return type is `Document`
    async with httpx.AsyncClient() as client:
        resp = await client.get(url)
        text = extractor.extract_html(resp.text)
        return Document(link=url, text=text)

@vr.inject(input=Document, output=Chunk)  # load from the `Document` table and dump to the `Chunk` table
# function parameters are the attributes of the `Document` table, only defined attributes
# will be loaded from the `Document` table
async def add_chunk(uid: int, text: str) -> list[Chunk]:  # the return type is `list[Chunk]`
    chunks = text.split("\n")
    return [Chunk(doc_id=uid, vector=await emb.vectorize_chunk(t), text=t) for t in chunks]

async def main():
    async with vr, emb:  # handle the connection with context manager
        await add_document("https://paulgraham.com/best.html")  # add arguments as usual
        await add_chunk()  # omit the arguments since the `input` is will be loaded from the `Document` table
        await vr.insert(Document(text="hello world"))  # insert manually
        print(await vr.select_by(Document.partial_init()))  # select all the columns from table `Document`

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Transaction

To guarantee the data consistency, users can use the VechordRegistry.run method to run multiple functions in a transaction.

In this transaction, all the functions will only load the data from the database that is inserted in the current transaction. So users can focus on the data processing part without worrying about which part of data has not been processed yet.

pipeline = vr.create_pipeline([add_document, add_chunk])
await pipeline.run("https://paulgraham.com/best.html")  # only accept the arguments for the first function

Search

print(await vr.search_by_vector(Chunk, await emb.vectorize_query("startup")))

Customized Index Configuration

from vechord.spec import VectorIndex

class Chunk(Table, kw_only=True):
    uid: Optional[PrimaryKeyAutoIncrease] = None
    vector: Annotated[DenseVector, VectorIndex(distance="cos", lists=128)]
    text: str

Access the underlying database cursor directly

await vr.client.get_cursor().execute("SET vchordrq.probes = 100;")

HTTP Service

This creates a WSGI application that can be served by any WSGI server.

Open the OpenAPI Endpoint to check the API documentation.

import uvicorn

uvicorn.run(create_web_app(vr))

Development

docker run --rm -d --name vdb -e POSTGRES_PASSWORD=postgres -p 5432:5432 ghcr.io/tensorchord/vchord-suite:pg17-20250620
envd up
# inside the envd env, sync all the dependencies
make sync
# format the code
make format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vechord-0.2.2.tar.gz (53.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vechord-0.2.2-py3-none-any.whl (55.0 kB view details)

Uploaded Python 3

File details

Details for the file vechord-0.2.2.tar.gz.

File metadata

  • Download URL: vechord-0.2.2.tar.gz
  • Upload date:
  • Size: 53.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.6

File hashes

Hashes for vechord-0.2.2.tar.gz
Algorithm Hash digest
SHA256 b0e669d237e7efa674dd6509c6e564aa580dc1c37f7f49416b0f7aedee7d0b70
MD5 5ece976f696c800a923e5ce1837eac34
BLAKE2b-256 eedda12630607f4c8f507a2a9ea7dbc3beb72c62f0e47cfa8e621c246fd95a68

See more details on using hashes here.

File details

Details for the file vechord-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: vechord-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 55.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.6

File hashes

Hashes for vechord-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6eb4ef1c3373ea82e9004f3c6036e3e02247f1f14f5fcd398188f0af08c3c96b
MD5 ba1ea91b32a886cb5e00062e5bee5889
BLAKE2b-256 fab1af69605cede9dc9f847c143cbe54070bfcf43b3dd2556c3e0be8d61659a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page