Skip to main content

pgvector support for Python

Project description

pgvector-python

pgvector support for Python

Supports Django, SQLAlchemy, SQLModel, Psycopg 3, Psycopg 2, asyncpg, and Peewee

Build Status

Installation

Run:

pip install pgvector

And follow the instructions for your database library:

Or check out some examples:

Django

Create a migration to enable the extension

from pgvector.django import VectorExtension

class Migration(migrations.Migration):
    operations = [
        VectorExtension()
    ]

Add a vector field to your model

from pgvector.django import VectorField

class Item(models.Model):
    embedding = VectorField(dimensions=3)

Insert a vector

item = Item(embedding=[1, 2, 3])
item.save()

Get the nearest neighbors to a vector

from pgvector.django import L2Distance

Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]

Also supports MaxInnerProduct and CosineDistance

Get the distance

Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))

Get items within a certain distance

Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)

Average vectors

from django.db.models import Avg

Item.objects.aggregate(Avg('embedding'))

Also supports Sum

Add an approximate index

from pgvector.django import IvfflatIndex, HnswIndex

class Item(models.Model):
    class Meta:
        indexes = [
            IvfflatIndex(
                name='my_index',
                fields=['embedding'],
                lists=100,
                opclasses=['vector_l2_ops']
            ),
            # or
            HnswIndex(
                name='my_index',
                fields=['embedding'],
                m=16,
                ef_construction=64,
                opclasses=['vector_l2_ops']
            )
        ]

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

SQLAlchemy

Enable the extension

session.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))

Add a vector column

from pgvector.sqlalchemy import Vector

class Item(Base):
    embedding = mapped_column(Vector(3))

Insert a vector

item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()

Get the nearest neighbors to a vector

session.scalars(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))

Also supports max_inner_product and cosine_distance

Get the distance

session.scalars(select(Item.embedding.l2_distance([3, 1, 2])))

Get items within a certain distance

session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))

Average vectors

from sqlalchemy.sql import func

session.scalars(select(func.avg(Item.embedding))).first()

Also supports sum

Add an approximate index

index = Index('my_index', Item.embedding,
    postgresql_using='ivfflat',
    postgresql_with={'lists': 100},
    postgresql_ops={'embedding': 'vector_l2_ops'}
)
# or
index = Index('my_index', Item.embedding,
    postgresql_using='hnsw',
    postgresql_with={'m': 16, 'ef_construction': 64},
    postgresql_ops={'embedding': 'vector_l2_ops'}
)

index.create(engine)

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

SQLModel

Enable the extension

session.exec(text('CREATE EXTENSION IF NOT EXISTS vector'))

Add a vector column

from pgvector.sqlalchemy import Vector
from sqlalchemy import Column

class Item(SQLModel, table=True):
    embedding: List[float] = Field(sa_column=Column(Vector(3)))

Insert a vector

item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()

Get the nearest neighbors to a vector

session.exec(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))

Also supports max_inner_product and cosine_distance

Psycopg 3

Enable the extension

conn.execute('CREATE EXTENSION IF NOT EXISTS vector')

Register the vector type with your connection

from pgvector.psycopg import register_vector

register_vector(conn)

For async connections, use

from pgvector.psycopg import register_vector_async

await register_vector_async(conn)

Create a table

conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')

Insert a vector

embedding = np.array([1, 2, 3])
conn.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))

Get the nearest neighbors to a vector

conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()

Psycopg 2

Enable the extension

cur = conn.cursor()
cur.execute('CREATE EXTENSION IF NOT EXISTS vector')

Register the vector type with your connection or cursor

from pgvector.psycopg2 import register_vector

register_vector(conn)

Create a table

cur.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')

Insert a vector

embedding = np.array([1, 2, 3])
cur.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))

Get the nearest neighbors to a vector

cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))
cur.fetchall()

asyncpg

Enable the extension

await conn.execute('CREATE EXTENSION IF NOT EXISTS vector')

Register the vector type with your connection

from pgvector.asyncpg import register_vector

await register_vector(conn)

or your pool

async def init(conn):
    await register_vector(conn)

pool = await asyncpg.create_pool(..., init=init)

Create a table

await conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')

Insert a vector

embedding = np.array([1, 2, 3])
await conn.execute('INSERT INTO items (embedding) VALUES ($1)', embedding)

Get the nearest neighbors to a vector

await conn.fetch('SELECT * FROM items ORDER BY embedding <-> $1 LIMIT 5', embedding)

Peewee

Add a vector column

from pgvector.peewee import VectorField

class Item(BaseModel):
    embedding = VectorField(dimensions=3)

Insert a vector

item = Item.create(embedding=[1, 2, 3])

Get the nearest neighbors to a vector

Item.select().order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5)

Also supports max_inner_product and cosine_distance

Get the distance

Item.select(Item.embedding.l2_distance([3, 1, 2]).alias('distance'))

Get items within a certain distance

Item.select().where(Item.embedding.l2_distance([3, 1, 2]) < 5)

Average vectors

from peewee import fn

Item.select(fn.avg(Item.embedding)).scalar()

Also supports sum

Add an approximate index

Item.add_index('embedding vector_l2_ops', using='hnsw')

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/pgvector/pgvector-python.git
cd pgvector-python
pip install -r requirements.txt
createdb pgvector_python_test
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgvector-0.2.5-py2.py3-none-any.whl (9.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pgvector-0.2.5-py2.py3-none-any.whl.

File metadata

  • Download URL: pgvector-0.2.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for pgvector-0.2.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5e5e93ec4d3c45ab1fa388729d56c602f6966296e19deee8878928c6d567e41b
MD5 4470d877c918f49e86ead6b71ffbf591
BLAKE2b-256 29bb4686b1090a7c68fa367e981130a074dc6c1236571d914ffa6e05c882b59d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page