Skip to main content

pgvector support for Python

Project description

pgvector-python

pgvector support for Python

Supports Django, SQLAlchemy, SQLModel, Psycopg 3, Psycopg 2, and asyncpg

Build Status

Installation

Run:

pip install pgvector

And follow the instructions for your database library:

Or check out some examples:

Django

Create the extension

from pgvector.django import VectorExtension

class Migration(migrations.Migration):
    operations = [
        VectorExtension()
    ]

Add a vector field

from pgvector.django import VectorField

class Item(models.Model):
    embedding = VectorField(dimensions=3)

Insert a vector

item = Item(embedding=[1, 2, 3])
item.save()

Get the nearest neighbors to a vector

from pgvector.django import L2Distance

Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]

Also supports MaxInnerProduct and CosineDistance

Get the distance

Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))

Get items within a certain distance

Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)

Add an approximate index

from pgvector.django import IvfflatIndex

class Item(models.Model):
    class Meta:
        indexes = [
            IvfflatIndex(
                name='my_index',
                fields=['embedding'],
                lists=100,
                opclasses=['vector_l2_ops']
            )
        ]

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

SQLAlchemy

Add a vector column

from pgvector.sqlalchemy import Vector

class Item(Base):
    embedding = mapped_column(Vector(3))

Insert a vector

item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()

Get the nearest neighbors to a vector

session.scalars(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))

Also supports max_inner_product and cosine_distance

Get the distance

session.scalars(select(Item.embedding.l2_distance([3, 1, 2])))

Get items within a certain distance

session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))

Add an approximate index

index = Index('my_index', Item.embedding,
    postgresql_using='ivfflat',
    postgresql_with={'lists': 100},
    postgresql_ops={'embedding': 'vector_l2_ops'}
)
index.create(engine)

Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance

SQLModel

Add a vector column

from pgvector.sqlalchemy import Vector
from sqlalchemy import Column

class Item(SQLModel, table=True):
    embedding: List[float] = Field(sa_column=Column(Vector(3)))

Insert a vector

item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()

Get the nearest neighbors to a vector

session.exec(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))

Also supports max_inner_product and cosine_distance

Psycopg 3

Register the vector type with your connection

from pgvector.psycopg import register_vector

register_vector(conn)

For async connections, use

from pgvector.psycopg import register_vector_async

await register_vector_async(conn)

Insert a vector

embedding = np.array([1, 2, 3])
conn.execute('INSERT INTO item (embedding) VALUES (%s)', (embedding,))

Get the nearest neighbors to a vector

conn.execute('SELECT * FROM item ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()

Psycopg 2

Register the vector type with your connection or cursor

from pgvector.psycopg2 import register_vector

register_vector(conn)

Insert a vector

embedding = np.array([1, 2, 3])
cur.execute('INSERT INTO item (embedding) VALUES (%s)', (embedding,))

Get the nearest neighbors to a vector

cur.execute('SELECT * FROM item ORDER BY embedding <-> %s LIMIT 5', (embedding,))
cur.fetchall()

asyncpg

Register the vector type with your connection

from pgvector.asyncpg import register_vector

await register_vector(conn)

Insert a vector

embedding = np.array([1, 2, 3])
await conn.execute('INSERT INTO item (embedding) VALUES ($1)', embedding)

Get the nearest neighbors to a vector

await conn.fetch('SELECT * FROM item ORDER BY embedding <-> $1 LIMIT 5', embedding)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/pgvector/pgvector-python.git
cd pgvector-python
pip install -r requirements.txt
createdb pgvector_python_test
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pgvector-0.1.7-py2.py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page