pgvector support for Python
Project description
pgvector-python
pgvector support for Python
Supports Django, SQLAlchemy, SQLModel, Psycopg 3, Psycopg 2, asyncpg, and Peewee
Installation
Run:
pip install pgvector
And follow the instructions for your database library:
Or check out some examples:
- Embeddings with OpenAI
- Sentence embeddings with SentenceTransformers
- Hybrid search with SentenceTransformers (Reciprocal Rank Fusion)
- Hybrid search with SentenceTransformers (cross-encoder)
- Image search with PyTorch
- Implicit feedback recommendations with Implicit
- Explicit feedback recommendations with Surprise
- Recommendations with LightFM
- Horizontal scaling with Citus
Django
Create a migration to enable the extension
from pgvector.django import VectorExtension
class Migration(migrations.Migration):
operations = [
VectorExtension()
]
Add a vector field to your model
from pgvector.django import VectorField
class Item(models.Model):
embedding = VectorField(dimensions=3)
Insert a vector
item = Item(embedding=[1, 2, 3])
item.save()
Get the nearest neighbors to a vector
from pgvector.django import L2Distance
Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]
Also supports MaxInnerProduct and CosineDistance
Get the distance
Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))
Get items within a certain distance
Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)
Average vectors
from django.db.models import Avg
Item.objects.aggregate(Avg('embedding'))
Also supports Sum
Add an approximate index
from pgvector.django import IvfflatIndex, HnswIndex
class Item(models.Model):
class Meta:
indexes = [
IvfflatIndex(
name='my_index',
fields=['embedding'],
lists=100,
opclasses=['vector_l2_ops']
),
# or
HnswIndex(
name='my_index',
fields=['embedding'],
m=16,
ef_construction=64,
opclasses=['vector_l2_ops']
)
]
Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance
SQLAlchemy
Enable the extension
session.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
from pgvector.sqlalchemy import Vector
class Item(Base):
embedding = mapped_column(Vector(3))
Insert a vector
item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()
Get the nearest neighbors to a vector
session.scalars(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))
Also supports max_inner_product and cosine_distance
Get the distance
session.scalars(select(Item.embedding.l2_distance([3, 1, 2])))
Get items within a certain distance
session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))
Average vectors
from sqlalchemy.sql import func
session.scalars(select(func.avg(Item.embedding))).first()
Also supports sum
Add an approximate index
index = Index('my_index', Item.embedding,
postgresql_using='ivfflat',
postgresql_with={'lists': 100},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
# or
index = Index('my_index', Item.embedding,
postgresql_using='hnsw',
postgresql_with={'m': 16, 'ef_construction': 64},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
index.create(engine)
Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance
SQLModel
Enable the extension
session.exec(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
from pgvector.sqlalchemy import Vector
from sqlalchemy import Column
class Item(SQLModel, table=True):
embedding: List[float] = Field(sa_column=Column(Vector(3)))
Insert a vector
item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()
Get the nearest neighbors to a vector
session.exec(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))
Also supports max_inner_product and cosine_distance
Psycopg 3
Enable the extension
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection
from pgvector.psycopg import register_vector
register_vector(conn)
For async connections, use
from pgvector.psycopg import register_vector_async
await register_vector_async(conn)
Create a table
conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
conn.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()
Psycopg 2
Enable the extension
cur = conn.cursor()
cur.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection or cursor
from pgvector.psycopg2 import register_vector
register_vector(conn)
Create a table
cur.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
cur.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))
cur.fetchall()
asyncpg
Enable the extension
await conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection
from pgvector.asyncpg import register_vector
await register_vector(conn)
or your pool
async def init(conn):
await register_vector(conn)
pool = await asyncpg.create_pool(..., init=init)
Create a table
await conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
await conn.execute('INSERT INTO items (embedding) VALUES ($1)', embedding)
Get the nearest neighbors to a vector
await conn.fetch('SELECT * FROM items ORDER BY embedding <-> $1 LIMIT 5', embedding)
Peewee
Add a vector column
from pgvector.peewee import VectorField
class Item(BaseModel):
embedding = VectorField(dimensions=3)
Insert a vector
item = Item.create(embedding=[1, 2, 3])
Get the nearest neighbors to a vector
Item.select().order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5)
Also supports max_inner_product and cosine_distance
Get the distance
Item.select(Item.embedding.l2_distance([3, 1, 2]).alias('distance'))
Get items within a certain distance
Item.select().where(Item.embedding.l2_distance([3, 1, 2]) < 5)
Average vectors
from peewee import fn
Item.select(fn.avg(Item.embedding)).scalar()
Also supports sum
Add an approximate index
Item.add_index('embedding vector_l2_ops', using='hnsw')
Use vector_ip_ops for inner product and vector_cosine_ops for cosine distance
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/pgvector/pgvector-python.git
cd pgvector-python
pip install -r requirements.txt
createdb pgvector_python_test
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pgvector-0.2.5-py2.py3-none-any.whl.
File metadata
- Download URL: pgvector-0.2.5-py2.py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e5e93ec4d3c45ab1fa388729d56c602f6966296e19deee8878928c6d567e41b
|
|
| MD5 |
4470d877c918f49e86ead6b71ffbf591
|
|
| BLAKE2b-256 |
29bb4686b1090a7c68fa367e981130a074dc6c1236571d914ffa6e05c882b59d
|