pgvector support for Python
Project description
pgvector-python
pgvector support for Python
Supports Django, SQLAlchemy, SQLModel, Psycopg 3, Psycopg 2, asyncpg, and Peewee
Installation
Run:
pip install pgvector
And follow the instructions for your database library:
Or check out some examples:
- Embeddings with OpenAI
- Binary embeddings with Cohere
- Sentence embeddings with SentenceTransformers
- Hybrid search with SentenceTransformers (Reciprocal Rank Fusion)
- Hybrid search with SentenceTransformers (cross-encoder)
- Sparse search with Transformers
- Late interaction search with ColBERT
- Image search with PyTorch
- Image search with perceptual hashing
- Morgan fingerprints with RDKit
- Topic modeling with Gensim
- Implicit feedback recommendations with Implicit
- Explicit feedback recommendations with Surprise
- Recommendations with LightFM
- Horizontal scaling with Citus
- Bulk loading with
COPY
Django
Create a migration to enable the extension
from pgvector.django import VectorExtension
class Migration(migrations.Migration):
operations = [
VectorExtension()
]
Add a vector field to your model
from pgvector.django import VectorField
class Item(models.Model):
embedding = VectorField(dimensions=3)
Also supports HalfVectorField
, BitField
, and SparseVectorField
Insert a vector
item = Item(embedding=[1, 2, 3])
item.save()
Get the nearest neighbors to a vector
from pgvector.django import L2Distance
Item.objects.order_by(L2Distance('embedding', [3, 1, 2]))[:5]
Also supports MaxInnerProduct
, CosineDistance
, L1Distance
, HammingDistance
, and JaccardDistance
Get the distance
Item.objects.annotate(distance=L2Distance('embedding', [3, 1, 2]))
Get items within a certain distance
Item.objects.alias(distance=L2Distance('embedding', [3, 1, 2])).filter(distance__lt=5)
Average vectors
from django.db.models import Avg
Item.objects.aggregate(Avg('embedding'))
Also supports Sum
Add an approximate index
from pgvector.django import HnswIndex, IvfflatIndex
class Item(models.Model):
class Meta:
indexes = [
HnswIndex(
name='my_index',
fields=['embedding'],
m=16,
ef_construction=64,
opclasses=['vector_l2_ops']
),
# or
IvfflatIndex(
name='my_index',
fields=['embedding'],
lists=100,
opclasses=['vector_l2_ops']
)
]
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
SQLAlchemy
Enable the extension
session.execute(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
from pgvector.sqlalchemy import Vector
class Item(Base):
embedding = mapped_column(Vector(3))
Also supports HALFVEC
, BIT
, and SPARSEVEC
Insert a vector
item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()
Get the nearest neighbors to a vector
session.scalars(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))
Also supports max_inner_product
, cosine_distance
, l1_distance
, hamming_distance
, and jaccard_distance
Get the distance
session.scalars(select(Item.embedding.l2_distance([3, 1, 2])))
Get items within a certain distance
session.scalars(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))
Average vectors
from pgvector.sqlalchemy import avg
session.scalars(select(avg(Item.embedding))).first()
Also supports sum
Add an approximate index
index = Index(
'my_index',
Item.embedding,
postgresql_using='hnsw',
postgresql_with={'m': 16, 'ef_construction': 64},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
# or
index = Index(
'my_index',
Item.embedding,
postgresql_using='ivfflat',
postgresql_with={'lists': 100},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
index.create(engine)
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
SQLModel
Enable the extension
session.exec(text('CREATE EXTENSION IF NOT EXISTS vector'))
Add a vector column
from pgvector.sqlalchemy import Vector
from sqlalchemy import Column
class Item(SQLModel, table=True):
embedding: Any = Field(sa_column=Column(Vector(3)))
Also supports HALFVEC
, BIT
, and SPARSEVEC
Insert a vector
item = Item(embedding=[1, 2, 3])
session.add(item)
session.commit()
Get the nearest neighbors to a vector
session.exec(select(Item).order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5))
Also supports max_inner_product
, cosine_distance
, l1_distance
, hamming_distance
, and jaccard_distance
Get the distance
session.exec(select(Item.embedding.l2_distance([3, 1, 2])))
Get items within a certain distance
session.exec(select(Item).filter(Item.embedding.l2_distance([3, 1, 2]) < 5))
Average vectors
from pgvector.sqlalchemy import avg
session.exec(select(avg(Item.embedding))).first()
Also supports sum
Add an approximate index
from sqlalchemy import Index
index = Index(
'my_index',
Item.embedding,
postgresql_using='hnsw',
postgresql_with={'m': 16, 'ef_construction': 64},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
# or
index = Index(
'my_index',
Item.embedding,
postgresql_using='ivfflat',
postgresql_with={'lists': 100},
postgresql_ops={'embedding': 'vector_l2_ops'}
)
index.create(engine)
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Psycopg 3
Enable the extension
conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection
from pgvector.psycopg import register_vector
register_vector(conn)
For async connections, use
from pgvector.psycopg import register_vector_async
await register_vector_async(conn)
Create a table
conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
conn.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
conn.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,)).fetchall()
Add an approximate index
conn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')
# or
conn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Psycopg 2
Enable the extension
cur = conn.cursor()
cur.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection or cursor
from pgvector.psycopg2 import register_vector
register_vector(conn)
Create a table
cur.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
cur.execute('INSERT INTO items (embedding) VALUES (%s)', (embedding,))
Get the nearest neighbors to a vector
cur.execute('SELECT * FROM items ORDER BY embedding <-> %s LIMIT 5', (embedding,))
cur.fetchall()
Add an approximate index
cur.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')
# or
cur.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
asyncpg
Enable the extension
await conn.execute('CREATE EXTENSION IF NOT EXISTS vector')
Register the vector type with your connection
from pgvector.asyncpg import register_vector
await register_vector(conn)
or your pool
async def init(conn):
await register_vector(conn)
pool = await asyncpg.create_pool(..., init=init)
Create a table
await conn.execute('CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3))')
Insert a vector
embedding = np.array([1, 2, 3])
await conn.execute('INSERT INTO items (embedding) VALUES ($1)', embedding)
Get the nearest neighbors to a vector
await conn.fetch('SELECT * FROM items ORDER BY embedding <-> $1 LIMIT 5', embedding)
Add an approximate index
await conn.execute('CREATE INDEX ON items USING hnsw (embedding vector_l2_ops)')
# or
await conn.execute('CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 100)')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
Peewee
Add a vector column
from pgvector.peewee import VectorField
class Item(BaseModel):
embedding = VectorField(dimensions=3)
Also supports HalfVectorField
, FixedBitField
, and SparseVectorField
Insert a vector
item = Item.create(embedding=[1, 2, 3])
Get the nearest neighbors to a vector
Item.select().order_by(Item.embedding.l2_distance([3, 1, 2])).limit(5)
Also supports max_inner_product
, cosine_distance
, l1_distance
, hamming_distance
, and jaccard_distance
Get the distance
Item.select(Item.embedding.l2_distance([3, 1, 2]).alias('distance'))
Get items within a certain distance
Item.select().where(Item.embedding.l2_distance([3, 1, 2]) < 5)
Average vectors
from peewee import fn
Item.select(fn.avg(Item.embedding).coerce(True)).scalar()
Also supports sum
Add an approximate index
Item.add_index('embedding vector_l2_ops', using='hnsw')
Use vector_ip_ops
for inner product and vector_cosine_ops
for cosine distance
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/pgvector/pgvector-python.git
cd pgvector-python
pip install -r requirements.txt
createdb pgvector_python_test
pytest
To run an example:
cd examples/loading
pip install -r requirements.txt
createdb pgvector_example
python3 example.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pgvector-0.3.6.tar.gz
.
File metadata
- Download URL: pgvector-0.3.6.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31d01690e6ea26cea8a633cde5f0f55f5b246d9c8292d68efdef8c22ec994ade |
|
MD5 | 1ebc119e877e9de54cb4c50e0ebc0bf0 |
|
BLAKE2b-256 | 7dd8fd6009cee3e03214667df488cdcf9609461d729968da94e4f95d6359d304 |
File details
Details for the file pgvector-0.3.6-py3-none-any.whl
.
File metadata
- Download URL: pgvector-0.3.6-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6c269b3c110ccb7496bac87202148ed18f34b390a0189c783e351062400a75a |
|
MD5 | 11cc5c106b944a189628475315d22986 |
|
BLAKE2b-256 | fb81f457d6d361e04d061bef413749a6e1ab04d98cfeec6d8abcfe40184750f3 |