Store and query embedding vectors in Datasette tables
Project description
datasette-embeddings
Store and query embedding vectors in Datasette tables
Installation
Install this plugin in the same environment as Datasette.
datasette install datasette-embeddings
Usage
Adds an enrichment for calculating and storing OpenAI embedding vectors for content in a table.
Users get to select the embedding model and the template (e.g. {{ title }} {{ body }}
) for the columns they would like to embed.
Embeddings are stored as binary values in columns in a new table called _embeddings_NAME
, where NAME
is the name of the original source table.
The vectors are stored in columns that match the name of the embedding model, for example emb_text_embedding_3_large_256
for the text-embedding-3-large-256
model.
If you do not configure an OpenAI API key users will be asked for one any time they run the enrichment.
You can set an API key with plugin configuration like this:
plugins:
datasette-embeddings:
api_key:
$env: OPENAI_API_KEY
Then set the OPENAI_API_KEY
environment variable before you start Datasette.
This plugin adds a "Semantic search against this table" table action item for tables with embeddings, but only if the API key environment variable has been configured as the key is needed to calculate embeddings for the user's search query.
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-embeddings
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest
The tests use captured examples of embedding APIs. The easiest way to re-generate these is to do the following:
rm -rf tests/cassettes
to remove the previous recordingsexport OPENAPI_API_KEY='...'
to set an OpenAI API keypytest --record-mode once
to recreate the cassettes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datasette-embeddings-0.1a3.tar.gz
.
File metadata
- Download URL: datasette-embeddings-0.1a3.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 342fb1ffc3136e009794337f5c0fa4095a1d9201bf65e26a083428b91860c4f6 |
|
MD5 | 101a0fbbe8c8b2d4923d0cf62fd8057d |
|
BLAKE2b-256 | 961ae54cffc0906915ca3a587c0d4db1860e2a08311753093259563dd6012095 |
File details
Details for the file datasette_embeddings-0.1a3-py3-none-any.whl
.
File metadata
- Download URL: datasette_embeddings-0.1a3-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 016feffe56ffeb8908f8897e049436e80631a93a29acefa062e0d4029d7ec7ee |
|
MD5 | 258d577270d2ad27e2bddaf1b60418a3 |
|
BLAKE2b-256 | 1813ce89669790c0ce923001645fb0e7a3f6ca94413df427ed40ea5b6cd671fc |