Skip to main content

Store and query embedding vectors in Datasette tables

Project description

datasette-embeddings

PyPI Changelog Tests License

Store and query embedding vectors in Datasette tables

Installation

Install this plugin in the same environment as Datasette.

datasette install datasette-embeddings

Usage

Adds an enrichment for calculating and storing OpenAI embedding vectors for content in a table.

Users get to select the embedding model and the template (e.g. {{ title }} {{ body }}) for the columns they would like to embed.

Embeddings are stored as binary values in columns in a new table called _embeddings_NAME, where NAME is the name of the original source table.

The vectors are stored in columns that match the name of the embedding model, for example emb_text_embedding_3_large_256 for the text-embedding-3-large-256 model.

If you do not configure an OpenAI API key users will be asked for one any time they run the enrichment.

You can set an API key with plugin configuration like this:

plugins:
  datasette-embeddings:
    api_key:
      $env: OPENAI_API_KEY

Then set the OPENAI_API_KEY environment variable before you start Datasette.

This plugin adds a "Semantic search against this table" table action item for tables with embeddings, but only if the API key environment variable has been configured as the key is needed to calculate embeddings for the user's search query.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd datasette-embeddings
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

The tests use captured examples of embedding APIs. The easiest way to re-generate these is to do the following:

  • rm -rf tests/cassettes to remove the previous recordings
  • export OPENAPI_API_KEY='...' to set an OpenAI API key
  • pytest --record-mode once to recreate the cassettes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasette-embeddings-0.1a3.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

datasette_embeddings-0.1a3-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file datasette-embeddings-0.1a3.tar.gz.

File metadata

  • Download URL: datasette-embeddings-0.1a3.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for datasette-embeddings-0.1a3.tar.gz
Algorithm Hash digest
SHA256 342fb1ffc3136e009794337f5c0fa4095a1d9201bf65e26a083428b91860c4f6
MD5 101a0fbbe8c8b2d4923d0cf62fd8057d
BLAKE2b-256 961ae54cffc0906915ca3a587c0d4db1860e2a08311753093259563dd6012095

See more details on using hashes here.

File details

Details for the file datasette_embeddings-0.1a3-py3-none-any.whl.

File metadata

File hashes

Hashes for datasette_embeddings-0.1a3-py3-none-any.whl
Algorithm Hash digest
SHA256 016feffe56ffeb8908f8897e049436e80631a93a29acefa062e0d4029d7ec7ee
MD5 258d577270d2ad27e2bddaf1b60418a3
BLAKE2b-256 1813ce89669790c0ce923001645fb0e7a3f6ca94413df427ed40ea5b6cd671fc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page