Skip to main content

llm_embed(model_id, text) SQL function for Datasette

Project description

datasette-llm-embed

PyPI Changelog Tests License

Datasette plugin adding a llm_embed(model_id, text) SQL function.

Installation

datasette install datasette-llm-embed

Usage

Adds a SQL function that can be called like this:

select llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')

This embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as datasette-faiss.

The models need to be installed using LLM plugins such as llm-sentence-transformers.

Use llm_embed_cosine(a, b) to calculate cosine similarity between two vector blobs:

select llm_embed_cosine(
    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),
    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')
)

The llm_embed_decode() function can be used to decode a binary BLOB into a JSON array of floats:

select llm_embed_decode(
    llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
)

Models that require API keys

If your embedding model needs an API key - for example the ada-002 model from OpenAI - you can configure that key in metadata.yml (or JSON) like this:

plugins:
  datasette-llm-embed:
    keys:
      ada-002:
        $env: OPENAI_API_KEY

The key here should be the full model ID of the model - not an alias.

You can then set the OPENAI_API_KEY environment variable to the key you want to use before starting Datasette:

export OPENAI_API_KEY=sk-1234567890

Once configured, calls like this will use the API key that has been provided:

select llm_embed('ada-002', 'This is some text')

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd datasette-llm-embed
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'
To run the tests:
```bash
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasette-llm-embed-0.2.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

datasette_llm_embed-0.2-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file datasette-llm-embed-0.2.tar.gz.

File metadata

  • Download URL: datasette-llm-embed-0.2.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for datasette-llm-embed-0.2.tar.gz
Algorithm Hash digest
SHA256 6793b0403546188db13ebcd1d2009078bc88695418aeae4f734e45c3114e510c
MD5 debd714ffd6711b3470aecabb01013e0
BLAKE2b-256 408b26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990

See more details on using hashes here.

File details

Details for the file datasette_llm_embed-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for datasette_llm_embed-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c3474758a5d54af523c344dcf99a331ba33930e7de73d0815feee5cc352c47ff
MD5 878efbfc2ebd653efd488a7aa28b7472
BLAKE2b-256 95eaa90fcaaa81310bf05b0d3eace0a752b6b15aa7238273a72b7c373899fea5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page