llm_embed(model_id, text) SQL function for Datasette
Project description
datasette-llm-embed
Datasette plugin adding a llm_embed(model_id, text)
SQL function.
Installation
datasette install datasette-llm-embed
Usage
Adds a SQL function that can be called like this:
select llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
This embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as datasette-faiss.
The models need to be installed using LLM plugins such as llm-sentence-transformers.
Use llm_embed_cosine(a, b)
to calculate cosine similarity between two vector blobs:
select llm_embed_cosine(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')
)
The llm_embed_decode()
function can be used to decode a binary BLOB into a JSON array of floats:
select llm_embed_decode(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
)
Models that require API keys
If your embedding model needs an API key - for example the ada-002
model from OpenAI - you can configure that key in metadata.yml
(or JSON) like this:
plugins:
datasette-llm-embed:
keys:
ada-002:
$env: OPENAI_API_KEY
The key here should be the full model ID of the model - not an alias.
You can then set the OPENAI_API_KEY
environment variable to the key you want to use before starting Datasette:
export OPENAI_API_KEY=sk-1234567890
Once configured, calls like this will use the API key that has been provided:
select llm_embed('ada-002', 'This is some text')
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-llm-embed
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
```bash
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datasette-llm-embed-0.2.tar.gz
.
File metadata
- Download URL: datasette-llm-embed-0.2.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6793b0403546188db13ebcd1d2009078bc88695418aeae4f734e45c3114e510c |
|
MD5 | debd714ffd6711b3470aecabb01013e0 |
|
BLAKE2b-256 | 408b26e2d501ddd920eb6a0bfa0ac76499178f1eeaea7bad608fe835eaca2990 |
File details
Details for the file datasette_llm_embed-0.2-py3-none-any.whl
.
File metadata
- Download URL: datasette_llm_embed-0.2-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3474758a5d54af523c344dcf99a331ba33930e7de73d0815feee5cc352c47ff |
|
MD5 | 878efbfc2ebd653efd488a7aa28b7472 |
|
BLAKE2b-256 | 95eaa90fcaaa81310bf05b0d3eace0a752b6b15aa7238273a72b7c373899fea5 |