llm_embed(model_id, text) SQL function for Datasette
Project description
datasette-llm-embed
Datasette plugin adding a llm_embed(model_id, text)
SQL function.
Installation
datasette install datasette-llm-embed
Usage
Adds a SQL function that can be called like this:
select llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text')
This embeds the provided text using the specified embedding model and returns a binary blob, suitable for use with plugins such as datasette-faiss.
The models need to be installed using LLM plugins such as llm-sentence-transformers.
Use llm_embed_cosine(a, b)
to calculate cosine similarity between two vector blobs:
select llm_embed_cosine(
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some text'),
llm_embed('sentence-transformers/all-mpnet-base-v2', 'This is some other text')
)
Models that require API keys
If your embedding model needs an API key - for example the ada-002
model from OpenAI - you can configure that key in metadata.yml
(or JSON) like this:
plugins:
datasette-llm-embed:
keys:
ada-002:
$env: OPENAI_API_KEY
The key here should be the full model ID of the model - not an alias.
You can then set the OPENAI_API_KEY
environment variable to the key you want to use before starting Datasette:
export OPENAI_API_KEY=sk-1234567890
Once configured, calls like this will use the API key that has been provided:
select llm_embed('ada-002', 'This is some text')
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd datasette-llm-embed
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
```bash
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datasette_llm_embed-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e05bc936f7e02fc0df287e4b560455d8b5ccaee75b1a3041d325cfb078188750 |
|
MD5 | 088ffc522e77c0e498e5fa7415cbfce1 |
|
BLAKE2b-256 | f45e96e7d20cbb88c4e5a764d69bec6aee0503bd4477e34eef8ed2ba4a1fd72c |