Skip to main content

Save OpenAI API results to a SQLite database

Project description

openai-to-sqlite

PyPI Changelog Tests License

Save OpenAI API results to a SQLite database

This tool is under active development. It is not yet ready for production use.

Installation

Install this tool using pip:

pip install openai-to-sqlite

Usage

For help, run:

openai-to-sqlite --help

You can also use:

python -m openai_to_sqlite --help

Configuration

You will need an OpenAI API key to use this tool.

You can create one at https://beta.openai.com/account/api-keys

You can then either set the API key as an environment variable:

export OPENAI_API_KEY=sk-...

Or pass it to each command using the --token sk-... option.

Embeddings

The first command supported by this tool is embeddings:

openai-to-sqlite embeddings --help

This command can be fed a CSV (or JSON or TSV) file full of content, and it will use the OpenAI API to generate embeddings for each row.

The first column of the CSV file will be treated as the content ID. Any other columns will be concatenated together and used as the text to be embedded.

These embeddings will then be saved as binary blobs in the embeddings table of a SQLite database.

Given a CSV file like this:

id,content
1,This is a test
2,This is another test

Embeddings can be stored like so:

openai-to-sqlite embeddings embeddings.db data.csv --csv

The --csv flag tells the tool that the input file is a CSV file. Without this it will attempt to guess.

The resulting schema looks like this:

CREATE TABLE [embeddings] (
   [id] TEXT PRIMARY KEY,
   [embedding] BLOB
);

The binary data can be extracted into a Python array of floating point numbers like this:

import struct

vector = struct.unpack(
    "f" * 1536, binary_embedding
)

Search

Having saved the embeddings for content, you can run searches using the search command:

openai-to-sqlite search embeddings.db 'this is my search term'

The output will be a list of cosine similarity scores and content IDs:

% openai-to-sqlite search blog.db 'cool datasette demo'
0.843 7849
0.830 8036
0.828 8195
0.826 8098
0.818 8086
0.817 8171
0.816 8121
0.815 7860
0.815 7872
0.814 8169

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd openai-to-sqlite
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai-to-sqlite-0.1a0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

openai_to_sqlite-0.1a0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file openai-to-sqlite-0.1a0.tar.gz.

File metadata

  • Download URL: openai-to-sqlite-0.1a0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for openai-to-sqlite-0.1a0.tar.gz
Algorithm Hash digest
SHA256 2b057d013f8e676a3a1eeafb197fd4839f62052db20e09dda29c3f6fe64b29f9
MD5 ef747061c4656575f97d505866935397
BLAKE2b-256 30e4e1f9b7c4a78b47b8253c6eac3ff2da1cc10241658d8139748d837d3b3f5f

See more details on using hashes here.

File details

Details for the file openai_to_sqlite-0.1a0-py3-none-any.whl.

File metadata

File hashes

Hashes for openai_to_sqlite-0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 620998af7fcfb189a4f0d431988ebc0a30e89295d8cc2509be35361a2f1d4deb
MD5 cc64bb7eb7f66f67ab5f52ec4d1d3320
BLAKE2b-256 979485262c3d7894d810267fe64d2948f5f8372cecfc401593e54ce853c69fd0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page