Read through cache for OpenAI's API.
Project description
llmem
A read through cache for OpenAI chat completions.
Why:
- Avoid running up bills when developing/testing by sending the same request multiple times/forgetting to mock things.
Fine-tune an open source model on data I've paid forAdhere to all terms of use.- Evaluate future models.
- Track model changes over time.
use
It currently only supports the chat completion API.
There is one additional (optional) field in the request, age
. You can give it a number + characters like 10w
it will only return from cache if the entry is younger than that (in this case 10 weeks). Acceptable characters are s(econds), m(inute), h(hours), d(ays), w(eeks), y(ears).
requests
NOTE: These examples run against my public server llmem.com
. You might want to run your own server instead.
import requests
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Hi."
},
]
}
],
"max_tokens": 10
}
response = requests.post("https://llmem.com/chat/completions", headers=headers, json=payload)
print(response.json())
If you run that twice, you'll see that first it gives an ID starting with chatcmpl
, then IDs starting with llmem
.
You can also use the OpenAI client, although it will currently only work for /chat
requests:
from openai import OpenAI
client = OpenAI(
api_key=api_key,
base_url="https://llmem.com"
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test",
}
],
model="gpt-3.5-turbo",
)
private server
docker
Makes it available on port 8000
, and keeps its cache in ~/.llmem
.
docker run --rm -p 8000:8000 -v ~/.llmem:/workspace ghcr.io/c0g/llmem:latest
python
pip install llmem
uvicorn llmem:app
By default it will make a llmem.db
file in the working directory, you can override this location by setting the LLMEM_DB_FILE
environment variable.
You can get the code on GitHub.
public server
I host a public server at llmem.com. If you point your API client to https://llmem.com/chat
, you'll hit my cache and share the wealth (and get to share other people's wealth).
- NOTE: I could look at your OpenAI key if I wanted to. I don't, but I could. You can probably trust me[^criminal].
- NOTE: I can look at your queries. I probably will (see 'Why' bullets above). Please don't submit anything you wouldn't want your mother seeing, since I might well be your mother.
- NOTE: If any of the above NOTEs worry you it's probably best not to point your API client to
https://llmem.com
.
[^criminal]: That's exactly what a criminal would say.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llmem-20240215.post2.tar.gz
.
File metadata
- Download URL: llmem-20240215.post2.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5da701fcf9421011d98dba9fc7d4753a80363a7a35de230df972d009fd517df |
|
MD5 | 2bcd19b3c6dcc2a72b5701f5d5a01351 |
|
BLAKE2b-256 | 8c9f4ecf9ee8676c76788029b00976fcb37353b7539507bf7bc32a26aee038d3 |
File details
Details for the file llmem-20240215.post2-py3-none-any.whl
.
File metadata
- Download URL: llmem-20240215.post2-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13d6637c9731d66da45182bfaa43bf657bc794fb92ba9f6dfe8be57496fd6412 |
|
MD5 | fc628248473a46c5704ce3cc74266a0a |
|
BLAKE2b-256 | acab94345aa37d98eb72eb94235fe555157fb988872f55fab5c92fdf065ecbcb |