One master key for all LLM/GenAI endpoints
Project description
LLaMasterKey: One master key for all LLM/GenAI endpoints
A big pain in the era of LLMs is that you need to get an API token for each of them, OpenAI, Cohere, Google Vertex AI, Anthropic, AnyScale, Huggingface, etc.
If an intern in your startup accidentally pushes the code containing the API keys to Github, you would have to revoke each of the API tokens that was assigned to him. Even worse, you already forgot which API tokens were given to him. So what do you do? Revoke all keys and suffer from service interruption?
This is when LlaMasterKey (pronounced as "La Master key" which stands for "Llama" + "Master" + "key" where "La" stands for "the" in French) comes to play. It severs as a proxy that dispatches the requests to the real cloud LLM/GenAI endpoints and returns the response to your team/customer. To authenticate, only one master key is needed between your team member or customer and your LlaMasterKey server. If any of them makes you unhappy, you only need to revoke one key to cut off his/her access to all cloud LLM/GenAI endpoints. The actual keys are hidden from your team members and customers.
Roadmap
- Currently no master key is enabled. We will add authentication.
- More cloud LLM/GenAI endpoints will be supported. This is the status:
- OpenAI/chat/completion
- Cohere/chat
- AnyScale
- HuggingFace Inference API
- Anthropic
- Google Vertex AI
- Vectara AI
Installation
pip install LLaMasterKey
If you want to install from the path, you can do:
pip install -e .
Usage
-
On your server, set up the keys for each cloud LLM/GenAI endpoint you want to use. For example, if you want to use OpenAI, set the OS environment variable
OPENAI_API_KEY
.export OPENAI_API_KEY=sk-xxx #openai export CO_API_KEY=co-xxx # cohere export HF_TOKEN=hf-xxx # huggingface export ANYSCALE_API_KEY=as-xxx # anyscale export ANTHROPIC_API_KEY=an-xxx # anthropic export VECTOR_AI_API_KEY=va-xxx # vectara
-
Start your LlaMasterKey server
lmk
The server will read keys set in the OS environment variables and start a server at
http://localhost:8000
(8000 because it's the default port of FastAPI). -
On each computer that needs to connect to a cloud LLM, e.g., the laptop of your intern, use the
generated-keys.env
which is generated by the LlaMasterKey.source generated-keys.env
-
Make requests to the cloud LLM/GenAI endpoint as usual.
For example,
test_chatgpt.py
intests
is a client request.
How it works under the hood
We generate an env file that modifies the token and the endpoint URL, e.g. for OpenAI we override OPENAI_BASE_URL
and OPENAI_API_KEY
. The request will then be forwarded to the LlaMasterKey server and processed and forwarded to the corresponding address based on the token.
For HuggingFace
If you work through huggingface_hub.InferenceClient()
it works fine. But if you are working through requests
like:
import requests
API_URL = "https://api-inference.huggingface.co/models/t5-small"
headers = {"Authorization": "Bearer **********"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Меня зовут Вольфганг и я живу в Берлине",
})
You need to change the API_URL
to os.environ["HF_INFERENCE_ENDPOINT"] + "/models/t5-small"
, and change the Authorization
header to os.environ["HF_TOKEN"]
.
For example, if you want to use the t5-small
model, you can do:
import os
import requests
API_URL = f"{os.environ['HF_INFERENCE_ENDPOINT']}/models/t5-small"
headers = {"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Меня зовут Вольфганг и я живу в Берлине",
})
``
## License
Ah, this is important. Let's say MIT for now?
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for LlaMasterKey-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8901610c15f0865a8a3c3286378337a7da5139159c53af07ca9a1993cd496bc1 |
|
MD5 | f81251bce08c039e70615043b0e8e643 |
|
BLAKE2b-256 | 3d6a29c31f441395c678566d56a8785c4d7217894e9eabc9693c2a7a74de3ef1 |