Skip to main content

One master key for all LLM/GenAI endpoints

Project description

LLaMasterKey: One master key for all LLM/GenAI endpoints

A big pain in the era of LLMs is that you need to get an API token for each of them, OpenAI, Cohere, Google Vertex AI, Anthropic, AnyScale, Huggingface, etc.

If an intern in your startup accidentally pushes the code containing the API keys to Github, you would have to revoke each of the API tokens that was assigned to him. Even worse, you already forgot which API tokens were given to him. So what do you do? Revoke all keys and suffer from service interruption?

This is when LlaMasterKey (pronounced as "La Master key" which stands for "Llama" + "Master" + "key" where "La" stands for "the" in French) comes to play. It severs as a proxy that dispatches the requests to the real cloud LLM/GenAI endpoints and returns the response to your team/customer. To authenticate, only one master key is needed between your team member or customer and your LlaMasterKey server. If any of them makes you unhappy, you only need to revoke one key to cut off his/her access to all cloud LLM/GenAI endpoints. The actual keys are hidden from your team members and customers.

Roadmap

  1. Currently no master key is enabled. We will add authentication.
  2. More cloud LLM/GenAI endpoints will be supported. This is the status:
    • OpenAI/chat/completion
    • Cohere/chat
    • AnyScale
    • HuggingFace Inference API
    • Anthropic
    • Google Vertex AI
    • Vectara AI

Installation

pip install LLaMasterKey

If you want to install from the path, you can do:

pip install -e .

Usage

  1. On your server, set up the keys for each cloud LLM/GenAI endpoint you want to use. For example, if you want to use OpenAI, set the OS environment variable OPENAI_API_KEY.

    export OPENAI_API_KEY=sk-xxx #openai
    export CO_API_KEY=co-xxx # cohere
    export HF_TOKEN=hf-xxx # huggingface
    export ANYSCALE_API_KEY=as-xxx # anyscale
    export ANTHROPIC_API_KEY=an-xxx # anthropic
    export VECTOR_AI_API_KEY=va-xxx # vectara
    
  2. Start your LlaMasterKey server

    lmk
    

    The server will read keys set in the OS environment variables and start a server at http://localhost:8000 (8000 because it's the default port of FastAPI).

  3. On each computer that needs to connect to a cloud LLM, e.g., the laptop of your intern, use the generated-keys.env which is generated by the LlaMasterKey.

    source generated-keys.env
    
  4. Make requests to the cloud LLM/GenAI endpoint as usual.

    For example, test_chatgpt.py in tests is a client request.

How it works under the hood

We generate an env file that modifies the token and the endpoint URL, e.g. for OpenAI we override OPENAI_BASE_URL and OPENAI_API_KEY. The request will then be forwarded to the LlaMasterKey server and processed and forwarded to the corresponding address based on the token.

For HuggingFace

If you work through huggingface_hub.InferenceClient() it works fine. But if you are working through requests like:

import requests

API_URL = "https://api-inference.huggingface.co/models/t5-small"
headers = {"Authorization": "Bearer **********"}

def query(payload):
   response = requests.post(API_URL, headers=headers, json=payload)
   return response.json()

output = query({
   "inputs": "Меня зовут Вольфганг и я живу в Берлине",
})

You need to change the API_URL to os.environ["HF_INFERENCE_ENDPOINT"] + "/models/t5-small", and change the Authorization header to os.environ["HF_TOKEN"].

For example, if you want to use the t5-small model, you can do:

import os
import requests

API_URL = f"{os.environ['HF_INFERENCE_ENDPOINT']}/models/t5-small"
headers = {"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}

def query(payload):
   response = requests.post(API_URL, headers=headers, json=payload)
   return response.json()

output = query({
   "inputs": "Меня зовут Вольфганг и я живу в Берлине",
})
``

## License

Ah, this is important. Let's say MIT for now?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LlaMasterKey-0.1.0.tar.gz (6.3 kB view hashes)

Uploaded Source

Built Distribution

LlaMasterKey-0.1.0-py3-none-any.whl (8.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page