Skip to main content

One master key for all LLM/GenAI endpoints

Project description

LlaMaKey: one master key for accessing all cloud LLM/GenAI APIs

Introducing LlaMa(ster)Key, the simplified and secure way to manage API keys and control the access to various cloud LLM/GenAI APIs for multiple users. LlaMaKey enables a user to access multiple cloud AI APIs through a single, user-unique master key, instead of a bunch of API keys, one for each platform. As a proxy, it eases key management for both the user and the administrator by consolidating the keys distributed to a user to just one while enhancing the protection of the actual API keys by hiding them from the user. Major cloud AI APIs (OpenAI, Cohere, AnyScale, etc.) can be seamlessly called in their official Python SDKs without any code changes. In addition, administrators can control individual users in detail through rate throttling, API/endpoint whitelisting, budget capping, etc. LlaMaKey is open source under MIT license and is ready for private, on-premises deployment.

graph TD
   subgraph Your team
     A[User A] -- Local pass A --> L[LlaMasterKey server]
     B[User B] -- Local pass B --> L["LlaMasterKey server<br> (rate throttling, API/endpoint whitelisting, <br> logging, budgetting, etc.)"]
   end 
    L -- Actual OPENAI_API_KEY--> O[OpenAI API server]
    L -- Actual COHERE_API_KEY--> C[Cohere API server]
    L -- Actual VECTARA_API_KEY--> V[Vectara API server]

The pain and the solution

How do you manage the API keys in a team needing to access an array of cloud LLM/GenAI APIs? If you get one key per user per API, then you have too many keys to manage. But if you share the key per API, then it is too risky. What if your careless intern accidentally pushes it to a public Github repo?

This is when LlamaKey comes to play. It is a proxy between your users and the actual cloud AI API. To authenticate, only one key is needed between your team member's code and your LlamaKey server. If any of them makes you unhappy, just revoke one key to cut him/her loss without interrupting others.

A user does not need to change a single line of code to use LlaMaKey. LlaMaKey takes advantage of a feature in the official Python SDKs of most cloud LLM/GenAI APIs that each of them has a BASE_URL which is configurable in the environment variables:

  • OPENAI_BASE_URL for OpenAI
  • CO_API_URL for Cohere
  • ANYSCALE_BASE_URL for AnyScale

So the user only needs to set the respectively BASE_URL to the LlaMaKey server. Then the request is first make to a LlaMaKey server, which then forwards it to the real cloud LLM/GenAI endpoint.

Roadmap

  1. Currently, authentication with the LlaMaKey server is not enabled. If you want us to support it, please open an issue on Github. We will see it as a demand and prioritize it accordingly.

  2. Supported APIs:

    • OpenAI (all endpoints)
    • Cohere (all endpoints)
    • AnyScale
    • HuggingFace Inference API (free tier)
    • HuggingFace EndPoint API
    • Anthropic
    • Google Vertex AI
    • Vectara AI

Installation

Stable version:

pip install LLaMasterKey

Nightly version:

You can manually install the nightly version at:

https://github.com/TexteaInc/LlaMasterKey/releases/tag/nightly

Build from source

Requirements:

git clone git@github.com:TexteaInc/LlaMasterKey.git
# you can switch to a different branch:
# git switch dev
cargo build --release
# available at ./target/release/lmk

Usage

On the server end, set up the actual API keys in the environment variable per their respective APIs and start your LlaMaKey server, for example:

export OPENAI_API_KEY=sk-xxx # an actual openai key

lmk # start the server

The server will read keys of supported LLM/GenAI APIs from the OS environment variables and start a server at http://localhost:8000 (8000 is the default port of FastAPI). It will generate the shell command to activate certain environment variables on your client end, like this:

export OPENAI_BASE_URL="http://127.0.0.1:8000/openai" # direct OpenAI calls to the LlaMaKey server
export OPENAI_API_KEY="LlaMaKey" # a placeholder master key

For your convenience, the commands are also dumped to the file./llamakey_local.env.

On the client end, activate the environment variables generated above before running your code. You can copy and paste the commands above or simply source the llamakey_local.env file generated in the previous step, for example:

# step 1: activate the environment variables that directs the API calls to the LlaMaKey server
source llamakey_local.env # this is only one of many ways to do it.

# Step 2: Call OpenAI as usual using its offical Python SDK
python3 -c '\
from openai import OpenAI;
client = OpenAI();
print (\
  client.chat.completions.create(\
    model="gpt-3.5-turbo",\
    messages=[{"role": "user", "content": "What is FastAPI?"}]
  )
)'

License

Ah, this is important. Let's say MIT for now?

Contact

For usage, bugs, or feature requests, please open an issue on Github. For private inquiries, please email hello@funix.io.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LlaMasterKey-0.1.1.tar.gz (22.6 kB view hashes)

Uploaded Source

Built Distributions

LlaMasterKey-0.1.1-py3-none-win_amd64.whl (2.8 MB view hashes)

Uploaded Python 3 Windows x86-64

LlaMasterKey-0.1.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB view hashes)

Uploaded Python 3 manylinux: glibc 2.17+ x86-64

LlaMasterKey-0.1.1-py3-none-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (6.6 MB view hashes)

Uploaded Python 3 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page