Skip to main content

One master key for all LLM/GenAI endpoints

Project description

LlaMaKey: one master key for accessing all cloud LLM/GenAI APIs

Introducing LlaMa(ster)Key, the simplified and secure way to manage API keys and control the access to various cloud LLM/GenAI APIs for multiple users. LlaMaKey enables a user to access multiple cloud AI APIs through a one master key unique to the user, instead of a bunch of API keys, one for each platform. As a proxy, it eases key management for both the user and the administrator by consolidating the keys distributed to a user to just one while enhancing the protection of the actual API keys by hiding them from the user. Major cloud AI APIs (OpenAI, Cohere, AnyScale, etc.) can be seamlessly called in their official Python SDKs without any code changes. In addition, administrators can control individual users in detail through rate throttling, API/endpoint whitelisting, budget capping, etc. LlaMaKey is open source under MIT license and is ready for private, on-premises deployment.

graph TD
   subgraph Your team
     A[User 1] -- Master Key 1 --> L["LlaMasterKey server<br> (rate throttling, API/endpoint whitelisting, <br> logging, budgetting, etc.)"]
     B[User 2] -- Master Key 2 --> L
     C[User 100] -- Master Key 100 --> L
   end 
    L -- Actual OPENAI_API_KEY--> O[OpenAI API server]
    L -- Actual CO_API_KEY--> P[Cohere API server]
    L -- Actual VECTARA_API_KEY--> Q[Vectara API server]
  • The pain point: How do you manage the API keys in a team needing to access an array of cloud LLM/GenAI APIs? If you get one key per user per API, then you have a quadratic number of, or MxN, keys to manage where M is the number of APIs and N is the number of users. But if you share a key among users, then it is prone to risk and hassles. What if your careless intern accidentally pushes it to a public Github repo? Revoking the key will interrupt the work of all other users.
  • The solution This is when LlamaKey comes to play. It is a proxy between your users and the actual cloud AI API. A user gets one master key unique to him/her to authenticate with your LlamaKey server which will connect him/her to many cloud AI APIs. Cutting one user loose will not interrupt others. You can even control which APIs each user or user group are allowed to access, their rate limits, budget caps, etc.

Supported APIs:

  • OpenAI (all endpoints)
  • Cohere (all endpoints)
  • AnyScale
  • HuggingFace Inference API (free tier)
  • HuggingFace EndPoint API
  • Anthropic
  • Perplexity
  • Google Vertex AI
  • Vectara AI

Currently, authentication with the LlaMaKey server is not enabled. All users share the master key LlaMaKey. If you want to see it, please upvote here.

How LlaMaKey works

As a proxy, LlaMaKey takes advantage of a feature in the Python SDK of most cloud LLM/GenAI APIs that they allow setting the base URL and API keys/tokens to and with which a request is sent and authenticated (OpenAI's, Cohere's). The base URL and API key can be set easily via environment variables. So a client just needs to set such environment variables (or manually configure in their code) and then call the APIs as usual -- see how simple and easy. LlaMaKey will receive the request, authenticate the user (if authentication is enabled), and then forward the request to the corresponding actual cloud API with an actual API key (set by the administrator when starting a LlaMaKey server). The response will be passed back to the client after the LlaMaKey server hears back from a cloud API.

Installation

Build from source

Requirements: git and Rust Toolchain.

git clone git@github.com:TexteaInc/LlaMasterKey.git 
# you can switch to a different branch:
# git switch dev
cargo build --release
# binary at ./target/release/lmk

# run it without installation
cargo run
# you can also install it system-wide
cargo install --path .

# run it
lmk

Usage

The server end

Set up the actual API keys as environment variables per their respective APIs, and then start the server, for example:

# Set the actual API keys as environment variables 
export OPENAI_API_KEY=sk-xxx # openai
export CO_API_KEY=co-xxx # cohere
export HF_TOKEN=hf-xxx # huggingface
export ANYSCALE_API_KEY=credential-xxx # anyscale

lmk # start the server

By default, the server is started at http://localhost:8000 (8000 is the default port of FastAPI).

It will generate the shell commands to activate certain environment variables on your client end, like this:

export OPENAI_BASE_URL="http://127.0.0.1:8000/openai" # direct OpenAI calls to the LlaMaKey server
export CO_API_URL="http://127.0.0.1:8000/cohere"
export ANYSCALE_BASE_URL="http://127.0.0.1:8000/anyscale"
export HF_INFERENCE_ENDPOINT="http://127.0.0.1:8000/huggingface"

export OPENAI_API_KEY="LlaMaKey" # One master key for all APIs
export CO_API_KEY="LlaMaKey"
export ANYSCALE_API_KEY="LlaMaKey"
export HF_TOKEN="LlaMaKey"

Such environment variables will direct the API calls to the LlaMaKey server. For your convenience, the commands are also dumped to the file./llamakey_local.env.

The client end

Just activate the environment variables generated above and then run your code as usual! You may copy and paste the commands above or simply source the llamakey_local.env file generated above, for example:

# step 1: activate the environment variables that tell official SDKs to make requests to LlaMaKey server
source llamakey_local.env 

# Step 2: Call offical Python SDKs as usual, for example, for OpenAI: 
python3 -c '\
from openai import OpenAI;
client = OpenAI();
print (\
  client.chat.completions.create(\
    model="gpt-3.5-turbo",\
    messages=[{"role": "user", "content": "What is FastAPI?"}]
  )
)'

License

Ah, this is important. Let's say MIT for now?

Contact

For usage, bugs, or feature requests, please open an issue on Github. For private inquiries, please email bao@textea.co.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

LlaMasterKey-0.1.2.tar.gz (27.0 kB view details)

Uploaded Source

Built Distributions

LlaMasterKey-0.1.2-py3-none-win_amd64.whl (2.9 MB view details)

Uploaded Python 3 Windows x86-64

LlaMasterKey-0.1.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.0 MB view details)

Uploaded Python 3 manylinux: glibc 2.17+ x86-64

LlaMasterKey-0.1.2-py3-none-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (6.8 MB view details)

Uploaded Python 3 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

File details

Details for the file LlaMasterKey-0.1.2.tar.gz.

File metadata

  • Download URL: LlaMasterKey-0.1.2.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for LlaMasterKey-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4b731b5285c3ba14c6938f8c45bd09d761481668fdae00314c922a17ef5f172d
MD5 cc0311053e66ea1705a544caedd3ae93
BLAKE2b-256 19db053d4db8784f5597b902e56b7ef65691c50ecfea0f948779f5894e37e413

See more details on using hashes here.

File details

Details for the file LlaMasterKey-0.1.2-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for LlaMasterKey-0.1.2-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 5738a08958c1fc62bb1be80ba49e4774aa0d7c26917cefd2ed4e156aad2226c6
MD5 3aec387a8784cc8da69dd40ced247ca1
BLAKE2b-256 d90ffdea492769ac227fd71e0bfe021dd5a3f519011d46afa6d19a76e76a6f1b

See more details on using hashes here.

File details

Details for the file LlaMasterKey-0.1.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for LlaMasterKey-0.1.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3223246eb33148b5126b31e1f03829513cf1b80f2520dfbf8c5733ea4434b683
MD5 51846bd9564e7cf6b62680b3f3735373
BLAKE2b-256 ec60b75e10d3226ad1b1e422e99ea49181f1eb9cca8f58af8af84cd87c38920f

See more details on using hashes here.

File details

Details for the file LlaMasterKey-0.1.2-py3-none-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for LlaMasterKey-0.1.2-py3-none-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 d7970bc19949f7df5e7980f94ccb46f6b4774f17ab31879357680c5de62779f3
MD5 b24df6bcb6cc285e61f1fed1d0f05cc6
BLAKE2b-256 653b8ca3884ad2db4885cf8deeededf4964f11b217e759eb59a4173c07dc69a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page