Library to easily interface with LLM API providers
Project description
๐ LiteLLM
Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]
OpenAI-Compatible Server
LiteLLM manages
- Translating inputs to the provider's
completionandembeddingendpoints - Guarantees consistent output, text responses will always be available at
['choices'][0]['message']['content'] - Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
- Load-balance across multiple deployments (e.g. Azure/OpenAI) -
Router1k+ requests/second
Usage (Docs)
[!IMPORTANT] LiteLLM v1.0.0 now requires
openai>=1.0.0. Migration guide here
pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)
Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
# claude 2
response = completion('claude-2', messages, stream=True)
for part in response:
print(part.choices[0].delta.content or "")
Router - load balancing(Docs)
LiteLLM allows you to load balance between multiple deployments (Azure, OpenAI). It picks the deployment which is below rate-limit and has the least amount of tokens used.
from litellm import Router
model_list = [{ # list of model deployments
"model_name": "gpt-3.5-turbo", # model alias
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-v-2", # actual model name
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/chatgpt-functioncalling",
"api_key": os.getenv("AZURE_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_API_BASE")
}
}, {
"model_name": "gpt-3.5-turbo",
"litellm_params": { # params for litellm completion/embedding call
"model": "gpt-3.5-turbo",
"api_key": os.getenv("OPENAI_API_KEY"),
}
}]
router = Router(model_list=model_list)
# openai.ChatCompletion.create replacement
response = router.completion(model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response)
OpenAI Proxy - (Docs)
LiteLLM Proxy manages:
- Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
- Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
- Authentication & Spend Tracking Virtual Keys
Step 1: Start litellm proxy
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:8000
Step 2: Replace openai base
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
Logging Observability (Docs)
LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack
from litellm import completion
## set env variables for logging tools
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"
os.environ["OPENAI_API_KEY"]
# set callbacks
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase
#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi ๐ - i'm openai"}])
Supported Provider (Docs)
| Provider | Completion | Streaming | Async Completion | Async Streaming |
|---|---|---|---|---|
| openai | โ | โ | โ | โ |
| azure | โ | โ | โ | โ |
| aws - sagemaker | โ | โ | โ | โ |
| aws - bedrock | โ | โ | โ | โ |
| cohere | โ | โ | โ | โ |
| anthropic | โ | โ | โ | โ |
| huggingface | โ | โ | โ | โ |
| replicate | โ | โ | โ | โ |
| together_ai | โ | โ | โ | โ |
| openrouter | โ | โ | โ | โ |
| google - vertex_ai | โ | โ | โ | โ |
| google - palm | โ | โ | โ | โ |
| ai21 | โ | โ | โ | โ |
| baseten | โ | โ | โ | โ |
| vllm | โ | โ | โ | โ |
| nlp_cloud | โ | โ | โ | โ |
| aleph alpha | โ | โ | โ | โ |
| petals | โ | โ | โ | โ |
| ollama | โ | โ | โ | โ |
| deepinfra | โ | โ | โ | โ |
| perplexity-ai | โ | โ | โ | โ |
| anyscale | โ | โ | โ | โ |
Contributing
To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.
Here's how to modify the repo locally: Step 1: Clone the repo
git clone https://github.com/BerriAI/litellm.git
Step 2: Navigate into the project, and install dependencies:
cd litellm
poetry install
Step 3: Test your change:
cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .
Step 4: Submit a PR with your changes! ๐
- push your fork to your GitHub repo
- submit a PR from there
Support / talk with founders
- Schedule Demo ๐
- Community Discord ๐ญ
- Our numbers ๐ +1 (770) 8783-106 / โญ+1 (412) 618-6238โฌ
- Our emails โ๏ธ ishaan@berri.ai / krrish@berri.ai
Why did we build this
- Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litellm-1.9.dev0.tar.gz.
File metadata
- Download URL: litellm-1.9.dev0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ee17c28aba41023c727cd4165c16fd892e1a483a7b080262010cd5c5e65e037
|
|
| MD5 |
da0fc1ea691867261ca6701604bca597
|
|
| BLAKE2b-256 |
ad6c162b83e9cf68f5a94c30153e835414c3ee11c8602ffaacc9a032851dfa26
|
File details
Details for the file litellm-1.9.dev0-py3-none-any.whl.
File metadata
- Download URL: litellm-1.9.dev0-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fee9ea108f77d9d1a9bc5a6656b09dbe98647b50546240042f487eaed7e9b40
|
|
| MD5 |
7306d5546be12bce58e640ed9055d3f0
|
|
| BLAKE2b-256 |
222948ca636eccb49d9375584df6b648b9c3642cfda963071335a10546b5e9ea
|