llama-index llms baseten integration
Project description
LlamaIndex Llms Integration: Baseten
This integration allows you to use Baseten's hosted models with LlamaIndex.
Installation
Install the required packages:
pip install llama-index-llms-baseten
pip install llama-index
Model APIs vs. Dedicated Deployments
Baseten offers two main ways for inference.
-
Model APIs are public endpoints for popular open source models (GPT-OSS, Kimi K2, DeepSeek etc) where you can directly use a frontier model via slug e.g.
deepseek-ai/DeepSeek-V3-0324and you will be charged on a per-token basis. You can find the list of supported models here: https://docs.baseten.co/development/model-apis/overview#supported-models. -
Dedicated deployments are useful for serving custom models where you want to autoscale production workloads and have fine-grain configuration. You need to deploy a model in your Baseten dashboard and provide the 8 character model id like
abcd1234.
By default, we set the model_apis parameter to True. If you want to use a dedicated deployment, you must set the model_apis parameter to False when instantiating the Baseten object.
Usage
Basic Usage
To use Baseten models with LlamaIndex, first initialize the LLM:
# Model APIs, you can find the model_slug here: https://docs.baseten.co/development/model-apis/overview#supported-models
llm = Baseten(
model_id="MODEL_SLUG",
api_key="YOUR_API_KEY",
model_apis=True, # Default, so not strictly necessary
)
# Dedicated Deployments, you can find the model_id by in the Baseten dashboard here: https://app.baseten.co/overview
llm = Baseten(
model_id="MODEL_ID",
api_key="YOUR_API_KEY",
model_apis=False,
)
Basic Completion
Generate a simple completion:
response = llm.complete("Paul Graham is")
print(response.text)
Chat Messages
Use chat-style interactions:
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
response = llm.chat(messages)
print(response)
Streaming
Stream completions in real-time:
# Streaming completion
response = llm.stream_complete("Paul Graham is")
for r in response:
print(r.delta, end="")
# Streaming chat
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
response = llm.stream_chat(messages)
for r in response:
print(r.delta, end="")
Async Operations
Baseten supports async operations for long-running inference tasks. This is useful for:
- Tasks that may hit request timeouts
- Batch inference jobs
- Prioritizing certain requests
The async implementation uses webhooks to deliver results.
Note: Async is only available for dedicated deployments and not for model APIs. achat is not supported because chat does not make sense for async operations.
async_llm = Baseten(
model_id="your_model_id",
api_key="your_api_key",
webhook_endpoint="your_webhook_endpoint",
)
response = await async_llm.acomplete("Paul Graham is")
print(response)
To check the status of an async request:
import requests
model_id = "your_model_id"
request_id = "your_request_id"
api_key = "your_api_key"
resp = requests.get(
f"https://model-{model_id}.api.baseten.co/async_request/{request_id}",
headers={"Authorization": f"Api-Key {api_key}"},
)
print(resp.json())
For async operations, results are posted to your provided webhook endpoint. Your endpoint should validate the webhook signature and handle the results appropriately. The results are NOT stored by Baseten.
Additional Resources
For more examples and detailed usage, check out the Baseten Cookbook.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_llms_baseten-0.1.1.tar.gz.
File metadata
- Download URL: llama_index_llms_baseten-0.1.1.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
026874acd36041da31e95dab2cd27ad21385d4919a1384b8833cc2faa771b9db
|
|
| MD5 |
f785eb61a82db1ce2add5fd89f4b72e7
|
|
| BLAKE2b-256 |
29d2fa484f3845d726f0e6d03396f361282d41270cd8ea518d581ce3d3f93f48
|
File details
Details for the file llama_index_llms_baseten-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llama_index_llms_baseten-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd02bcb4469a2ca22ad0d8adc3389629bc749592b5d630c20148c7127c86e498
|
|
| MD5 |
2d20b60cfac529478b060dc248eba4f6
|
|
| BLAKE2b-256 |
f498b242965a1717a1e7e44d33798add6bb32b28f8372dbb4ca25978e605d05a
|