Skip to main content

llama-index llms baseten integration

Project description

LlamaIndex Llms Integration: Baseten

This integration allows you to use Baseten's hosted models with LlamaIndex.

Installation

Install the required packages:

pip install llama-index-llms-baseten
pip install llama-index

Model APIs vs. Dedicated Deployments

Baseten offers two main ways for inference.

  1. Model APIs are public endpoints for popular open source models (GPT-OSS, Kimi K2, DeepSeek etc) where you can directly use a frontier model via slug e.g. deepseek-ai/DeepSeek-V3-0324 and you will be charged on a per-token basis. You can find the list of supported models here: https://docs.baseten.co/development/model-apis/overview#supported-models.

  2. Dedicated deployments are useful for serving custom models where you want to autoscale production workloads and have fine-grain configuration. You need to deploy a model in your Baseten dashboard and provide the 8 character model id like abcd1234.

By default, we set the model_apis parameter to True. If you want to use a dedicated deployment, you must set the model_apis parameter to False when instantiating the Baseten object.

Usage

Basic Usage

To use Baseten models with LlamaIndex, first initialize the LLM:

# Model APIs, you can find the model_slug here: https://docs.baseten.co/development/model-apis/overview#supported-models
llm = Baseten(
    model_id="MODEL_SLUG",
    api_key="YOUR_API_KEY",
    model_apis=True,  # Default, so not strictly necessary
)

# Dedicated Deployments, you can find the model_id by in the Baseten dashboard here: https://app.baseten.co/overview
llm = Baseten(
    model_id="MODEL_ID",
    api_key="YOUR_API_KEY",
    model_apis=False,
)

Basic Completion

Generate a simple completion:

response = llm.complete("Paul Graham is")
print(response.text)

Chat Messages

Use chat-style interactions:

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
response = llm.chat(messages)
print(response)

Streaming

Stream completions in real-time:

# Streaming completion
response = llm.stream_complete("Paul Graham is")
for r in response:
    print(r.delta, end="")

# Streaming chat
messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
response = llm.stream_chat(messages)
for r in response:
    print(r.delta, end="")

Async Operations

Baseten supports async operations for long-running inference tasks. This is useful for:

  • Tasks that may hit request timeouts
  • Batch inference jobs
  • Prioritizing certain requests

The async implementation uses webhooks to deliver results.

Note: Async is only available for dedicated deployments and not for model APIs. achat is not supported because chat does not make sense for async operations.

async_llm = Baseten(
    model_id="your_model_id",
    api_key="your_api_key",
    webhook_endpoint="your_webhook_endpoint",
)
response = await async_llm.acomplete("Paul Graham is")
print(response)

To check the status of an async request:

import requests

model_id = "your_model_id"
request_id = "your_request_id"
api_key = "your_api_key"

resp = requests.get(
    f"https://model-{model_id}.api.baseten.co/async_request/{request_id}",
    headers={"Authorization": f"Api-Key {api_key}"},
)
print(resp.json())

For async operations, results are posted to your provided webhook endpoint. Your endpoint should validate the webhook signature and handle the results appropriately. The results are NOT stored by Baseten.

Additional Resources

For more examples and detailed usage, check out the Baseten Cookbook.

Open In Colab

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_llms_baseten-0.2.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_llms_baseten-0.2.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_llms_baseten-0.2.0.tar.gz.

File metadata

  • Download URL: llama_index_llms_baseten-0.2.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_llms_baseten-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1aa4d3dd3ba4c987018985a5293e51dd64c23af8054e2abaa36c8c79cc0755f2
MD5 c249018251115948868fb99b2dde9fbc
BLAKE2b-256 c03d84bcee27fb4d8a3054836c11e3192fba9bf5e0f2d798df0d49ec85af46a4

See more details on using hashes here.

File details

Details for the file llama_index_llms_baseten-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_llms_baseten-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_llms_baseten-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c388f5b99d2d2e4e7c6e56fbde10cf5593b7fdd95b333b18c2ebc023e711121
MD5 2b57055c1cd53a7eccfcd39f21f97c89
BLAKE2b-256 2965c20cb31bf392c2a7402fa91ddb3b1ea3e724edf8456c31393be320a11a8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page