Interact with the Databricks Foundation Model API from python
Project description
Databricks Generative AI Inference SDK (Beta)
The Databricks Generative AI Inference Python library provides a user-friendly python interface to use the Databricks Foundation Model API.
[!NOTE] This SDK was primarily designed for pay-per-token endpoints (
databricks-*
). It has a list of known model names (eg.dbrx-instruct
) and automatically maps them to the corresponding shared endpoint (databricks-dbrx-instruct
). You can use this with provisioned throughput endpoints, as long as they do not match known model names. If there is an overlap, you can use theDATABRICKS_MODEL_URL_ENV
URL to directly provide an endpoint URL.
This library includes a pre-defined set of API classes Embedding
, Completion
, ChatCompletion
with convenient functions to make API request, and to parse contents from raw json response.
We also offer a high level ChatSession
object for easy management of multi-round chat completions, which is especially useful for your next chatbot development.
You can find more usage details in our SDK onboarding doc.
[!IMPORTANT]
We're preparing to release version 1.0 of the Databricks GenerativeAI Inference Python library.
Installation
pip install databricks-genai-inference
Usage
Embedding
from databricks_genai_inference import Embedding
Text embedding
response = Embedding.create(
model="bge-large-en",
input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
[!TIP]
You may want to reuse http connection to improve request latency for large-scale workload, code example:
with requests.Session() as client:
for i, text in enumerate(texts):
response = Embedding.create(
client=client,
model="bge-large-en",
input=text
)
Text embedding (async)
async with httpx.AsyncClient() as client:
response = await Embedding.acreate(
client=client,
model="bge-large-en",
input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
Text embedding with instruction
response = Embedding.create(
model="bge-large-en",
instruction="Represent this sentence for searching relevant passages:",
input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
Text embedding (batching)
[!IMPORTANT]
Support max batch size of 150
response = Embedding.create(
model="bge-large-en",
input=[
"3D ActionSLAM: wearable person tracking in multi-floor environments",
"3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')
Text embedding with instruction (batching)
[!IMPORTANT]
Support one instruction per batch Batch size
response = Embedding.create(
model="bge-large-en",
instruction="Represent this sentence for searching relevant passages:",
input=[
"3D ActionSLAM: wearable person tracking in multi-floor environments",
"3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')
Text completion
from databricks_genai_inference import Completion
Text completion
response = Completion.create(
model="mpt-7b-instruct",
prompt="Represent the Science title:")
print(f'response.text:{response.text:}')
Text completion (async)
async with httpx.AsyncClient() as client:
response = await Completion.acreate(
client=client,
model="mpt-7b-instruct",
prompt="Represent the Science title:")
print(f'response.text:{response.text:}')
Text completion (streaming)
[!IMPORTANT]
Only support batch size = 1 in streaming mode
response = Completion.create(
model="mpt-7b-instruct",
prompt="Count from 1 to 100:",
stream=True)
print(f'response.text:')
for chunk in response:
print(f'{chunk.text}', end="")
Text completion (streaming + async)
async with httpx.AsyncClient() as client:
response = await Completion.acreate(
client=client,
model="mpt-7b-instruct",
prompt="Count from 1 to 10:",
stream=True)
print(f'response.text:')
async for chunk in response:
print(f'{chunk.text}', end="")
Text completion (batching)
[!IMPORTANT]
Support max batch size of 16
response = Completion.create(
model="mpt-7b-instruct",
prompt=[
"Represent the Science title:",
"Represent the Science title:"])
print(f'response.text[0]:{response.text[0]}')
print(f'response.text[1]:{response.text[1]}')
Chat completion
from databricks_genai_inference import ChatCompletion
[!IMPORTANT]
Batching is not supported forChatCompletion
Chat completion
response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}])
print(f'response.text:{response.message:}')
Chat completion (async)
async with httpx.AsyncClient() as client:
response = await ChatCompletion.acreate(
client=client,
model="llama-2-70b-chat",
messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}],
)
print(f'response.text:{response.message:}')
Chat completion (streaming)
response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}], stream=True)
for chunk in response:
print(f'{chunk.message}', end="")
Chat completion (streaming + async)
async with httpx.AsyncClient() as client:
response = await ChatCompletion.acreate(
client=client,
model="llama-2-70b-chat",
messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}],
stream=True,
)
async for chunk in response:
print(f'{chunk.message}', end="")
Chat session
from databricks_genai_inference import ChatSession
[!IMPORTANT]
Streaming mode is not supported forChatSession
chat = ChatSession(model="llama-2-70b-chat")
chat.reply("Kock, kock!")
print(f'chat.last: {chat.last}')
chat.reply("Take a guess!")
print(f'chat.last: {chat.last}')
print(f'chat.history: {chat.history}')
print(f'chat.count: {chat.count}')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for databricks-genai-inference-0.2.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | caf7861cc5be557bf1a580d155847556477405088e423fb5582d28e962f98167 |
|
MD5 | d7286c9e406d0e28c22766e2433de3e0 |
|
BLAKE2b-256 | c8b4dc45abcd404144f257daae8c858ba5aac90958e17e8051867cdc7009c802 |
Hashes for databricks_genai_inference-0.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fe8eceafa1e5f16cb04b6a683095bdc6d7ed86776ff03b9afe46594c83fae1a |
|
MD5 | 7110f7be748189768ef5caf65b3eca91 |
|
BLAKE2b-256 | 4205df2f827acfbe433be519d3b4ff9d98a8241a606139b6353cdcb2cd511952 |