Interact with the Databricks Foundation Model API from python
Project description
Databricks Generative AI Inference SDK (Beta)
The Databricks Generative AI Inference Python library provides a user-friendly python interface to use the Databricks Foundation Model API.
[!NOTE] This SDK was primarily designed for pay-per-token endpoints (
databricks-*
). It has a list of known model names (eg.dbrx-instruct
) and automatically maps them to the corresponding shared endpoint (databricks-dbrx-instruct
). You can use this with provisioned throughput endpoints, as long as they do not match known model names. If there is an overlap, you can use theDATABRICKS_MODEL_URL_ENV
URL to directly provide an endpoint URL.
This library includes a pre-defined set of API classes Embedding
, Completion
, ChatCompletion
with convenient functions to make API request, and to parse contents from raw json response.
We also offer a high level ChatSession
object for easy management of multi-round chat completions, which is especially useful for your next chatbot development.
You can find more usage details in our SDK onboarding doc.
[!IMPORTANT]
We're preparing to release version 1.0 of the Databricks GenerativeAI Inference Python library.
Installation
pip install databricks-genai-inference
Usage
Embedding
from databricks_genai_inference import Embedding
Text embedding
response = Embedding.create(
model="bge-large-en",
input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
[!TIP]
You may want to reuse http connection to improve request latency for large-scale workload, code example:
with requests.Session() as client:
for i, text in enumerate(texts):
response = Embedding.create(
client=client,
model="bge-large-en",
input=text
)
Text embedding (async)
async with httpx.AsyncClient() as client:
response = await Embedding.acreate(
client=client,
model="bge-large-en",
input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
Text embedding with instruction
response = Embedding.create(
model="bge-large-en",
instruction="Represent this sentence for searching relevant passages:",
input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')
Text embedding (batching)
[!IMPORTANT]
Support max batch size of 150
response = Embedding.create(
model="bge-large-en",
input=[
"3D ActionSLAM: wearable person tracking in multi-floor environments",
"3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')
Text embedding with instruction (batching)
[!IMPORTANT]
Support one instruction per batch Batch size
response = Embedding.create(
model="bge-large-en",
instruction="Represent this sentence for searching relevant passages:",
input=[
"3D ActionSLAM: wearable person tracking in multi-floor environments",
"3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')
Text completion
from databricks_genai_inference import Completion
Text completion
response = Completion.create(
model="mpt-7b-instruct",
prompt="Represent the Science title:")
print(f'response.text:{response.text:}')
Text completion (async)
async with httpx.AsyncClient() as client:
response = await Completion.acreate(
client=client,
model="mpt-7b-instruct",
prompt="Represent the Science title:")
print(f'response.text:{response.text:}')
Text completion (streaming)
[!IMPORTANT]
Only support batch size = 1 in streaming mode
response = Completion.create(
model="mpt-7b-instruct",
prompt="Count from 1 to 100:",
stream=True)
print(f'response.text:')
for chunk in response:
print(f'{chunk.text}', end="")
Text completion (streaming + async)
async with httpx.AsyncClient() as client:
response = await Completion.acreate(
client=client,
model="mpt-7b-instruct",
prompt="Count from 1 to 10:",
stream=True)
print(f'response.text:')
async for chunk in response:
print(f'{chunk.text}', end="")
Text completion (batching)
[!IMPORTANT]
Support max batch size of 16
response = Completion.create(
model="mpt-7b-instruct",
prompt=[
"Represent the Science title:",
"Represent the Science title:"])
print(f'response.text[0]:{response.text[0]}')
print(f'response.text[1]:{response.text[1]}')
Chat completion
from databricks_genai_inference import ChatCompletion
[!IMPORTANT]
Batching is not supported forChatCompletion
Chat completion
response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}])
print(f'response.text:{response.message:}')
Chat completion (async)
async with httpx.AsyncClient() as client:
response = await ChatCompletion.acreate(
client=client,
model="llama-2-70b-chat",
messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}],
)
print(f'response.text:{response.message:}')
Chat completion (streaming)
response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}], stream=True)
for chunk in response:
print(f'{chunk.message}', end="")
Chat completion (streaming + async)
async with httpx.AsyncClient() as client:
response = await ChatCompletion.acreate(
client=client,
model="llama-2-70b-chat",
messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}],
stream=True,
)
async for chunk in response:
print(f'{chunk.message}', end="")
Chat session
from databricks_genai_inference import ChatSession
[!IMPORTANT]
Streaming mode is not supported forChatSession
chat = ChatSession(model="llama-2-70b-chat")
chat.reply("Kock, kock!")
print(f'chat.last: {chat.last}')
chat.reply("Take a guess!")
print(f'chat.last: {chat.last}')
print(f'chat.history: {chat.history}')
print(f'chat.count: {chat.count}')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file databricks-genai-inference-0.2.3.tar.gz
.
File metadata
- Download URL: databricks-genai-inference-0.2.3.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | caf7861cc5be557bf1a580d155847556477405088e423fb5582d28e962f98167 |
|
MD5 | d7286c9e406d0e28c22766e2433de3e0 |
|
BLAKE2b-256 | c8b4dc45abcd404144f257daae8c858ba5aac90958e17e8051867cdc7009c802 |
File details
Details for the file databricks_genai_inference-0.2.3-py3-none-any.whl
.
File metadata
- Download URL: databricks_genai_inference-0.2.3-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8fe8eceafa1e5f16cb04b6a683095bdc6d7ed86776ff03b9afe46594c83fae1a |
|
MD5 | 7110f7be748189768ef5caf65b3eca91 |
|
BLAKE2b-256 | 4205df2f827acfbe433be519d3b4ff9d98a8241a606139b6353cdcb2cd511952 |