Interact with the Databricks Foundation Model API from python

These details have not been verified by PyPI

Project links

Homepage

Project description

Databricks Generative AI Inference SDK (Beta)

The Databricks Generative AI Inference Python library provides a user-friendly python interface to use the Databricks Foundation Model API.

[!NOTE] This SDK was primarily designed for pay-per-token endpoints (databricks-*). It has a list of known model names (eg. dbrx-instruct) and automatically maps them to the corresponding shared endpoint (databricks-dbrx-instruct). You can use this with provisioned throughput endpoints, as long as they do not match known model names. If there is an overlap, you can use the DATABRICKS_MODEL_URL_ENV URL to directly provide an endpoint URL.

This library includes a pre-defined set of API classes Embedding, Completion, ChatCompletion with convenient functions to make API request, and to parse contents from raw json response.

We also offer a high level ChatSession object for easy management of multi-round chat completions, which is especially useful for your next chatbot development.

You can find more usage details in our SDK onboarding doc.

[!IMPORTANT]
We're preparing to release version 1.0 of the Databricks GenerativeAI Inference Python library.

Installation

pip install databricks-genai-inference

Usage

Embedding

from databricks_genai_inference import Embedding

Text embedding

response = Embedding.create(
    model="bge-large-en", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')

[!TIP]
You may want to reuse http connection to improve request latency for large-scale workload, code example:

with requests.Session() as client:
    for i, text in enumerate(texts):
        response = Embedding.create(
            client=client,
            model="bge-large-en",
            input=text
        )

Text embedding (async)

async with httpx.AsyncClient() as client:
    response = await Embedding.acreate(
        client=client,
        model="bge-large-en", 
        input="3D ActionSLAM: wearable person tracking in multi-floor environments")
    print(f'embeddings: {response.embeddings[0]}')

Text embedding with instruction

response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')

Text embedding (batching)

[!IMPORTANT]
Support max batch size of 150

response = Embedding.create(
    model="bge-large-en", 
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')

Text embedding with instruction (batching)

[!IMPORTANT]
Support one instruction per batch Batch size

response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:",
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')

Text completion

from databricks_genai_inference import Completion

Text completion

response = Completion.create(
    model="mpt-7b-instruct",
    prompt="Represent the Science title:")
print(f'response.text:{response.text:}')

Text completion (async)

async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct",
        prompt="Represent the Science title:")
    print(f'response.text:{response.text:}')

Text completion (streaming)

[!IMPORTANT]
Only support batch size = 1 in streaming mode

response = Completion.create(
    model="mpt-7b-instruct", 
    prompt="Count from 1 to 100:",
    stream=True)
print(f'response.text:')
for chunk in response:
    print(f'{chunk.text}', end="")

Text completion (streaming + async)

async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct", 
        prompt="Count from 1 to 10:",
        stream=True)
    print(f'response.text:')
    async for chunk in response:
        print(f'{chunk.text}', end="")

Text completion (batching)

[!IMPORTANT]
Support max batch size of 16

response = Completion.create(
    model="mpt-7b-instruct", 
    prompt=[
        "Represent the Science title:", 
        "Represent the Science title:"])
print(f'response.text[0]:{response.text[0]}')
print(f'response.text[1]:{response.text[1]}')

Chat completion

from databricks_genai_inference import ChatCompletion

[!IMPORTANT]
Batching is not supported for ChatCompletion

Chat completion

response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}])
print(f'response.text:{response.message:}')

Chat completion (async)

async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}],
    )
    print(f'response.text:{response.message:}')

Chat completion (streaming)

response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}], stream=True)
for chunk in response:
    print(f'{chunk.message}', end="")

Chat completion (streaming + async)

async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}],
        stream=True,
    )
    async for chunk in response:
        print(f'{chunk.message}', end="")

Chat session

from databricks_genai_inference import ChatSession

[!IMPORTANT]
Streaming mode is not supported for ChatSession

chat = ChatSession(model="llama-2-70b-chat")
chat.reply("Kock, kock!")
print(f'chat.last: {chat.last}')
chat.reply("Take a guess!")
print(f'chat.last: {chat.last}')

print(f'chat.history: {chat.history}')
print(f'chat.count: {chat.count}')

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0.2

Apr 18, 2025

2.0.1

Apr 18, 2025

This version

2.0.0

Apr 18, 2025

0.2.3

Apr 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wfork_databricks_genai_inference-2.0.0.tar.gz (27.6 kB view details)

Uploaded Apr 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wfork_databricks_genai_inference-2.0.0-py3-none-any.whl (18.1 kB view details)

Uploaded Apr 18, 2025 Python 3

File details

Details for the file wfork_databricks_genai_inference-2.0.0.tar.gz.

File metadata

Download URL: wfork_databricks_genai_inference-2.0.0.tar.gz
Upload date: Apr 18, 2025
Size: 27.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for wfork_databricks_genai_inference-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`0eb7f5f51cb7fb98771faeea3b737a982dbb3ee45db7c990ff4e12857cf8d252`
MD5	`450926904e6dacac5eb5b332e7799239`
BLAKE2b-256	`c0bf95e532254a1ad06670183c9d9f0576912ac6a0db5734f5698b2e44dfad98`

See more details on using hashes here.

File details

Details for the file wfork_databricks_genai_inference-2.0.0-py3-none-any.whl.

File metadata

Download URL: wfork_databricks_genai_inference-2.0.0-py3-none-any.whl
Upload date: Apr 18, 2025
Size: 18.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for wfork_databricks_genai_inference-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e21a249b0fdf5967fb4e0bc1457f43a7e0c9d8cffaf3ebf6b7fc924cae936dc8`
MD5	`7551348f220d80be073ac0e03c10ce41`
BLAKE2b-256	`2d64f610fe007fd4e0a1db4fa6df0854e6285429195898ee0c19e04895585118`

See more details on using hashes here.

wfork-databricks-genai-inference 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Databricks Generative AI Inference SDK (Beta)

Installation

Usage

Embedding

Text embedding

Text embedding (async)

Text embedding with instruction

Text embedding (batching)

Text embedding with instruction (batching)

Text completion

Text completion

Text completion (async)

Text completion (streaming)

Text completion (streaming + async)

Text completion (batching)

Chat completion

Chat completion

Chat completion (async)

Chat completion (streaming)

Chat completion (streaming + async)

Chat session

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes