Skip to main content

Interact with the Databricks Foundation Model API from python

Project description

Databricks Generative AI Inference SDK (Beta)

PyPI version

The Databricks Generative AI Inference Python library provides a user-friendly python interface to use the Databricks Foundation Model API.

It includes a pre-defined set of API classes Embedding, Completion, ChatCompletion with convenient functions to make API request, and to parse contents from raw json response.

We also offer a high level ChatSession object for easy management of multi-round chat completions, which is especially useful for your next chatbot development.

You can find more usage details in our SDK onboarding doc.

[!IMPORTANT]
We're preparing to release version 1.0 of the Databricks GenerativeAI Inference Python library.

Installation

pip install databricks-genai-inference

Usage

Embedding

from databricks_genai_inference import Embedding

Text embedding

response = Embedding.create(
    model="bge-large-en", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')

[!TIP]
You may want to reuse http connection to improve request latency for large-scale workload, code example:

with requests.Session() as client:
    for i, text in enumerate(texts):
        response = Embedding.create(
            client=client,
            model="bge-large-en",
            input=text
        )

Text embedding (async)

async with httpx.AsyncClient() as client:
    response = await Embedding.acreate(
        client=client,
        model="bge-large-en", 
        input="3D ActionSLAM: wearable person tracking in multi-floor environments")
    print(f'embeddings: {response.embeddings[0]}')

Text embedding with instruction

response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')

Text embedding (batching)

[!IMPORTANT]
Support max batch size of 150

response = Embedding.create(
    model="bge-large-en", 
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')

Text embedding with instruction (batching)

[!IMPORTANT]
Support one instruction per batch Batch size

response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:",
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')

Text completion

from databricks_genai_inference import Completion

Text completion

response = Completion.create(
    model="mpt-7b-instruct",
    prompt="Represent the Science title:")
print(f'response.text:{response.text:}')

Text completion (async)

async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct",
        prompt="Represent the Science title:")
    print(f'response.text:{response.text:}')

Text completion (streaming)

[!IMPORTANT]
Only support batch size = 1 in streaming mode

response = Completion.create(
    model="mpt-7b-instruct", 
    prompt="Count from 1 to 100:",
    stream=True)
print(f'response.text:')
for chunk in response:
    print(f'{chunk.text}', end="")

Text completion (streaming + async)

async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct", 
        prompt="Count from 1 to 10:",
        stream=True)
    print(f'response.text:')
    async for chunk in response:
        print(f'{chunk.text}', end="")

Text completion (batching)

[!IMPORTANT]
Support max batch size of 16

response = Completion.create(
    model="mpt-7b-instruct", 
    prompt=[
        "Represent the Science title:", 
        "Represent the Science title:"])
print(f'response.text[0]:{response.text[0]}')
print(f'response.text[1]:{response.text[1]}')

Chat completion

from databricks_genai_inference import ChatCompletion

[!IMPORTANT]
Batching is not supported for ChatCompletion

Chat completion

response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}])
print(f'response.text:{response.message:}')

Chat completion (async)

async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}],
    )
    print(f'response.text:{response.message:}')

Chat completion (streaming)

response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}], stream=True)
for chunk in response:
    print(f'{chunk.message}', end="")

Chat completion (streaming + async)

async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}],
        stream=True,
    )
    async for chunk in response:
        print(f'{chunk.message}', end="")

Chat session

from databricks_genai_inference import ChatSession

[!IMPORTANT]
Streaming mode is not supported for ChatSession

chat = ChatSession(model="llama-2-70b-chat")
chat.reply("Kock, kock!")
print(f'chat.last: {chat.last}')
chat.reply("Take a guess!")
print(f'chat.last: {chat.last}')

print(f'chat.history: {chat.history}')
print(f'chat.count: {chat.count}')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-genai-inference-0.2.2.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

databricks_genai_inference-0.2.2-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file databricks-genai-inference-0.2.2.tar.gz.

File metadata

File hashes

Hashes for databricks-genai-inference-0.2.2.tar.gz
Algorithm Hash digest
SHA256 d0788f7a4a6d97d3cf889157f194f0069af5387991e25149a2d42db6bee5ad66
MD5 b06c520cfb40b08a8c6c58c44b54508d
BLAKE2b-256 c82731791be81f149bc476e8c5c71eb3b131fefb84e8807fd1301fb6ee32797e

See more details on using hashes here.

File details

Details for the file databricks_genai_inference-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_genai_inference-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 868c586c8b65e36fe587c9a35717bce20d54df5ff7fd215775f68fd36c9382ae
MD5 57219d0bbc0b7d207c4a192e449fdf4a
BLAKE2b-256 a3ec073847f3909a3395ca1a88401ae8b2cb0a9409c98244e810e659f0b58357

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page