Interact with the Databricks Foundation Model API from python

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Databricks Generative AI Inference SDK (Beta)

The Databricks Generative AI Inference Python library provides a user-friendly python interface to use the Databricks Foundation Model API.

It includes a pre-defined set of API classes Embedding, Completion, ChatCompletion with convenient functions to make API request, and to parse contents from raw json response.

We also offer a high level ChatSession object for easy management of multi-round chat completions, which is especially useful for your next chatbot development.

You can find more usage details in our SDK onboarding doc.

[!IMPORTANT]
We're preparing to release version 1.0 of the Databricks GenerativeAI Inference Python library.

Installation

pip install databricks-genai-inference

Usage

Embedding

from databricks_genai_inference import Embedding

Text embedding

response = Embedding.create(
    model="bge-large-en", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')

[!TIP]
You may want to reuse http connecition to improve request latency for large-scale workload, code example:

with requests.Session() as client:
    for i, text in enumerate(texts):
        response = Embedding.create(
            client=client,
            model="bge-large-en",
            input=text
        )

Text embedding (async)

async with httpx.AsyncClient() as client:
    response = await Embedding.acreate(
        client=client,
        model="bge-large-en", 
        input="3D ActionSLAM: wearable person tracking in multi-floor environments")
    print(f'embeddings: {response.embeddings[0]}')

Text embedding with instruction

response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:", 
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings[0]}')

Text embedding (batching)

[!IMPORTANT]
Support max batch size of 150

response = Embedding.create(
    model="bge-large-en", 
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')

Text embedding with instruction (batching)

[!IMPORTANT]
Support one instruction per batch Batch size

response = Embedding.create(
    model="bge-large-en", 
    instruction="Represent this sentence for searching relevant passages:",
    input=[
        "3D ActionSLAM: wearable person tracking in multi-floor environments",
        "3D ActionSLAM: wearable person tracking in multi-floor environments"])
print(f'response.embeddings[0]: {response.embeddings[0]}\n')
print(f'response.embeddings[1]: {response.embeddings[1]}')

Text completion

from databricks_genai_inference import Completion

Text completion

response = Completion.create(
    model="mpt-7b-instruct",
    prompt="Represent the Science title:")
print(f'response.text:{response.text:}')

Text completion (async)

async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct",
        prompt="Represent the Science title:")
    print(f'response.text:{response.text:}')

Text completion (streaming)

[!IMPORTANT]
Only support batch size = 1 in streaming mode

response = Completion.create(
    model="mpt-7b-instruct", 
    prompt="Count from 1 to 100:",
    stream=True)
print(f'response.text:')
for chunk in response:
    print(f'{chunk.text}', end="")

Text completion (streaming + async)

async with httpx.AsyncClient() as client:
    response = await Completion.acreate(
        client=client,
        model="mpt-7b-instruct", 
        prompt="Count from 1 to 10:",
        stream=True)
    print(f'response.text:')
    async for chunk in response:
        print(f'{chunk.text}', end="")

Text completion (batching)

[!IMPORTANT]
Support max batch size of 16

response = Completion.create(
    model="mpt-7b-instruct", 
    prompt=[
        "Represent the Science title:", 
        "Represent the Science title:"])
print(f'response.text[0]:{response.text[0]}')
print(f'response.text[1]:{response.text[1]}')

Chat completion

from databricks_genai_inference import ChatCompletion

[!IMPORTANT]
Batching is not supported for ChatCompletion

Chat completion

response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}])
print(f'response.text:{response.message:}')

Chat completion (async)

async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Knock knock."}],
    )
    print(f'response.text:{response.message:}')

Chat completion (streaming)

response = ChatCompletion.create(model="llama-2-70b-chat", messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}], stream=True)
for chunk in response:
    print(f'{chunk.message}', end="")

Chat completion (streaming + async)

async with httpx.AsyncClient() as client:
    response = await ChatCompletion.acreate(
        client=client,
        model="llama-2-70b-chat",
        messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Count from 1 to 30, add one emoji after each number"}],
        stream=True,
    )
    async for chunk in response:
        print(f'{chunk.message}', end="")

Chat session

from databricks_genai_inference import ChatSession

[!IMPORTANT]
Streaming mode is not supported for ChatSession

chat = ChatSession(model="llama-2-70b-chat")
chat.reply("Kock, kock!")
print(f'chat.last: {chat.last}')
chat.reply("Take a guess!")
print(f'chat.last: {chat.last}')

print(f'chat.history: {chat.history}')
print(f'chat.count: {chat.count}')

Project details

These details have not been verified by PyPI

Project links

Homepage

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.3

Mar 27, 2024

0.2.2

Mar 21, 2024

0.2.1

Feb 17, 2024

This version

0.2.0

Feb 13, 2024

0.1.3

Dec 20, 2023

0.1.2

Dec 7, 2023

0.1.1

Nov 13, 2023

0.1.0

Nov 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks-genai-inference-0.2.0.tar.gz (26.3 kB view hashes)

Uploaded Feb 13, 2024 Source

Built Distribution

databricks_genai_inference-0.2.0-py3-none-any.whl (17.6 kB view hashes)

Uploaded Feb 13, 2024 Python 3

Hashes for databricks-genai-inference-0.2.0.tar.gz

Hashes for databricks-genai-inference-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d931c14b94bd107200ca444e73b21479a7eb812a0dee91ebf5a94ee21cf66b67`
MD5	`2a4ecb480f72536083a93cf3e9cbf668`
BLAKE2b-256	`6ccbe40c516cee7717d1d1c5d4c3b8204293036db2c6ba473d19351cf37ca700`

Hashes for databricks_genai_inference-0.2.0-py3-none-any.whl

Hashes for databricks_genai_inference-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b2c0d30f674ea965bc60b490f77cec74a3d97d16766e3ec058072656e36d52f1`
MD5	`5ccd5172d0b2125ec923964331f11579`
BLAKE2b-256	`365b136bbd2a3091d50733744ffdf26f66444805d7bd5ccbc748f16efc2389a4`