Skip to main content

llama-index llms optimum intel integration

Project description

LlamaIndex Llms Integration: Optimum Intel IPEX backend

Installation

To install the required packages, run:

%pip install llama-index-llms-optimum-intel
!pip install llama-index

Setup

Define Functions for Prompt Handling

You will need functions to convert messages and completions into prompts:

from llama_index.llms.optimum_intel import OptimumIntelLLM


def messages_to_prompt(messages):
    prompt = ""
    for message in messages:
        if message.role == "system":
            prompt += f"<|system|>\n{message.content}</s>\n"
        elif message.role == "user":
            prompt += f"<|user|>\n{message.content}</s>\n"
        elif message.role == "assistant":
            prompt += f"<|assistant|>\n{message.content}</s>\n"

    # Ensure we start with a system prompt, insert blank if needed
    if not prompt.startswith("<|system|>\n"):
        prompt = "<|system|>\n</s>\n" + prompt

    # Add final assistant prompt
    prompt = prompt + "<|assistant|>\n"

    return prompt


def completion_to_prompt(completion):
    return f"<|system|>\n</s>\n<|user|>\n{completion}</s>\n<|assistant|>\n"

Model Loading

Models can be loaded by specifying parameters using the OptimumIntelLLM method:

oi_llm = OptimumIntelLLM(
    model_name="Intel/neural-chat-7b-v3-3",
    tokenizer_name="Intel/neural-chat-7b-v3-3",
    context_window=3900,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    device_map="cpu",
)

response = oi_llm.complete("What is the meaning of life?")
print(str(response))

Streaming Responses

To use the streaming capabilities, you can use the stream_complete and stream_chat methods:

Using stream_complete

response = oi_llm.stream_complete("Who is Mother Teresa?")
for r in response:
    print(r.delta, end="")

Using stream_chat

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system",
        content="You are an American chef in a small restaurant in New Orleans",
    ),
    ChatMessage(role="user", content="What is your dish of the day?"),
]

resp = oi_llm.stream_chat(messages)

for r in resp:
    print(r.delta, end="")

LLM Implementation example

https://docs.llamaindex.ai/en/stable/examples/llm/optimum_intel/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_llms_optimum_intel-0.4.1.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_llms_optimum_intel-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_llms_optimum_intel-0.4.1.tar.gz
Algorithm Hash digest
SHA256 791ccd2a50f6f9b76e59b81986e31d9a6a43ebad758c8ec940cbacb4c621192c
MD5 2083b423ad1487a808380e91224203df
BLAKE2b-256 6a3bded548364cfcaff7efb11d8b666716bfe5e16bc9c617d4f7b55790324557

See more details on using hashes here.

File details

Details for the file llama_index_llms_optimum_intel-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_llms_optimum_intel-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7db10f652dab9c6c9adbac1cbaf6f7004de2c824e117e13aa72a161467b034ce
MD5 565c8f30ee496b10ab0ea844e2045c94
BLAKE2b-256 496518a567545418abb12284045a26830bd04e4bf01e03a8f4866bcaedb34f11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page