llama-index llms optimum intel integration
Project description
LlamaIndex Llms Integration: Optimum Intel IPEX backend
Installation
To install the required packages, run:
%pip install llama-index-llms-optimum-intel
!pip install llama-index
Setup
Define Functions for Prompt Handling
You will need functions to convert messages and completions into prompts:
from llama_index.llms.optimum_intel import OptimumIntelLLM
def messages_to_prompt(messages):
prompt = ""
for message in messages:
if message.role == "system":
prompt += f"<|system|>\n{message.content}</s>\n"
elif message.role == "user":
prompt += f"<|user|>\n{message.content}</s>\n"
elif message.role == "assistant":
prompt += f"<|assistant|>\n{message.content}</s>\n"
# Ensure we start with a system prompt, insert blank if needed
if not prompt.startswith("<|system|>\n"):
prompt = "<|system|>\n</s>\n" + prompt
# Add final assistant prompt
prompt = prompt + "<|assistant|>\n"
return prompt
def completion_to_prompt(completion):
return f"<|system|>\n</s>\n<|user|>\n{completion}</s>\n<|assistant|>\n"
Model Loading
Models can be loaded by specifying parameters using the OptimumIntelLLM method:
oi_llm = OptimumIntelLLM(
model_name="Intel/neural-chat-7b-v3-3",
tokenizer_name="Intel/neural-chat-7b-v3-3",
context_window=3900,
max_new_tokens=256,
generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
device_map="cpu",
)
response = oi_llm.complete("What is the meaning of life?")
print(str(response))
Streaming Responses
To use the streaming capabilities, you can use the stream_complete and stream_chat methods:
Using stream_complete
response = oi_llm.stream_complete("Who is Mother Teresa?")
for r in response:
print(r.delta, end="")
Using stream_chat
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system",
content="You are an American chef in a small restaurant in New Orleans",
),
ChatMessage(role="user", content="What is your dish of the day?"),
]
resp = oi_llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
LLM Implementation example
https://docs.llamaindex.ai/en/stable/examples/llm/optimum_intel/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_llms_optimum_intel-0.4.1.tar.gz.
File metadata
- Download URL: llama_index_llms_optimum_intel-0.4.1.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
791ccd2a50f6f9b76e59b81986e31d9a6a43ebad758c8ec940cbacb4c621192c
|
|
| MD5 |
2083b423ad1487a808380e91224203df
|
|
| BLAKE2b-256 |
6a3bded548364cfcaff7efb11d8b666716bfe5e16bc9c617d4f7b55790324557
|
File details
Details for the file llama_index_llms_optimum_intel-0.4.1-py3-none-any.whl.
File metadata
- Download URL: llama_index_llms_optimum_intel-0.4.1-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7db10f652dab9c6c9adbac1cbaf6f7004de2c824e117e13aa72a161467b034ce
|
|
| MD5 |
565c8f30ee496b10ab0ea844e2045c94
|
|
| BLAKE2b-256 |
496518a567545418abb12284045a26830bd04e4bf01e03a8f4866bcaedb34f11
|