llama-index llms optimum intel integration
Project description
LlamaIndex Llms Integration: Optimum Intel IPEX backend
Installation
To install the required packages, run:
%pip install llama-index-llms-optimum-intel
!pip install llama-index
Setup
Define Functions for Prompt Handling
You will need functions to convert messages and completions into prompts:
from llama_index.llms.optimum_intel import OptimumIntelLLM
def messages_to_prompt(messages):
prompt = ""
for message in messages:
if message.role == "system":
prompt += f"<|system|>\n{message.content}</s>\n"
elif message.role == "user":
prompt += f"<|user|>\n{message.content}</s>\n"
elif message.role == "assistant":
prompt += f"<|assistant|>\n{message.content}</s>\n"
# Ensure we start with a system prompt, insert blank if needed
if not prompt.startswith("<|system|>\n"):
prompt = "<|system|>\n</s>\n" + prompt
# Add final assistant prompt
prompt = prompt + "<|assistant|>\n"
return prompt
def completion_to_prompt(completion):
return f"<|system|>\n</s>\n<|user|>\n{completion}</s>\n<|assistant|>\n"
Model Loading
Models can be loaded by specifying parameters using the OptimumIntelLLM
method:
oi_llm = OptimumIntelLLM(
model_name="Intel/neural-chat-7b-v3-3",
tokenizer_name="Intel/neural-chat-7b-v3-3",
context_window=3900,
max_new_tokens=256,
generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
device_map="cpu",
)
response = oi_llm.complete("What is the meaning of life?")
print(str(response))
Streaming Responses
To use the streaming capabilities, you can use the stream_complete
and stream_chat
methods:
Using stream_complete
response = oi_llm.stream_complete("Who is Mother Teresa?")
for r in response:
print(r.delta, end="")
Using stream_chat
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system",
content="You are an American chef in a small restaurant in New Orleans",
),
ChatMessage(role="user", content="What is your dish of the day?"),
]
resp = oi_llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
LLM Implementation example
https://docs.llamaindex.ai/en/stable/examples/llm/optimum_intel/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_llms_optimum_intel-0.2.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e441c4f466a9422b7d79fdbb7f71574c388e429c14d4023a71fb1fd73fe5c70 |
|
MD5 | 1f6203be3f2a2628297810edeccd8b9c |
|
BLAKE2b-256 | f81ada74d5f56101a397d618046a92a65e1061252b5d5f374ec0c48bbe12c053 |
Close
Hashes for llama_index_llms_optimum_intel-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e675db79b821b4b0aa1ce695da91aba6cf231acfad5a7d248fa29ca406f6c68e |
|
MD5 | 33c7d56cc439fdebf34ba2fbd0261e2f |
|
BLAKE2b-256 | b16489662dea5081e0744e76085239962ece714abe880a40ddcc830618baccba |