llama-index llms ollama integration
Project description
LlamaIndex Llms Integration: Ollama
Installation
To install the required package, run:
%pip install llama-index-llms-ollama
Setup
- Follow the Ollama README to set up and run a local Ollama instance.
- When the Ollama app is running on your local machine, it will serve all of your local models on
localhost:11434
. - Select your model when creating the
Ollama
instance by specifyingmodel=":"
. - You can increase the default timeout (30 seconds) by setting
Ollama(..., request_timeout=300.0)
. - If you set
llm = Ollama(..., model="<model family>")
without a version, it will automatically look for the latest version.
Usage
Initialize Ollama
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
Generate Completions
To generate a text completion for a prompt, use the complete
method:
resp = llm.complete("Who is Paul Graham?")
print(resp)
Chat Responses
To send a chat message and receive a response, create a list of ChatMessage
instances and use the chat
method:
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality."
),
ChatMessage(role="user", content="What is your name?"),
]
resp = llm.chat(messages)
print(resp)
Streaming Responses
Stream Complete
To stream responses for a prompt, use the stream_complete
method:
response = llm.stream_complete("Who is Paul Graham?")
for r in response:
print(r.delta, end="")
Stream Chat
To stream chat responses, use the stream_chat
method:
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality."
),
ChatMessage(role="user", content="What is your name?"),
]
resp = llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
JSON Mode
Ollama supports a JSON mode to ensure all responses are valid JSON, which is useful for tools that need to parse structured outputs:
llm = Ollama(model="llama3.1:latest", request_timeout=120.0, json_mode=True)
response = llm.complete(
"Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))
Structured Outputs
You can attach a Pydantic class to the LLM to ensure structured outputs:
from llama_index.core.bridge.pydantic import BaseModel
from llama_index.core.tools import FunctionTool
class Song(BaseModel):
"""A song with name and artist."""
name: str
artist: str
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
sllm = llm.as_structured_llm(Song)
response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(
response.message.content
) # e.g., {"name": "Yesterday", "artist": "The Beatles"}
Asynchronous Chat
You can also use asynchronous chat:
response = await sllm.achat(
[ChatMessage(role="user", content="Name a random song!")]
)
print(response.message.content)
LLM Implementation example
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_llms_ollama-0.3.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56df6e1887081af4522f9ec32be2d4632d8d4f2619f72aa71ea60966231e54a0 |
|
MD5 | b4519e522ce9d3a2bc0c595ffa562033 |
|
BLAKE2b-256 | 9092e7378d2af5ad8f19d16025d050161a44ef1d380ebfa313752f7fb8e36424 |
Close
Hashes for llama_index_llms_ollama-0.3.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c144551c59decd2a993eccde87e451fe61843e31480ebf541951581b43c08e4 |
|
MD5 | 7f7fd43478b056c6d2f9f4a6388abd39 |
|
BLAKE2b-256 | 3883e8fc5234c19900a1bcca5dd64b98cb79a11f92526a8c741263e24d1279a7 |