A framework for building LLM based AI agents with llama-cpp-python.
Project description
llama-cpp-agent Framework
Introduction
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). It provides a simple yet robust interface using llama-cpp-python, allowing users to chat with LLM models, execute structured function calls and get structured output.
Key Features
- Simple Chat Interface: Engage in seamless conversations with LLMs.
- Structured Output: Get structured output from LLMs.
- Function Calling: Execute structured outputs from LLMs, enhancing the interaction capabilities.
- Flexibility: Suited for various applications from casual chatting to specific function executions.
Installation
The llama-cpp-agent framework can be installed using pip:
pip install llama-cpp-agent
Usage
The llama-cpp-agent framework is designed to be easy to use. The following sections will guide you through the process of using the framework.
Chat usage
To chat with an LLM model, you need to create an instance of the LlamaCppAgent
class. The constructor takes the following parameters:
main_model
: The LLM model to use for the chat. This is an instance of theLlama
class from the llama-cpp-python library.name
: The name of the agent. Defaults tollamacpp_agent
.system_prompt
: The system prompt to use for the chat. Defaults toYou are a helpful assistant.
.predefined_messages_formatter_type
: The type of predefined messages formatter to use. Defaults toMessagesFormatterType.CHATML
.debug_output
: Whether to print debug output to the console. Defaults toFalse
.
Predefined Messages Formatter
The llama-cpp-agent framework uses custom messages formatters to format messages for the LLM model. The MessagesFormatterType
enum defines the available predefined formatters. The following predefined formatters are available:
MessagesFormatterType.CHATML
: Formats messages using the CHATML format.MessagesFormatterType.MIXTRAL
: Formats messages using the MIXTRAL format.MessagesFormatterType.VICUNA
: Formats messages using the VICUNA format.MessagesFormatterType.LLAMA_2
: Formats messages using the LLAMA 2 format.MessagesFormatterType.SYNTHIA
: Formats messages using the SYNTHIA format.MessagesFormatterType.NEURAL_CHAT
: Formats messages using the NEURAL CHAT format.MessagesFormatterType.SOLAR
: Formats messages using the SOLAR format.MessagesFormatterType.OPEN_CHAT
: Formats messages using the OPEN CHAT format.
You can also define your own custom messages formatter by creating an instance of the MessagesFormatter
class.
The MessagesFormatter
class takes the following parameters:
PRE_PROMPT
: The pre-prompt to use for the messages.SYS_PROMPT_START
: The system prompt start to use for the messages.SYS_PROMPT_END
: The system prompt end to use for the messages.USER_PROMPT_START
: The user prompt start to use for the messages.USER_PROMPT_END
: The user prompt end to use for the messages.ASSISTANT_PROMPT_START
: The assistant prompt start to use for the messages.ASSISTANT_PROMPT_END
: The assistant prompt end to use for the messages.INCLUDE_SYS_PROMPT_IN_MESSAGE
: Whether to include the system prompt in the message.DEFAULT_STOP_SEQUENCES
: The default stop sequences to use for the messages.
After creating an instance of the MessagesFormatter
class, you can use it by setting the messages_formatter
of the LlamaCppAgent
instance to the instance of the MessagesFormatter
class.
Chatting
To chat with the LLM model, you can use the get_chat_response
method of the LlamaCppAgent
class. The get_chat_response
method takes the following parameters:
message
: The message to send to the LLM model. Defaults toNone
.role
: The role of the message. Defaults touser
.system_prompt
: A override for the system prompt. Defaults toNone
and uses the agent system prompt passed at creation.grammar
: The grammar to use for constraining the LLM response. Defaults toNone
.function_tool_registry
: The function tool registry to use for the chat. Defaults toNone
.max_tokens
: The maximum number of tokens to use for the chat. Defaults to0
.temperature
: The temperature to use for the chat. Defaults to0.4
.top_k
: The top k to use for the chat. Defaults to0
.top_p
: The top p to use for the chat. Defaults to1.0
.min_p
: The min p to use for the chat. Defaults to0.05
.typical_p
: The typical p to use for the chat. Defaults to1.0
.repeat_penalty
: The repeat penalty to use for the chat. Defaults to1.0
.mirostat_mode
: The mirostat mode to use for the chat. Defaults to0
.mirostat_tau
: The mirostat tau to use for the chat. Defaults to5.0
.mirostat_eta
: The mirostat eta to use for the chat. Defaults to0.1
.tfs_z
: The tfs z to use for the chat. Defaults to1.0
.stop_sequences
: The stop sequences to use for the chat. Defaults toNone
.stream
: Whether to stream the chat. Defaults toTrue
.k_last_messages
: The k last messages to use for the chat. Defaults to-1
which takes all messages in the chat history.add_response_to_chat_history
: Whether to add the response to the chat history. Defaults toTrue
.add_message_to_chat_history
: Whether to add the message to the chat history. Defaults toTrue
.print_output
: Whether to print the output. Defaults toTrue
.
Structured Output Usage
To get structured output from an LLM model, you can use an instance of the StructuredOutputAgent
class. The constructor takes the following parameters:
main_model
: The LLM model to use for the structured output. This is an instance of theLlama
class from the llama-cpp-python library.messages_formatter_type
: The type of messages formatter to use. Defaults toMessagesFormatterType.CHATML
.debug_output
: Whether to print debug output to the console. Defaults toFalse
.
To set a custom messages formatter, you can use the llama_cpp_agent.messages_formatter
property of the StructuredOutputAgent
class.
Structured Output
To create structured output from the LLM model, you can use the create_object
method of the StructuredOutputAgent
class. The create_object
method takes the following parameters:
cls
: The pydantic class used for creating the structured output.data
: The data to use for the structured output. Defaults toNone
which creates a random object of cls class.
This will return an instance of the pydantic class.
Function Calling Usage
To utilize function calling with an LLM model, you can use the get_chat_response method of a LlamaCppAgent
with a function_tool_registry
. The function_tool_registry
is an instance of the LlamaCppFunctionToolRegistry
class. You can create a LlamaCppFunctionToolRegistry
instance by passing a list of LlamaCppFunctionTool
instances to the static get_function_tool_registry
method of the LlamaCppAgent
class. The LlamaCppFunctionTool
class takes the following parameters:
model
: The pydantic class defining the function call, it must have arun
method to actually execute the function call. You can also convert Python functions with type hints, automatically to pydantic models using the function:create_dynamic_model_from_function
under:llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models
has_markdown_code_block
: Whether the model has amarkdown_code_block
field. Defaults toFalse
. Amarkdown_code_block
field is a special field used to allow the LLM to write relatively unconstrained output by letting it write themarkdown_code_block
as a Markdown code block. Which is useful for file writing.has_triple_quoted_string
: Whether the model has atriple_quoted_string
field. Defaults toFalse
. Atriple_quoted_string
field is a special field used to allow the LLM to write relatively unconstrained output by letting it write thetriple_quoted_string
as a triple quoted string. Which is useful for file writing. After passing the list ofLlamaCppFunctionTool
instances to theget_function_tool_registry
method, you can use the returnedLlamaCppFunctionToolRegistry
instance as thefunction_tool_registry
parameter of theget_chat_response
method of theLlamaCppAgent
class.
Usage Examples
The following examples demonstrate the usage of the llama-cpp-agent framework.
You can find a lot more examples in the examples
folder.
Simple Chat Example
This example demonstrates how to initiate a chat with an LLM model.
from llama_cpp import Llama
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.messages_formatter import MessagesFormatterType
main_model = Llama(
"../gguf-models/dolphin-2.7-mixtral-8x7b.Q4_K_M.gguf",
n_gpu_layers=13,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=False,
system_prompt="You are Dolphin, an uncensored and unbiased AI assistant.", predefined_messages_formatter_type=MessagesFormatterType.CHATML)
while True:
user_input = input("User: ")
if user_input == "exit":
break
user_input = llama_cpp_agent.get_chat_response(user_input, temperature=0.7)
print("AI: " + user_input)
Structured Output
This example shows how to get structured JSON output using the StructureOutputAgent class.
# Example agent that uses the StructuredOutputAgent class to create a dataset entry of a book out of unstructured data.
from enum import Enum
from llama_cpp import Llama
from pydantic import BaseModel, Field
from llama_cpp_agent.structured_output_agent import StructuredOutputAgent
# Example enum for our output model
class Category(Enum):
Fiction = "Fiction"
NonFiction = "Non-Fiction"
# Example output model
class Book(BaseModel):
"""
Represents an entry about a book.
"""
title: str = Field(..., description="Title of the book.")
author: str = Field(..., description="Author of the book.")
published_year: int = Field(..., description="Publishing year of the book.")
keywords: list[str] = Field(..., description="A list of keywords.")
category: Category = Field(..., description="Category of the book.")
summary: str = Field(..., description="Summary of the book.")
main_model = Llama(
"../gguf-models/nous-hermes-2-solar-10.7b.Q6_K.gguf",
n_gpu_layers=49,
offload_kqv=True,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=4096,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
structured_output_agent = StructuredOutputAgent(main_model, debug_output=True)
text = """The Feynman Lectures on Physics is a physics textbook based on some lectures by Richard Feynman, a Nobel laureate who has sometimes been called "The Great Explainer". The lectures were presented before undergraduate students at the California Institute of Technology (Caltech), during 1961–1963. The book's co-authors are Feynman, Robert B. Leighton, and Matthew Sands."""
print(structured_output_agent.create_object(Book, text))
Example output
{ "title": "The Feynman Lectures on Physics" , "author": "Richard Feynman, Robert B. Leighton, Matthew Sands" , "published_year": 1963 , "keywords": [ "physics" , "textbook" , "Nobel laureate" , "The Great Explainer" , "California Institute of Technology" , "undergraduate" , "lectures" ] , "category": "Non-Fiction" , "summary": "The Feynman Lectures on Physics is a physics textbook based on lectures by Nobel laureate Richard Feynman, known as 'The Great Explainer'. The lectures were presented to undergraduate students at Caltech between 1961 and 1963. Co-authors of the book are Feynman, Robert B. Leighton, and Matthew Sands." }
title='The Feynman Lectures on Physics' author='Richard Feynman, Robert B. Leighton, Matthew Sands' published_year=1963 keywords=['physics', 'textbook', 'Nobel laureate', 'The Great Explainer', 'California Institute of Technology', 'undergraduate', 'lectures'] category=<Category.NonFiction: 'Non-Fiction'> summary="The Feynman Lectures on Physics is a physics textbook based on lectures by Nobel laureate Richard Feynman, known as 'The Great Explainer'. The lectures were presented to undergraduate students at Caltech between 1961 and 1963. Co-authors of the book are Feynman, Robert B. Leighton, and Matthew Sands."
Function Calling Example
This example shows how to do function calling pydantic models.
You can also convert Python functions with type hints, automatically to pydantic models using the function:
create_dynamic_model_from_function
under: llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models
from enum import Enum
from llama_cpp import Llama
from pydantic import BaseModel, Field
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.messages_formatter import MessagesFormatterType
from llama_cpp_agent.function_call_tools import LlamaCppFunctionTool
# Simple calculator tool for the agent that can add, subtract, multiply, and divide.
class MathOperation(Enum):
ADD = "add"
SUBTRACT = "subtract"
MULTIPLY = "multiply"
DIVIDE = "divide"
class Calculator(BaseModel):
"""
Perform a math operation on two numbers.
"""
number_one: float = Field(..., description="First number.", max_precision=5, min_precision=2)
operation: MathOperation = Field(..., description="Math operation to perform.")
number_two: float = Field(..., description="Second number.", max_precision=5, min_precision=2)
def run(self):
if self.operation == MathOperation.ADD:
return self.number_one + self.number_two
elif self.operation == MathOperation.SUBTRACT:
return self.number_one - self.number_two
elif self.operation == MathOperation.MULTIPLY:
return self.number_one * self.number_two
elif self.operation == MathOperation.DIVIDE:
return self.number_one / self.number_two
else:
raise ValueError("Unknown operation.")
function_tools = [LlamaCppFunctionTool(Calculator)]
function_tool_registry = LlamaCppAgent.get_function_tool_registry(function_tools)
main_model = Llama(
"../gguf-models/dolphin-2.6-mistral-7b-Q8_0.gguf",
n_gpu_layers=35,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=False,
system_prompt="You are an advanced AI, tasked to assist the user by calling functions in JSON format.\n\n\n" + function_tool_registry.get_documentation(),
predefined_messages_formatter_type=MessagesFormatterType.CHATML)
user_input = 'What is 42 * 42?'
print(llama_cpp_agent.get_chat_response(user_input, temperature=0.45, function_tool_registry=function_tool_registry))
Example output
{ "function": "calculator","function_parameters": { "number_one": 42.00000 , "operation": "multiply" , "number_two": 42.00000 }}
1764.0
Function Calling with Python Function Example
This example shows how to do function calling using actual Python functions.
from llama_cpp import Llama
from typing import Union
import math
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.messages_formatter import MessagesFormatterType
from llama_cpp_agent.function_call_tools import LlamaCppFunctionTool
from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import create_dynamic_model_from_function
def calculate_a_to_the_power_b(a: Union[int | float], b: Union[int | float]):
print(f"Result: {math.pow(a, b)}")
DynamicSampleModel = create_dynamic_model_from_function(calculate_a_to_the_power_b)
function_tools = [LlamaCppFunctionTool(DynamicSampleModel)]
function_tool_registry = LlamaCppAgent.get_function_tool_registry(function_tools)
main_model = Llama(
"../gguf-models/dolphin-2.6-mistral-7b-Q8_0.gguf",
n_gpu_layers=35,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=False,
system_prompt="You are an advanced AI, tasked to assist the user by calling functions in JSON format.\n\n\n" + function_tool_registry.get_documentation(),
predefined_messages_formatter_type=MessagesFormatterType.CHATML)
user_input = "a= 5, b = 42"
print(llama_cpp_agent.get_chat_response(user_input, temperature=0.45, function_tool_registry=function_tool_registry))
Example output
{ "function": "calculate-a-to-the-power-b","function_parameters": { "a": 5 , "b": 42 }}
Result: 2.2737367544323207e+29
Knowledge Graph Creation Example
This example, based on an example of the Instructor library for OpenAI, demonstrates how to create a knowledge graph using the llama-cpp-agent framework.
import json
from typing import List
from enum import Enum
from llama_cpp import Llama, LlamaGrammar
from pydantic import BaseModel, Field
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import generate_gbnf_grammar_and_documentation
main_model = Llama(
"../gguf-models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf",
n_gpu_layers=13,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=True,
seed=42,
)
class Node(BaseModel):
id: int
label: str
color: str
class Edge(BaseModel):
source: int
target: int
label: str
color: str = "black"
class KnowledgeGraph(BaseModel):
nodes: List[Node] = Field(..., default_factory=list)
edges: List[Edge] = Field(..., default_factory=list)
gbnf_grammar, documentation = generate_gbnf_grammar_and_documentation([KnowledgeGraph],False)
print(gbnf_grammar)
grammar = LlamaGrammar.from_string(gbnf_grammar, verbose=True)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=True,
system_prompt="You are an advanced AI assistant responding in JSON format.\n\nAvailable JSON response models:\n\n" + documentation)
from graphviz import Digraph
def visualize_knowledge_graph(kg: KnowledgeGraph):
dot = Digraph(comment="Knowledge Graph")
# Add nodes
for node in kg.nodes:
dot.node(str(node.id), node.label, color=node.color)
# Add edges
for edge in kg.edges:
dot.edge(str(edge.source), str(edge.target), label=edge.label, color=edge.color)
# Render the graph
dot.render("knowledge_graph.gv", view=True)
def generate_graph(user_input: str) -> KnowledgeGraph:
prompt = f'''Help me understand the following by describing it as a detailed knowledge graph: {user_input}'''.strip()
response = llama_cpp_agent.get_chat_response(message=prompt, temperature=0.65, mirostat_mode=0, mirostat_tau=3.0,
mirostat_eta=0.1, grammar=grammar)
knowledge_graph = json.loads(response)
cls = KnowledgeGraph
knowledge_graph = cls(**knowledge_graph)
return knowledge_graph
graph = generate_graph("Teach me about quantum mechanics")
visualize_knowledge_graph(graph)
Example Output:
Additional Information
- Dependencies: pydantic for grammars based generation and of course llama-cpp-python.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama_cpp_agent-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d52b9d11c9f5619d459b9529adb3235425359484d38b65aeb924b976be36bbf |
|
MD5 | 477da63eeec20e1cffef5e1458647b9b |
|
BLAKE2b-256 | 0fb9393c39373816c72a41421207c5f007b42f08821df5b41f3d0491d588240e |