A framework for building LLM based AI agents with llama-cpp-python.
Project description
llama-cpp-agent Framework
Introduction
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). It provides a simple yet robust interface using llama-cpp-python, allowing users to chat with LLM models, execute structured function calls and get structured output.
Key Features
- Simple Chat Interface: Engage in seamless conversations with LLMs.
- Structured Output: Get structured output from LLMs.
- Function Calling: Execute structured outputs from LLMs, enhancing the interaction capabilities.
- Flexibility: Suited for various applications from casual chatting to specific function executions.
Installation
The llama-cpp-agent framework can be installed using pip:
pip install llama-cpp-agent
Usage
The llama-cpp-agent framework is designed to be easy to use. The following sections will guide you through the process of using the framework.
Chat usage
To chat with an LLM model, you need to create an instance of the LlamaCppAgent class. The constructor takes the following parameters:
main_model: The LLM model to use for the chat. This is an instance of theLlamaclass from the llama-cpp-python library.name: The name of the agent. Defaults tollamacpp_agent.system_prompt: The system prompt to use for the chat. Defaults toYou are a helpful assistant..predefined_messages_formatter_type: The type of predefined messages formatter to use. Defaults toMessagesFormatterType.CHATML.debug_output: Whether to print debug output to the console. Defaults toFalse.
Predefined Messages Formatter
The llama-cpp-agent framework uses custom messages formatters to format messages for the LLM model. The MessagesFormatterType enum defines the available predefined formatters. The following predefined formatters are available:
MessagesFormatterType.CHATML: Formats messages using the CHATML format.MessagesFormatterType.MIXTRAL: Formats messages using the MIXTRAL format.MessagesFormatterType.VICUNA: Formats messages using the VICUNA format.MessagesFormatterType.LLAMA_2: Formats messages using the LLAMA 2 format.MessagesFormatterType.SYNTHIA: Formats messages using the SYNTHIA format.MessagesFormatterType.NEURAL_CHAT: Formats messages using the NEURAL CHAT format.MessagesFormatterType.SOLAR: Formats messages using the SOLAR format.MessagesFormatterType.OPEN_CHAT: Formats messages using the OPEN CHAT format.
You can also define your own custom messages formatter by creating an instance of the MessagesFormatter class.
The MessagesFormatter class takes the following parameters:
PRE_PROMPT: The pre-prompt to use for the messages.SYS_PROMPT_START: The system prompt start to use for the messages.SYS_PROMPT_END: The system prompt end to use for the messages.USER_PROMPT_START: The user prompt start to use for the messages.USER_PROMPT_END: The user prompt end to use for the messages.ASSISTANT_PROMPT_START: The assistant prompt start to use for the messages.ASSISTANT_PROMPT_END: The assistant prompt end to use for the messages.INCLUDE_SYS_PROMPT_IN_MESSAGE: Whether to include the system prompt in the message.DEFAULT_STOP_SEQUENCES: The default stop sequences to use for the messages.
After creating an instance of the MessagesFormatter class, you can use it by setting the messages_formatter of the LlamaCppAgent instance to the instance of the MessagesFormatter class.
Chatting
To chat with the LLM model, you can use the get_chat_response method of the LlamaCppAgent class. The get_chat_response method takes the following parameters:
message: The message to send to the LLM model. Defaults toNone.role: The role of the message. Defaults touser.system_prompt: A override for the system prompt. Defaults toNoneand uses the agent system prompt passed at creation.grammar: The grammar to use for constraining the LLM response. Defaults toNone.function_tool_registry: The function tool registry to use for the chat. Defaults toNone.max_tokens: The maximum number of tokens to use for the chat. Defaults to0.temperature: The temperature to use for the chat. Defaults to0.4.top_k: The top k to use for the chat. Defaults to0.top_p: The top p to use for the chat. Defaults to1.0.min_p: The min p to use for the chat. Defaults to0.05.typical_p: The typical p to use for the chat. Defaults to1.0.repeat_penalty: The repeat penalty to use for the chat. Defaults to1.0.mirostat_mode: The mirostat mode to use for the chat. Defaults to0.mirostat_tau: The mirostat tau to use for the chat. Defaults to5.0.mirostat_eta: The mirostat eta to use for the chat. Defaults to0.1.tfs_z: The tfs z to use for the chat. Defaults to1.0.stop_sequences: The stop sequences to use for the chat. Defaults toNone.stream: Whether to stream the chat. Defaults toTrue.k_last_messages: The k last messages to use for the chat. Defaults to-1which takes all messages in the chat history.add_response_to_chat_history: Whether to add the response to the chat history. Defaults toTrue.add_message_to_chat_history: Whether to add the message to the chat history. Defaults toTrue.print_output: Whether to print the output. Defaults toTrue.
Structured Output Usage
To get structured output from an LLM model, you can use an instance of the StructuredOutputAgent class. The constructor takes the following parameters:
main_model: The LLM model to use for the structured output. This is an instance of theLlamaclass from the llama-cpp-python library.messages_formatter_type: The type of messages formatter to use. Defaults toMessagesFormatterType.CHATML.debug_output: Whether to print debug output to the console. Defaults toFalse.
To set a custom messages formatter, you can use the llama_cpp_agent.messages_formatter property of the StructuredOutputAgent class.
Structured Output
To create structured output from the LLM model, you can use the create_object method of the StructuredOutputAgent class. The create_object method takes the following parameters:
cls: The pydantic class used for creating the structured output.data: The data to use for the structured output. Defaults toNonewhich creates a random object of cls class.
This will return an instance of the pydantic class.
Function Calling Usage
To utilize function calling with an LLM model, you can use the get_chat_response method of a LlamaCppAgent with a function_tool_registry. The function_tool_registry is an instance of the LlamaCppFunctionToolRegistry class. You can create a LlamaCppFunctionToolRegistry instance by passing a list of LlamaCppFunctionTool instances to the static get_function_tool_registry method of the LlamaCppAgent class. The LlamaCppFunctionTool class takes the following parameters:
model: The pydantic class defining the function call, it must have arunmethod to actually execute the function call. You can also convert Python functions with type hints, automatically to pydantic models using the function:create_dynamic_model_from_functionunder:llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_modelshas_markdown_code_block: Whether the model has amarkdown_code_blockfield. Defaults toFalse. Amarkdown_code_blockfield is a special field used to allow the LLM to write relatively unconstrained output by letting it write themarkdown_code_blockas a Markdown code block. Which is useful for file writing.has_triple_quoted_string: Whether the model has atriple_quoted_stringfield. Defaults toFalse. Atriple_quoted_stringfield is a special field used to allow the LLM to write relatively unconstrained output by letting it write thetriple_quoted_stringas a triple quoted string. Which is useful for file writing. After passing the list ofLlamaCppFunctionToolinstances to theget_function_tool_registrymethod, you can use the returnedLlamaCppFunctionToolRegistryinstance as thefunction_tool_registryparameter of theget_chat_responsemethod of theLlamaCppAgentclass.
Usage Examples
The following examples demonstrate the usage of the llama-cpp-agent framework.
You can find a lot more examples in the examples folder.
Simple Chat Example
This example demonstrates how to initiate a chat with an LLM model.
from llama_cpp import Llama
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.messages_formatter import MessagesFormatterType
main_model = Llama(
"../gguf-models/dolphin-2.7-mixtral-8x7b.Q4_K_M.gguf",
n_gpu_layers=13,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=False,
system_prompt="You are Dolphin, an uncensored and unbiased AI assistant.", predefined_messages_formatter_type=MessagesFormatterType.CHATML)
while True:
user_input = input("User: ")
if user_input == "exit":
break
user_input = llama_cpp_agent.get_chat_response(user_input, temperature=0.7)
print("AI: " + user_input)
Structured Output
This example shows how to get structured JSON output using the StructureOutputAgent class.
# Example agent that uses the StructuredOutputAgent class to create a dataset entry of a book out of unstructured data.
from enum import Enum
from llama_cpp import Llama
from pydantic import BaseModel, Field
from llama_cpp_agent.structured_output_agent import StructuredOutputAgent
# Example enum for our output model
class Category(Enum):
Fiction = "Fiction"
NonFiction = "Non-Fiction"
# Example output model
class Book(BaseModel):
"""
Represents an entry about a book.
"""
title: str = Field(..., description="Title of the book.")
author: str = Field(..., description="Author of the book.")
published_year: int = Field(..., description="Publishing year of the book.")
keywords: list[str] = Field(..., description="A list of keywords.")
category: Category = Field(..., description="Category of the book.")
summary: str = Field(..., description="Summary of the book.")
main_model = Llama(
"../gguf-models/nous-hermes-2-solar-10.7b.Q6_K.gguf",
n_gpu_layers=49,
offload_kqv=True,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=4096,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
structured_output_agent = StructuredOutputAgent(main_model, debug_output=True)
text = """The Feynman Lectures on Physics is a physics textbook based on some lectures by Richard Feynman, a Nobel laureate who has sometimes been called "The Great Explainer". The lectures were presented before undergraduate students at the California Institute of Technology (Caltech), during 1961–1963. The book's co-authors are Feynman, Robert B. Leighton, and Matthew Sands."""
print(structured_output_agent.create_object(Book, text))
Example output
{ "title": "The Feynman Lectures on Physics" , "author": "Richard Feynman, Robert B. Leighton, Matthew Sands" , "published_year": 1963 , "keywords": [ "physics" , "textbook" , "Nobel laureate" , "The Great Explainer" , "California Institute of Technology" , "undergraduate" , "lectures" ] , "category": "Non-Fiction" , "summary": "The Feynman Lectures on Physics is a physics textbook based on lectures by Nobel laureate Richard Feynman, known as 'The Great Explainer'. The lectures were presented to undergraduate students at Caltech between 1961 and 1963. Co-authors of the book are Feynman, Robert B. Leighton, and Matthew Sands." }
title='The Feynman Lectures on Physics' author='Richard Feynman, Robert B. Leighton, Matthew Sands' published_year=1963 keywords=['physics', 'textbook', 'Nobel laureate', 'The Great Explainer', 'California Institute of Technology', 'undergraduate', 'lectures'] category=<Category.NonFiction: 'Non-Fiction'> summary="The Feynman Lectures on Physics is a physics textbook based on lectures by Nobel laureate Richard Feynman, known as 'The Great Explainer'. The lectures were presented to undergraduate students at Caltech between 1961 and 1963. Co-authors of the book are Feynman, Robert B. Leighton, and Matthew Sands."
Function Calling Example
This example shows how to do function calling pydantic models.
You can also convert Python functions with type hints, automatically to pydantic models using the function:
create_dynamic_model_from_function under: llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models
from enum import Enum
from llama_cpp import Llama
from pydantic import BaseModel, Field
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.messages_formatter import MessagesFormatterType
from llama_cpp_agent.function_call_tools import LlamaCppFunctionTool
# Simple calculator tool for the agent that can add, subtract, multiply, and divide.
class MathOperation(Enum):
ADD = "add"
SUBTRACT = "subtract"
MULTIPLY = "multiply"
DIVIDE = "divide"
class Calculator(BaseModel):
"""
Perform a math operation on two numbers.
"""
number_one: float = Field(..., description="First number.", max_precision=5, min_precision=2)
operation: MathOperation = Field(..., description="Math operation to perform.")
number_two: float = Field(..., description="Second number.", max_precision=5, min_precision=2)
def run(self):
if self.operation == MathOperation.ADD:
return self.number_one + self.number_two
elif self.operation == MathOperation.SUBTRACT:
return self.number_one - self.number_two
elif self.operation == MathOperation.MULTIPLY:
return self.number_one * self.number_two
elif self.operation == MathOperation.DIVIDE:
return self.number_one / self.number_two
else:
raise ValueError("Unknown operation.")
function_tools = [LlamaCppFunctionTool(Calculator)]
function_tool_registry = LlamaCppAgent.get_function_tool_registry(function_tools)
main_model = Llama(
"../gguf-models/dolphin-2.6-mistral-7b-Q8_0.gguf",
n_gpu_layers=35,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=False,
system_prompt="You are an advanced AI, tasked to assist the user by calling functions in JSON format.\n\n\n" + function_tool_registry.get_documentation(),
predefined_messages_formatter_type=MessagesFormatterType.CHATML)
user_input = 'What is 42 * 42?'
print(llama_cpp_agent.get_chat_response(user_input, temperature=0.45, function_tool_registry=function_tool_registry))
Example output
{ "function": "calculator","function_parameters": { "number_one": 42.00000 , "operation": "multiply" , "number_two": 42.00000 }}
1764.0
Function Calling with Python Function Example
This example shows how to do function calling using actual Python functions.
from llama_cpp import Llama
from typing import Union
import math
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.messages_formatter import MessagesFormatterType
from llama_cpp_agent.function_call_tools import LlamaCppFunctionTool
from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import create_dynamic_model_from_function
def calculate_a_to_the_power_b(a: Union[int | float], b: Union[int | float]):
print(f"Result: {math.pow(a, b)}")
DynamicSampleModel = create_dynamic_model_from_function(calculate_a_to_the_power_b)
function_tools = [LlamaCppFunctionTool(DynamicSampleModel)]
function_tool_registry = LlamaCppAgent.get_function_tool_registry(function_tools)
main_model = Llama(
"../gguf-models/dolphin-2.6-mistral-7b-Q8_0.gguf",
n_gpu_layers=35,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=False,
system_prompt="You are an advanced AI, tasked to assist the user by calling functions in JSON format.\n\n\n" + function_tool_registry.get_documentation(),
predefined_messages_formatter_type=MessagesFormatterType.CHATML)
user_input = "a= 5, b = 42"
print(llama_cpp_agent.get_chat_response(user_input, temperature=0.45, function_tool_registry=function_tool_registry))
Example output
{ "function": "calculate-a-to-the-power-b","function_parameters": { "a": 5 , "b": 42 }}
Result: 2.2737367544323207e+29
Knowledge Graph Creation Example
This example, based on an example of the Instructor library for OpenAI, demonstrates how to create a knowledge graph using the llama-cpp-agent framework.
import json
from typing import List
from enum import Enum
from llama_cpp import Llama, LlamaGrammar
from pydantic import BaseModel, Field
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import generate_gbnf_grammar_and_documentation
main_model = Llama(
"../gguf-models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf",
n_gpu_layers=13,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=True,
seed=42,
)
class Node(BaseModel):
id: int
label: str
color: str
class Edge(BaseModel):
source: int
target: int
label: str
color: str = "black"
class KnowledgeGraph(BaseModel):
nodes: List[Node] = Field(..., default_factory=list)
edges: List[Edge] = Field(..., default_factory=list)
gbnf_grammar, documentation = generate_gbnf_grammar_and_documentation([KnowledgeGraph],False)
print(gbnf_grammar)
grammar = LlamaGrammar.from_string(gbnf_grammar, verbose=True)
llama_cpp_agent = LlamaCppAgent(main_model, debug_output=True,
system_prompt="You are an advanced AI assistant responding in JSON format.\n\nAvailable JSON response models:\n\n" + documentation)
from graphviz import Digraph
def visualize_knowledge_graph(kg: KnowledgeGraph):
dot = Digraph(comment="Knowledge Graph")
# Add nodes
for node in kg.nodes:
dot.node(str(node.id), node.label, color=node.color)
# Add edges
for edge in kg.edges:
dot.edge(str(edge.source), str(edge.target), label=edge.label, color=edge.color)
# Render the graph
dot.render("knowledge_graph.gv", view=True)
def generate_graph(user_input: str) -> KnowledgeGraph:
prompt = f'''Help me understand the following by describing it as a detailed knowledge graph: {user_input}'''.strip()
response = llama_cpp_agent.get_chat_response(message=prompt, temperature=0.65, mirostat_mode=0, mirostat_tau=3.0,
mirostat_eta=0.1, grammar=grammar)
knowledge_graph = json.loads(response)
cls = KnowledgeGraph
knowledge_graph = cls(**knowledge_graph)
return knowledge_graph
graph = generate_graph("Teach me about quantum mechanics")
visualize_knowledge_graph(graph)
Example Output:
Additional Information
- Dependencies: pydantic for grammars based generation and of course llama-cpp-python.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama-cpp-agent-0.0.4.tar.gz.
File metadata
- Download URL: llama-cpp-agent-0.0.4.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e10642e74a327b2197c575c500ee2755855c964f846df4251e3367402a57682
|
|
| MD5 |
7d5c842f1d8645981dead2bd56290f96
|
|
| BLAKE2b-256 |
8a2cfec312d328c12d035e40be3a4b58d6aabad1493ec832294c586538496280
|
File details
Details for the file llama_cpp_agent-0.0.4-py3-none-any.whl.
File metadata
- Download URL: llama_cpp_agent-0.0.4-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d52b9d11c9f5619d459b9529adb3235425359484d38b65aeb924b976be36bbf
|
|
| MD5 |
477da63eeec20e1cffef5e1458647b9b
|
|
| BLAKE2b-256 |
0fb9393c39373816c72a41421207c5f007b42f08821df5b41f3d0491d588240e
|