Skip to main content

A chat completion client extension using Llama CPP, integrating with AutoGen for AI-powered chat interactions.

Project description

AutoGen Llama-CPP Chat Completion Extension

This extension provides a Chat Completion Client using the Llama-CPP model. It integrates with the AutoGen ecosystem, enabling AI-powered chat completion with access to external tools.

Installation

To install the autogen-llama-cpp-chat-completion extension, run the following command:

pip install autogen-llama-cpp-chat-completion

Dependencies

  • autogen-core>=0.4,<0.5
  • pydantic
  • llama-cpp

Usage

Once installed, you can integrate this extension into your AutoGen system for chat-based completions using the Llama-CPP model.

Example Usage

Here’s an example of how you can use the extension to create a chat session with Llama-CPP:

from autogen_llama_cpp_chat_completion.llama_cpp_extension import LlamaCppChatCompletionClient
from autogen_core.models import SystemMessage, UserMessage

# Initialize the LlamaCpp client
client = LlamaCppChatCompletionClient(
    repo_id="your_repo_id", 
    filename="path_to_model_file", 
    n_gpu_layers=-1, 
    seed=1337, 
    n_ctx=1000, 
    verbose=True
)

# Create chat messages
messages = [
    SystemMessage(content="You are a helpful assistant."),
    UserMessage(content="What is the capital of France?")
]

# Get a response from the model
result = await client.create(messages)

# Print the result
print(result.content)

Phi-4 Model Example

You can also use the phi-4 model for chat completions. Here's an example of how to integrate it with the extension:

from autogen_llama_cpp_chat_completion.llama_cpp_extension import LlamaCppChatCompletionClient
from autogen_core.models import SystemMessage, UserMessage

# Initialize the Phi-4 LlamaCpp client
model_client = LlamaCppChatCompletionClient(
    repo_id="unsloth/phi-4-GGUF",
    filename="phi-4-Q2_K_L.gguf",
    n_gpu_layers=-1,
    seed=1337,
    n_ctx=16384,
    verbose=False,
)

# Create chat messages
messages = [
    SystemMessage(content="You are an assistant with the Phi-4 model."),
    UserMessage(content="What is the latest breakthrough in AI research?")
]

# Get a response from the model
result = await model_client.create(messages)

# Print the result
print(result.content)

This example demonstrates how to use the phi-4 model, providing a larger context window (n_ctx=16384) and using the model from the "unsloth/phi-4-GGUF" repository.

Streaming Mode

You can also use streaming mode to generate responses incrementally:

response_generator = client.create_stream(messages)

# Iterate through the response stream
async for token in response_generator:
    print(token)

Configuration

When initializing the LlamaCppChatCompletionClient, you can provide the following configuration parameters:

  • repo_id: The repository ID for the model you want to use.
  • filename: The path to the model file.
  • n_gpu_layers: The number of GPU layers (default is -1).
  • seed: The random seed to use for initialization (default is 1337).
  • n_ctx: The context window size (default is 1000).
  • verbose: Whether to print debug information (default is True).

Tools Integration

You can dynamically register tools that the model can use during interaction. If a message invokes a tool, it will be detected, and the corresponding tool will be executed.

Tools should be passed as part of the tools argument when calling the create or create_stream methods.

Example Tool Usage

If you have a tool, such as a request validation tool, you can register it and the model will use it if necessary.

tools = [
    Tool(name="validate_request", description="Validates request data")
]

result = await client.create(messages, tools=tools)

Running Tests

To ensure that everything is working correctly, you can run tests with pytest:

pytest

Contributing

If you’d like to contribute to this extension, feel free to open an issue or submit a pull request.

License

This extension is open source and available under the MIT License. See the LICENSE file for more information.

Topics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autogen_llama_cpp_chat_completion-0.1.2.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file autogen_llama_cpp_chat_completion-0.1.2.tar.gz.

File metadata

File hashes

Hashes for autogen_llama_cpp_chat_completion-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a0404f6d6730be7c4efba2753a9192b8433f31f4d7b20db5e1c65a11af833ca1
MD5 acce057b78d66865a64fcd0036e47e5e
BLAKE2b-256 0d9f01986e2e5ed9082e5119181212060d46e743c9a273a1c5be118865ca0e7c

See more details on using hashes here.

File details

Details for the file autogen_llama_cpp_chat_completion-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for autogen_llama_cpp_chat_completion-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3ccb4ae4fcb62f0a9dd0e11d7d77975866ca1bbe979f9f579a4a12c3d7cfdeb6
MD5 185fd5dd7d8e5b52f7b8459e77efe6db
BLAKE2b-256 a63841d55ab69879921cf4fee162be55c41db4806f6454f02c00a2ec474bdb52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page