Skip to main content

No project description provided

Project description

Langchain Chat Model + LLamaCPP

A well tested 🧪 working solution for Integrating LLamaCPP with langchain. Fully compatible with ChatModel, and LangGraph integration. Provide a direct interface to the LlamaCPP library, without any additional wrapper layers, to maintain full configurability and control over the LlamaCPP functionality.

If you find this project useful, please give it a star ⭐!

Support:

  • invoke
  • ainvoke
  • stream
  • astream
  • ✅ Structured output (JSON mode)
  • ✅ Tool/Function calling
  • ✅ LLamaProxy

Quick Install

pip
pip install langchain-llamacpp-chat-model
# When using llama_proxy
pip install langchain-llamacpp-chat-model langchain-llamacpp-chat-model[llama_proxy]
poetry
poetry add langchain-llamacpp-chat-model
# When using llama_proxy
poetry add langchain-llamacpp-chat-model langchain-llamacpp-chat-model[llama_proxy]

Usage

Using Llama

Llama instance allow to create a chat model for a single llama model

import os
from langchain_core.pydantic_v1 import BaseModel, Field
from llama_cpp import Llama
from langchain_llamacpp_chat_model import LlamaChatModel
from langchain_core.tools import tool

model_path = os.path.join(
    os.path.expanduser("~/.cache/lm-studio/models"),
    "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",
)

llama = Llama(
    model_path=model_path,
    _gpu_layers=-1,
    chat_format="chatml-function-calling", # https://llama-cpp-python.readthedocs.io/en/latest/#function-calling
)
chat_model = LlamaChatModel(llama=llama)

Invoke

result = chat_model.invoke("Tell me a joke about cats")
print(
    result.content
)  # Why was the cat sitting on the computer? Because it wanted to keep an eye on the mouse!

Stream

stream = chat_model.stream("Tell me a joke about cats")
final_content = ""
for token in stream:
    final_content += token.content

print(
    final_content
)  # Why was the cat sitting on the computer? Because it wanted to keep an eye on the mouse!

Strucuted Output

class Joke(BaseModel):
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")


structured_llm = chat_model.with_structured_output(Joke)

result = structured_llm.invoke("Tell me a joke about cats")

assert isinstance(result, Joke)
print(result.setup)  # Why was the cat sitting on the computer?
print(result.punchline)  # Because it wanted to keep an eye on the mouse!

⚠️ Ensure temperature is set to 0 (or near 0). The open-source models (tested with Phi3 and LLama3) do not perform well when calling functions, as their behavior can be unpredictable.

Function calling

@tool
def magic_number_tool(input: int) -> int:
    """Applies a magic function to an input."""
    return input + 2


llm_with_tool = chat_model.bind_tools(
    [magic_number_tool], tool_choice="magic_number_tool"
)

result = llm_with_tool.invoke("What is the magic mumber of 2?")

assert result.tool_calls[0]["name"] == "magic_number_tool"

⚠️ Ensure temperature is set to 0 (or near 0). The open-source models (tested with Phi3 and LLama3) do not perform well when calling functions, as their behavior can be unpredictable.

Using LlamaProxy

LLamaProxy allow to define multiple models and use one of them by specifying model_name. Very useful for a server environment.

import os
from llama_cpp.server.app import LlamaProxy, ModelSettings
from langchain_llamacpp_chat_model import LlamaProxyChatModel

llama3_model_path = os.path.join(
    os.path.expanduser("~/.cache/lm-studio/models"),
    "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",
)
phi3_model_path = os.path.join(
    os.path.expanduser("~/.cache/lm-studio/models"),
    "microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
)

llama_proxy = LlamaProxy(
    models=[
        ModelSettings(model=llama3_model_path, model_alias="llama3"),
        ModelSettings(model=phi3_model_path, model_alias="phi3"),
    ]
)
llama3_chat_model = LlamaProxyChatModel(llama_proxy=llama_proxy, model="llama3")
phi3_chat_model = LlamaProxyChatModel(llama_proxy=llama_proxy, model="phi3")


# Invoke
# --------------------------------------------------------
llama3_result = llama3_chat_model.invoke("Tell me a joke about cats")
print(llama3_result.content)

phi3_result = llama3_chat_model.invoke("Tell me a joke about cats")
print(phi3_result.content)

# Stream
# --------------------------------------------------------
stream = llama3_chat_model.stream("Tell me a joke about cats")
final_content = ""
for token in stream:
    final_content += token.content

print(final_content)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_llamacpp_chat_model-0.2.2.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file langchain_llamacpp_chat_model-0.2.2.tar.gz.

File metadata

File hashes

Hashes for langchain_llamacpp_chat_model-0.2.2.tar.gz
Algorithm Hash digest
SHA256 b41d8c42283e648203239abd8e0f392f20aba5233d1ea4a0e6cecef8f801688c
MD5 8436c32128a8f811e73b114039057a84
BLAKE2b-256 5c2fa8844ca723c2f81162ae5fb4da56d0d09d04c90290f8109803305b43ad91

See more details on using hashes here.

File details

Details for the file langchain_llamacpp_chat_model-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_llamacpp_chat_model-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6f74a7ac7d0618ecace5f3323c5e9d13e6fe9503ede387f06eafdd61a3c66017
MD5 5e14e2e130ac0df28dc48662486d2299
BLAKE2b-256 e0a7cda42e6acfc6aec773f24e31040b122c962ebb1f34437240ef34b041d6b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page