No project description provided
Project description
Langchain Chat Model + LLamaCPP
A well tested 🧪 working solution for Integrating LLamaCPP with langchain. Fully compatible with ChatModel, and LangGraph integration. Provide a direct interface to the LlamaCPP library, without any additional wrapper layers, to maintain full configurability and control over the LlamaCPP functionality.
If you find this project useful, please give it a star ⭐!
Support:
- ✅
invoke
- ✅
ainvoke
- ✅
stream
- ✅
astream
- ✅ Structured output (JSON mode)
- ✅ Tool/Function calling
- ✅ LLamaProxy
Quick Install
pip
pip install langchain-llamacpp-chat-model
# When using llama_proxy
pip install langchain-llamacpp-chat-model langchain-llamacpp-chat-model[llama_proxy]
poetry
poetry add langchain-llamacpp-chat-model
# When using llama_proxy
poetry add langchain-llamacpp-chat-model langchain-llamacpp-chat-model[llama_proxy]
Usage
Using Llama
Llama instance allow to create a chat model for a single llama model
import os
from langchain_core.pydantic_v1 import BaseModel, Field
from llama_cpp import Llama
from langchain_llamacpp_chat_model import LlamaChatModel
from langchain_core.tools import tool
model_path = os.path.join(
os.path.expanduser("~/.cache/lm-studio/models"),
"lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",
)
llama = Llama(
model_path=model_path,
_gpu_layers=-1,
chat_format="chatml-function-calling", # https://llama-cpp-python.readthedocs.io/en/latest/#function-calling
)
chat_model = LlamaChatModel(llama=llama)
Invoke
result = chat_model.invoke("Tell me a joke about cats")
print(
result.content
) # Why was the cat sitting on the computer? Because it wanted to keep an eye on the mouse!
Stream
stream = chat_model.stream("Tell me a joke about cats")
final_content = ""
for token in stream:
final_content += token.content
print(
final_content
) # Why was the cat sitting on the computer? Because it wanted to keep an eye on the mouse!
Strucuted Output
class Joke(BaseModel):
setup: str = Field(description="The setup of the joke")
punchline: str = Field(description="The punchline to the joke")
structured_llm = chat_model.with_structured_output(Joke)
result = structured_llm.invoke("Tell me a joke about cats")
assert isinstance(result, Joke)
print(result.setup) # Why was the cat sitting on the computer?
print(result.punchline) # Because it wanted to keep an eye on the mouse!
Function calling
@tool
def magic_number_tool(input: int) -> int:
"""Applies a magic function to an input."""
return input + 2
llm_with_tool = chat_model.bind_tools(
[magic_number_tool], tool_choice="magic_number_tool"
)
result = llm_with_tool.invoke("What is the magic mumber of 2?")
assert result.tool_calls[0]["name"] == "magic_number_tool"
Using LlamaProxy
LLamaProxy allow to define multiple models and use one of them by specifying model_name
. Very useful for a server environment.
import os
from llama_cpp.server.app import LlamaProxy, ModelSettings
from langchain_llamacpp_chat_model import LlamaProxyChatModel
llama3_model_path = os.path.join(
os.path.expanduser("~/.cache/lm-studio/models"),
"lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf",
)
phi3_model_path = os.path.join(
os.path.expanduser("~/.cache/lm-studio/models"),
"microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-q4.gguf",
)
llama_proxy = LlamaProxy(
models=[
ModelSettings(model=llama3_model_path, model_alias="llama3"),
ModelSettings(model=phi3_model_path, model_alias="phi3"),
]
)
llama3_chat_model = LlamaProxyChatModel(llama_proxy=llama_proxy, model="llama3")
phi3_chat_model = LlamaProxyChatModel(llama_proxy=llama_proxy, model="phi3")
# Invoke
# --------------------------------------------------------
llama3_result = llama3_chat_model.invoke("Tell me a joke about cats")
print(llama3_result.content)
phi3_result = llama3_chat_model.invoke("Tell me a joke about cats")
print(phi3_result.content)
# Stream
# --------------------------------------------------------
stream = llama3_chat_model.stream("Tell me a joke about cats")
final_content = ""
for token in stream:
final_content += token.content
print(final_content)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for langchain_llamacpp_chat_model-0.2.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f97e11e2d4c81772eaa734c67aa314c47e323849aa22ac1d437e925d42b93ec |
|
MD5 | 2b472f15749ea5a26eb86d6b407949c4 |
|
BLAKE2b-256 | 8d31ecd8cfeb033f5f359d522016cd1e447c7f2daa3734ba0dcfd9443be43dee |
Close
Hashes for langchain_llamacpp_chat_model-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28c576326fb18309ffbb815f2de1ecb4d11c245673dc03bb33c45e0db2c5fd93 |
|
MD5 | 41830ed3985a6d0d16b0edbbd29dd3e6 |
|
BLAKE2b-256 | c428e27e9b3293d17ce3c7efb2c2bdfe1f046f97e6e168beebcdb274b1db4c53 |