llama-index llms llama cpp integration
Project description
LlamaIndex Llms Integration: Llama Cpp
Installation
-
Install the required Python packages:
%pip install llama-index-embeddings-huggingface %pip install llama-index-llms-llama-cpp !pip install llama-index
Basic Usage
Import Required Libraries
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
messages_to_prompt,
completion_to_prompt,
)
Initialize LlamaCPP
Set up the model URL and initialize the LlamaCPP LLM:
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
llm = LlamaCPP(
model_url=model_url,
temperature=0.1,
max_new_tokens=256,
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 1},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
Generate Completions
Use the complete
method to generate a response:
response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print(response.text)
Stream Completions
You can also stream completions for a prompt:
response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
print(response.delta, end="", flush=True)
Set Up Query Engine with LlamaCPP
Change the global tokenizer to match the LLM:
from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)
Use Hugging Face Embeddings
Set up the embedding model and load documents:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader(
"../../../examples/paul_graham_essay/data"
).load_data()
Create Vector Store Index
Create a vector store index from the loaded documents:
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
Set Up Query Engine
Set up the query engine with the LlamaCPP LLM:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What did the author do growing up?")
print(response)
LLM Implementation example
https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for llama_index_llms_llama_cpp-0.2.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60547d237ce79c0489ade4a3557560292220ecc2893ec430ecc1e508aff7ab02 |
|
MD5 | 6e0589d022ddcc9fe0d69a5944a6708f |
|
BLAKE2b-256 | 865d2e26884ad44d048ef6135effcc09bdd36e1115e35f30af8afd08de78587a |
Close
Hashes for llama_index_llms_llama_cpp-0.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e97d53c2b7ea4ae2e5afc770cceca28797e413129f3181249791eecb1261e7d |
|
MD5 | 6c775183cc05155a76a53d984414ec35 |
|
BLAKE2b-256 | 19302b1a067e762a3bd0d45a17640870727fe39fdfbc13f0277abc08fdb2fc54 |