llama-index llms llama cpp integration
Project description
LlamaIndex Llms Integration: Llama Cpp
Installation
-
Install the required Python packages:
%pip install llama-index-embeddings-huggingface %pip install llama-index-llms-llama-cpp !pip install llama-index
Basic Usage
Import Required Libraries
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
messages_to_prompt,
completion_to_prompt,
)
Initialize LlamaCPP
Set up the model URL and initialize the LlamaCPP LLM:
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
llm = LlamaCPP(
model_url=model_url,
temperature=0.1,
max_new_tokens=256,
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 1},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
Generate Completions
Use the complete
method to generate a response:
response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print(response.text)
Stream Completions
You can also stream completions for a prompt:
response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
print(response.delta, end="", flush=True)
Set Up Query Engine with LlamaCPP
Change the global tokenizer to match the LLM:
from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)
Use Hugging Face Embeddings
Set up the embedding model and load documents:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader(
"../../../examples/paul_graham_essay/data"
).load_data()
Create Vector Store Index
Create a vector store index from the loaded documents:
index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
Set Up Query Engine
Set up the query engine with the LlamaCPP LLM:
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What did the author do growing up?")
print(response)
LLM Implementation example
https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llama_index_llms_llama_cpp-0.3.0.tar.gz
.
File metadata
- Download URL: llama_index_llms_llama_cpp-0.3.0.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d640e8515d07ffbd77cf007a486f3a6498a57d06476c6d0952ea160185fd42d7 |
|
MD5 | a85f72a893d21b6f1f156fcb5095950a |
|
BLAKE2b-256 | 444d1463013fdade95052638640e102ba246bb4e406eafbae7c0c813b7b16812 |
File details
Details for the file llama_index_llms_llama_cpp-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: llama_index_llms_llama_cpp-0.3.0-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.10 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea8655f41a4dd1e74ec0065e7021e1add2bc8e7c550df37b3dbfbdaee21a4d84 |
|
MD5 | a4e1bb51506bd0d587dc88f368269708 |
|
BLAKE2b-256 | 01153718f06671f292430e71fff04a7359d38cc8444c9b1f065db4289ca53478 |