Skip to main content

llama-index llms llama cpp integration

Project description

LlamaIndex Llms Integration: Llama Cpp

Installation

  1. Install the required Python packages:

    %pip install llama-index-embeddings-huggingface
    %pip install llama-index-llms-llama-cpp
    !pip install llama-index
    

Basic Usage

Import Required Libraries

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.llama_cpp import LlamaCPP
from llama_index.llms.llama_cpp.llama_utils import (
    messages_to_prompt,
    completion_to_prompt,
)

Initialize LlamaCPP

Set up the model URL and initialize the LlamaCPP LLM:

model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
llm = LlamaCPP(
    model_url=model_url,
    temperature=0.1,
    max_new_tokens=256,
    context_window=3900,
    generate_kwargs={},
    model_kwargs={"n_gpu_layers": 1},
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

Generate Completions

Use the complete method to generate a response:

response = llm.complete("Hello! Can you tell me a poem about cats and dogs?")
print(response.text)

Stream Completions

You can also stream completions for a prompt:

response_iter = llm.stream_complete("Can you write me a poem about fast cars?")
for response in response_iter:
    print(response.delta, end="", flush=True)

Set Up Query Engine with LlamaCPP

Change the global tokenizer to match the LLM:

from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode
)

Use Hugging Face Embeddings

Set up the embedding model and load documents:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader(
    "../../../examples/paul_graham_essay/data"
).load_data()

Create Vector Store Index

Create a vector store index from the loaded documents:

index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

Set Up Query Engine

Set up the query engine with the LlamaCPP LLM:

query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What did the author do growing up?")
print(response)

LLM Implementation example

https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_llms_llama_cpp-0.3.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_llms_llama_cpp-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_llms_llama_cpp-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d640e8515d07ffbd77cf007a486f3a6498a57d06476c6d0952ea160185fd42d7
MD5 a85f72a893d21b6f1f156fcb5095950a
BLAKE2b-256 444d1463013fdade95052638640e102ba246bb4e406eafbae7c0c813b7b16812

See more details on using hashes here.

File details

Details for the file llama_index_llms_llama_cpp-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_llms_llama_cpp-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea8655f41a4dd1e74ec0065e7021e1add2bc8e7c550df37b3dbfbdaee21a4d84
MD5 a4e1bb51506bd0d587dc88f368269708
BLAKE2b-256 01153718f06671f292430e71fff04a7359d38cc8444c9b1f065db4289ca53478

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page