Skip to main content

llama-index llms llamafile integration

Project description

LlamaIndex Llms Integration: llamafile

Setup Steps

1. Download a LlamaFile

Use the following command to download a LlamaFile from Hugging Face:

wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

2. Make the File Executable

On Unix-like systems, run the following command:

chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

For Windows, simply rename the file to end with .exe.

3. Start the Model Server

Run the following command to start the model server, which will listen on http://localhost:8080 by default:

./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding

Using LlamaIndex

If you are using Google Colab or want to interact with LlamaIndex, you will need to install the necessary packages:

%pip install llama-index-llms-llamafile
!pip install llama-index

Import Required Libraries

from llama_index.llms.llamafile import Llamafile
from llama_index.core.llms import ChatMessage

Initialize the LLM

Create an instance of the LlamaFile LLM:

llm = Llamafile(temperature=0, seed=0)

Generate Completions

To generate a completion for a prompt, use the complete method:

resp = llm.complete("Who is Octavia Butler?")
print(resp)

Call Chat with a List of Messages

You can also interact with the LLM using a list of messages:

messages = [
    ChatMessage(
        role="system",
        content="Pretend you are a pirate with a colorful personality.",
    ),
    ChatMessage(role="user", content="What is your name?"),
]
resp = llm.chat(messages)
print(resp)

Streaming Responses

To use the streaming capabilities, you can call the stream_complete method:

response = llm.stream_complete("Who is Octavia Butler?")
for r in response:
    print(r.delta, end="")

You can also stream chat responses:

messages = [
    ChatMessage(
        role="system",
        content="Pretend you are a pirate with a colorful personality.",
    ),
    ChatMessage(role="user", content="What is your name?"),
]
resp = llm.stream_chat(messages)
for r in resp:
    print(r.delta, end="")

LLM Implementation example

https://docs.llamaindex.ai/en/stable/examples/llm/llamafile/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_llms_llamafile-0.3.0.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file llama_index_llms_llamafile-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_llms_llamafile-0.3.0.tar.gz
Algorithm Hash digest
SHA256 7f50732832524e11607093171f629b8f64a647d160fdcfce2d3b864cc8223ee4
MD5 e28847999827bfbce6dd89926f2e533d
BLAKE2b-256 1d306d11ae6689abbbcc81c3f3e7b337c843d366027f3bf44ad25be8050c8fa0

See more details on using hashes here.

File details

Details for the file llama_index_llms_llamafile-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_llms_llamafile-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb6533b2848fa7902953dd816396b7775d8159cc3ccf80c181a475f63ceab7c9
MD5 9a04d3e8085c5a93debbba680517aae2
BLAKE2b-256 433844928583bd37853343be0feaac8044cb7e74ea0f4b4227a11a3331441473

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page