Skip to main content

llama-index llms llamafile integration

Project description

LlamaIndex Llms Integration: llamafile

Setup Steps

1. Download a LlamaFile

Use the following command to download a LlamaFile from Hugging Face:

wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

2. Make the File Executable

On Unix-like systems, run the following command:

chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

For Windows, simply rename the file to end with .exe.

3. Start the Model Server

Run the following command to start the model server, which will listen on http://localhost:8080 by default:

./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding

Using LlamaIndex

If you are using Google Colab or want to interact with LlamaIndex, you will need to install the necessary packages:

%pip install llama-index-llms-llamafile
!pip install llama-index

Import Required Libraries

from llama_index.llms.llamafile import Llamafile
from llama_index.core.llms import ChatMessage

Initialize the LLM

Create an instance of the LlamaFile LLM:

llm = Llamafile(temperature=0, seed=0)

Generate Completions

To generate a completion for a prompt, use the complete method:

resp = llm.complete("Who is Octavia Butler?")
print(resp)

Call Chat with a List of Messages

You can also interact with the LLM using a list of messages:

messages = [
    ChatMessage(
        role="system",
        content="Pretend you are a pirate with a colorful personality.",
    ),
    ChatMessage(role="user", content="What is your name?"),
]
resp = llm.chat(messages)
print(resp)

Streaming Responses

To use the streaming capabilities, you can call the stream_complete method:

response = llm.stream_complete("Who is Octavia Butler?")
for r in response:
    print(r.delta, end="")

You can also stream chat responses:

messages = [
    ChatMessage(
        role="system",
        content="Pretend you are a pirate with a colorful personality.",
    ),
    ChatMessage(role="user", content="What is your name?"),
]
resp = llm.stream_chat(messages)
for r in resp:
    print(r.delta, end="")

LLM Implementation example

https://docs.llamaindex.ai/en/stable/examples/llm/llamafile/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_llms_llamafile-0.4.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llama_index_llms_llamafile-0.4.1-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file llama_index_llms_llamafile-0.4.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_llms_llamafile-0.4.1.tar.gz
Algorithm Hash digest
SHA256 4cadea29a556b076e22ac22f0892b8da74aec307d7405414f07b4c7a7be7663f
MD5 5e78af4b9f3d1b65b94b6f69c298acb5
BLAKE2b-256 5cb1127ea7577a5f7fba91bd31c6da234f684e6eb2d30216e0f53069601cb5a2

See more details on using hashes here.

File details

Details for the file llama_index_llms_llamafile-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_llms_llamafile-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 55f84f557309845bec62590631eaef5412838defd7e7808a2ff67c23861e38c2
MD5 5f61f49303183fe132a88fa9262a5d0e
BLAKE2b-256 f3d7cfe903efaafa8f1c0eb94e6e902d246bf5a3f65167f1681a1e9118b2b1b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page