Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.8-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 176efa30e19bda320bf32b7dd732f261e1ae94e048fd872a4d49f9db1beb978b |
|
MD5 | 7311fd6d4fad4cc673508112b6e7f6dc |
|
BLAKE2b-256 | 94c6ed698e90c267299dde706e9bfab46e4a691921d2483cd2fe49ab6476bd2b |
Hashes for llama_cpp_cffi-0.1.8-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a8b0b318121a4f2a1dce659f22f730a4da63718984f7eaa5f63e6c9041065f4 |
|
MD5 | 99fecda20663fa44c1a99d1ffbdb9606 |
|
BLAKE2b-256 | 1068dd9be2a17daff43ee49266848272e8d90892c6cbbaaedb14e775561902cb |
Hashes for llama_cpp_cffi-0.1.8-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d73adfe63daa5bdea2b58a66ac4de3ed932542ae1663168c000250712b5c4803 |
|
MD5 | 46b6a5e2de8a5d275546d556006d1a7b |
|
BLAKE2b-256 | caee2445945028c8ee047e7ec5f154cc79d49cc96410d8a6029e9bbb4875b8e9 |
Hashes for llama_cpp_cffi-0.1.8-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 424adbb514237086bf68cb7c46642937423a56e589e01421bed059222f759fae |
|
MD5 | 1bae01099094090383bee41e867df58f |
|
BLAKE2b-256 | 961ed3bc25fc347bc039f4e78bcb3dc05f6c6da3644ae67ae21bfbeacaa74504 |
Hashes for llama_cpp_cffi-0.1.8-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d65dc8459e8bb07427bd48dc2be4a57d01a018fb5d19fece23dad95cf91b5e20 |
|
MD5 | 31403112645e5dff518662e19ca6d48e |
|
BLAKE2b-256 | a1e5733b4fc5028ee1e155892c27e9007b830c5e1b977fa0fb31f0d6a0e49779 |
Hashes for llama_cpp_cffi-0.1.8-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9daff13137996e73a11a0ba46d91b60f42746276025af62607388e07ec83988a |
|
MD5 | 228fd4201997297e200253885dfeef62 |
|
BLAKE2b-256 | 706a7a7de731fc288d5b76e912df2be1861199bebd7d377ffa6896805352057d |
Hashes for llama_cpp_cffi-0.1.8-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 140b6f70fa71ed9058eb1608c0db8241d1f894641794ca70e5d58fe1a19bea60 |
|
MD5 | 20543b7d0fcfe278409f0a4bed36a5c4 |
|
BLAKE2b-256 | 85f02df24f57bcb4a14e85e45f62b5f06b0d9aa64e047f4c52b2a01937c713ff |
Hashes for llama_cpp_cffi-0.1.8-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da95fad2f40362c6bd77e0f17d0ea3304bccd65f271f4c24a2067c61a1023270 |
|
MD5 | b233a32e224ec6ff715a54b9a5b68542 |
|
BLAKE2b-256 | 566763ec79841a8323a34596db7764880c9d555374dcab3ca7bbd91b9333dcb0 |
Hashes for llama_cpp_cffi-0.1.8-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09f4e141b09bf14d92bb2a7b11d94edb6f22ed60bfac8f5ba15f7d36bc8195c4 |
|
MD5 | 3dbccd9ac1c1f0c780ffe9a900f87e3d |
|
BLAKE2b-256 | d2411f80eaec046dc47494bf8aad5860c42e1b58d2847c89732f2a4ed08b675c |
Hashes for llama_cpp_cffi-0.1.8-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90ae0af45fcf334a0b4214425d13f266aaff76d35da7b13ee14c4165ae9bf402 |
|
MD5 | eac3d41583ea07a2bdf0bab6b9c93892 |
|
BLAKE2b-256 | e0af613fdaa14935aa6ad4be25f8654fba9dde0880164716549acf61f22b5990 |
Hashes for llama_cpp_cffi-0.1.8-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51282316d62281a6982f5ee2055936ac851b206e7f0650dfa647775203316f8c |
|
MD5 | d00dbf2fef5592f9f4a43cc41c615108 |
|
BLAKE2b-256 | c55775f8213f959f87ff3bd11d129a827b781d681d1b44db32c533495e4e7ee4 |
Hashes for llama_cpp_cffi-0.1.8-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00fd61ee3a129446504e5a2ff1c816b61f1f332ea15e771e0136044a630e3615 |
|
MD5 | 15f74f28fe370226ef62a8298051a252 |
|
BLAKE2b-256 | 532772b9b3e47fce7ffa5bd98d140754de300489450a00d9e6c92a3817bbc624 |
Hashes for llama_cpp_cffi-0.1.8-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b7dee76c023f362ff1907c569817df965ed81e7cf5b2673be0cbbf1afdebe53 |
|
MD5 | 933b7f3cccbae8a23203c5aedccf6502 |
|
BLAKE2b-256 | 863a4f85b0dbb03e8b5a11fd9d84f62beafa93395e950e9e7c6c6deb304aef52 |
Hashes for llama_cpp_cffi-0.1.8-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 840c3fb9cf775116e17f154a5898c007bedb420d2562c6735a611cfa1cf2cca6 |
|
MD5 | 28f2cc917102bd68727e5c6c8b2f54e1 |
|
BLAKE2b-256 | 26dd41c192e10444230478a7baa3659dabc016e53d90132232bede20cbab235b |