Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.7-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eca694e734e9ffcb040b90849658b4be9280f46b53e32e9cdd596de228329cee |
|
MD5 | e7d711055208962d562563f53def796b |
|
BLAKE2b-256 | 442a826e7d861c728598d52c80489e1b0130e80703f802c7e0b2531d6803efce |
Hashes for llama_cpp_cffi-0.1.7-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11c8d7ac22e76875a0a0ef6264a2f3825468247a0f91a45a2fb573a2b0cadb6b |
|
MD5 | a0174282c623298dd7a6e2f0e2f7f5ca |
|
BLAKE2b-256 | b818768ad2152b96e1cf0fede122658ba1edd4a916527263f3358d0236b50e30 |
Hashes for llama_cpp_cffi-0.1.7-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9c2dd75b7e9ff356a06c478603a8ea3a240786399cb909bc868451e4925a664 |
|
MD5 | 52cec58f1052b679861ce66075d42a93 |
|
BLAKE2b-256 | 3deece983f3ee88b46dc4bef17d3ab6e291db371894fd79dbf7e5d1789c1ad78 |
Hashes for llama_cpp_cffi-0.1.7-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0899ac51edeefa741a1d1e506d17db2abd5396375314ef78629303f123f815fa |
|
MD5 | 1a93cbd2d0b14382c852d467d9957b0e |
|
BLAKE2b-256 | b4b0160e6f1acd2d2b17dd529dc2bbe4d0fee34aa1beec7bbb5a16f352563dd3 |
Hashes for llama_cpp_cffi-0.1.7-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3934f29a25aa6f37bae3dc0cfabd3da8847ae27aa0f6fb339db6875ad1fefd37 |
|
MD5 | 91e4f4ef155471447aa8d93fed2b4395 |
|
BLAKE2b-256 | 562b2b3d2f5b8d5b76d7e326cac7a4aba991939c3a6eff85d24e2de7bf5171db |
Hashes for llama_cpp_cffi-0.1.7-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7591c15537dc9b3b322cc208e8fdfb148d9ecbc434eb6255c29331e1d053806f |
|
MD5 | 57899192533b348705b0bdb386f26833 |
|
BLAKE2b-256 | 87d742705b63a8be1203c71e7b3f78ecb95ebecdbc0911c175ac8645686e6f14 |
Hashes for llama_cpp_cffi-0.1.7-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1643cb7771a241a2418a4affec22c8693e69d54d52f437727a8bfd6ad2ce50a |
|
MD5 | f413a737ca40bbd9b86f1fc378d93fda |
|
BLAKE2b-256 | b998f1af4a84a4435f94cc783815d8459fe263fc4615335eb43a9cf695ada679 |
Hashes for llama_cpp_cffi-0.1.7-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf500eacd72ee238a68e84f6a8a56e7bdbb456ab1c6924d7f870c5b4c53c2a58 |
|
MD5 | 46f41ee55c625de2b7696ca65c0d3483 |
|
BLAKE2b-256 | 3ca1014f3f505485a192f9c2ed0c3bb9fc0ebaa2856e00abe777fae580335f5c |
Hashes for llama_cpp_cffi-0.1.7-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1dc269ec34b9d32c11136956335d718e52d42f6cc918ca5a96637838f2ab706 |
|
MD5 | 2a7d437f40ce87591afcade91ac30038 |
|
BLAKE2b-256 | 7752ae3e56cf41aae500dcd41f03dabaa3e95367a8d034086d13cf084f98a2a8 |
Hashes for llama_cpp_cffi-0.1.7-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a19443f7810178dac92f003007777aede5009f533a26dc40a50fcd9ccd1eae4f |
|
MD5 | 73139d7c970895bb50efb3e4e5447d24 |
|
BLAKE2b-256 | 5385d2a137469d5c5941054630c0c65c9908d90350308aec58adbded6e4d9bf5 |
Hashes for llama_cpp_cffi-0.1.7-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fd91cbee093ca9aef9cc378d4c6548d3af614ea8a7cb614714f498005a4fb49 |
|
MD5 | a2d416bd7cc60bfa34a17945b1d50919 |
|
BLAKE2b-256 | 32cce191cd62e2e63ed146590c86871a79e946888c8b351920c46147c470c70f |
Hashes for llama_cpp_cffi-0.1.7-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86165ff5a941bf8bf7fc561bca736f9ac4b642e57e7e1b0ba20997e7720a4e16 |
|
MD5 | b4a5c6c1ff710f0adb3437f96d34df46 |
|
BLAKE2b-256 | 7295082d35b365ef50a8f736904f063a34a271a60b28c354f63108cf217b119b |
Hashes for llama_cpp_cffi-0.1.7-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 709cad7a5bcb05bafbd7acb45529d6f1a4874099fd0382fc7590956e1042d45a |
|
MD5 | 589915a9c0d7f8f511f9429f9e270254 |
|
BLAKE2b-256 | 4c1b2a780844801f64474cb17886fd8aca2e655b4c911439915fdf0df7ab1693 |
Hashes for llama_cpp_cffi-0.1.7-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1c791c81c9c568a5d32c1730f90842d39d35066d2246bb8e3e7235c16f5ccb3 |
|
MD5 | 50ab4fe9c496eecd3e9c7c90f18e52dd |
|
BLAKE2b-256 | bf83689bfe131f9f394cee596ec8997c7420bb5fb09b9eb5c9a1a23f4b4ffed2 |