Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.5-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 639693b0fc5cfc0d57e7b65923f641f193026a712810479545d79721a4d4ccdd |
|
MD5 | 3041563847ac9a08b7897d6a75a6aa9d |
|
BLAKE2b-256 | 394387abc8905d52a12d29889e0bae15391b2355d79f8b64b1b66ee0303bbf96 |
Hashes for llama_cpp_cffi-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c36143b7c6e8df129d952697c6a4e48285821170d55599c87b4c3514d89b273 |
|
MD5 | 2086e50a2e09b29297d24525718f69ee |
|
BLAKE2b-256 | 0304aac629ee47c08c59e4828d108dcc430865abab226df7e4cf8212d7b018de |
Hashes for llama_cpp_cffi-0.1.5-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9edea6b760af0b84afb0cf6eb2a8a0b7b4e216b43c45355261b81339979a027 |
|
MD5 | ca32b6568ec87e0fe373e75b04212ff9 |
|
BLAKE2b-256 | 73bca59ac8afa1cd6c4f3262a49f3b6eb720434610d6de011aac3fae7f18cb5d |
Hashes for llama_cpp_cffi-0.1.5-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3deb95236b224efb2665eda08bd8153e0a7e7e4ed2f7a76f465ed4e67e70377b |
|
MD5 | 7b7a6e1544bee7c890f157d80001cd9c |
|
BLAKE2b-256 | 78ed10ff32e91b4c3ebe102e17c5a277fe1be7023a6917b32f4a8e5ed1290c66 |
Hashes for llama_cpp_cffi-0.1.5-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35bc57564e07d75302ee5c4e04ca8c8b0f943a52b5fbc0994ed00b68d34d28bf |
|
MD5 | c2f4b993fbcd8a7ad8ef241d9c5fc8bf |
|
BLAKE2b-256 | 4deeee0d47092c698f603c25a3a7e1c7a70d697c65656c747616035c8a071a0b |
Hashes for llama_cpp_cffi-0.1.5-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f91ffcc96fb3de2d23f40af5b6eab219945823271285eaa610ce92fab9c364f4 |
|
MD5 | 2a8337fac4f11937ccae3fd2ef83e5eb |
|
BLAKE2b-256 | 3a7a9354d20b94ed99d9766229d89d244de0604b2c80c7b6127bc7d38b947b73 |
Hashes for llama_cpp_cffi-0.1.5-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef77b153d16a3fcdc6cb26aa9f5f3e7de57fef2578831b313926fa2224c95c83 |
|
MD5 | 159e2f2c140bf8046831c8df1091fb2c |
|
BLAKE2b-256 | 220fc740941b5e4ab863d555af18ec31a08a7c9f01ded9e890ad14e9d912981a |
Hashes for llama_cpp_cffi-0.1.5-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 826627c9102dbb0eba4988dae917d7fdd750ea66330e8ef5fcf0e3d9b43948da |
|
MD5 | 6b26da489569d82def65c524c58ccad0 |
|
BLAKE2b-256 | d6353cb7aa1da502e5fa4460763ecb0d53ab1dcb494fdbb35e3915a68a0f7203 |
Hashes for llama_cpp_cffi-0.1.5-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03c98a73fb41a9cc9c1035d9b7e68a7ee8681d067e48d72cf0f3a9da54f34678 |
|
MD5 | a82be2ebecadab98f69e93696b38f3ee |
|
BLAKE2b-256 | d9fbb8a8864301de4655d2b9269d654039621bf9c8cb6113dee057e629234eee |
Hashes for llama_cpp_cffi-0.1.5-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1adee9e33cd4b488e596c19232d688d6b8bbf2245514601ed93356a7a77350e6 |
|
MD5 | 99c7bb148ec55c88a938395153dc4f34 |
|
BLAKE2b-256 | 66861f429a365da7341d9e3eb0cbd89fdf5fa625328807fff33d20afb28632ee |
Hashes for llama_cpp_cffi-0.1.5-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a02583bef88b327eae57f84673e926c83de123b2d6ebbf9a57f1455928b00ecb |
|
MD5 | ccecb1d00ecf6fa8e891297c4cffed32 |
|
BLAKE2b-256 | bc0fa7e181b96875e95882ff80299fec6ee8a0cdd1572a2502cd0a258710e86d |
Hashes for llama_cpp_cffi-0.1.5-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c2c88e48b0f7c0885cb4ecbfb149042cc86269fbb50c1edf17684dce4c1bdb9 |
|
MD5 | a89cf0aed71bd448d1d55d9a21a53cfd |
|
BLAKE2b-256 | 846a8b8cc7934b7419f1b7cf0a9b473826888e19b8f59bd8d890a9c6a71e2c35 |
Hashes for llama_cpp_cffi-0.1.5-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7770bc2118f59be39abc6cbd9a9698584e39b0e8332185f356d50f42e8bd5700 |
|
MD5 | 44e834e88fd0db8529525e8d97b95ca4 |
|
BLAKE2b-256 | d7968f2b79c7bb381f13af7436a75207d9679ad92034d9e96ec5754cf2e77701 |
Hashes for llama_cpp_cffi-0.1.5-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9450ee5b53729457467680a903f808284aea36bf86db211b70528c131dfe131c |
|
MD5 | 0bb4097a36e1f958574cfb7190bf289d |
|
BLAKE2b-256 | f0d7a9fb67f936833777c8052b31279829738ef0914371a301aa0087b200a33c |