Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b76c6cac450e57343f648e253416397ac85e20eeb0ede19bd5467d98c5eb5740 |
|
MD5 | 4a9134ea7fc99acaae859158c5ca9609 |
|
BLAKE2b-256 | 1fe81c57c27dd292856a6015fae89d77b8c7bb1a086638c6032e21a110697bb6 |
Hashes for llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99d9b188bf49daffbef2380fb1b32057427f31c2f3719b61726a562bcefb8add |
|
MD5 | 17c3f529ecfd2eb404aeaec29f010926 |
|
BLAKE2b-256 | eff48ec85a771dd51a1e075f222979ec8078e1d673d83e047aa3dc254cd16c3c |
Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8057b4471d22dbcb895a335ed433b57020304309e74fcab38f230b6f38a75444 |
|
MD5 | 82199838931ece4d28c90e46860957d2 |
|
BLAKE2b-256 | 519f624e877bda369e1432aa77f0b7d7990bac1abd148b48de31bbd504866ef7 |
Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42215c3c15432bc81b7a830b3100a2188ce8c1ad9f013ee47367afabd88972b2 |
|
MD5 | 752529145441045f9e6f3f210022e1b1 |
|
BLAKE2b-256 | df2941db1daf222787e21112e7a2cb5feceec7b4ed61be4dee1523835a4ed58c |
Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 304e844a38d41db9908af9affec082fb6be23cd44caedc9a0e0d80a3c7821085 |
|
MD5 | 4ca420cf2f60f4952bb7f5282f1806e1 |
|
BLAKE2b-256 | 459929441aa32c8991bdf43e0da7867450558cae5bb8ce6c5f58e4fbd83b413b |
Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d3694fb133a1f06d198dce91b4292eb7b2d9cd06af4256626fbe15a9e511384 |
|
MD5 | 3f76696a136c2de2d495678a93abdc57 |
|
BLAKE2b-256 | 2d52961620408199589feb112d5a15523d06b73616fe5bf5e9a33d119931c4f5 |
Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73b7e570e21613c1c2ba3e2404e03065cd7ec3b1e458d749cd7b696aea6bda0e |
|
MD5 | f798f4394f24be856b32b7b7a2088d8c |
|
BLAKE2b-256 | 10c0aa89774f11056f720f0ee7e8478bef11aea465f666b495cd30e1e2314af2 |
Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06ee6250087c0a26a7e872e896cb9026f878b231e3cf41118098d134ef6b790c |
|
MD5 | d8eeb34b3b53e55e319135bac504382e |
|
BLAKE2b-256 | 81d658df058138d79b655c00637dcb999cfc485ed62746e7a688fe2890e42db3 |
Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12d185b0fea9b8eade16849d8ec11a706a3302fc4522a41b6acede469b5e0566 |
|
MD5 | 30e8bfb233c5772804ef7ff5ff690cf7 |
|
BLAKE2b-256 | 70109fdc9c7a29d77362c95240e839c6ca48281c2efa13773b664895156c710c |
Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eef28a95898a94f83c79d3d7bf17ce10cdd35b3c60dd6e8beb5a0084d3156bf |
|
MD5 | f92f1c99159853c28d0b9a992bee8e9f |
|
BLAKE2b-256 | 54b5711a98cfe569780f21332d45dc10270b8f593230cd354bb86ca1a8734f72 |
Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 977dd1776a17dfb7c8351a9af4f8885f0866c725064eac90a673df708fb20dd0 |
|
MD5 | 92f6c7fd2810efa12990a424c57ceca1 |
|
BLAKE2b-256 | 2dcd69860e80804c9ce00463bae45b922a94f869f9c0a25b5018b391e5dcb256 |
Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f71db2a5b6c8a99d59ea0967f7a2908f4219aff61228a7fa0d77b60857676f2 |
|
MD5 | d3a08bdaf2c8d41261e0119cdbd90c06 |
|
BLAKE2b-256 | 99383d3ccfb4cd8dce33fd7769a48e2292b12e7072a26adf42e567f2a1163551 |
Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 357ef98a40e0e1408860d042a83a44a41fad5d0b5402fb589ee8b4e867c5b3ed |
|
MD5 | 3cb4ec71b113d08c82c7842f72c0cce8 |
|
BLAKE2b-256 | dfb18ed375ccd272d92cf6f3e141a0da11407a95c11f57367b6511910a262d25 |
Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f13254bd8fc48fa9660be938e8df56e4b541342d28334a487409a7ba9797f4da |
|
MD5 | 42d5d856091765284d14a83bacf75138 |
|
BLAKE2b-256 | 046d3e0d917dd845c3f22b6dde918a984e85113e2dfef57f5fb8fb44beb760bf |
Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9eefa81bd791d1cd2e37b6fe01123baa558f6927ccf231c530207b167e03a94b |
|
MD5 | 18209ae4f1bac3d6fec8cdda2b38c7ee |
|
BLAKE2b-256 | 6745ea7b0bfefe5c9b0c811261fc6e638be38a5e9498fc07194c6b4dcb608397 |
Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e72353095dd7715c6f18e038eb761af5e7f9f4aa52c2624f1bea66a3ff61e3bd |
|
MD5 | d9822f15e319d03691e8a0807f0fd9fb |
|
BLAKE2b-256 | 955989888a71cb636717211beef4bfaa122a499d876b9a1713dc288fbf687bef |
Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d78b5e0adde6f9112bab381bf828ac88a02d57533b12f9f284573c64d7eb420e |
|
MD5 | abeb4eb3a7519dd20b75b6b6acba4b19 |
|
BLAKE2b-256 | 63fd13d05141c1e706d626f8738f5e18b039499b1a0cc5fdd8c9628959153741 |
Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01c8a1c3949736dd7c48421547107ce4d9852b37e489e5a2d518cc69f76886a8 |
|
MD5 | 83c782aeaf266b8ec9133679a3549709 |
|
BLAKE2b-256 | 149dca3696b50c20646a99a5b78df186217f77d553ac40654c5066dcb3470950 |