Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 160.4 MB
- Tags: PyPy, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b76c6cac450e57343f648e253416397ac85e20eeb0ede19bd5467d98c5eb5740 |
|
MD5 | 4a9134ea7fc99acaae859158c5ca9609 |
|
BLAKE2b-256 | 1fe81c57c27dd292856a6015fae89d77b8c7bb1a086638c6032e21a110697bb6 |
File details
Details for the file llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 13.4 MB
- Tags: PyPy, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99d9b188bf49daffbef2380fb1b32057427f31c2f3719b61726a562bcefb8add |
|
MD5 | 17c3f529ecfd2eb404aeaec29f010926 |
|
BLAKE2b-256 | eff48ec85a771dd51a1e075f222979ec8078e1d673d83e047aa3dc254cd16c3c |
File details
Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 26.1 MB
- Tags: CPython 3.13, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8057b4471d22dbcb895a335ed433b57020304309e74fcab38f230b6f38a75444 |
|
MD5 | 82199838931ece4d28c90e46860957d2 |
|
BLAKE2b-256 | 519f624e877bda369e1432aa77f0b7d7990bac1abd148b48de31bbd504866ef7 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 11.2 MB
- Tags: CPython 3.13, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42215c3c15432bc81b7a830b3100a2188ce8c1ad9f013ee47367afabd88972b2 |
|
MD5 | 752529145441045f9e6f3f210022e1b1 |
|
BLAKE2b-256 | df2941db1daf222787e21112e7a2cb5feceec7b4ed61be4dee1523835a4ed58c |
File details
Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 160.4 MB
- Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 304e844a38d41db9908af9affec082fb6be23cd44caedc9a0e0d80a3c7821085 |
|
MD5 | 4ca420cf2f60f4952bb7f5282f1806e1 |
|
BLAKE2b-256 | 459929441aa32c8991bdf43e0da7867450558cae5bb8ce6c5f58e4fbd83b413b |
File details
Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 13.5 MB
- Tags: CPython 3.13, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d3694fb133a1f06d198dce91b4292eb7b2d9cd06af4256626fbe15a9e511384 |
|
MD5 | 3f76696a136c2de2d495678a93abdc57 |
|
BLAKE2b-256 | 2d52961620408199589feb112d5a15523d06b73616fe5bf5e9a33d119931c4f5 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 26.1 MB
- Tags: CPython 3.12, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73b7e570e21613c1c2ba3e2404e03065cd7ec3b1e458d749cd7b696aea6bda0e |
|
MD5 | f798f4394f24be856b32b7b7a2088d8c |
|
BLAKE2b-256 | 10c0aa89774f11056f720f0ee7e8478bef11aea465f666b495cd30e1e2314af2 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 11.2 MB
- Tags: CPython 3.12, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06ee6250087c0a26a7e872e896cb9026f878b231e3cf41118098d134ef6b790c |
|
MD5 | d8eeb34b3b53e55e319135bac504382e |
|
BLAKE2b-256 | 81d658df058138d79b655c00637dcb999cfc485ed62746e7a688fe2890e42db3 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 160.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12d185b0fea9b8eade16849d8ec11a706a3302fc4522a41b6acede469b5e0566 |
|
MD5 | 30e8bfb233c5772804ef7ff5ff690cf7 |
|
BLAKE2b-256 | 70109fdc9c7a29d77362c95240e839c6ca48281c2efa13773b664895156c710c |
File details
Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 13.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eef28a95898a94f83c79d3d7bf17ce10cdd35b3c60dd6e8beb5a0084d3156bf |
|
MD5 | f92f1c99159853c28d0b9a992bee8e9f |
|
BLAKE2b-256 | 54b5711a98cfe569780f21332d45dc10270b8f593230cd354bb86ca1a8734f72 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 26.1 MB
- Tags: CPython 3.11, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 977dd1776a17dfb7c8351a9af4f8885f0866c725064eac90a673df708fb20dd0 |
|
MD5 | 92f6c7fd2810efa12990a424c57ceca1 |
|
BLAKE2b-256 | 2dcd69860e80804c9ce00463bae45b922a94f869f9c0a25b5018b391e5dcb256 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 11.2 MB
- Tags: CPython 3.11, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f71db2a5b6c8a99d59ea0967f7a2908f4219aff61228a7fa0d77b60857676f2 |
|
MD5 | d3a08bdaf2c8d41261e0119cdbd90c06 |
|
BLAKE2b-256 | 99383d3ccfb4cd8dce33fd7769a48e2292b12e7072a26adf42e567f2a1163551 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 160.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 357ef98a40e0e1408860d042a83a44a41fad5d0b5402fb589ee8b4e867c5b3ed |
|
MD5 | 3cb4ec71b113d08c82c7842f72c0cce8 |
|
BLAKE2b-256 | dfb18ed375ccd272d92cf6f3e141a0da11407a95c11f57367b6511910a262d25 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 13.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f13254bd8fc48fa9660be938e8df56e4b541342d28334a487409a7ba9797f4da |
|
MD5 | 42d5d856091765284d14a83bacf75138 |
|
BLAKE2b-256 | 046d3e0d917dd845c3f22b6dde918a984e85113e2dfef57f5fb8fb44beb760bf |
File details
Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 26.1 MB
- Tags: CPython 3.10, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9eefa81bd791d1cd2e37b6fe01123baa558f6927ccf231c530207b167e03a94b |
|
MD5 | 18209ae4f1bac3d6fec8cdda2b38c7ee |
|
BLAKE2b-256 | 6745ea7b0bfefe5c9b0c811261fc6e638be38a5e9498fc07194c6b4dcb608397 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 11.2 MB
- Tags: CPython 3.10, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e72353095dd7715c6f18e038eb761af5e7f9f4aa52c2624f1bea66a3ff61e3bd |
|
MD5 | d9822f15e319d03691e8a0807f0fd9fb |
|
BLAKE2b-256 | 955989888a71cb636717211beef4bfaa122a499d876b9a1713dc288fbf687bef |
File details
Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 160.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d78b5e0adde6f9112bab381bf828ac88a02d57533b12f9f284573c64d7eb420e |
|
MD5 | abeb4eb3a7519dd20b75b6b6acba4b19 |
|
BLAKE2b-256 | 63fd13d05141c1e706d626f8738f5e18b039499b1a0cc5fdd8c9628959153741 |
File details
Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_aarch64.whl
.
File metadata
- Download URL: llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 13.4 MB
- Tags: CPython 3.10, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01c8a1c3949736dd7c48421547107ce4d9852b37e489e5a2d518cc69f76886a8 |
|
MD5 | 83c782aeaf266b8ec9133679a3549709 |
|
BLAKE2b-256 | 149dca3696b50c20646a99a5b78df186217f77d553ac40654c5066dcb3470950 |