Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, CUDA 12.5.1 and CUDA 12.6 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.12-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b16cf33f2f3269e6ff8fb374f9ff12247bdadeb3ec08230ffdb49b420ebf62d4 |
|
MD5 | 718d9950440e9facf9886fa9095937b6 |
|
BLAKE2b-256 | e5d502e5675fdafa61994ab20bbdf3dda0706d5d0cc0131a86e28c83edb5ef3b |
Hashes for llama_cpp_cffi-0.1.12-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ad281f89337380d29a976ecf2cb87e6ce144103b08bed7fca28abaaa2e705da |
|
MD5 | 5155e09166af1fad36c08d80d4a2d9b3 |
|
BLAKE2b-256 | c860372117b21dba3d7ec0ae7c17e819214c57b67c9bfc42cc3be9e66d9dc182 |
Hashes for llama_cpp_cffi-0.1.12-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04dee829e6c6454ffe203d08d989c86a24800c1cdc787f3895c364558caca5ba |
|
MD5 | e7844378d21f337e1c3c09dfb4d4b8d9 |
|
BLAKE2b-256 | 9131a40fbf2b5de6494ab12caebb83ee258de0c6d901680b732756567f63d9da |
Hashes for llama_cpp_cffi-0.1.12-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb4cd3ef319a3ed8a586c34ed2b13b4bbc079cd2b7fa8dd259f4f12a1f9b8b1d |
|
MD5 | 9377786b538ee43c4e57c57028038b67 |
|
BLAKE2b-256 | 5379e7bcbfde6be44cda815a29d9e1395ebf0bfbd77b4b8f57deeecc05f0e5f3 |
Hashes for llama_cpp_cffi-0.1.12-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e1bc16ec13400b447799895c53365cb196953b7ce9826a4d359667d0f1d12b7 |
|
MD5 | 000fd6984cadff90008aabbc014903af |
|
BLAKE2b-256 | 8153bc5c4f4b222097894f4defc19d013eabc7ac3253c4a2a16481495053980b |
Hashes for llama_cpp_cffi-0.1.12-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4b1bca0856ddcea5fda10d518a8b19ce5ddd9101953b6d5fa3f2b3ff12e5752 |
|
MD5 | 190dbc5644ac86aa9d507455724df08d |
|
BLAKE2b-256 | 9cfc8f051494df943d6648a71796dea5f2277d44477acd232ed387bba6d8f249 |
Hashes for llama_cpp_cffi-0.1.12-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d0ff1e2993243fc9ac27574c7f0b85d9277eb88a8e491530181336951f6dfcd |
|
MD5 | 572eef183cc313580c298596d43adb33 |
|
BLAKE2b-256 | 360413d3b96d81c1e91e384523a4c56e6cc3289f8cc8dc870a345eebad8692d7 |
Hashes for llama_cpp_cffi-0.1.12-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14a21aa0268c7751acb48b81c1a0f8297e3202e67e466cee7b8ea40ed5e4af43 |
|
MD5 | 9598e61975c32d9a736cde02b1d04371 |
|
BLAKE2b-256 | ce72242ceef4ac315b8e25e822c10b38317056cdf8d0e86b5a0275f8889f66f0 |
Hashes for llama_cpp_cffi-0.1.12-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 43578af831189f331a026396daffe1a5b5e2974b04ba27a181434709603aeaa3 |
|
MD5 | e7630c1037de3537f12853e384f91fcd |
|
BLAKE2b-256 | 4e8e7415e461404ee7ccb871ea85e772b8a36755902ae811f82770ae53d04951 |
Hashes for llama_cpp_cffi-0.1.12-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 910ad3d9d1d558f76e80a0d45a52f61ce5781571575319e8c47c2f087991e7ee |
|
MD5 | f9d6df72fd87a84414e222f00401a196 |
|
BLAKE2b-256 | 2bab443bd4a2111178c835a18791763ae81a9d643c8fcf44afd7462f376d14e9 |
Hashes for llama_cpp_cffi-0.1.12-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07cf769e0694e6a265876ff53424bd27db95df8652704060586c62bdc499366b |
|
MD5 | 639278c7c7a19db56cf686cd958e33ef |
|
BLAKE2b-256 | deaf96bb6497fae0a683abe449c6d64e1e70551c3b6e8d6105dac0581203ff76 |
Hashes for llama_cpp_cffi-0.1.12-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83fc7912ad850c820a5958c5c3f329e718a6ccab53e6f01bd06ddda74d9ccc7c |
|
MD5 | cedc720f33555501cd434030b0927ff3 |
|
BLAKE2b-256 | c964d1af8baabb186638b0dc2a7bb2ab7fa5fd1785c74591fe6fae074ab2bf7a |
Hashes for llama_cpp_cffi-0.1.12-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83d055fc8373d4d2458c887350271937f8e2f8f422f68d15f106c1f7024bfe1f |
|
MD5 | 05eca1108311371edeb25bf3ef883371 |
|
BLAKE2b-256 | eb19df5df7dd489df73bd0743c3ff48b2f737c5917568bba050079b12972a8d3 |
Hashes for llama_cpp_cffi-0.1.12-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6be74529a091f66d3dcf8d6a14c66abb901705aae033e4f897c099b392d1a64 |
|
MD5 | 3633aed232c7772437b83b1cea399b1c |
|
BLAKE2b-256 | 00af12b87e0afbe209db6ce88fa2467693ab834f0588e9d9721bb759ea16a754 |
Hashes for llama_cpp_cffi-0.1.12-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ee145b3aa70e891f155bde7e7265f90cd0e765f2b5fb0cd43885a29548af71e |
|
MD5 | 4415c591945412f2e38efbe77c4d2899 |
|
BLAKE2b-256 | ab001918d8267c215123687df106ba5d5b652781e3b03c359b0fd921c21aad7e |
Hashes for llama_cpp_cffi-0.1.12-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d7f918afe7300310ee35ca43328516450d1cbd5ab4c059b07c4aa56b80d7700 |
|
MD5 | 2c57538336b214eec99acf8dcebe4394 |
|
BLAKE2b-256 | 233d8acb32564e2662be57b8a3fb0e122d6dac2ea28a00e49ead5d017ae50b39 |
Hashes for llama_cpp_cffi-0.1.12-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f13972b552e4ef7bba713a5c7c90fe2838f80a1e9d90c82e76b15842d1f81b4b |
|
MD5 | 5ee626fde0078825474b2859d3185dc6 |
|
BLAKE2b-256 | dc7609b687d393d5f676d6393a01f7404bdec5c976cdef444f1a603bd27f1b1f |
Hashes for llama_cpp_cffi-0.1.12-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0b2a3b12670eec126d8939c16cb2170df213f9fceba6faf71a6bbfd35774ed8 |
|
MD5 | acbcbb9a990674fdb32f573a4a2dc4cf |
|
BLAKE2b-256 | 741e4f5ba22dc959b46d7f7caa689851ea0e88c54ebbbe942a2d9e4fb6e9ee3d |