Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, CUDA 12.5.1 and CUDA 12.6 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.16-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dcc05acfba530a8a90cb04c3e81966ef29b91b619330d1ded1b3df7858aca3b |
|
MD5 | 94f8621b156037250d66b01e1fdeaa03 |
|
BLAKE2b-256 | 53c04996b06d048f18946c215ab1e028f587034a3ba856af1f0a0e5073cf68e8 |
Hashes for llama_cpp_cffi-0.1.16-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f476085c4a90c1f2df9c18a3454c1f6b25a547dc4b11987c12ac8056508b2531 |
|
MD5 | 0c751d9618fbf9c37d01fb2440d2f7a9 |
|
BLAKE2b-256 | 521caea582bcaa1581112a0963cdef75aa5374b8d4e3c92de49e55332d1f9029 |
Hashes for llama_cpp_cffi-0.1.16-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c77b2ae24a4530ed077d0da9a4b36386a5ce3ebfade584f96667f5a5ff561f6b |
|
MD5 | d5ca5c4a3cf08af906be9b6d0348ff42 |
|
BLAKE2b-256 | e93228a45ce80a40c21a536db4034db9aca5a6109469125e38581015d8ad02ae |
Hashes for llama_cpp_cffi-0.1.16-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 680169dcacaf2d68e6493024670a02db320d84ff474a5852e298bb7c83371c6a |
|
MD5 | b0e0aeadfb60c8a7df960f8bd98728f9 |
|
BLAKE2b-256 | b99346cb09e114e43c249473e946a4a6f78725b72d040404568e508b91afee51 |
Hashes for llama_cpp_cffi-0.1.16-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 646cb58f20dcf5927e9e45e36c1eb049f0f7918eb16c67769600bbb31a8b31a8 |
|
MD5 | ef8b10e3faaad2abca79a068f5c8c7f9 |
|
BLAKE2b-256 | d990076812bdc17c9b95547a3e51a5e22d05496228ea3f0ee13994ebca3253ec |
Hashes for llama_cpp_cffi-0.1.16-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cefa06d636bb061ecc7b432f79a94d8a17566615568b15f8163451f9c9292ae6 |
|
MD5 | 08550c242ff158ea4199f29786721cdd |
|
BLAKE2b-256 | 79c8a8f9b1618092e92697c8d474553a3dd4051b7e5ac191774ee2113dbc8b40 |
Hashes for llama_cpp_cffi-0.1.16-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e739f89f838af4cce7b8e1b7539c41aa208c269dad3d1f8c9cabace14900b16 |
|
MD5 | 757970460efaf9fd8f56c9497af1b16d |
|
BLAKE2b-256 | 0594ce5d8b757afa06c0c4767ff297cfac95d3b97599eefc39bd85470430e9ab |
Hashes for llama_cpp_cffi-0.1.16-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97a36bfdabebbeff6b00b0c8a602f31f47947b608fd186c29dc4b6a07dc4af4d |
|
MD5 | 7e544ef05a99b63804f2598144b96bef |
|
BLAKE2b-256 | 2069d8b75fed99ee1552108fbc51c3b028a2dccc404f3227a6e666763d01b254 |
Hashes for llama_cpp_cffi-0.1.16-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a08b9ce1e78420e8660041e4685cbb97b885dde6edb53378bb5ef4bf18b345ec |
|
MD5 | fd88983311b7aee620cbd7aacdb2c4cd |
|
BLAKE2b-256 | 9718ea957a81ce0a28bd018a253619e3c06f920cbff2168444b6c3b0d0f2feeb |
Hashes for llama_cpp_cffi-0.1.16-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 868e73cf1899d0c70b6996363e913e4aeb10c0ec032df611c0b83c1a05866902 |
|
MD5 | f0ecce521bf9fd07cc8cc6c6e7f0a496 |
|
BLAKE2b-256 | b8a37c926e427bdaa766f6218a2b1393b2e5065acdcebf5f8a39ac8b2d125669 |
Hashes for llama_cpp_cffi-0.1.16-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eab2c64a8c766ab7d073c52b812f516c080ebc8caee171aeda47d4cf5fc6421 |
|
MD5 | e5c05a6770152dd0b26ff3b48712efb2 |
|
BLAKE2b-256 | 10f824922a80c789bc48aee5de6de6b5a4cf9f2471f6506a9cb4dce240acc4ac |
Hashes for llama_cpp_cffi-0.1.16-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b8fd33f8ce4066ad55c58ad7dd3f5428be6c9499eca1ec00b6a532f5d1b60e6 |
|
MD5 | 2b771a1703af677b8794e37f92d2f18b |
|
BLAKE2b-256 | fda8bfbec991743bbf04c9c958311784182dc06c34421c8cd87079666e3262c5 |
Hashes for llama_cpp_cffi-0.1.16-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 003190366e268a79ebb73e9e5c65200741743e297a64b4f4647e9b56cc6f7449 |
|
MD5 | b9703fc760e57a95cdf90f5744d51a9e |
|
BLAKE2b-256 | 33838562aeb1c2c792f6679459e196707b7cf5171bf45fcd4415b85fa506dc37 |
Hashes for llama_cpp_cffi-0.1.16-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a2944da782930bcf45164e5155ae58910c5b55660e7415c82c65c3f956ea6fd |
|
MD5 | 779890e421ca12e078b0e9d2e94c0034 |
|
BLAKE2b-256 | 88ff3e9dc1b1cb4b35ffce5deaec154340cec6fe64fcf4d51e94ee97afe95931 |
Hashes for llama_cpp_cffi-0.1.16-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe84c3b3f0f5c45327fd8a30760a0a6eedbf2c12d137bdfca9fe0407efc1b658 |
|
MD5 | 7b5edef667cc1c0c486707a2d82eb2e0 |
|
BLAKE2b-256 | c9d175fff126c55bbe0929cb81cc5b5a668103462466a3e73aa4e68d8b5f66ba |
Hashes for llama_cpp_cffi-0.1.16-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6bac0693b122320d669fb525dc461796f435043fc3fb41b58bf6e56813c6f7b |
|
MD5 | cbed5db104bca32fff3bedbf2f628e4b |
|
BLAKE2b-256 | 70531ad30a240fb94950ffbcc3dcb65e5cf0438a10762ad945168e8ec625118d |
Hashes for llama_cpp_cffi-0.1.16-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5b0fe4de7098bd9d9d69c44e8d6234cb7f45c1b50ca3e87eeb2aa91ee6dbc2f |
|
MD5 | 1dedcf2e1c1d8a97343ba421b1890283 |
|
BLAKE2b-256 | 6b6174e48268d6bae62af04f1032c9965aa775b6781253a3f99c5d77536127ec |
Hashes for llama_cpp_cffi-0.1.16-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea2ce7cd5bb82a2a2ffc35df74c6849fc5ec32344382988fe18139066073d59b |
|
MD5 | 43131ca88ad97c60679f6778a2339eb3 |
|
BLAKE2b-256 | d6dbc8996c44d87f2ec2e948df760ddfe05dd12b4d40752c55772fc380967d32 |