Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.19-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fc3c69c2331199e49b857b4fed08b6595f593dd13a9f51ba845a5deb079f3c0 |
|
MD5 | 9fb3c2152cf97ee42f63c346e68a5708 |
|
BLAKE2b-256 | 61b26c5be8fdace28d001348fe79789697b945864f4dbfdd6344d69fb16a7e7f |
Hashes for llama_cpp_cffi-0.1.19-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c677e1aa323a890675c8799dffaacca4d43b242a633f077d10932aa3309952b2 |
|
MD5 | b3deabaad9e3737cfd9909f1cb8ee9f7 |
|
BLAKE2b-256 | 052af0c3993146ce253f7592f0f5b29e85706b70f63af2c128fd823f1dc506ae |
Hashes for llama_cpp_cffi-0.1.19-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72201d327bcf93c72fb2266b32aa52c4b7485b8e6428caf958003df14637d680 |
|
MD5 | 8d3bedd7aa73877061f43c88ac75078a |
|
BLAKE2b-256 | ff81679355491e69c2be7759985fc6a10b02918eeaffd04fa40665e29cb0cdcf |
Hashes for llama_cpp_cffi-0.1.19-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fe2efaa44f10a71b498aa7c1be381270d4dd160a2a29f7716e1156686249d80 |
|
MD5 | aedb7d7d4d4fe5f9445e7a59ff633fe0 |
|
BLAKE2b-256 | 98c6c6bd0337285288d5e6625c9e1f36657d8a26ea7889a7ac35ec6b5ad14556 |
Hashes for llama_cpp_cffi-0.1.19-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df078e9b471fe28ed75182f1291a27b266699945a29e5a754d96851804cf5415 |
|
MD5 | c04edd4ed1d879db8e0669e08baa1773 |
|
BLAKE2b-256 | 4f5fe95ad54d73710847e5e2679e5117444a0179b2184daffc2949c86ab082dc |
Hashes for llama_cpp_cffi-0.1.19-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7797b59172e9d25e015271271cf7efafdb1c99dfc66e02228e17ee88d72a4c4c |
|
MD5 | cb952f0b86115afde125ad747ca1c5b1 |
|
BLAKE2b-256 | 0e6612dd17c9200da9c875e257fd10abfe1be811b46f870f1dbd33264bba41fb |
Hashes for llama_cpp_cffi-0.1.19-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fb7980d5d9200eec9c69b2b981d4abec2ba55e94954d60ee3746b7eaaf2f2ff |
|
MD5 | c7a99b3a01a6f6e815e61584f71b0b86 |
|
BLAKE2b-256 | 713a53ae880f0205bd6691fa6cb5e16ec5c051ceff2f922703d8948793e4f57a |
Hashes for llama_cpp_cffi-0.1.19-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c862eb34790fd1c98a6ce7ae198c3cd8660917e2458bc05eb44194755af02c4 |
|
MD5 | 7dafc8abbe4bbbeae4181fe0dacdb7b8 |
|
BLAKE2b-256 | a3d5553e0df271667d9f6c3e79710fcf79ec7e7a291c54299f674ab1fd2ea6ea |
Hashes for llama_cpp_cffi-0.1.19-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ce18ba4ac029613570e868bcea7b7e91a337665e599094401207ed11cbdc7f7 |
|
MD5 | f025e46851f9852f7d59c5dffe40dda2 |
|
BLAKE2b-256 | c516500ce4c662ca66dbdd4780b6684c4d139515e15599a70d17e938716a107b |