Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, CUDA 12.5.1 and CUDA 12.6 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.15-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b3b512efabcbf6149495383d3adf3a65dbfdd63f91b373982474ee41b97ad59 |
|
MD5 | c86052bc23a53635e25f86c064622c1f |
|
BLAKE2b-256 | d793e17bae835bfaa6b977bf2762d4d58a59e617831ecd4eec069168a88e130b |
Hashes for llama_cpp_cffi-0.1.15-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b08de28aee6d4bf9e17037837482e986e013901fce7e9d760ae38275f5a20fe4 |
|
MD5 | 7bc85990ab1ee509b7327a4833805912 |
|
BLAKE2b-256 | c9bdd7699840ebfbdfeebfa479bf8aecf559170465b1dfd848effed33cd58845 |
Hashes for llama_cpp_cffi-0.1.15-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9e1195f7a01208db9dbfc493025318aaf89779800bbbfcfde6c5ffc9d126628 |
|
MD5 | fd6eb5c4814364bf3ebd09505ba2eff9 |
|
BLAKE2b-256 | 6aa571213c8b8befc7b7df7572cc9ad7be2d0985c314692cd8e650aa4163c229 |
Hashes for llama_cpp_cffi-0.1.15-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb7a602382e437e075b6fc6906bc688d24884c5d65943579293752549ebd7550 |
|
MD5 | d98dcca5349f22c4c1a3bda7348f3ba6 |
|
BLAKE2b-256 | b53be337978946a12b2a9b05e436401bd0de1b4df0a2ef880ae7d0b296f71e1c |
Hashes for llama_cpp_cffi-0.1.15-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37e0e9b9a44f572b0cc52598e98a7dea8c5f4e1d76de029ac824190581d4827b |
|
MD5 | 9298e30a22b3f7266713f50a2e33b9e8 |
|
BLAKE2b-256 | d0448d3fd79cb793f3fe61f26fa32f3524299c9c996ef24d752ebaf32f30cba1 |
Hashes for llama_cpp_cffi-0.1.15-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4418ff10dfdc196f06e69242e9f7f7417c4058ca735d47e899ed0e7b28975cc |
|
MD5 | c8101db4cf5039dff5d027722a2ab929 |
|
BLAKE2b-256 | 5e22841c2e07b712e2a4a52a65961098aa2b1e9c864559a53fd4583989c6ec6a |
Hashes for llama_cpp_cffi-0.1.15-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e7319dea730522f724d30855507c70ebaa984b12815a62b3130386f1c1d105b |
|
MD5 | 50c1489228f40b5c74c46f91e58aba0d |
|
BLAKE2b-256 | f1044a7b157216fb9b84339677708fc353d65e8a0b58c7d8c021e3533d00eba6 |
Hashes for llama_cpp_cffi-0.1.15-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10d1be65ae1a129bda0389ccfacd0c15a15fa0a9b0a3f2a86b4b578631d0d33a |
|
MD5 | 6fc11df726041f7c37ad533d0e6ef4e1 |
|
BLAKE2b-256 | 6944d76e0ab445b534df3c36ee7614f3f1a9da90f721139cd39bc61db2ed2853 |
Hashes for llama_cpp_cffi-0.1.15-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab653b7f0ae384a2a1dc2220c2596efca06e3351bd845810f1792d1bb12104b3 |
|
MD5 | 1fbe2615498849de81af28d9fefb7c62 |
|
BLAKE2b-256 | aab17baee56c4872746f6ca1af8d7e4c07a7e3c1444122e63a97d709c88b4698 |
Hashes for llama_cpp_cffi-0.1.15-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11464695ba584df673b66cb4d15f0ae9c16e676f05ccc00e48d10ab32205c7e5 |
|
MD5 | 5cbc054725ba77d05902ff35e8497a74 |
|
BLAKE2b-256 | 0a5c574c86fa9cbd2be31725af4cfca62c5cc6f741a8acb42d0431407d791b2a |
Hashes for llama_cpp_cffi-0.1.15-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f3ecc130a30a2d9033fd18617a43c2a5104659f77118a5075d27633b26d6d45 |
|
MD5 | 6ff25ad338999d631067e0543933d633 |
|
BLAKE2b-256 | 291b140cc9fb1e75db667cfb73be755fb4675ee30a8e9a6a94819b4d87ee2852 |
Hashes for llama_cpp_cffi-0.1.15-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5b3baebafc4700dd7350eb51a7ec9291781e1519365e2cf95c9e209fee0a0ea |
|
MD5 | 0a9519697b4eb18c26446e1dfe2af1be |
|
BLAKE2b-256 | 79f6d5704b5d08b4dfa518e492f555c5601fe44816077deda3653ed727b17f2d |
Hashes for llama_cpp_cffi-0.1.15-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc65ed0bf4d3308cddb3221cb24b38b13ee048a2081649906f084378ada963cd |
|
MD5 | de0e9ae8cc3543c1e3f5947ed5ede0a4 |
|
BLAKE2b-256 | ca7c4e23fd265a436dcbfd51709f4f5cea8b10755980b0cbc6da29e1ccc99abc |
Hashes for llama_cpp_cffi-0.1.15-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f35eaa4e1c0bedf9e52f6d1decc8d1a6d4488d0a8b41a5ce4725033fafa45d0 |
|
MD5 | 152dd15f729ac5657e7160b17e27fd96 |
|
BLAKE2b-256 | 50d5c6226a03534ff762baeb3b37ca95e69648cfe03b222891e7dd9528db1779 |
Hashes for llama_cpp_cffi-0.1.15-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69ebf7de1291e2dcbeabdd8ca138a05aac12d908d1d87702c03c61e49d02336d |
|
MD5 | 6570db524ff27a076eea9de082575b7f |
|
BLAKE2b-256 | 13137c345a20cb38c6206907199af8a721008dc04f4294691a107acdfc10ccd6 |
Hashes for llama_cpp_cffi-0.1.15-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c956cb0a6ddfd39a1c7c57ca7e1248d98f56935c1747bf74c46470aa66ca4239 |
|
MD5 | cc8b3015c02067beda4576e83000d1ce |
|
BLAKE2b-256 | 906adacf175eea621127b236bcc35f17145f1036d02bbb547341bb11ee413851 |
Hashes for llama_cpp_cffi-0.1.15-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14e5c94560f4c312f4141b0a3b817025b879afcefeaf9f2618bd96f5d1c6315c |
|
MD5 | a1ab8620d6c750defd9a4016309e2767 |
|
BLAKE2b-256 | 2fffb5574e81653ce93890134d28ad71101f13a5bc0223a134d1aeb334411faa |
Hashes for llama_cpp_cffi-0.1.15-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea0a4bb66a1d177ef1cec9d4709156f4afe9885a0d5f908240769cf5463014e3 |
|
MD5 | 26d38f2630ff3dd93e64319a34470a08 |
|
BLAKE2b-256 | 8ad4a20d8415be2cdddc6558fa6b52241a7b835d8cc600218ec3f508f1ebb9d1 |