Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, CUDA 12.5.1 and CUDA 12.6 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.13-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7befb2b584d984c1a9d07d97e0f8e1f034a0157f32aacda4041a879f5d13796 |
|
MD5 | 42f7bb7bc1acab364dc21b2d2bcb6d6d |
|
BLAKE2b-256 | ca89aebeffbc954446dd40289b4c181b828b49ce35ef730fc3e87fa4d060c014 |
Hashes for llama_cpp_cffi-0.1.13-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c16162cb3ce0b6ddbdadaf7b76db2026741fdc075d3c0dec659be5eefae73b8 |
|
MD5 | 210b9cc50c8370edce755e89c2a4ab80 |
|
BLAKE2b-256 | 6a5fb8d5531510c309634097ad4c90ff43052580b48606b1e4eaa5fcbbd129c7 |
Hashes for llama_cpp_cffi-0.1.13-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 666e8c2ae2e0095711c44ea23532c64251846f97a791a75e05474e2d6d5d6c38 |
|
MD5 | 8393760460ea06c869611f5926419805 |
|
BLAKE2b-256 | f1f84858c1b8d36344d53cd94c0f27dab130f535af9a95b49eb49734d60071e6 |
Hashes for llama_cpp_cffi-0.1.13-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f218f151a36e7ea01afc38d1bf40df57e5f906745e69412f2f7b6e7f2d9db9f |
|
MD5 | 19c82ea66cca3539680a37f42d86b265 |
|
BLAKE2b-256 | e2e9849974a619525287cc6b0b2d15f20b910b8fa774700e5c9a311afe8e04c5 |
Hashes for llama_cpp_cffi-0.1.13-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec5682215591e9f4eedc8016c0afb7510ffc1f52144832257569962b7790b90e |
|
MD5 | ff6f069e4da1f82c34cc0a89177cecf9 |
|
BLAKE2b-256 | 76e364640e43fe49deea0551f612771610d4b69517c46a3f1e85edd843711ff6 |
Hashes for llama_cpp_cffi-0.1.13-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24f76d84c93b6ee76e6c39ec990dde267ec059154edb98b53f9a49a82f4b43d4 |
|
MD5 | dc06dff877096abf0ed208f820b503fd |
|
BLAKE2b-256 | 517633f3db2fa62cb410bef422df6d345e0d03ab28534433dc9876dc2717f21b |
Hashes for llama_cpp_cffi-0.1.13-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e02eb9db87903317093bd5893d9370d3aa25b87f3e904e6d56704b844fb97320 |
|
MD5 | a933f4e689957c4c25336a5cdc303ad4 |
|
BLAKE2b-256 | 555a64c700cff23c4c157002b37690fdbb2b8c5e69ac449b13e7637137360b6f |
Hashes for llama_cpp_cffi-0.1.13-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69ff4fd4606005abccf8347cbb0391ea553e70c1bf400a0e5948f24bc192f0de |
|
MD5 | 3a238cee6b742ab55242c5368889dd13 |
|
BLAKE2b-256 | 67573840a32a9b23e09e8b13cb3f3510bba0c3056295bf15fa95f2c6b75be49a |
Hashes for llama_cpp_cffi-0.1.13-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59500fd4ba8eaa3e77e60564c2f3ac68f2d7ac3535454247ef0759b032aea63c |
|
MD5 | ab0ebfd3720682eb3df370c7414c290c |
|
BLAKE2b-256 | 2d5ee7b0b2d8bca6bfd1d63db5e2761b31b62193e8b7a388cb407d63ea995012 |
Hashes for llama_cpp_cffi-0.1.13-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef1b0ca4b1b6ec766a6fa0b1de9b4eb263f65c084e652a6c515ecbc987c3bfdc |
|
MD5 | 911e0cd47116efdceb55daf3b6da024c |
|
BLAKE2b-256 | dfa68d704e5a9711f436aec645d1beeb70cdb641c0974b839d998b6f99d49403 |
Hashes for llama_cpp_cffi-0.1.13-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a61c9c895a74151dbb2458f0550184861b7e84e7f243f48e674d116b21bbffc8 |
|
MD5 | e392de32783e6be07249c8469fb9ae74 |
|
BLAKE2b-256 | e6171d1cbe289ee16c5c66f75163a5ecb5f2cd4319aa54772dbf668cba9ab1d5 |
Hashes for llama_cpp_cffi-0.1.13-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25f08d2a871e7e3608856e79787ca0d55a8179626a87270eadc609720b4f15df |
|
MD5 | cac5f7d622210a894f54aff92522e462 |
|
BLAKE2b-256 | 3073feaa57834cc505564fb43484ba5732f29b82479b601b32182b6307bf998c |
Hashes for llama_cpp_cffi-0.1.13-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f35dc7cd4df5a7ec61193eb362b08690ba96dfdbe212fcc19da5f8b310ce94a |
|
MD5 | 9441e0268a8ca00ecde0827dc846cb17 |
|
BLAKE2b-256 | 7dd2402b3b454e115e298575e8ef318dab5eb6a60452f0b55751efadf82183dd |
Hashes for llama_cpp_cffi-0.1.13-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca9de388ec2942acc1956c8cf78a1cea46586873a477bd640ed4f10388113eed |
|
MD5 | 82b3fbbe52b4f40bd03ff974a67e0599 |
|
BLAKE2b-256 | 0ce252e9fb6bd6105cabe976fb0af6a738e684fa34665d9927dd6a81ca58336f |
Hashes for llama_cpp_cffi-0.1.13-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 086b97cca219dc1622184616710ebba341e3ef536842e1c8cebf806d58ecf8e5 |
|
MD5 | ba8a619bf192f2cd9a14ff9161dcd19a |
|
BLAKE2b-256 | 0167e9dc431a7e6e185ca788113c6fb001b421e090ff4ba5e87460cd31e279af |
Hashes for llama_cpp_cffi-0.1.13-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9753b0669f405fa5b03657437f8fe478db057a0ef4dde34d475d57a192de451b |
|
MD5 | 95c4bf417ca5ab3974548dbee1635043 |
|
BLAKE2b-256 | 79e536cb8736931126da0b760a7f426c8539459933f648d9db045d34536e0a02 |
Hashes for llama_cpp_cffi-0.1.13-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d2e26f7abd736bb6c235567193adaf80bd74bedfe7730b38663c44af82b8547 |
|
MD5 | ac7198370749d46fd4740e1b7fc41e76 |
|
BLAKE2b-256 | 45da5fc35a3f95be2e48cf083fade29de84a9f3c2bc10b9944bac79883c2a8a0 |
Hashes for llama_cpp_cffi-0.1.13-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef7db3760b5d5d0d020530a8b586c162761e13fca7ff2874fe8bee6a55f07845 |
|
MD5 | 70f2e11e6c5c17cc350996c302c47fce |
|
BLAKE2b-256 | 309b5a57a85f50d20eb083743dce6bad186e983f13ba3a7bbe8164954fa8ea3e |