Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.4-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fd994569b7500a64a6c0fad0d08eedc1a02012abb19c9410de3d1b8a804ef65 |
|
MD5 | b55177fdba6ea64446a1271dc7f521e0 |
|
BLAKE2b-256 | 5201cf2e3114785fd945b70aa1b37843e1aadf7229de897804304005b0309e0b |
Hashes for llama_cpp_cffi-0.1.4-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a01cc8d8a6ca0415e6a535bed3e6974a48e933f959158999c7d5e0aab1a0c698 |
|
MD5 | 18d07a5a4630246016abcc8dc844e897 |
|
BLAKE2b-256 | 4c3fc688c8af86977fd6eb0c0b94ed36d3777a24a0de10124a12729970392562 |
Hashes for llama_cpp_cffi-0.1.4-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 103d21ee32bb1f94234e5bd70e9bbbc90a110357eb9d7bb3b655704377f5d8f5 |
|
MD5 | c6b93b9458732af7a11340377f8ef33c |
|
BLAKE2b-256 | 050c2bdf91d1aa6b4daf5a16bb5069040ee4c16365254def3533522d6ae5aa41 |
Hashes for llama_cpp_cffi-0.1.4-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bd52d73667da5443df186f450d5dee4619bb5a9c4d2da0e47722d58e5a641a5 |
|
MD5 | 592e7bf4d8ec88fd9e70330c6318877f |
|
BLAKE2b-256 | 9f39ef95dd2a45d84e886edbde7e8582531027ed2e462f5b15b3ea7e3f1d2eb1 |
Hashes for llama_cpp_cffi-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3d64d6ab7ada5c1ab9c1d8e481be1c14ae299eba88cee71eedefb642f83cffc |
|
MD5 | 46484b543f5a1cb75efbe432f849c5e3 |
|
BLAKE2b-256 | 8e333c6bd453970a68d73cb9b41e9fc86d0ef0e57d74fdd40c58fed4f377a273 |
Hashes for llama_cpp_cffi-0.1.4-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c09a7b40441752573c096f32071d354898b4d0e1297ef5f7354af072f075f92 |
|
MD5 | caaa0eb1f68b0bd1ce382cd795a743fb |
|
BLAKE2b-256 | 6dcc92815a780f581bfd9c7c399224c1dd20521347ebf722064686fc39ffebf3 |
Hashes for llama_cpp_cffi-0.1.4-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3488e91e3f6821fce571c2cd358bcc029789f01d321fbecb88f3d01429d2859a |
|
MD5 | 910c3a3b58d7685f31ff58fa93d4d0e4 |
|
BLAKE2b-256 | c9ab50c0bb7591a3b008f11e015ee422203437cfb244bcf6cb4b8edfee7bed04 |
Hashes for llama_cpp_cffi-0.1.4-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a6acdff62e27c25c9f8cf3463fce910ad85b909277744782887fa8177df9992 |
|
MD5 | 2120a1b5323e64ba1f4b681423663e5d |
|
BLAKE2b-256 | 392e8d8db04a6d3bbd0d4e84819f4d8cee3a21c4d52607a13fc0a5f31421e090 |
Hashes for llama_cpp_cffi-0.1.4-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12dc7ad9fcbf3390acf11a93fbf1a0a9e3ce00d81450469b307931f82b6f4711 |
|
MD5 | b2cb8bbdc8d9cb3e187a46e0d9d00c35 |
|
BLAKE2b-256 | 0ca8ca098d1e2c82de6f8e2e7bfc66417c96a3f7c26f90fb2469ca6c8c0caa94 |
Hashes for llama_cpp_cffi-0.1.4-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c698952f415f8cc4989a4bc5b1fe8ac131216146db11bfe3771c6dce789e2293 |
|
MD5 | e64ec15723c94c501eed729f58f9fc9a |
|
BLAKE2b-256 | fcc9fb67f26e183055a30380cd33e543bb43a00032cf1a3a1d472befd19c4721 |
Hashes for llama_cpp_cffi-0.1.4-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5dd1a325d6df080aae5ab1baaea24bafc953e40900c843ddea261efaddc1252 |
|
MD5 | 651857e60ae81345ed887c5953ed6312 |
|
BLAKE2b-256 | ede8ec901138d889fbcb858467ceaa86ba4dbcb5f85de72353b81509de231fa4 |
Hashes for llama_cpp_cffi-0.1.4-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 575a5d010f07c9e82be81986ad5a71c0dbae830a2f06659a625a647224582958 |
|
MD5 | 72bf7af1e33af81bffd4ccfc3636e4ee |
|
BLAKE2b-256 | fab44caf4ede83ada5565ede104703be0a2f2c21ce33ed636250494ef53775c0 |
Hashes for llama_cpp_cffi-0.1.4-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0497130f2f58bde7d4f0344293c20872bd3350b51e84c9ec76660d6ae2c5f92b |
|
MD5 | de4d4bbca39179528e848fbb633174c6 |
|
BLAKE2b-256 | 5671183300b06e6fa0613d16f432ba003eb34a8d40944805582682c01fd1764e |
Hashes for llama_cpp_cffi-0.1.4-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86f4e063ea19eacb4274082dbf74f25b018de08240364632b0eb447744b9f13a |
|
MD5 | d31e65b3086585570145865089e7485d |
|
BLAKE2b-256 | 3ccc5a317b6d2ae25c494c3c7352fa7d18d2de0343687cf0e34f1e768d089c4f |