Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.20-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0aac483ed983cb91687550972b82569efdf88de26bee5e27f90db86dbbb23a9 |
|
MD5 | b86ac8c9e13cf99813366526a538d4d1 |
|
BLAKE2b-256 | 8e8c9d2837fa025453ad5276b01eae3f6cd22e42fba3eee7fd3ce3347dddfb7a |
Hashes for llama_cpp_cffi-0.1.20-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe3c59ee33b21bd1e773637eb180313dfee1eded12e33e11b762422934a96445 |
|
MD5 | 4518f82312bc802b1b4ab310a90ee260 |
|
BLAKE2b-256 | 84134b5cf2f26af88ecc2102058a9a042c444bcbea93c51f12ad9eaf6a3e01a2 |
Hashes for llama_cpp_cffi-0.1.20-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c90cc34751286f7107576b8ac4462c98b289ad9f533f6e7e237ffbc54cd7a3c |
|
MD5 | e0cb68e91516777696b23252831d372c |
|
BLAKE2b-256 | 607f991fcadf6c0b4d30785d8553391673d4dca6cfb4f7872562e90dda191a6c |
Hashes for llama_cpp_cffi-0.1.20-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa2d88ab248fb9bf916db66702298e07a3e733a9c8a5a4957da39fc9b7d741da |
|
MD5 | 9af63eb4276bbb7c786f41abd25f34c7 |
|
BLAKE2b-256 | 1f33fd4456585b3b67f71e41dc3fa59e7c6d7c150bbc1f3d9b6ff7dc712619ba |
Hashes for llama_cpp_cffi-0.1.20-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 389647bdff82b61b33573658c5f05f92f6a349bc8062bdfb12005ce3f6dbe7b8 |
|
MD5 | d01c679724c2cfe3cd6427f3a99e6a98 |
|
BLAKE2b-256 | 61d74daf21115c88b482ee3488a20c0a6cf30a33a261c8e53fd6065dbc8770e7 |
Hashes for llama_cpp_cffi-0.1.20-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c329ed2ac6fb91e7208d7891774bbb154feb41a84640df01ba13d212f223d5d3 |
|
MD5 | b95fd35c05905c345153eb4103895aef |
|
BLAKE2b-256 | baa62b26cf9bb39259d0ff4ba11be8ff2596a2354eef371ec10b8d29f9c0577c |
Hashes for llama_cpp_cffi-0.1.20-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4b92835995d9e5f1df5479a5ff3619595ad6208cf6169a40b532d2328c72d65 |
|
MD5 | bc7925ca071a16c2a0a9acbab4aa71c6 |
|
BLAKE2b-256 | 5e5c8039a213632c941112118c1379cbf8cea7100136ff414d8ba7792755f2a4 |
Hashes for llama_cpp_cffi-0.1.20-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9332f48a7778a332495b7e8e0cbcb8bb552c47887118afb961618839d4795a35 |
|
MD5 | 8a6e80c66e96e8103bebefbbf086e25d |
|
BLAKE2b-256 | ba0993062dfb7e6b0bdf99bbe6ce419b0624595dc00a9f54f0126ec67ca991cc |
Hashes for llama_cpp_cffi-0.1.20-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99beeac702e97a1d186bf9f22aa6c3bf04f100854db36a834865702a6f6cb8c2 |
|
MD5 | f8cc5cb3809abf3ce2b19206ff224804 |
|
BLAKE2b-256 | 60d3127d4d5a9314376485591a2a74c2d5e6061149391813757c6688d0b20777 |
Hashes for llama_cpp_cffi-0.1.20-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ebde71515a59446131d26d859602fed979f9156a51ccbf7a7abe14e26c6fa83 |
|
MD5 | 9b3a16002c5c0b0c1188aa4250647d98 |
|
BLAKE2b-256 | ab80fbcc94f279f325ba49a97e9380129decc2eab6e4347f448e2fc637571f7b |
Hashes for llama_cpp_cffi-0.1.20-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d65e21515b5b774491a54e2c8ae32c43316b722e429045a52b639c326f14542 |
|
MD5 | 6c4a5f271af5fcde08b0bbfc67cddee7 |
|
BLAKE2b-256 | 4af87214d6dcd643c514759dd94dfcee17e446d7b63026d2bf744067c3bc2709 |
Hashes for llama_cpp_cffi-0.1.20-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c36f7298bc5c506b90587db4b76d327b23a667abde00612f805896e832c6498d |
|
MD5 | 22ae817fa9c674e237e02e9cab4004cf |
|
BLAKE2b-256 | 6b4a87bd785a3e1ff8c9df5588e54108609ffbff1725436136365225f7f50932 |
Hashes for llama_cpp_cffi-0.1.20-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f71e9ddc38c870b5b1e3cb950a4010893b9e562de46ea32d765d0cf8a641292 |
|
MD5 | 4361a18cc5dc8b93abc352dd974bd048 |
|
BLAKE2b-256 | cfe5a197c8bb9bb0088a4c79cb44d5f4623b38069880de07e22235621ed0e584 |
Hashes for llama_cpp_cffi-0.1.20-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 780266354df85e6d7ec47eb342d170cc6cd8b098f823ff071200664cad97bba7 |
|
MD5 | 062742573aa9b5f033e781f1e624946d |
|
BLAKE2b-256 | 7cdec09623dec4f9836d118be80b1273af56f899599f4f152dadc28f196e6db7 |
Hashes for llama_cpp_cffi-0.1.20-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0fc592ed479241b77a4b6290b7f066d5e503be0cacfb1e1f03ec26bd91ec5640 |
|
MD5 | 241998164160ff84d6e2e4c766cc572e |
|
BLAKE2b-256 | 33a0864b3113958c03febb24cbce8c0caf7729f89b9ff3fd5fb88071444228cd |
Hashes for llama_cpp_cffi-0.1.20-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 248870c27f423d3377256dfe1548da380945a438f62b4495ce78cc8db1b3c506 |
|
MD5 | 7e30cb72a4fd5a23a00a1bbe29ff13a8 |
|
BLAKE2b-256 | 98646b2bbd2f265cd31713b977dbd057bb735023d2583fb8bb4ffdedc3fb7eb6 |
Hashes for llama_cpp_cffi-0.1.20-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32141fb9a77300ab883d05aa78b0efe90735b96368f925c8ee6439a9b8850020 |
|
MD5 | 3a0e94d5070ee404d6c2645699dc1e3b |
|
BLAKE2b-256 | e7ae56756d150a5d89bcc6762ebc79e0f6d492b8c0c54515af5f7fbb594ffa67 |
Hashes for llama_cpp_cffi-0.1.20-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d58a5d2d4917bd0a701887a504ef5383871acb7c7398f8a22c6775efcf0d5ad |
|
MD5 | f66699b0f8818a9665febe2f240c617b |
|
BLAKE2b-256 | e732ffa72f84ea398190fef766e8e4e0e3d43af639db5325f6cb30e968d08787 |