Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, CUDA 12.5.1 and CUDA 12.6 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.17-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f834274110c4ce4ade73b3e86fe6045233c811b737e77dc0b9d1c2d0fe872fe |
|
MD5 | 369a5455b30040b6ab6f7ab83f74b1fb |
|
BLAKE2b-256 | b20144ebde3c0a8ff10e4c2eb1798ee74e1a844fff5278f2b3e8d8e16890186c |
Hashes for llama_cpp_cffi-0.1.17-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 995e88d76f11d55e18744e6c0e5b2e0fc8f579313eb981b842d0ef390b06030b |
|
MD5 | 4d3070c579d1d15cd4d5152be8ba448e |
|
BLAKE2b-256 | 8fc98613bcbc60d65580bdd85392b66b7f1ddfbf1d2eeaae50b540e0ff5da591 |
Hashes for llama_cpp_cffi-0.1.17-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a94d05693f6a6f6f004401ec3838ee083d44d5a3b7c293f6cdd40bc3297474b |
|
MD5 | 904e8018ae7ce1f09d0f4f9df07109d3 |
|
BLAKE2b-256 | 5834c7dcc78ea92c9e52d8b2a38782b7c96c12b337be53b7e5e22722c4db70f5 |
Hashes for llama_cpp_cffi-0.1.17-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3e05a24cad69a2cd8cab4ef451c9ba1523011c2f75ef0e3112849116fc5e9ed |
|
MD5 | d0ccf6fdd4c84febe316f5ef9c74b346 |
|
BLAKE2b-256 | a155d0cdba897229c9c08cef60038a1c0d7615d0196d6cabd2d3d6a197bceae8 |
Hashes for llama_cpp_cffi-0.1.17-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a9d084e485238742db67334fe661bf76fa64c9b17d99152977cf70e1256c461 |
|
MD5 | 3f5977c9ff710b814303bb3eee22d052 |
|
BLAKE2b-256 | 2d0f1f3794c963c079a2ee2c5bd936068e38c7d627e1e3fbfea2a7019d6b7745 |
Hashes for llama_cpp_cffi-0.1.17-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b61520b034b07ab3bfa1379f1cf97978229af1a957b949ce61b4e0aa3f36f83 |
|
MD5 | 377f156526c1d059c87d4d655f417974 |
|
BLAKE2b-256 | a12d15c774639fb4464027da2f98169b944a0b73805eab25cd22b7031090e8db |
Hashes for llama_cpp_cffi-0.1.17-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f62e5a975082d106736ae63506402b773997901de0e180e7886e5c878da1dcda |
|
MD5 | 14827d315b667c1aae2a51f903d652f0 |
|
BLAKE2b-256 | 4f39a4b8e8cbac14b735a750d8cf7097801089e0f240eac5a28e32cfaf781cc5 |
Hashes for llama_cpp_cffi-0.1.17-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60811d809dbfa936426ec02e769a98e15e1ab96fe9dfc0b7158c49a49f867299 |
|
MD5 | d60b64acb2bbf581c25db981686d810b |
|
BLAKE2b-256 | b420e7d196a1dfb2d95850167fd7baeb0222296d28276ac058d37e2b362aa5ed |
Hashes for llama_cpp_cffi-0.1.17-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72aa21b50089f428ba5d0cb4a5dfd298586b1f18bbffc81df97bd61a050b01c7 |
|
MD5 | 031325b5aa0592ce3492afd7fb693788 |
|
BLAKE2b-256 | 2acf0e56232db1f6227aaaad4f02e9078870ddbd35d6e0b3785075c9919c88f9 |
Hashes for llama_cpp_cffi-0.1.17-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a6a55072b1df4359673df910d5dbb0450cdf91b4e65a4fc573b92543d464c8a |
|
MD5 | b38af27ecf10a10dfd0763b969e64aa8 |
|
BLAKE2b-256 | 26f9c32a9c61e724e49e1fa7028842cf138a7047fdce0067fa5608f7c5c9fa2a |
Hashes for llama_cpp_cffi-0.1.17-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 048b22da45d9ac6d96ccfddacd2ed4da17c766f89ed722c3df860dfc6e68544e |
|
MD5 | fb97dbf219e23c05f0871a9e731f1d46 |
|
BLAKE2b-256 | d3c94e683c515ed94ad450bbffff202092e1c5f03b72e737a6ce29b2959cd4ad |
Hashes for llama_cpp_cffi-0.1.17-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6eac05b82fa79ae46d6c1ce154bcbee905b00ff29521ec7d7820fbc235fbad06 |
|
MD5 | 21c624332db425a5cbe1546e59a1ee57 |
|
BLAKE2b-256 | 11ca34ec0070cecaaf14afd6474113cec0150e7d88fc2551a56a582314864b07 |
Hashes for llama_cpp_cffi-0.1.17-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0f67d9aa4f706cd6e666c4e60c8aa8b1e55fa77364bbcc34169a8649a7f497f |
|
MD5 | 98c4b6b949f5ec10eebbc6589a0d11f3 |
|
BLAKE2b-256 | c49b546a87b0c2f4c375931081b3a04363e48bee0cc4792fac61ecd175e1aa5e |
Hashes for llama_cpp_cffi-0.1.17-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7ccc84865feb7b9b9e928214254b632fd4385d50ea114f17813e2142a78a896 |
|
MD5 | d8f1e09ca7fe801f1249b34c8e4330d5 |
|
BLAKE2b-256 | 03205247061fbc0459d52088eb7452ec32ccb5f82da194fe2e16386b4ad83862 |
Hashes for llama_cpp_cffi-0.1.17-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | adbb29300d326a00a2c7615e354c38a1b68cd32a1a1a1153a79ec6c675d8de64 |
|
MD5 | 6973d59858b97300f08df8ff6fff0f76 |
|
BLAKE2b-256 | 5ee206ed0f5b6a2dd1a8dbc5c65c6ca4425a2b8156095a7b6f23944a3031b4a0 |
Hashes for llama_cpp_cffi-0.1.17-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cfb16f837c0d03bfc0fd32fd637cab1031bfe8032d1c959c28e78153b3d1f6d7 |
|
MD5 | 33e37a6bf94c8c9a7b07bfb85b7d7056 |
|
BLAKE2b-256 | a60f0c98dd6e5e339ab6f230b2681714b5742ed0c41843b57f6fc7e040fcded3 |
Hashes for llama_cpp_cffi-0.1.17-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67ecb3be3df6a91a7093ac6a383faab2a2f65de1403c2104d72edc0b156177d1 |
|
MD5 | e6902ae3bd6503565663c9d6721723ca |
|
BLAKE2b-256 | f2fd21e571a22204bc9640b91f4c491ec76e262e6a8e47a0a21cf37c1605ca52 |
Hashes for llama_cpp_cffi-0.1.17-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae126affb4da1beb7b0e561cb4bfa7cee15957e0a3175a8a3d0fc75360f280ba |
|
MD5 | 5b5e366ba0dd9fc218985a3cffb889c4 |
|
BLAKE2b-256 | a855b320d286c3f9f8d5f3f841e49dfebf92b758d481f2b4f51a51d6fc68c461 |