Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.11-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd0c5d6331064763a9a4d742bb9f1538860c58c4b44e21e91e22513256fb376d |
|
MD5 | 4332e06ee7d87c28d5c18fe33e5ebc81 |
|
BLAKE2b-256 | 5b6c1443154f4d9f6ad5034d381439fddff588602097a9cbab86019fd20cf4d4 |
Hashes for llama_cpp_cffi-0.1.11-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56b4d3e7ec3dc68557097f9c52f2097ae23cd8f0fe6da127e6307faf17f6d4e2 |
|
MD5 | dd07267847cdc405292acdadafc7cd4b |
|
BLAKE2b-256 | cb93c4453c9ba4a870b1c4f04403509dd8c080e57d10d1e71b6616e9df5d3219 |
Hashes for llama_cpp_cffi-0.1.11-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33d1aa2c21e91b3fc2a2adc58156a458bdfab5e1609df13016208c2988e7de14 |
|
MD5 | c4dd68164a058dd3604bc365741622d7 |
|
BLAKE2b-256 | 93419db410c701c92cd2d36e1b8dff6d74da642ea586c2d8855108046eff0ec1 |
Hashes for llama_cpp_cffi-0.1.11-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e5189cf4441f4a086f5d42e1810255efcae5e001042a68f42c97230a0f8616b |
|
MD5 | ca1f3649c9327c121d80f9a32ae11ff9 |
|
BLAKE2b-256 | 5607b979bb1c54377a70031d135412d300e61fbe9cd7694303e59afae3ef53bf |
Hashes for llama_cpp_cffi-0.1.11-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe682ea6ab0f8116ba10a210493d0b2bd0d4aa5dead1b1e06f8f15c38f1e00b9 |
|
MD5 | 7410d780225611b7a364f151e0501951 |
|
BLAKE2b-256 | 3d50be2b866141650032fdd1e4ebc0600627132c2bd28c7c79f539a254d74157 |
Hashes for llama_cpp_cffi-0.1.11-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9b64308e18702ef8abaebeb863a7093c611674f42171dc9b899f1671eec29c1a |
|
MD5 | 85c19a0cdc2c037462ea026450c01841 |
|
BLAKE2b-256 | 5478abefb4dadd231b65e1964be2ed7e03f90b4d6dc20dc4076c9dc5a3f1abc0 |
Hashes for llama_cpp_cffi-0.1.11-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae5831b235578943a5c876152c08b86850727d39703874de702a581e6a76ef4d |
|
MD5 | 0af5b8c4c261f580a4798e78e9e638ae |
|
BLAKE2b-256 | 961abea67b95bab5e62e26abfb5f9d669b22ca9ae378d77f94c0b2e12cf1f301 |
Hashes for llama_cpp_cffi-0.1.11-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e648a3ec019fe7a589931260f0d9623fb10e66de1e212d7b5ff3f44868025b4 |
|
MD5 | 6602203ef8da2f2966fbc0d16dfa0fce |
|
BLAKE2b-256 | 0398d0bbf590cac3cd1485ff5b7c2473d4a3c18c44ea0082a78945a43512ac71 |
Hashes for llama_cpp_cffi-0.1.11-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 119e2bf4228d30547ba30be2b83617f6e38b2e6be070d6c22be50a460717b6d5 |
|
MD5 | 4b6572e3891b13a1e7143d905329fac9 |
|
BLAKE2b-256 | 629ea57c6d705e0a6215df914d60c2ee70e38c947addffa5beda5714cae7ce43 |
Hashes for llama_cpp_cffi-0.1.11-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6753890953a8a5983e58a861d1aa3dcf351872290649b11ac1fa6d81902cac5a |
|
MD5 | 47038fec7487f3bace186babd2d9e5cf |
|
BLAKE2b-256 | c6ce874513608345fa1d10e919460a92140cc1abc4c6a1ff84408f2ef6f35888 |
Hashes for llama_cpp_cffi-0.1.11-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 505a617f344874cf2bb0dc6c7f45fe05f0e186af4b702428640beac1262fff6a |
|
MD5 | 4b544a792391eed46dc9ccdaa5d7773d |
|
BLAKE2b-256 | f1d952e5ad5cb85fc12c86df66e3f6a346af18a994b5482fdafaa4b003f47070 |
Hashes for llama_cpp_cffi-0.1.11-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c724aeba814ea96b0ef701a1a91bcedb0cd09dc8c190ccc10498a9a474622935 |
|
MD5 | bcabe47135fb783cbe81bf16ec5c8e38 |
|
BLAKE2b-256 | 728197323a64d8eefde22249bc0a3190a624e76212ae4ca9f7917afb62cf6da9 |
Hashes for llama_cpp_cffi-0.1.11-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 432f19368fa6363fac6b3327c5ba36eec3225ba4080fe1a484639a16b1eebb9f |
|
MD5 | 102d060ffaee72b4f64c9434cb208e40 |
|
BLAKE2b-256 | 63a097163b0cd2e88665947aebcc78eb7ddf2d072453d15a3523825def675087 |
Hashes for llama_cpp_cffi-0.1.11-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ef9943f86df1f1cdad45ebc8428bcebeeb7d5b1ef0def6c768fe33acbbfcca7 |
|
MD5 | f9cf8b2ae848edee83f62f2cf070d547 |
|
BLAKE2b-256 | c60851c53aaea9c360de88c3dc8ed6977520a1f88c01819d2cf76d651c0add02 |
Hashes for llama_cpp_cffi-0.1.11-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b611b692ed1b79a32d5551738365b41c553143fd5e6520abff4087e854e506d1 |
|
MD5 | c79ffa6d8a276aaf12165c648f25af47 |
|
BLAKE2b-256 | a23533a99dbb8265f40b4a7c82e24c0c595ddfa3b346e8366328d484535e3a85 |
Hashes for llama_cpp_cffi-0.1.11-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8da1ef8ba5528fe5fd2d798b3c2d61869898d844d6caa0e4007292615e1ccec7 |
|
MD5 | d513103558f414875e4776a86e3b1ef0 |
|
BLAKE2b-256 | f669429a3032ed44a2965627a55fc605b6a421642f8de220765cd8179790ffd5 |
Hashes for llama_cpp_cffi-0.1.11-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9712b960d45d30eaf6294c0bca9ea971ea55c01b1c1f37288c4f33b46ce67cab |
|
MD5 | 02b25a1a9426a1901b63b60635e818d3 |
|
BLAKE2b-256 | 95e29f743183aa4467b8b0b560cd82494a0293427e66f2a08cd2a273fa2d3cf7 |
Hashes for llama_cpp_cffi-0.1.11-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ec58c450c799284a34b723f6112fba9f57e6b5b6ec12ccca97683d38735c6c8 |
|
MD5 | 66b0649ca9651accb5f13f21381a79d7 |
|
BLAKE2b-256 | b3e56f36ea3db19d37e5c9c238486d5db7ea558d32c42d3c4539716ce5e4bb0b |