Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi and ctypes. Supports CPU and CUDA 12.5 execution.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want Chat Completions API by OpenAI © compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, Model, Options
from llama import get_config
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions (TBD)
Run OpenAI compatible server:
python -B llama/openai.py
Run example examples/demo_1.py
using OpenAI module:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
Close
Hashes for llama_cpp_cffi-0.1.3-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c44a77dd1b0614f87edcfa03f850fbfe7660812e4e0b853b26fc2bff64b36ca |
|
MD5 | 5db21236eadf9d50225e105b133a7ba1 |
|
BLAKE2b-256 | 929d9f38696e48c6acc134fdbb23216d6e849a55e54674e4d8dfdad982d39c58 |
Close
Hashes for llama_cpp_cffi-0.1.3-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1eb6bf1103c5361978f9c927ce8c766c99f71bfc6028f82f761695d091d9869c |
|
MD5 | fe0709ceeb8caa9ca7c37464cc6ab199 |
|
BLAKE2b-256 | 658ea1cb681dc50a0dadb7705d9e094401e88e75bc5a887f24301aa9718336be |
Close
Hashes for llama_cpp_cffi-0.1.3-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3a19cafec0352a1f121e342f882b877fb254717bb41a3f7b1d73cb209a4e3f4 |
|
MD5 | da3e818ca5e9270007961f3b92d710c3 |
|
BLAKE2b-256 | ced571d68358d1b4a55617e04d81ecba821e3bb8b815e4245d8ddaf8fc1b27c1 |
Close
Hashes for llama_cpp_cffi-0.1.3-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46195495c210d8c709cc4d7faea77903b355324fa35df4da088aaec7ca1e9ffc |
|
MD5 | 7bd37795fe8b074641af6770e66531f4 |
|
BLAKE2b-256 | ee73d36c5a63140df3e287b8da8c35dc467b2cf8fb595654f8e54acef3bc207b |
Close
Hashes for llama_cpp_cffi-0.1.3-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdd2e7ca98f7bc92a3452466f275c3e9c3779206b84f0ef653899cdbe6aab515 |
|
MD5 | 4c20c3f805a1333cfe429c30de2c8441 |
|
BLAKE2b-256 | e01f1aa7b0cb05a55a33c8d9c59e4cfd21c290b437389cfebfcba6e900cb4c2f |
Close
Hashes for llama_cpp_cffi-0.1.3-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8d9ff96e8b746659715eea81770cc1d56189ef7a986e24a7c1b38c17948af8d |
|
MD5 | 9c9bac8e9a6c2263dda6c2cb23ec7017 |
|
BLAKE2b-256 | d013dd764a8732f3639fb89f8136740168d6a4dafb403b7108dc1daf26f04b91 |
Close
Hashes for llama_cpp_cffi-0.1.3-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7d5527913617ea3bc1c103a7d6deeef1477da02436b0f31fbe3d0139433c1c1 |
|
MD5 | 799159adebdd8d74a82e8d73cc5894f9 |
|
BLAKE2b-256 | 7ca84e480b32ffb14bacc18e6c7c72b7677258df109777281d8e0190b84430de |
Close
Hashes for llama_cpp_cffi-0.1.3-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad0f6ecd4d7425d4383f0356cecef61926bfdec97e1c1faaecfcbdaa7720cc1b |
|
MD5 | af675ee0ef0cc7f52297840100d168c0 |
|
BLAKE2b-256 | 48c892af50abb1d735f9a4c4221306cb67dd6bf645ea7d26c1e9e3a3a9613b9c |
Close
Hashes for llama_cpp_cffi-0.1.3-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7e4ed3121cfaab552783d03a2890dfbf4a5d16bdd6d229ebbb9d906b7e84fed |
|
MD5 | 17016a5f2c6b26bcc3f37b8094069ee6 |
|
BLAKE2b-256 | 30b6939cc70dce85c7fb972a0ce474a9c722e8e2228655873e786848df7eb15a |
Close
Hashes for llama_cpp_cffi-0.1.3-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eab324a64fcd3c1e86e600b2e28724566d510fd90220aa454af910c419736d73 |
|
MD5 | b810c03b61dc1bdeaaee52400fc52894 |
|
BLAKE2b-256 | 18b0efb8dfa1e5b2c7d07b26aa212867f9b4ba66b3dcb98967c6725c972f417a |
Close
Hashes for llama_cpp_cffi-0.1.3-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 123ca9fe88fa255575998ba68b5988128b8ce1b4952eb306bc7ff739e3ebf0b6 |
|
MD5 | cd02061a8f31c834239cd3ec96086093 |
|
BLAKE2b-256 | 14351b2de41a9609ae9ae653b276ac7ba746ee7d3ec1496f30c28afbe6cf8568 |
Close
Hashes for llama_cpp_cffi-0.1.3-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1c3103dbf559a497513e8c70cb5caf8687ad90f7d40886122a25578d1728d7f |
|
MD5 | 707939db2f7a5859cf21a5baf66b9c3d |
|
BLAKE2b-256 | 4fc8700c66d7e894b360ce27e8096e3261745e7b495284972e453f33fbd9d600 |
Close
Hashes for llama_cpp_cffi-0.1.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afc4c445b2d19fe4958aa681492ee2dbc7f9ddad4288d0eb0b22bdafd685819c |
|
MD5 | afc73598463f55206d869f4a505488e5 |
|
BLAKE2b-256 | e0287e075db7e6161096f42e0d65e168500c4eb13073612af9733715ba935bdf |
Close
Hashes for llama_cpp_cffi-0.1.3-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fdcbf46b1885e9e9a062c89eb1e899575ad2affec56e2bd0510c076adce9b79d |
|
MD5 | 50a3a92b8b25d15b400e4da3dc71d63d |
|
BLAKE2b-256 | db579ab758aa5b227cf306db91e69a2eec7762d59446f3ebf6c5a1474b37b135 |