Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.9-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24711ca8d1e9441b78b92b7c5ca5fe8da9f2e7a407225540c7602c8d23702ecd |
|
MD5 | 87c8e32ed34b90f495835f781f19d5ca |
|
BLAKE2b-256 | bece1773789aff2ac6f0bfb3f7be59dafd04a6e50a5722d836a0d783ed3bcf81 |
Hashes for llama_cpp_cffi-0.1.9-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 349694778bbe023b9f59cd95a84fa7c09b185db289ee5b1d3c42109288055db4 |
|
MD5 | e9489ad97cee1dc2436085d09efb5185 |
|
BLAKE2b-256 | ee2464cfb2db94f7a21ef947b4907be9cc48d6afafbeae57ce9a67ec62b854ae |
Hashes for llama_cpp_cffi-0.1.9-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5da944171681c2b7a9c39b41bfe0a8e0267dcd05ea5f4be940f813ee9206e668 |
|
MD5 | edc4a33d36fd483e48fb81ad3b8a9d19 |
|
BLAKE2b-256 | fcf0ce41cee1fab9b46b1110118969881e8f26530b860c3ed6a3566cb3f02b03 |
Hashes for llama_cpp_cffi-0.1.9-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f933b7f8755878b3d566377f8e56101b52844656f2e754db1e3599d99ece1482 |
|
MD5 | bfa93fbdb63ac74914fe9485d02c8b6d |
|
BLAKE2b-256 | a72c2bf64ffb7a3424a4f215ec173d2e76f25414ebd7e5d5b5ad86fccad8ac37 |
Hashes for llama_cpp_cffi-0.1.9-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 253afae201a5e7e9ba42c5352aa503152926370eea7c34a9df9a95e58bef2fe9 |
|
MD5 | 3ca78d463e52ab8bc6cac625365b6a43 |
|
BLAKE2b-256 | 2bac474af40b9ef917f2c6de71d4717be5e6d5a2ac5ec233b2e5462a20bce827 |
Hashes for llama_cpp_cffi-0.1.9-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5fbeacdff8f230d7b03a3b28a4835feab4a2352a3337e88d9fdd888442a4162 |
|
MD5 | 7d94949c18519dd7a66ccbad2333f17a |
|
BLAKE2b-256 | 478709fa95f2b564bf48031b754dc63687ce7049a0f2cdd432bd3af2fba242be |
Hashes for llama_cpp_cffi-0.1.9-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f03a21c957860e0b1da17625323cd81667c9feb58f97c2c3cfdb9470c0996d1 |
|
MD5 | 1ba60d9296ac194a3260bb9a7b67cf3f |
|
BLAKE2b-256 | eefdd3c316ecbbee16c447b8987156b7669dc410bc3401f200d001134bb09971 |
Hashes for llama_cpp_cffi-0.1.9-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8663b4a77ce2e77b86335081afe6296244e6e65e1dd00e63828d2f6aaa778c0 |
|
MD5 | 315ebf975af8e231462431882bcc5d0b |
|
BLAKE2b-256 | 2f8105e9f9d403c1199d6bbe7e6d4817e4d83fae8ad055747fcd5f61fb531f88 |
Hashes for llama_cpp_cffi-0.1.9-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 313f9a15ce9edc4c55df865bab41f2a2427f8f9f90eaf0664c2fa1057aeb641d |
|
MD5 | f2110bbccc84360eb6fbfaa9ef80f15c |
|
BLAKE2b-256 | 0f37386f689588201988285c070a7ec2f25bedaca2caa6fae857a8ada0a09aef |
Hashes for llama_cpp_cffi-0.1.9-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49123c217a73d289c7c2c325159d5856b962caafc466cd26f0251e189d92ddbd |
|
MD5 | 76497ee326ab37b00314c56f53a6e5d2 |
|
BLAKE2b-256 | 3868875d2948dae794afbaa8d8363c0c3050f86633f6dedcfd7e57e80f56d851 |
Hashes for llama_cpp_cffi-0.1.9-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9214d5b9ce034f7a8408df6584c27a7d0a731b66ee8c0837eac3c393b462576f |
|
MD5 | 3ace3b2fc52f1abb35a981befff8d677 |
|
BLAKE2b-256 | 93805711f87d664a0cbfc948e2ce81a60049558dae7490ecda0cf82c2424988e |
Hashes for llama_cpp_cffi-0.1.9-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1def4e253620e01311f6eebefa304fba65297b02a34c452eec4e9084762630c |
|
MD5 | 4589df7cc6affdd9d5a4a9f2b4ca9e44 |
|
BLAKE2b-256 | 58da4d76b3de5ad08a46545088d0c28607d7701955c3ea47d385dcf88d138d72 |
Hashes for llama_cpp_cffi-0.1.9-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2dd3fa8c840276ecc1efdbb3b3e8ef04a342f0f47c48000b2cfcaffe7aacd92 |
|
MD5 | 3908b9949ca61a209a9772c87a48592c |
|
BLAKE2b-256 | 5c7ed1f928542ca6dde6dd47042ceb3656bd921996aaf41fa73d585c6d247e04 |
Hashes for llama_cpp_cffi-0.1.9-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62352358ed4dcd741812aed7f91aa7a1fe180fbe86c67a1d8fbb1d95c3238630 |
|
MD5 | e20160249f1804642ddf60589e17ff24 |
|
BLAKE2b-256 | 542a303282ccde3166bd56b014ae4b67f17811924732d0cefee895ebd16d4b39 |