Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.10-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0dc0418e9265de46c1a1ae65dc86c168a6c6f682fc6619f1ac6f70bfe9c3d3f |
|
MD5 | 2fecca5141c36f57da256665710f7dad |
|
BLAKE2b-256 | e7ac422415e7bbebe921439d8e2431853dcfa831f1a599d5ee56d65ebe7a273e |
Hashes for llama_cpp_cffi-0.1.10-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b9a75a7ed285ba3bc6d59cd4e61dfc6a49a7f819214cfc8800b29c933318667 |
|
MD5 | 10bc3978804e679b50aa1e1c82a548d8 |
|
BLAKE2b-256 | 74ca45960c25f86b1c11a27bc4e85dafd5276e5ce6f05a111910d6dd9ff8d0e2 |
Hashes for llama_cpp_cffi-0.1.10-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8b5b812882e98bc59cefb5868e68530117e1bd07ddc64060fd68921c094b68b |
|
MD5 | e0a8634dab953d2a4df8b43753b05c2a |
|
BLAKE2b-256 | aecbce285aa8e6a2f9ec9f87b2b25bb71ffc6d7a64ea2fefdf0836bdfbcbd6be |
Hashes for llama_cpp_cffi-0.1.10-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a05c21916f99ab7107b45763db1c0271a614baa967e4817d0cd20cd344bce62d |
|
MD5 | fc5a17a2f259e5a492706ec75efbdb32 |
|
BLAKE2b-256 | 271f3454a9a46e9fcc57013fe4ccc065beba798f56b9b1a961cdbbe528c90753 |
Hashes for llama_cpp_cffi-0.1.10-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46d7b7f0f40b85e8640c19ecbfc0e9727ab1482c5f3ba3da0a42eb7ffcc0c2d6 |
|
MD5 | d4bbea2ec0a3aeeb9a700282c199ee5f |
|
BLAKE2b-256 | 7bf909363a8fa1d53f009e8b4d23fb7cb4937ef5cbb44404a7f70ebb15e143b3 |
Hashes for llama_cpp_cffi-0.1.10-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d84875ee5786911aa0c9880ccd2e2ac3f64d0fbeb7c0e21456f0cfce3039d8c |
|
MD5 | 0a7ff71519416afe6b29da04aab95dd1 |
|
BLAKE2b-256 | 279865bf505942130628583c2c9581631676fbbf5e9e21c58f878dacffe39d49 |
Hashes for llama_cpp_cffi-0.1.10-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a283b66e9733362bd85e514eb9b25c9b20d5104c33595ddeb9584cdaf518be9c |
|
MD5 | 013f6d052edff05fcd5f8e8547761e59 |
|
BLAKE2b-256 | 1c6a854400a2ef147358b200533d84e781923066e91751033b7ef737145e6d41 |
Hashes for llama_cpp_cffi-0.1.10-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03d7adcbe8203c9f0d530132bb51d3c2777537bc75c349269c80c40a1754cd52 |
|
MD5 | e71bf4564d78029753f732b7ed0135ed |
|
BLAKE2b-256 | 5e67b8927e948d2511f2d31bde9cbd2c3bdbba13d56c7875f2db0b64ae767efd |
Hashes for llama_cpp_cffi-0.1.10-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c203c8dc0b49d8feeb76421b2704e8fe760b4f4f1c17c29e420484004d9e761c |
|
MD5 | 44321808c3fb69eb3df1aaf37aba9f41 |
|
BLAKE2b-256 | e76e38248b191745b1b2800fd42c73152995d0074c7b33730b77c8e6d679d0df |
Hashes for llama_cpp_cffi-0.1.10-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | af6f2fade8abe37604e1ae66e7aba7371414948767ea1d539994ce79ae0c0c33 |
|
MD5 | a40056b9743c717eeef4e6c4a93071eb |
|
BLAKE2b-256 | e509a28b964d127d5a519924764ca31768beb72c10bbe9c195d32ca3f93b2b25 |
Hashes for llama_cpp_cffi-0.1.10-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fab8ad481d5810042f83dedaf5f96ea4df7e034199567c4f57a37c567ac832d |
|
MD5 | 7d2e15167f1bc52736a6ffca084c746e |
|
BLAKE2b-256 | 5555b14323ce98d17b7d3e9425796715c8e85d0bc6c139869c5aa340730f8af2 |
Hashes for llama_cpp_cffi-0.1.10-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce01be7db76d6c28accf13e74e18135e3c29da7643f562a7d9147e09cf909be2 |
|
MD5 | b01534dcc5f2c65fd9a173d5d21137d4 |
|
BLAKE2b-256 | 1e06038a1868bfe57a4cbffacd24a6360f72ef142f72c7bdbd835c8765886f6b |
Hashes for llama_cpp_cffi-0.1.10-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ef8a82f19da4cce620fc00c02096738a53adccfdbbc9cab951bf869d46cd3ef |
|
MD5 | 8f43086478bd9bcea61a1c73ff782b22 |
|
BLAKE2b-256 | 4633f1802b7979f77e641e9ca33ae2789d4f6fd6fec70a73318dd450007be063 |
Hashes for llama_cpp_cffi-0.1.10-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10a4eada54e3d5e2e696d6fad1a90f42d0746c546d802a2779b1554a500c910f |
|
MD5 | e99d39988a0a84b08ff2453393f07ba7 |
|
BLAKE2b-256 | f98379b2772445d16590a17f99e62bdd806f351a0929b5b63bf27aea847be939 |