Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.18-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55a54a12c21b3fc3351f6aa973bb8227f835c64991312ba65a252d3ad72744a0 |
|
MD5 | 7db6bf25eacca7dda7570da4909dcf73 |
|
BLAKE2b-256 | 8dc4315b8565798f5fde7a30954c6ceef28e8278416ee5b39f393136118b1a1b |
Hashes for llama_cpp_cffi-0.1.18-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a4f609e4f316d547ff0845d4c53f020f8aa8545aa9ab7d847a80d570eb25f5d |
|
MD5 | 01f2eb39220e9e9586281e93650aa7ed |
|
BLAKE2b-256 | 145e06c5ee1a7d0936e7aaa604057dc7305b87216d0d3a6eab50439fab6b685f |
Hashes for llama_cpp_cffi-0.1.18-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 675ea88e033f05a75ff93ab1cf37a664009dbf464441315f4a2fd9462f79042a |
|
MD5 | f0453fd676fa49dce37b3b3d27d53d57 |
|
BLAKE2b-256 | a35a91e929c7dba0021777b6bcdf4b9a4db435cbf051578f66e2f59e18ac85ee |
Hashes for llama_cpp_cffi-0.1.18-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1027c12b496847da80d46227d0e18a768efbd9ac230668c001b9bf4b3a000886 |
|
MD5 | b3ed68170aec25490aeadac765083ebc |
|
BLAKE2b-256 | bfc8825c34826447846960a2a00589160dfa7b7c115df55f9d738362f148dfc5 |
Hashes for llama_cpp_cffi-0.1.18-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0838fd56d246121b0a6b18eb059f75abab80a0ea5ebd6fe9a601b1bdf0ab1459 |
|
MD5 | cdee72747bd81d5b18ec488d6452ffb2 |
|
BLAKE2b-256 | 73a3a0b1ef19a5500ab26d494d38c88e009d6c15b07e440f95633e910da4098d |
Hashes for llama_cpp_cffi-0.1.18-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcdf66070703796a57b951ecfc47be249462575fc0ae39c0c79133357e225597 |
|
MD5 | 6bcfcec49adcfaa7cbf45359be609c83 |
|
BLAKE2b-256 | 40e72a99a828f61afb31864ade63f1e3b2c0e942734cd29889815d52e7195b2b |
Hashes for llama_cpp_cffi-0.1.18-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49e9346b44b0cf2302f581a269c3e37f487932cb53caf34dbaa36cb55e7ca170 |
|
MD5 | 0c4e9cc90ec258504c4aaa730afa25fb |
|
BLAKE2b-256 | 13ac7d7034847c3aa44b9e905534c5686158e54e3c684c33eb8acaf30815180d |
Hashes for llama_cpp_cffi-0.1.18-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10f13268e5f950c46ec6bcdb35a6138df00302b78eae877fc8dc227582368818 |
|
MD5 | d653dcc8e976b743644f67825f84a9ab |
|
BLAKE2b-256 | 6dcd4da94bf0339de5f3270d4b25d60676d5fdea06c0c3a44d1b1fe033ab23d4 |
Hashes for llama_cpp_cffi-0.1.18-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4c6ac54831a995e4e0a00ad24e99df67feffba4101a916399fa03cb2e798a3a |
|
MD5 | 28366fcc212f09a73da3bc354768198c |
|
BLAKE2b-256 | b6dec555637eaf2faecc8577ffc00afa0d89e4194360c467a7691197d42fdafe |
Hashes for llama_cpp_cffi-0.1.18-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 900d2949d56c4b27595dd873f157efc05e598d51037855a857e11251077e57c0 |
|
MD5 | d6fa9942508c58497fa49dda5abcffc1 |
|
BLAKE2b-256 | 076f45db87b2d81b2b7dad78855efecc6830b6e5c6ed238bf71bf65067d26103 |
Hashes for llama_cpp_cffi-0.1.18-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 019987e9a7f75241ba1ef436cf7e4938a0074e80503604379cdebbf94550b826 |
|
MD5 | 141290f9d23242d321452af0ecbb8c08 |
|
BLAKE2b-256 | b022f7ef3152ac99fdb86781a52a88fb2ba287db0d6ac53d0816158eceb863ef |
Hashes for llama_cpp_cffi-0.1.18-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 807329673d470486d2a0795475ded22a5ec4016a0478c34773f1844be07e7bb3 |
|
MD5 | fbfcb35fc48bb3dc37e6b263c12cb952 |
|
BLAKE2b-256 | d0534fb0e2eade34836e630edc2abcef4d8e824c8ac3f47baeb8a4f336786100 |
Hashes for llama_cpp_cffi-0.1.18-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d10620fe2073ded7589c066541a6076f42b6442aea5f304ccd738441354af310 |
|
MD5 | 7ada022dd1d2a5f99b1f95407e0c1365 |
|
BLAKE2b-256 | 62cd2eda1671bc6fae685a333701022384853ea9e10b61d319af627e83e838a6 |
Hashes for llama_cpp_cffi-0.1.18-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20cbb6bab3966ff1ea74549a08cb9654e21d9891b92ae0db9005b1a8a898df26 |
|
MD5 | 9fea2e48ebd5fe9f9f4ebcae8c1be602 |
|
BLAKE2b-256 | c748b6709f257be0d3723da8e5080a2d44b957db9e0a72fbdfc1a26d4a7cedd8 |
Hashes for llama_cpp_cffi-0.1.18-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1501c63c1c6fa2cecaa73373b7d2d62e47f2caad5476cfe2d164e840b043f9c6 |
|
MD5 | 83f93c57ab1a4e2859d193281c5d92b5 |
|
BLAKE2b-256 | d254e659c0d79c03ac8343053f23543db03efe6d7914d26f789226629088429c |
Hashes for llama_cpp_cffi-0.1.18-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cac874a084985a47ff7aa80d43c56e38af416aa0b69d84500d640db021973e9 |
|
MD5 | 2a40396e4230da24419aef300ceac6d3 |
|
BLAKE2b-256 | 398091c82871538825cc8250a0797c76e7106c9c22365522665ea25d91e51235 |
Hashes for llama_cpp_cffi-0.1.18-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1808523d0c966cce30d5e1258b981fbb7d1dbb82323f6a274099f86b3684a1c2 |
|
MD5 | 1cdcb0c059da09cbfa68937affe2c663 |
|
BLAKE2b-256 | 91a75a2288d39a3e0c5ddb530e96440b902decb875dd94363edca333380c8d8c |
Hashes for llama_cpp_cffi-0.1.18-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c58a9a87a00025b19c586c83272ae2ed6666afe8e73856ec35c06dbe39b097d7 |
|
MD5 | 8a98a2c15077b1f9037bfc14c9c36fd5 |
|
BLAKE2b-256 | 5759703c44f1f219d59eb802f8d3ef8a5d190ccb2c73e1f9eb7d846cc0a29e91 |