Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU and CUDA 12.5 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12.5. If you don't have CUDA 12.5 installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
examples/demo_0.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -B llama/openai.py
Run OpenAI compatible client examples/demo_1.py
:
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
#
# run demos
#
python -B examples/demo_0.py
python -B examples/demo_cffi_cpu.py
python -B examples/demo_cffi_cuda_12_5.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.6-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06f4ed2007329f9ae6e6d135b49236f8229721aa72f5c985f86d2a45bee922bd |
|
MD5 | 24212c5bfa7c1af6a26048bae20f2696 |
|
BLAKE2b-256 | b2233ee925bf4cb903829327a56ccacb44ad9cca4bfe2760f046b4e6cd9c27c7 |
Hashes for llama_cpp_cffi-0.1.6-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2bc56aef2830214941e20e8e064cf943692ca3f0ea86fdaf233e797727b7aca |
|
MD5 | 441fd691495b6eaa40470590fda85643 |
|
BLAKE2b-256 | a05d8a7e56f6bd717eaf540188f60545ec498a9982f30a9b3f61e8ca88e8ba8c |
Hashes for llama_cpp_cffi-0.1.6-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dc22b32c71d9c4faecaf8c5b2b3582aa9e9fd9ce3add1ead181f177b9143ebc |
|
MD5 | c317a1dd1d1b79a858bebaf9ee638dfb |
|
BLAKE2b-256 | 580ce03d40d6d658bc075171932e760a9100460d4f1a06d149831865d46c71ca |
Hashes for llama_cpp_cffi-0.1.6-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6be0dd232f747c61da0aecb411c4df31dbc36b8a31fb7675b877c8c0204064b2 |
|
MD5 | 672c6e2079ac805b366a646aa5243d4f |
|
BLAKE2b-256 | 078f107d3cb870399b584b4f87f2d35f24d526267480ec15b6322ca2cba15e13 |
Hashes for llama_cpp_cffi-0.1.6-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78eda9383bdb73d0522bd66f32ee284e0ccc84616c4c583d11d21a323cc730b3 |
|
MD5 | 35e5812bbcc47a287b4fcc4908340841 |
|
BLAKE2b-256 | 51e96898da20041c05411b83b135ac32473527a1d6a717ffb683bf974711c2d0 |
Hashes for llama_cpp_cffi-0.1.6-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f909011aea1a9300d324c6ffc3e7ad1d1e49441a9de3f9ad23c80e82ada2fe1d |
|
MD5 | 12425fc503e63c3750ccd0ab26fa036e |
|
BLAKE2b-256 | 0843577aaa57fa668484e206b6b57ffb106a66f4cc2c6d5e12093973196625e3 |
Hashes for llama_cpp_cffi-0.1.6-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd935c850006df97c0080c97b8d10c8bbb091d86d6262d3590606b35ef5d40c7 |
|
MD5 | d7713985802d5e36d8496a4d14eea406 |
|
BLAKE2b-256 | 7646f5424d6cdda5bb2315ecf27a84853e9069fe647dd9e7b870154ff6c3238a |
Hashes for llama_cpp_cffi-0.1.6-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 908be769106d8907b61692c4e368061e270cfec6606dc00299dfeef0f352b0cb |
|
MD5 | ba4f094be80b069d93f8a8beb9cd3df4 |
|
BLAKE2b-256 | bf01d2a5091be9c4b01da1f767ce8f52b229153445e14516aa3e3e329fc7164d |
Hashes for llama_cpp_cffi-0.1.6-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b7b6e62bf09eceb54dfb9ad1499e918fc5c5146c5d9cdf876fce50c8c1df64a |
|
MD5 | 9b8db31ade02f6f9a5e001f0c4c5e421 |
|
BLAKE2b-256 | b7338e7b721f558409064a79a20204459217b3f61802d98d82a904c67724ef2e |
Hashes for llama_cpp_cffi-0.1.6-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41d31380fc4a60802885c606307e807308004d1a815bd2da604804c5a9a57d80 |
|
MD5 | 68a2c6ab80988c4570659fe89a990c54 |
|
BLAKE2b-256 | aad14b1e76a361cdc504001421ff5b594931f342c12dcd6ec483afe492eccd22 |
Hashes for llama_cpp_cffi-0.1.6-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4cfa491024a44a4845c7512a6625d61e63e26bbca6f821ee6b400a376946525 |
|
MD5 | dea6e2112d8ac6350e3b33fcf47ee765 |
|
BLAKE2b-256 | 7bbc9e2f8975e1fd251f9d38460afb4493dcf6c3c767cc3b43b87eb7876772c4 |
Hashes for llama_cpp_cffi-0.1.6-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e5925f8a6d6a2d705e3bc0392b2d9b9045d3a10a96fe16a6221f6d829970bce |
|
MD5 | 413c000c078ffc1a8838407746d5c790 |
|
BLAKE2b-256 | 43b8160c6844235f0101ccadebbd37e31a7f307d991f05ef44811cc1ca5db331 |
Hashes for llama_cpp_cffi-0.1.6-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db03641bc7ede680b53b09cc506254b795fd37a36533c31f11f8a674408e6eab |
|
MD5 | a6e041c4e13e236a69014183981bd2c7 |
|
BLAKE2b-256 | 99efc525756c55d4998ac78342f8a1b1c8e0375bd183a82d33ba2cea9f2581f6 |
Hashes for llama_cpp_cffi-0.1.6-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02db4ca225f4ec13eb6a08165f7aaa5e2e5dc4c0187cc46fbcd8ce3648bc3ecc |
|
MD5 | 8c37a127fe096495e5117235fa540767 |
|
BLAKE2b-256 | a272a0a264db796264036ab70b4a199ad04816dad1e36d3be97e45e25b6d0693 |