Python binding for llama.cpp using cffi
Project description
llama-cpp-cffi
Python binding for llama.cpp using cffi. Supports CPU, CUDA 12.5.1 and CUDA 12.6 execution.
NOTE: Currently supported operating system is Linux (manylinux_2_28
and musllinux_1_2
), but we are working on both Windows and MacOS versions.
Install
Basic library install:
pip install llama-cpp-cffi
In case you want OpenAI © Chat Completions API compatible API:
pip install llama-cpp-cffi[openai]
IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .
GPU Compute Capability: compute_61
, compute_70
, compute_75
, compute_80
, compute_86
, compute_89
covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.
Example
Library Usage
References:
examples/demo_tinyllama_chat.py
examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
config = get_config(model.creator_hf_repo)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]
options = Options(
ctx_size=config.max_position_embeddings,
predict=-2,
model=model,
prompt=messages,
)
for chunk in llama_generate(options):
print(chunk, flush=True, end='')
# newline
print()
OpenAI © compatible Chat Completions API - Server and Client
Run OpenAI compatible server:
python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'
Run OpenAI compatible client examples/demo_openai_0.py
:
python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model
client = OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='llama-cpp-cffi',
)
model = Model(
creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': '1 + 1 = ?'},
{'role': 'assistant', 'content': '2'},
{'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]
def demo_chat_completions():
print('demo_chat_completions:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
)
print(response.choices[0].message.content)
def demo_chat_completions_stream():
print('demo_chat_completions_stream:')
response = client.chat.completions.create(
model=str(model),
messages=messages,
temperature=0.0,
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content, flush=True, end='')
print()
if __name__ == '__main__':
demo_chat_completions()
demo_chat_completions_stream()
Demos
python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for llama_cpp_cffi-0.1.14-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc23d46795243d78e1b18d86fc74df05183c48498ffce6c82b786c8d5e19af7a |
|
MD5 | 808dd4ca1c0b46a52275469d026aec6a |
|
BLAKE2b-256 | c82405503d1ebfc172bfe461a873cc6157505db4465853234fd63ac5753d2b97 |
Hashes for llama_cpp_cffi-0.1.14-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f68b44157bf614265631250108b52c55e3f60d18f2dd7ff1728e6ba01fa0df67 |
|
MD5 | d72f26c992d1e8f32bb1740ea7a408ba |
|
BLAKE2b-256 | 7daa05bc37f12bcefc82642fd4e630a6a5ca98f2d26774d31c1c4da5716afcdb |
Hashes for llama_cpp_cffi-0.1.14-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8084a344ec80f79c9088f9661aeb06c36be0f8a9ba038d12c93c7ce66e84d967 |
|
MD5 | c0a1b8c63bfe236ad4f39366c93776dd |
|
BLAKE2b-256 | 4a62c7c0a857f114955f8e504a78c76f09e6c1ffc1d4d820383867b02cad8fdc |
Hashes for llama_cpp_cffi-0.1.14-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d7306c7e62c8f5fe7bc9c60c857b23f44dbeee714471a6f4770b155684d77d3 |
|
MD5 | 36cffbdf16af74ccdf316e2ac177c0e7 |
|
BLAKE2b-256 | 670559712a5e0ce36caccf741f48fc1f7f4d84f2bebf588d4d54f70ecd3e3e58 |
Hashes for llama_cpp_cffi-0.1.14-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 728a7bfac3b5cd53c7136152a1964b9ea06dd96bc1bc572b5660f1922ee53658 |
|
MD5 | cceed7d7c9d2d90606cacb6f87cbd05d |
|
BLAKE2b-256 | 3ada9237795f851571ca57540e91d03f85e652785d2c9fab92643b69fa93375d |
Hashes for llama_cpp_cffi-0.1.14-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75ff04a98ffeef9c29a09b06a054c05d3cd78a3ee3fb4e53c67e0230639cf11d |
|
MD5 | 37ac7290b649677fd1b5a70b464bedf2 |
|
BLAKE2b-256 | 2b9aa9e2c6fbae348061d681a1111277644ab566a4a8c23da15778f0502f6f4f |
Hashes for llama_cpp_cffi-0.1.14-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4b8815b4deca59757f33b2ccc247e61c05ce87ed3fa5e89b1b44309e60076ba0 |
|
MD5 | 74924597a94e5c4cb158dfc80eef83e0 |
|
BLAKE2b-256 | e64ed5c87c8335c0ab6121eb0bf78ea33a16159c78d2452093c631c2a69bae9a |
Hashes for llama_cpp_cffi-0.1.14-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0c5c4eaf296c3146c4659739e1f997c66b6881618bf14b74fdbbccd035f1741 |
|
MD5 | c3ccc4c9cfccaccc1065cf4942bc0bc4 |
|
BLAKE2b-256 | 8bb13ad371ae0dabdf16a768ca1198c697e57868d82f65e2d901016d8be7b3b5 |
Hashes for llama_cpp_cffi-0.1.14-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d2248fa7d961c5f28ea3b085e225b71a7a4817d48c80e3628e72bbf06f675e4 |
|
MD5 | 994f1165a7c92c9cb83090f31d5831c6 |
|
BLAKE2b-256 | 7f50a0fd5da9e8babd83894111a65bfa50342689ba5440b2f23c11d865fdc4fd |
Hashes for llama_cpp_cffi-0.1.14-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b702ae201966e74272c889f0121f063ec1a07c2eaf9fc1298b17630ae595d64c |
|
MD5 | 8f683fc27c8b844b3ba1e03272aa6771 |
|
BLAKE2b-256 | a5a77845b62499feb07e397d52c2f6660396f1c37273dcf620b5b1936cac143e |
Hashes for llama_cpp_cffi-0.1.14-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51a99c8a72811ee0035da6e7072702079d3c309b00b97157d50cd0064ea9ddf8 |
|
MD5 | 9cebcd8862e086c336b8f5be113ae17b |
|
BLAKE2b-256 | 21ae6305fc2baa2d66e85f17a49fff720302c05cda52f82867f8d53d15a2d40c |
Hashes for llama_cpp_cffi-0.1.14-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a165a45bdcb846b37529e4d9f39759bf15560d3c6feb6ccbe9d759aa2bc1bf91 |
|
MD5 | 414c514f1e04ba37a91204a23bb0ba0b |
|
BLAKE2b-256 | e173ededa2fcfed3518fe881cc6f7b7339372ecc23e3cc98260779119db6752a |
Hashes for llama_cpp_cffi-0.1.14-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89c7f4208655f04e4956a60862ae3231570e7c9fad122bd39a5e5b55152d3577 |
|
MD5 | a65c9da00f33585569e6d2454c04fc79 |
|
BLAKE2b-256 | b0bd37a5a94081248bd15f0d6948d5a6268ce7db520e6c5008c424ec56a9c87f |
Hashes for llama_cpp_cffi-0.1.14-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d9db8b126a9395079b83a7c040c462a278e492cf6c7934514a56fa66de2402f |
|
MD5 | 8230f3aeb3222b2a323323cb2e1c1101 |
|
BLAKE2b-256 | 16e00df8752d678d224564e26ca1d50955be486ff8eea7b9fe801c29fe208125 |
Hashes for llama_cpp_cffi-0.1.14-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2e7158c9fd9d4d767de7f4f1e96ff23574ed1d41eb2df0a047811b52f9d1dc1 |
|
MD5 | 33fdcd8d66dec94f8cb6eac97484afcc |
|
BLAKE2b-256 | 65d17ad6ea620ef3447f9abd8dfe8ef7eae840890069e02de1a35f022d5abef5 |
Hashes for llama_cpp_cffi-0.1.14-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ae6ecf550e6f2dc40b1381b3705904a41a991e1a98fc49c4a03b87e682863e0 |
|
MD5 | 5ec695c8feb394303778c9b0d7f5c6a0 |
|
BLAKE2b-256 | 4a6304ff5ad184d16b74adcf22c0e9e02cbb61793a3efe6e976aa79cb65d2f0d |
Hashes for llama_cpp_cffi-0.1.14-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13f8c4cb2b62d875a3124f786d75784da4fcb0adf36467d782cd2ce87c1048d7 |
|
MD5 | fe713ef58e26f4dbf067f6066cac09f7 |
|
BLAKE2b-256 | 5bade7a78a74cebd87e66270ebcfffb7debca55f055431389df9771cdf430e6c |
Hashes for llama_cpp_cffi-0.1.14-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53dd50ac2ad1ee5ea9330d3a5802cf3b6a661d7298d9794a252bc621ba45ec64 |
|
MD5 | dbc4c04547853799f0a4d9fa985ec13b |
|
BLAKE2b-256 | 5246c7a0114fc08051d912da4d9091093aba4175c871b1b963e3d41b39261880 |