Skip to main content

Python binding for llama.cpp using cffi

Project description

llama-cpp-cffi

PyPI Supported Versions PyPI Downloads Github Downloads License: MIT

Python binding for llama.cpp using cffi. Supports CPU, Vulkan 1.x and CUDA 12.6 runtimes, x86_64 and aarch64 platforms.

NOTE: Currently supported operating system is Linux (manylinux_2_28 and musllinux_1_2), but we are working on both Windows and MacOS versions.

Install

Basic library install:

pip install llama-cpp-cffi

In case you want OpenAI © Chat Completions API compatible API:

pip install llama-cpp-cffi[openai]

IMPORTANT: If you want to take advantage of Nvidia GPU acceleration, make sure that you have installed CUDA 12. If you don't have CUDA 12.X installed follow instructions here: https://developer.nvidia.com/cuda-downloads .

GPU Compute Capability: compute_61, compute_70, compute_75, compute_80, compute_86, compute_89 covering from most of GPUs from GeForce GTX 1050 to NVIDIA H100. GPU Compute Capability.

Example

Library Usage

References:

  • examples/demo_tinyllama_chat.py
  • examples/demo_tinyllama_tool.py
from llama import llama_generate, get_config, Model, Options


model = Model(
    creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
    hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
    hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)

config = get_config(model.creator_hf_repo)

messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': '1 + 1 = ?'},
    {'role': 'assistant', 'content': '2'},
    {'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'},
]

options = Options(
    ctx_size=config.max_position_embeddings,
    predict=-2,
    model=model,
    prompt=messages,
)

for chunk in llama_generate(options):
    print(chunk, flush=True, end='')

# newline
print()

OpenAI © compatible Chat Completions API - Server and Client

Run OpenAI compatible server:

python -m llama.openai
# or
python -B -u -m gunicorn --bind '0.0.0.0:11434' --timeout 300 --workers 1 --worker-class aiohttp.GunicornWebWorker 'llama.openai:build_app()'

Run OpenAI compatible client examples/demo_openai_0.py:

python -B examples/demo_openai_0.py
from openai import OpenAI
from llama import Model


client = OpenAI(
    base_url = 'http://localhost:11434/v1',
    api_key='llama-cpp-cffi',
)

model = Model(
    creator_hf_repo='TinyLlama/TinyLlama-1.1B-Chat-v1.0',
    hf_repo='TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF',
    hf_file='tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf',
)

messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': '1 + 1 = ?'},
    {'role': 'assistant', 'content': '2'},
    {'role': 'user', 'content': 'Evaluate 1 + 2 in Python.'}
]


def demo_chat_completions():
    print('demo_chat_completions:')

    response = client.chat.completions.create(
        model=str(model),
        messages=messages,
        temperature=0.0,
    )

    print(response.choices[0].message.content)


def demo_chat_completions_stream():
    print('demo_chat_completions_stream:')

    response = client.chat.completions.create(
        model=str(model),
        messages=messages,
        temperature=0.0,
        stream=True,
    )

    for chunk in response:
        print(chunk.choices[0].delta.content, flush=True, end='')

    print()


if __name__ == '__main__':
    demo_chat_completions()
    demo_chat_completions_stream()

Demos

python -B examples/demo_tinyllama_chat.py
python -B examples/demo_tinyllama_tool.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl (160.4 MB view details)

Uploaded PyPy manylinux: glibc 2.28+ x86-64

llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (13.4 MB view details)

Uploaded PyPy manylinux: glibc 2.28+ ARM64

llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.13 musllinux: musl 1.2+ x86-64

llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_aarch64.whl (11.2 MB view details)

Uploaded CPython 3.13 musllinux: musl 1.2+ ARM64

llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_x86_64.whl (160.4 MB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ x86-64

llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_aarch64.whl (13.5 MB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ ARM64

llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.12 musllinux: musl 1.2+ x86-64

llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_aarch64.whl (11.2 MB view details)

Uploaded CPython 3.12 musllinux: musl 1.2+ ARM64

llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_x86_64.whl (160.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_aarch64.whl (13.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ ARM64

llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.2+ x86-64

llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_aarch64.whl (11.2 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.2+ ARM64

llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_x86_64.whl (160.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_aarch64.whl (13.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ ARM64

llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.2+ x86-64

llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_aarch64.whl (11.2 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.2+ ARM64

llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_x86_64.whl (160.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_aarch64.whl (13.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ ARM64

File details

Details for the file llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b76c6cac450e57343f648e253416397ac85e20eeb0ede19bd5467d98c5eb5740
MD5 4a9134ea7fc99acaae859158c5ca9609
BLAKE2b-256 1fe81c57c27dd292856a6015fae89d77b8c7bb1a086638c6032e21a110697bb6

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 99d9b188bf49daffbef2380fb1b32057427f31c2f3719b61726a562bcefb8add
MD5 17c3f529ecfd2eb404aeaec29f010926
BLAKE2b-256 eff48ec85a771dd51a1e075f222979ec8078e1d673d83e047aa3dc254cd16c3c

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8057b4471d22dbcb895a335ed433b57020304309e74fcab38f230b6f38a75444
MD5 82199838931ece4d28c90e46860957d2
BLAKE2b-256 519f624e877bda369e1432aa77f0b7d7990bac1abd148b48de31bbd504866ef7

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 42215c3c15432bc81b7a830b3100a2188ce8c1ad9f013ee47367afabd88972b2
MD5 752529145441045f9e6f3f210022e1b1
BLAKE2b-256 df2941db1daf222787e21112e7a2cb5feceec7b4ed61be4dee1523835a4ed58c

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 304e844a38d41db9908af9affec082fb6be23cd44caedc9a0e0d80a3c7821085
MD5 4ca420cf2f60f4952bb7f5282f1806e1
BLAKE2b-256 459929441aa32c8991bdf43e0da7867450558cae5bb8ce6c5f58e4fbd83b413b

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8d3694fb133a1f06d198dce91b4292eb7b2d9cd06af4256626fbe15a9e511384
MD5 3f76696a136c2de2d495678a93abdc57
BLAKE2b-256 2d52961620408199589feb112d5a15523d06b73616fe5bf5e9a33d119931c4f5

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 73b7e570e21613c1c2ba3e2404e03065cd7ec3b1e458d749cd7b696aea6bda0e
MD5 f798f4394f24be856b32b7b7a2088d8c
BLAKE2b-256 10c0aa89774f11056f720f0ee7e8478bef11aea465f666b495cd30e1e2314af2

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 06ee6250087c0a26a7e872e896cb9026f878b231e3cf41118098d134ef6b790c
MD5 d8eeb34b3b53e55e319135bac504382e
BLAKE2b-256 81d658df058138d79b655c00637dcb999cfc485ed62746e7a688fe2890e42db3

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 12d185b0fea9b8eade16849d8ec11a706a3302fc4522a41b6acede469b5e0566
MD5 30e8bfb233c5772804ef7ff5ff690cf7
BLAKE2b-256 70109fdc9c7a29d77362c95240e839c6ca48281c2efa13773b664895156c710c

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6eef28a95898a94f83c79d3d7bf17ce10cdd35b3c60dd6e8beb5a0084d3156bf
MD5 f92f1c99159853c28d0b9a992bee8e9f
BLAKE2b-256 54b5711a98cfe569780f21332d45dc10270b8f593230cd354bb86ca1a8734f72

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 977dd1776a17dfb7c8351a9af4f8885f0866c725064eac90a673df708fb20dd0
MD5 92f6c7fd2810efa12990a424c57ceca1
BLAKE2b-256 2dcd69860e80804c9ce00463bae45b922a94f869f9c0a25b5018b391e5dcb256

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 9f71db2a5b6c8a99d59ea0967f7a2908f4219aff61228a7fa0d77b60857676f2
MD5 d3a08bdaf2c8d41261e0119cdbd90c06
BLAKE2b-256 99383d3ccfb4cd8dce33fd7769a48e2292b12e7072a26adf42e567f2a1163551

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 357ef98a40e0e1408860d042a83a44a41fad5d0b5402fb589ee8b4e867c5b3ed
MD5 3cb4ec71b113d08c82c7842f72c0cce8
BLAKE2b-256 dfb18ed375ccd272d92cf6f3e141a0da11407a95c11f57367b6511910a262d25

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f13254bd8fc48fa9660be938e8df56e4b541342d28334a487409a7ba9797f4da
MD5 42d5d856091765284d14a83bacf75138
BLAKE2b-256 046d3e0d917dd845c3f22b6dde918a984e85113e2dfef57f5fb8fb44beb760bf

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9eefa81bd791d1cd2e37b6fe01123baa558f6927ccf231c530207b167e03a94b
MD5 18209ae4f1bac3d6fec8cdda2b38c7ee
BLAKE2b-256 6745ea7b0bfefe5c9b0c811261fc6e638be38a5e9498fc07194c6b4dcb608397

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 e72353095dd7715c6f18e038eb761af5e7f9f4aa52c2624f1bea66a3ff61e3bd
MD5 d9822f15e319d03691e8a0807f0fd9fb
BLAKE2b-256 955989888a71cb636717211beef4bfaa122a499d876b9a1713dc288fbf687bef

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d78b5e0adde6f9112bab381bf828ac88a02d57533b12f9f284573c64d7eb420e
MD5 abeb4eb3a7519dd20b75b6b6acba4b19
BLAKE2b-256 63fd13d05141c1e706d626f8738f5e18b039499b1a0cc5fdd8c9628959153741

See more details on using hashes here.

File details

Details for the file llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for llama_cpp_cffi-0.1.21-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 01c8a1c3949736dd7c48421547107ce4d9852b37e489e5a2d518cc69f76886a8
MD5 83c782aeaf266b8ec9133679a3549709
BLAKE2b-256 149dca3696b50c20646a99a5b78df186217f77d553ac40654c5066dcb3470950

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page