A Python wrapper of llama.cpp
Project description
xllamacpp - a Python wrapper of llama.cpp
This project forks from cyllama and provides a Python wrapper for @ggerganov's llama.cpp which is likely the most active open-source compiled LLM inference engine. It was spun-off from my earlier, now frozen, llama.cpp wrapper project, llamalib which provided early stage, but functional, wrappers using cython, pybind11, and nanobind. Further development of xllamacpp, the cython wrapper from llamalib, will continue in this project.
Development goals are to:
-
Stay up-to-date with bleeding-edge
llama.cpp(last stable build with llama.cppb4381) -
Produce a minimal, performant, compiled, thin python wrapper around the core
llama-clifeature-set ofllama.cpp. -
Integrate and wrap
llava-clifeatures. -
Integrate and wrap features from related projects such as whisper.cpp and stable-diffusion.cpp
-
Learn about the internals of this popular C++/C LLM inference engine along the way. For me at least, this is definitely the most efficient way to learn about the underlying technologies.
Given that there is a fairly mature, well-maintained and performant ctypes-based wrapper provided by @abetlen's llama-cpp-python project and that LLM inference is gpu-driven rather than cpu-driven, this all may see quite redundant. Nonetheless, we anticipate some benefits to using a compiled cython-based wrapper instead of ctypes:
-
Cython functions and extension classes can enforce strong type checking.
-
Packaging benefits with respect to self-contained statically compiled extension modules, which include simpler compilation and reduced package size.
-
There may be some performance improvements in the use of compiled wrappers over the use of ctypes.
-
It may be possible to incorporate external optimizations more readily into compiled wrappers.
-
It may be useful in case one wants to de-couple the python frontend and wrapper backends to existing frameworks: for example, to just replace the ctypes wrapper part in
llama-cpp-pythonwith compiled cython wrappers and contribute it back as a PR.
Status
Development is done only on macOS to keep things simple, with intermittent testing to ensure it works on Linux.
The following table provide an overview of the current wrapping/dev status:
| status | xllamacpp |
|---|---|
| wrapper-type | cython |
| wrap llama.h + other headers | yes |
| wrap high-level simple-cli | yes |
| wrap low-level simple-cli | yes |
| wrap low-level llama-cli | WIP |
The initial milestone entailed creating a high-level wrapper of the simple.cpp llama.cpp example, followed by a low-level one. The next objective is to fully wrap the functionality of llama-cli which is ongoing (see: xllamacpp.__init__.py).
It goes without saying that any help / collaboration / contributions to accelerate the above would be welcome!
Wrapping Guidelines
As the intent is to provide a very thin wrapping layer and play to the strengths of the original c++ library as well as python, the approach to wrapping intentionally adopts the following guidelines:
-
In general, key structs are implemented as cython extension classses with related functions implemented as methods of said classes.
-
Be as consistent as possible with llama.cpp's naming of its api elements, except when it makes sense to shorten functions names which are used as methods.
-
Minimize non-wrapper python code.
Setup
To build xllamacpp:
-
A recent version of
python3(testing on python 3.12) -
Git clone the latest version of
xllamacpp:
git clone https://github.com/shakfu/xllamacpp.git
cd xllamacpp
git submodule init
git submodule update
- Install dependencies of
cython,setuptools, andpytestfor testing:
pip install -r requirements.txt
- Type
makein the terminal.
This will:
- Download and build
llama.cpp - Install it into
bin,include, andlibin the clonedxllamacppfolder - Build
xllamacpp
Testing
The tests directory in this repo provides extensive examples of using xllamacpp.
However, as a first step, you should download a smallish llm in the .gguf model from huggingface. A good model to start and which is assumed by tests is Llama-3.2-1B-Instruct-Q8_0.gguf. xllamacpp expects models to be stored in a models folder in the cloned xllamacpp directory. So to create the models directory if doesn't exist and download this model, you can just type:
make download
This basically just does:
cd xllamacpp
mkdir models && cd models
wget https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q8_0.gguf
Now you can test it using llama-cli or llama-simple:
bin/llama-cli -c 512 -n 32 -m models/Llama-3.2-1B-Instruct-Q8_0.gguf \
-p "Is mathematics discovered or invented?"
You can also run the test suite with pytest by typing pytest or:
make test
If all tests pass, you can type python3 -i scripts/start.py or ipython -i scripts/start.py and explore the xllamacpp library with a pre-configured repl:
from xllamacpp import Llama
llm = Llama(model_path='models/Llama-3.2-1B-Instruct-Q8_0.gguf')
llm.ask("what is the age of the universe?")
'estimated age of the universe\nThe estimated age of the universe is around 13.8 billion years'
TODO
-
wrap llama-simple
-
wrap llama-cli (WIP: see:
xllamacpp.__init__) -
wrap llama-llava-cli
-
wrap whisper.cpp
-
wrap stable-diffusion.cpp
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 22.0 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ed02103725bcbed4d5471dad0365a59afcc915ad5c2fb3af7c1cbefa779d03e
|
|
| MD5 |
8efbd1a12b5df7859c1477cddabd8d03
|
|
| BLAKE2b-256 |
11bafa22c00861e762e8f7f0f007e3b24fb4a1e6b720fe4976d2400ad895cbb9
|
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp312-cp312-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp312-cp312-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 22.4 MB
- Tags: CPython 3.12, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2d995a7ad164f1655cbbefaef6ff74123e15c253aaebb28da7fbae812c75743
|
|
| MD5 |
f544f6e515bba2bcbd28af150a1b6b6b
|
|
| BLAKE2b-256 |
611c192307117f0087e5faa8bcccd86ab7e0102c6fb99ca50346fb926a0264e1
|
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 22.0 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ed9d84862b3145af0cafdbffc8255e6e55e43acad43c176098d09fc5990bd2a
|
|
| MD5 |
9d1ced687701fc1ae4392498598419e8
|
|
| BLAKE2b-256 |
85abe65ad1cdea09a78314a530760ce353ff8c1546d803bf8d09d1086b5ddb98
|
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp311-cp311-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp311-cp311-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 22.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
311438ff0ca43d92a881099d4055848838fa73698f99493c90bbf877c02736f8
|
|
| MD5 |
c0a20c4ead76b13280fe17b4d87535da
|
|
| BLAKE2b-256 |
af2be93beacc35b65743cc8a3626984b1aa21f6b3b99d1eb4b3e1d25de6c5bb1
|
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 22.0 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a42a3b22ab46fb986945724ec2bd53a933e93854051f74e8b0445968f69fe87d
|
|
| MD5 |
083dbd152b3476de6c4abae69fd9e70d
|
|
| BLAKE2b-256 |
94ddcd479867f261dad0fe54016bb17844a2aa045910f64736063c0334a95543
|
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp310-cp310-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp310-cp310-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 22.4 MB
- Tags: CPython 3.10, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19bcba0d17db32d1c959d18a8d069192ee27027492f0bde51aa6878c82d70c5a
|
|
| MD5 |
ee47ea6a435d1b43dcc25c9d6a48224c
|
|
| BLAKE2b-256 |
49a94cc4b9f52cc4e86e62f845831a0097767262d16974a5b483828407fe9449
|
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp39-cp39-win_amd64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 22.0 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f72fc835df525465dbd878f6c3c597fab99b21cce5165fd3566374889b76d4b
|
|
| MD5 |
40ae32920725dbd7db5e4bc7c63c08b6
|
|
| BLAKE2b-256 |
1c4eee6161b918da0a299bd4c93b57fe5878592a0f74074711abc1c0721521f0
|
File details
Details for the file xllamacpp_cuda12x-0.1.4-cp39-cp39-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: xllamacpp_cuda12x-0.1.4-cp39-cp39-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 22.4 MB
- Tags: CPython 3.9, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e68079150dfb9a5757c7fd6f99ca130a52019263acf8d8518015effb81a2ce4f
|
|
| MD5 |
4709fff218b727bad97f2e17dfe5e80a
|
|
| BLAKE2b-256 |
8cee47c2a7dc1ddcb9f43d5d85dfd47a4a1fe03b035545aeab66c0bb19b318fe
|