Python bindings for @ggerganov's llama.cpp
Project description
Python bindings for llama.cpp
Building the Python bindings
macOS
brew install pybind11 # Installs dependency
git submodule init && git submodule update
poetry install
From PyPI
pip install llamacpp
Get the model weights
You will need to obtain the weights for LLaMA yourself. There are a few torrents floating around as well as some huggingface repositories (e.g https://huggingface.co/nyanko7/LLaMA-7B/). Once you have them, copy them into the models folder.
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model
Convert the weights to GGML format using llamacpp-convert
. Then use llamacpp-quantize
to quantize them into INT4. For example, for the 7B parameter model, run
llamacpp-convert ./models/7B/ 1
llamacpp-quantize ./models/7B/
llamacpp-cli
Note that running llamacpp-convert
requires torch
, sentencepiece
and numpy
to be installed. These packages are not installed by default when your install llamacpp
.
Command line interface
The package installs the command line entry point llamacpp-cli
that points to llamacpp/cli.py
and should provide about the same functionality as the main
program in the original C++ repository. There is also an experimental llamacpp-chat
that is supposed to bring up a chat interface but this is not working correctly yet.
Demo script
See llamacpp/cli.py
for a detailed example. The simplest demo would be something like the following:
params = llamacpp.gpt_params(
'./models/7B/ggml_model_q4_0.bin', # model,
"A llama is a ", # prompt
"", # reverse_prompt
512, # ctx_size
100, # n_predict
40, # top_k
0.95, # top_p
0.85, # temp
1.30, # repeat_penalty
-1, # seed
8, # threads
64, # repeat_last_n
8, # batch_size
False, # color
False, # interactive or args.interactive_start
False, # interactive_start
)
model = llamacpp.PyLLAMA(params)
model.add_bos() # Adds "beginning of string" token
model.update_input(params.prompt)
model.print_startup_stats()
model.prepare_context()
model.ingest_all_pending_input(True)
while not model.is_finished():
model.ingest_all_pending_input(not input_noecho)
text, is_finished = model.infer_text()
print(text, end="")
if is_finished:
break
ToDo
- Use poetry to build package
- Add command line entry point for quantize script
- Publish wheel to PyPI
- Add chat interface based on tinygrad
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for llamacpp-0.1.6-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0aef5029463ae20c5c8e15f9c75fe3c9b2c621c690bed8a0558d1956811e1f02 |
|
MD5 | 83d8bc2b3c6189b9072a502da80259f3 |
|
BLAKE2b-256 | e053dcbcb4959f24cff2cd7468e0423beb05827f535a3e54f12b7c8e6a25eb80 |
Hashes for llamacpp-0.1.6-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 833d699a5f2b3174b0588c938e2e8073a4c3c601f8eb56b54f3651fb89903dde |
|
MD5 | 7c0c42d560406364afa219604ee7a678 |
|
BLAKE2b-256 | 59e27837729dc16d48758b19049ff90697ffc9edc395dc43e1c99feb1e461e63 |
Hashes for llamacpp-0.1.6-cp311-cp311-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f59f4077dfff2aa50b6e8042f885a2ce179d5e40dd053b397bc6754319b639a8 |
|
MD5 | 7c9cb5f31ad0bf978a835be472ca97d1 |
|
BLAKE2b-256 | 3814a83220b2243d6a894231f2b3bb92b6009300e377f7cd0795df0390b62305 |
Hashes for llamacpp-0.1.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73073102aa60eca9addf8aa3706d5309334042362c30d0fcfa9c7fc410350de3 |
|
MD5 | 7ccd52a81cbe5bf2efd0f8626062fc91 |
|
BLAKE2b-256 | 498e26465fe7ccf9dc243f7b1377523a462d400aea5b99312032e4a640645690 |
Hashes for llamacpp-0.1.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ea438c94f9b0d36bb2fdd18c41870d8ce1bccece7b4948419a52d4587013202 |
|
MD5 | 6096237cd70a3da7dc08bcfad41dcbdd |
|
BLAKE2b-256 | a074237588b1be4431ba5f1ce405681ef7c4bd07a7eb9feab404be0ee8dccf57 |
Hashes for llamacpp-0.1.6-cp311-cp311-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3423d8ae33d61c7b312a92c7599b945e6a84313d34959de48adc4a81665ff81 |
|
MD5 | f6e7f8d593041291e4a570c9fdb6aad6 |
|
BLAKE2b-256 | 4791f56ff4de0093c218169879915f209e5b5c96dc4b75073a641ce8b3b0d6de |
Hashes for llamacpp-0.1.6-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c906335f13789674ae7a985d5fe9422cda9e74b9e0c31d1112c66dd3edf902d |
|
MD5 | 2471e2d83efe4f8bbaa07ce581f7b88a |
|
BLAKE2b-256 | 71970bcccdc35c60cb6d1ea4e475c5d34a49fd6d81c8834e680ba5cfd52aa836 |
Hashes for llamacpp-0.1.6-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c2255b6ce7471d24d201668c9edc0e2ffbc0b73288dcaac4e32a9e90500fd02 |
|
MD5 | 970d28d1b9f7ac29a2678a0c06de2429 |
|
BLAKE2b-256 | a4118ce809a902e03cce530ee7e014d4b32204d3ba11ae3d5859ca41c004caae |
Hashes for llamacpp-0.1.6-cp310-cp310-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58f4a46f30444b9fd6c00352f10771982432f2a9b76a6f5901fcdc07a663fefb |
|
MD5 | 3907d81e2e7b57c2d1e285ecd5fb79d0 |
|
BLAKE2b-256 | ab877351229eb95708c6367137fa65689a7a5ece94167829db19b3b711b825e3 |
Hashes for llamacpp-0.1.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72ebebb7af177b35597eac3e57294b76babab0e10c230311f6554f0258b53d50 |
|
MD5 | 313cad092eded390a38019e6fe5ffc94 |
|
BLAKE2b-256 | a6eaf6ae059790671f1ed5d89c0b5acf9104163078b28fe2b55057ad6c8a66e8 |
Hashes for llamacpp-0.1.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61016f02ade284b7d7d0d3b24a44748dbe1bc8ebd817110baef09ddbc66941ab |
|
MD5 | f8535494a52d2e6ebb18fb8f36602901 |
|
BLAKE2b-256 | 29614ed632fd907524b8e74d80df7f5274e9421a9766ceb412991d1e36791fc8 |
Hashes for llamacpp-0.1.6-cp310-cp310-macosx_13_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 527e2de41cf713150dd85bcc2baf4c78dcf31c2fa69ca30ccc5366bd616b20a6 |
|
MD5 | d215cc379f4948b858e51ac1bd3ba28a |
|
BLAKE2b-256 | dd6eac544777daa581b82bf19cae762eee7938ad2e60e56bc0eafd6f362b13fb |
Hashes for llamacpp-0.1.6-cp310-cp310-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c523dd9d288c349337d02c44f7c3fa850986646861f568b2ddb5122ead5c890 |
|
MD5 | 772802e2d91df095c8b34cf6a567e5bf |
|
BLAKE2b-256 | 3f2ea7375f71e08d9ca333ca2ebe718a6cdaa22559c5c4a0461886c3db9ebce3 |
Hashes for llamacpp-0.1.6-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 263973a64ec576d320c0fb31479f3183e26e01433d38ec4766ebf3a7b6c6ea68 |
|
MD5 | 757785b2f64d6c28a1f10dd2685d25fd |
|
BLAKE2b-256 | 389780a95fff415eefaf88fa086cb5733640ffc04fdcb31534b940fb457ea6a7 |
Hashes for llamacpp-0.1.6-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb1fdd61d8e733449253837d65051ac6d16c038edd7cefea956f4cff4478e031 |
|
MD5 | 8d782dd4f240a99f91819530a63a42dc |
|
BLAKE2b-256 | 766326792412dae6a64f4ac7ef53d1f0d38a32cc6b5dea2f04be1e7b4947e8dd |
Hashes for llamacpp-0.1.6-cp39-cp39-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a18c3c0047134ca7da6daf32dd218ed6b671b822380a9c1ae9a616278e34dc7e |
|
MD5 | cbe5ac2ac8b4410a19b1256a0cc1c12f |
|
BLAKE2b-256 | df749cbf393225601be12b16c31252a63edeec9a0b077222d88063e483727e92 |
Hashes for llamacpp-0.1.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82f58b73b48003d6ebf23ce8908c6492458f7f2023756fac286b6f381f6c770e |
|
MD5 | 4fbb21aa3642522dc81a71e8c143db3e |
|
BLAKE2b-256 | 3a3acbb8cfaffe13462ac0d48a7dbb2e9602f4f5d2c9b7eec60f8f8c1c06583c |
Hashes for llamacpp-0.1.6-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd0eda68c6e415e388fb1ce63610bf7fa208a96a8387ef1722facde3459288a6 |
|
MD5 | 7a6fb90d21028dd164ac3933bbee0fda |
|
BLAKE2b-256 | a1961e18108209a21eeea7c582f2e5ceba5baa3c92765036a695732fe0b5f36b |
Hashes for llamacpp-0.1.6-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44a548da30b8076e8b8e9c361531360638e1c404ea13e05cb641b0a065ec08e3 |
|
MD5 | f5a0de32942f027967602700979c68f5 |
|
BLAKE2b-256 | 5a95a812c72d16a1f726a9e3ddd9d95d61a4a90c07271c62102b6f9e8ff4b2b7 |
Hashes for llamacpp-0.1.6-cp38-cp38-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e08be6e7bc85e48644d42dce789ff2248d7bb5d10f0cebfe9a148e1a4c095df9 |
|
MD5 | f80083de97cfccc29de3ee24d1916060 |
|
BLAKE2b-256 | da7ed220d66ef8e7b9933b9094614b937e3c15ab325d02605942dca5cf7e2051 |
Hashes for llamacpp-0.1.6-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c43ff77ff18c6b2028c368bfc386eb40bf3b35ae9ccc80c4ba75180b0fde9a70 |
|
MD5 | 1135e99835319917816056731d1ebb60 |
|
BLAKE2b-256 | 7e7f82bcd6e9953838f6ae26ed1f2f2351bc3eaf71df5038d63852d1af4f2e06 |
Hashes for llamacpp-0.1.6-cp37-cp37m-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 849a5b26cda4152a710af3a323c28dd0533f537d35bacbd5d10cdb153d6fbc4f |
|
MD5 | 1ec642a6f3fa3aa41219fbc42d7e9b72 |
|
BLAKE2b-256 | 8e12753f7b875e5df902695da07f8495617180f7f0683dcd56e3e809e8f2b745 |
Hashes for llamacpp-0.1.6-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba00a768efe4b11d8d8cdf6cc753a630d8466bb6412daed7f9800be0f4e37a7b |
|
MD5 | 4303964aef9b0d542303ce5372d790a8 |
|
BLAKE2b-256 | c79cb4fb10f793bb4149162ab6d23653659f7c590ed15cf851146d9ebda19f3e |