Python bindings for @ggerganov's llama.cpp
Project description
Building the Python bindings
macOS
brew install pybind11
Install python package
From PyPI
pip install llamacpp
From source
poetry install
Get the model weights
You will need to obtain the weights for LLaMA yourself. There are a few torrents floating around as well as some huggingface repositories (e.g https://huggingface.co/nyanko7/LLaMA-7B/). Once you have them, copy them into the models folder.
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model
Convert the weights to GGML format using convert-pth-to-ggml.py
and use the llamacpp-quantize
command to quantize them into INT4. For example, for the 7B parameter model, run
python3 convert-pth-to-ggml.py models/7B/ 1
llamacpp-quantize ./models/7B/
Run demo script
import llamacpp
import os
model_path = "./models/7B/ggml-model-q4_0.bin"
params = llamacpp.gpt_params(model_path,
"Hi, I'm a llama.",
4096,
40,
0.1,
0.7,
2.0)
model = llamacpp.PyLLAMA(model_path, params)
model.predict("Hello, I'm a llama.", 10)
ToDo
- Use poetry to build package
- Add command line entry point for quantize script
- Publish wheel to PyPI
- Add chat interface based on tinygrad
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llamacpp-0.1.2.tar.gz
(3.8 kB
view hashes)
Built Distribution
Close
Hashes for llamacpp-0.1.2-cp310-cp310-macosx_13_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff086605ee4614b201c4afe1c95e643b2976ad749f2eee688eb87d18d77b9db1 |
|
MD5 | ac57c38b769ea81ff2cea71e49b828d5 |
|
BLAKE2b-256 | 46df6d2b947020baa0a8fe574c8c0ace2f064b1a664908039eb6c29d0afab10b |