llama.cpp server binary built from source
Project description
llama-cpp-bin
Pre-built llama.cpp server binaries as a py package. Install a wheel for your platform and run it.
Install
Pre-built wheels (recommended)
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cpu llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu131 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/rocm llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/vulkan llama-cpp-bin
Pin to a specific version:
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin==9095.0.0
PyPI (builds from source)
If no pre-built wheel matches your platform, pip falls back to building from the sdist on PyPI:
pip install llama-cpp-bin
You will need CMake, a c++ compiler, and the llama.cpp source submodule.
Dev
git clone --recurse-submodules https://github.com/vladlearns/llama-cpp-bin
cd llama-cpp-bin
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -v .
Run
CLI:
llama-cpp-server -m your-model.gguf --port 8080
Python:
from llama_cpp_bin import run_server
proc = run_server("your-model.gguf", port=8080)
proc.wait()
Or get the binary path and run it yourself:
import llama_cpp_bin
import subprocess
binary = llama_cpp_bin.get_binary_path()
subprocess.Popen([binary, "--model", "your-model.gguf"])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llama_cpp_bin-9802.0.0.tar.gz
(34.5 MB
view details)
File details
Details for the file llama_cpp_bin-9802.0.0.tar.gz.
File metadata
- Download URL: llama_cpp_bin-9802.0.0.tar.gz
- Upload date:
- Size: 34.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6a49a118f1dc18a9b57e3d8437fe0a1af8c68017a367792fc4d016378aca5cb
|
|
| MD5 |
ce1c5975488951f77628b91eb55b379b
|
|
| BLAKE2b-256 |
1b95a35203706b0ec367190ef685359c311f67ed76fd6b6a7ec4f0b26ad44e14
|
Provenance
The following attestation bundles were made for llama_cpp_bin-9802.0.0.tar.gz:
Publisher:
build-everything.yml on vladlearns/llama-cpp-bin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_cpp_bin-9802.0.0.tar.gz -
Subject digest:
f6a49a118f1dc18a9b57e3d8437fe0a1af8c68017a367792fc4d016378aca5cb - Sigstore transparency entry: 1960993751
- Sigstore integration time:
-
Permalink:
vladlearns/llama-cpp-bin@1662f978942bdb22ba0b5ab482f403919147c5f8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vladlearns
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-everything.yml@1662f978942bdb22ba0b5ab482f403919147c5f8 -
Trigger Event:
workflow_dispatch
-
Statement type: