llama.cpp server binary built from source
Project description
llama-cpp-bin
Pre-built llama.cpp server binaries as a py package. Install a wheel for your platform and run it.
Install
Pre-built wheels (recommended)
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cpu llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu131 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/rocm llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/vulkan llama-cpp-bin
Pin to a specific version:
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin==9095.0.0
PyPI (builds from source)
If no pre-built wheel matches your platform, pip falls back to building from the sdist on PyPI:
pip install llama-cpp-bin
You will need CMake, a c++ compiler, and the llama.cpp source submodule.
Dev
git clone --recurse-submodules https://github.com/vladlearns/llama-cpp-bin
cd llama-cpp-bin
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -v .
Run
CLI:
llama-cpp-server -m your-model.gguf --port 8080
Python:
from llama_cpp_bin import run_server
proc = run_server("your-model.gguf", port=8080)
proc.wait()
Or get the binary path and run it yourself:
import llama_cpp_bin
import subprocess
binary = llama_cpp_bin.get_binary_path()
subprocess.Popen([binary, "--model", "your-model.gguf"])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llama_cpp_bin-9830.0.0.tar.gz
(34.6 MB
view details)
File details
Details for the file llama_cpp_bin-9830.0.0.tar.gz.
File metadata
- Download URL: llama_cpp_bin-9830.0.0.tar.gz
- Upload date:
- Size: 34.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b155e65ac4f478798a1070115def2f032d3ef76b02a4d7e97af94c68969bc630
|
|
| MD5 |
eb28ef8880973941d8a6c860b5cfee96
|
|
| BLAKE2b-256 |
327a5b4210ac263ce2ca667c97ee2f1fd1115daf58ed064e67fac3241b8d7131
|
Provenance
The following attestation bundles were made for llama_cpp_bin-9830.0.0.tar.gz:
Publisher:
build-everything.yml on vladlearns/llama-cpp-bin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_cpp_bin-9830.0.0.tar.gz -
Subject digest:
b155e65ac4f478798a1070115def2f032d3ef76b02a4d7e97af94c68969bc630 - Sigstore transparency entry: 1995256350
- Sigstore integration time:
-
Permalink:
vladlearns/llama-cpp-bin@8d1252ffd8db1a8918feef4a6445a4317469c6ea -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vladlearns
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-everything.yml@8d1252ffd8db1a8918feef4a6445a4317469c6ea -
Trigger Event:
workflow_dispatch
-
Statement type: