llama.cpp server binary built from source
Project description
llama-cpp-bin
Pre-built llama.cpp server binaries as a py package. Install a wheel for your platform and run it.
Install
Pre-built wheels (recommended)
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cpu llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu131 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/rocm llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/vulkan llama-cpp-bin
Pin to a specific version:
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin==9095.0.0
PyPI (builds from source)
If no pre-built wheel matches your platform, pip falls back to building from the sdist on PyPI:
pip install llama-cpp-bin
You will need CMake, a c++ compiler, and the llama.cpp source submodule.
Dev
git clone --recurse-submodules https://github.com/vladlearns/llama-cpp-bin
cd llama-cpp-bin
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -v .
Run
CLI:
llama-cpp-server -m your-model.gguf --port 8080
Python:
from llama_cpp_bin import run_server
proc = run_server("your-model.gguf", port=8080)
proc.wait()
Or get the binary path and run it yourself:
import llama_cpp_bin
import subprocess
binary = llama_cpp_bin.get_binary_path()
subprocess.Popen([binary, "--model", "your-model.gguf"])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llama_cpp_bin-9785.0.0.tar.gz
(34.5 MB
view details)
File details
Details for the file llama_cpp_bin-9785.0.0.tar.gz.
File metadata
- Download URL: llama_cpp_bin-9785.0.0.tar.gz
- Upload date:
- Size: 34.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd8b3bea281ec6df765859ab02837e0838cba1cd5dc1365b3b83378fa8a0dc4a
|
|
| MD5 |
03460a47882cbfa26c6771d2860eca15
|
|
| BLAKE2b-256 |
018ff29a2a713e81d79c99f501ae86bf505c9efdf32c3744b6abadf370e91f80
|
Provenance
The following attestation bundles were made for llama_cpp_bin-9785.0.0.tar.gz:
Publisher:
build-everything.yml on vladlearns/llama-cpp-bin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_cpp_bin-9785.0.0.tar.gz -
Subject digest:
fd8b3bea281ec6df765859ab02837e0838cba1cd5dc1365b3b83378fa8a0dc4a - Sigstore transparency entry: 1945771296
- Sigstore integration time:
-
Permalink:
vladlearns/llama-cpp-bin@12f224f6422ab741a7c7387dbd69435f2fd80aed -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vladlearns
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-everything.yml@12f224f6422ab741a7c7387dbd69435f2fd80aed -
Trigger Event:
workflow_dispatch
-
Statement type: