llama.cpp server binary built from source
Project description
llama-cpp-bin
Pre-built llama.cpp server binaries as a py package. Install a wheel for your platform and run it.
Install
Pre-built wheels (recommended)
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cpu llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu131 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/rocm llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/vulkan llama-cpp-bin
Pin to a specific version:
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin==9095.0.0
PyPI (builds from source)
If no pre-built wheel matches your platform, pip falls back to building from the sdist on PyPI:
pip install llama-cpp-bin
You will need CMake, a c++ compiler, and the llama.cpp source submodule.
Dev
git clone --recurse-submodules https://github.com/vladlearns/llama-cpp-bin
cd llama-cpp-bin
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -v .
Run
CLI:
llama-cpp-server -m your-model.gguf --port 8080
Python:
from llama_cpp_bin import run_server
proc = run_server("your-model.gguf", port=8080)
proc.wait()
Or get the binary path and run it yourself:
import llama_cpp_bin
import subprocess
binary = llama_cpp_bin.get_binary_path()
subprocess.Popen([binary, "--model", "your-model.gguf"])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
llama_cpp_bin-9763.0.0.tar.gz
(34.5 MB
view details)
File details
Details for the file llama_cpp_bin-9763.0.0.tar.gz.
File metadata
- Download URL: llama_cpp_bin-9763.0.0.tar.gz
- Upload date:
- Size: 34.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9574128aff5166152236e69989540d50716c7f9c99e7cc764d3d36be6f946313
|
|
| MD5 |
4a84ac8dbb31f0550967f20a26ed7ce7
|
|
| BLAKE2b-256 |
cb8dfc702b576814e04eba6dd209569b33a8e4fcec3dcb9fb7a36a4a3e5f7def
|
Provenance
The following attestation bundles were made for llama_cpp_bin-9763.0.0.tar.gz:
Publisher:
build-everything.yml on vladlearns/llama-cpp-bin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llama_cpp_bin-9763.0.0.tar.gz -
Subject digest:
9574128aff5166152236e69989540d50716c7f9c99e7cc764d3d36be6f946313 - Sigstore transparency entry: 1919811721
- Sigstore integration time:
-
Permalink:
vladlearns/llama-cpp-bin@1bcea6d8ee6f3c72bc462dd90efb8549fc64dd0b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vladlearns
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-everything.yml@1bcea6d8ee6f3c72bc462dd90efb8549fc64dd0b -
Trigger Event:
workflow_dispatch
-
Statement type: