Skip to main content

llama.cpp server binary built from source

Project description

llama-cpp-bin

Pre-built llama.cpp server binaries as a py package. Install a wheel for your platform and run it.

Install

Pre-built wheels (recommended)

pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cpu llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu131 llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/rocm llama-cpp-bin
pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/vulkan llama-cpp-bin

Pin to a specific version:

pip install --index-url https://vladlearns.github.io/llama-cpp-bin/whl/cu124 llama-cpp-bin==9095.0.0

PyPI (builds from source)

If no pre-built wheel matches your platform, pip falls back to building from the sdist on PyPI:

pip install llama-cpp-bin

You will need CMake, a c++ compiler, and the llama.cpp source submodule.

Dev

git clone --recurse-submodules https://github.com/vladlearns/llama-cpp-bin
cd llama-cpp-bin
CMAKE_ARGS="-DGGML_CUDA=ON" pip install -v .

Run

CLI:

llama-cpp-server -m your-model.gguf --port 8080

Python:

from llama_cpp_bin import run_server
proc = run_server("your-model.gguf", port=8080)
proc.wait()

Or get the binary path and run it yourself:

import llama_cpp_bin
import subprocess
binary = llama_cpp_bin.get_binary_path()
subprocess.Popen([binary, "--model", "your-model.gguf"])

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_cpp_bin-9305.0.0.tar.gz (4.2 MB view details)

Uploaded Source

File details

Details for the file llama_cpp_bin-9305.0.0.tar.gz.

File metadata

  • Download URL: llama_cpp_bin-9305.0.0.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llama_cpp_bin-9305.0.0.tar.gz
Algorithm Hash digest
SHA256 8663f6d84a183b8b287b7cbb5559171c7b63c9a01f883d36a0375c91ac4f87a9
MD5 98546e0ddf44a096392dcaf3c0d30fa8
BLAKE2b-256 e40dd9e3eeb1d5a4e4b290e59a5394b8dfa882c2c3550f4b5ef22e77b2a2da80

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_cpp_bin-9305.0.0.tar.gz:

Publisher: build-everything.yml on vladlearns/llama-cpp-bin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page