Skip to main content

Out-of-tree GGUF quantization plugin for vLLM

Project description

vLLM GGUF Quantization Plugin

This plugin provides out-of-tree GGUF quantization support for vLLM after in-tree support deprecation (vllm-project/vllm#39583).

Installation

Prerequisites

  • CUDA toolkit or ROCm toolkit

We recommend uv for package management. If you don't have it installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

From Source

  1. Clone this repository:

    git clone https://github.com/vllm-project/vllm-gguf-plugin
    cd vllm-gguf-plugin
    
  2. Install the plugin in development mode:

    uv pip install -e . --torch-backend=auto
    

Or install directly:

uv pip install . --torch-backend=auto

Development

uv pip install -e .[dev] --torch-backend=auto
pre-commit install
pre-commit run --all-files

The same hooks also run in GitHub Actions on every push and pull request.

Usage

vllm serve Qwen/Qwen3-0.6B-GGUF:Q8_0 --tokenizer Qwen/Qwen3-0.6B

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_gguf_plugin-0.0.1.tar.gz (78.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_gguf_plugin-0.0.1-cp310-abi3-manylinux_2_28_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

File details

Details for the file vllm_gguf_plugin-0.0.1.tar.gz.

File metadata

  • Download URL: vllm_gguf_plugin-0.0.1.tar.gz
  • Upload date:
  • Size: 78.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for vllm_gguf_plugin-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c071d0bb5436f0123b6a66a226d046e0d07f1574dd78dcec7b40fe8ca2f79ec3
MD5 b24e2efed1aaef3c04f74acaba71fd8e
BLAKE2b-256 e599a5dfc548d9cf66dbf51974d66ecddf034c101f48a42651402fe9d0a8601a

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllm_gguf_plugin-0.0.1.tar.gz:

Publisher: release.yml on vllm-project/vllm-gguf-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vllm_gguf_plugin-0.0.1-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for vllm_gguf_plugin-0.0.1-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d05d6b72922e27fafbdf8cc583321346744bc1c62d486bab94b6bf76d98700cb
MD5 a78f06a64a620baef3413f446f8803ba
BLAKE2b-256 e9f8347c325e062cf6ed8bed571247d61db779878fae8ce0ef8bc3b06f87476e

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllm_gguf_plugin-0.0.1-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on vllm-project/vllm-gguf-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page