Skip to main content

a lightweight vLLM implementation built from scratch

Project description

GeeeekExplorer%2Fnano-vllm | Trendshift

Nano-vLLM

A lightweight vLLM implementation built from scratch.

Key Features

  • 🚀 Fast offline inference - Comparable inference speeds to vLLM
  • 📖 Readable codebase - Clean implementation in ~ 1,200 lines of Python code
  • Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.

Installation

pip install git+https://github.com/GeeeekExplorer/nano-vllm.git

Model Download

To download the model weights manually, use the following command:

huggingface-cli download --resume-download Qwen/Qwen3-0.6B \
  --local-dir ~/huggingface/Qwen3-0.6B/ \
  --local-dir-use-symlinks False

Quick Start

See example.py for usage. The API mirrors vLLM's interface with minor differences in the LLM.generate method:

from wickyvllm import LLM, SamplingParams
llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
prompts = ["Hello, Nano-vLLM."]
outputs = llm.generate(prompts, sampling_params)
outputs[0]["text"]

Benchmark

See bench.py for benchmark.

Test Configuration:

  • Hardware: RTX 4070 Laptop (8GB)
  • Model: Qwen3-0.6B
  • Total Requests: 256 sequences
  • Input Length: Randomly sampled between 100–1024 tokens
  • Output Length: Randomly sampled between 100–1024 tokens

Performance Results:

Inference Engine Output Tokens Time (s) Throughput (tokens/s)
vLLM 133,966 98.37 1361.84
Nano-vLLM 133,966 93.41 1434.13

Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wicky_vllm-0.1.0.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wicky_vllm-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file wicky_vllm-0.1.0.tar.gz.

File metadata

  • Download URL: wicky_vllm-0.1.0.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wicky_vllm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 172b0a15920388c611a3aa80a09ae462bf6e4628fc5b6896ddf4ce02270a00f1
MD5 4e8796ae021284ed88138aa03fc47e26
BLAKE2b-256 25b214bdc8298b54d883c6aa82c82c44f53418dcd50ead4ed36dfe167aad0286

See more details on using hashes here.

File details

Details for the file wicky_vllm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wicky_vllm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wicky_vllm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1e1278769bdd085b7809a1b320462cd77c21d0d93b5fdd6659099f373d1a07f
MD5 fa11481ba3890fbd75b0208f7c00f259
BLAKE2b-256 7c7038bdb1ce34997c4d54afc3f7ca7c217776da3bac635700764bcaef7ac9e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page