a lightweight vLLM implementation built from scratch

Project description

Nano-vLLM

A lightweight vLLM implementation built from scratch.

Key Features

🚀 Fast offline inference - Comparable inference speeds to vLLM
📖 Readable codebase - Clean implementation in ~ 1,200 lines of Python code
⚡ Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.

Installation

pip install git+https://github.com/GeeeekExplorer/nano-vllm.git

Model Download

To download the model weights manually, use the following command:

huggingface-cli download --resume-download Qwen/Qwen3-0.6B \
  --local-dir ~/huggingface/Qwen3-0.6B/ \
  --local-dir-use-symlinks False

Quick Start

See example.py for usage. The API mirrors vLLM's interface with minor differences in the LLM.generate method:

from nanovllm import LLM, SamplingParams
llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
prompts = ["Hello, Nano-vLLM."]
outputs = llm.generate(prompts, sampling_params)
outputs[0]["text"]

Benchmark

See bench.py for benchmark.

Test Configuration:

Hardware: RTX 4070 Laptop (8GB)
Model: Qwen3-0.6B
Total Requests: 256 sequences
Input Length: Randomly sampled between 100–1024 tokens
Output Length: Randomly sampled between 100–1024 tokens

Performance Results:

Inference Engine	Output Tokens	Time (s)	Throughput (tokens/s)
vLLM	133,966	98.37	1361.84
Nano-vLLM	133,966	93.41	1434.13

Star History

Project details

Release history Release notifications | RSS feed

This version

20260210

Feb 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nano_vllm_fork-20260210-py3-none-any.whl (28.6 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file nano_vllm_fork-20260210-py3-none-any.whl.

File metadata

Download URL: nano_vllm_fork-20260210-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 28.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for nano_vllm_fork-20260210-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05b49af60882bec08d2fb530ea23cecd529f14713fae1f9ac45f18e5638a4881`
MD5	`7c7cba3dc55ea889b58d490c14d3ad9e`
BLAKE2b-256	`e41c8ec85388d62ad3a2862d1b0b2651a63e10b6b5e947896d2012d40be981da`

See more details on using hashes here.

nano-vllm-fork 20260210

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta