a lightweight vLLM implementation built from scratch

These details have not been verified by PyPI

Project links

Homepage

Project description

Nano-vLLM

A lightweight vLLM implementation built from scratch.

Key Features

🚀 Fast offline inference - Comparable inference speeds to vLLM
📖 Readable codebase - Clean implementation in ~ 1,200 lines of Python code
⚡ Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.

Installation

pip install git+https://github.com/GeeeekExplorer/nano-vllm.git

Model Download

To download the model weights manually, use the following command:

huggingface-cli download --resume-download Qwen/Qwen3-0.6B \
  --local-dir ~/huggingface/Qwen3-0.6B/ \
  --local-dir-use-symlinks False

Quick Start

See example.py for usage. The API mirrors vLLM's interface with minor differences in the LLM.generate method:

Using local model path:

from wickyvllm import LLM, SamplingParams
llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
prompts = ["Hello, Nano-vLLM."]
outputs = llm.generate(prompts, sampling_params)
outputs[0]["text"]

Using Hugging Face model ID (auto-download):

from wickyvllm import LLM, SamplingParams
llm = LLM("Qwen/Qwen3-0.6B", enforce_eager=True, tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
prompts = ["Hello, Nano-vLLM."]
outputs = llm.generate(prompts, sampling_params)
outputs[0]["text"]

Benchmark

See bench.py for benchmark.

Attention Kernel Benchmark:

To benchmark the Triton attention kernel performance:

python benchmark_attention.py

Test Configuration:

Hardware: RTX 4070 Laptop (8GB)
Model: Qwen3-0.6B
Total Requests: 256 sequences
Input Length: Randomly sampled between 100–1024 tokens
Output Length: Randomly sampled between 100–1024 tokens

Performance Results:

Inference Engine	Output Tokens	Time (s)	Throughput (tokens/s)
vLLM	133,966	98.37	1361.84
Nano-vLLM	133,966	93.41	1434.13

Star History

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.6

Jan 25, 2026

0.1.5

Jan 24, 2026

0.1.4

Jan 24, 2026

0.1.3

Jan 24, 2026

0.1.2

Jan 24, 2026

0.1.1

Jan 24, 2026

0.1.0

Jan 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wicky_vllm-0.1.6.tar.gz (17.0 kB view details)

Uploaded Jan 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wicky_vllm-0.1.6-py3-none-any.whl (22.2 kB view details)

Uploaded Jan 25, 2026 Python 3

File details

Details for the file wicky_vllm-0.1.6.tar.gz.

File metadata

Download URL: wicky_vllm-0.1.6.tar.gz
Upload date: Jan 25, 2026
Size: 17.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wicky_vllm-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`162b4d83c60f1cb8fd80ac4c1833af07ce5b855cf57ddaddbe855b17c0e56123`
MD5	`dbff022e23697c3495ff00168eceb45c`
BLAKE2b-256	`d0d706f75dc7882297e046728fd0931e9bdc7087558d32f94059d7c6e0603df5`

See more details on using hashes here.

File details

Details for the file wicky_vllm-0.1.6-py3-none-any.whl.

File metadata

Download URL: wicky_vllm-0.1.6-py3-none-any.whl
Upload date: Jan 25, 2026
Size: 22.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for wicky_vllm-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`06046310477ea188eb4d9bbdb6efedcdca7cd285189c5573759fa2f7f9badede`
MD5	`12c839ee04199e6d5f07991fbe9e85b2`
BLAKE2b-256	`1f7a1046ead7205b43d51e68449ac10e5fa9ead83754344ae4cd991e1feeff44`

See more details on using hashes here.

wicky-vllm 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Nano-vLLM

Key Features

Installation

Model Download

Quick Start

Benchmark

Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes