From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption
Project description
smol-vllm
From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption. Pure Python, no external deps.
Install
pip install smol-vllm
Or from source:
pip install .
Usage
from smol_vllm import LLMEngine
engine = LLMEngine(num_gpu_blocks=64, block_size=16, max_batch_size=8)
# Single request (streaming)
for token in engine.generate([1, 2, 3, 4, 5], max_tokens=20):
print(token, end=" ")
# Batched: add requests and step
engine.add_request([10, 20, 30], max_tokens=10)
engine.add_request([40, 50, 60], max_tokens=10)
while True:
outputs = engine.step()
for out in outputs:
print(out.output_tokens)
if all(o.finished for o in outputs):
break
Demo
pip install smol-vllm
smol-vllm-demo
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
smol_vllm-0.1.0.tar.gz
(8.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smol_vllm-0.1.0.tar.gz.
File metadata
- Download URL: smol_vllm-0.1.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80a13e1e4609b502af1158fddc91232d3d9841cc868639bb13203ba9f0e691df
|
|
| MD5 |
46d223e517c912f773eb177f25e8a5da
|
|
| BLAKE2b-256 |
d59071cf22d077c9f1648b474c8e00802139f265c855da671f195537959da791
|
File details
Details for the file smol_vllm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: smol_vllm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9a5f524104f23a4e04a14ef8b406c1185ad9a61931b321862126ef948ebedd3
|
|
| MD5 |
8d0fdc23382c8bdf9f55eac350280f96
|
|
| BLAKE2b-256 |
b14f417d10c300e6f853c00f920ce2a5293195b7edb94216c93c21f5f30c2a57
|