Skip to main content

From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption

Project description

smol-vllm

From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption. Pure Python, no external deps.

Install

pip install smol-vllm

Or from source:

pip install .

Usage

from smol_vllm import LLMEngine

engine = LLMEngine(num_gpu_blocks=64, block_size=16, max_batch_size=8)

# Single request (streaming)
for token in engine.generate([1, 2, 3, 4, 5], max_tokens=20):
    print(token, end=" ")

# Batched: add requests and step
engine.add_request([10, 20, 30], max_tokens=10)
engine.add_request([40, 50, 60], max_tokens=10)
while True:
    outputs = engine.step()
    for out in outputs:
        print(out.output_tokens)
    if all(o.finished for o in outputs):
        break

Demo

pip install smol-vllm
smol-vllm-demo

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smol_vllm-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smol_vllm-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file smol_vllm-0.1.0.tar.gz.

File metadata

  • Download URL: smol_vllm-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for smol_vllm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 80a13e1e4609b502af1158fddc91232d3d9841cc868639bb13203ba9f0e691df
MD5 46d223e517c912f773eb177f25e8a5da
BLAKE2b-256 d59071cf22d077c9f1648b474c8e00802139f265c855da671f195537959da791

See more details on using hashes here.

File details

Details for the file smol_vllm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smol_vllm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for smol_vllm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b9a5f524104f23a4e04a14ef8b406c1185ad9a61931b321862126ef948ebedd3
MD5 8d0fdc23382c8bdf9f55eac350280f96
BLAKE2b-256 b14f417d10c300e6f853c00f920ce2a5293195b7edb94216c93c21f5f30c2a57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page