Skip to main content

An educational implementation of an inference engine

Project description

mini-vllm

A minimal implementation of vLLM's core ideas: PagedAttention and continuous batching.

Installation

pip install mini-vllm

Requirements: Python 3.10+, CUDA-capable GPU

Quick Start

from mini_vllm import LLMEngine

# Initialize the engine
engine = LLMEngine(
    model_name="meta-llama/Llama-3.2-1B",
    block_size=16,
    num_gpu_blocks=100
)

# Add a request
req_id = engine.add_request("The meaning of life is")

# Generate tokens
while True:
    outputs = engine.step()
    if not outputs:
        break
    
    # Check if generation is complete
    if req_id in outputs:
        print(outputs[req_id])

Benchmarks

Hardware: NVIDIA A100 (Modal)
Model: meta-llama/Llama-3.2-1B
Max tokens per request: 50
Prompt: "The meaning of life is"

mini-vllm Performance

Batch Size Duration Total Tokens Throughput
1 4.59s 50 10.90 tokens/sec
4 1.01s 250 248.48 tokens/sec
16 1.20s 1050 872.23 tokens/sec

Comparison with vLLM

Batch Size mini-vllm vLLM Ratio (vLLM/mini)
1 10.90 tokens/sec 213.73 tokens/sec 19.6x
4 248.48 tokens/sec 977.46 tokens/sec 3.9x
16 872.23 tokens/sec 3510.41 tokens/sec 4.0x

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mini_vllm-0.1.1.tar.gz (109.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mini_vllm-0.1.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file mini_vllm-0.1.1.tar.gz.

File metadata

  • Download URL: mini_vllm-0.1.1.tar.gz
  • Upload date:
  • Size: 109.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mini_vllm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5ea441eebf39642db295b3c60052dacc555f46b72a8cdc71e2d345e0f129bbe5
MD5 31ec1fff4ddeba33efddddcd5742eed4
BLAKE2b-256 ff741a19976949f5ca7509c5af0b9ccb3576ffceb89bb6421ec3fd3ee29b33c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for mini_vllm-0.1.1.tar.gz:

Publisher: publish.yml on ubermenchh/mini-vllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mini_vllm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mini_vllm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mini_vllm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 90c83ef9da5251e97164d0fe1e60963f39af02b8562753bc19dd81aff8c187df
MD5 7c3728c355918b6af6cc1673aad7d6be
BLAKE2b-256 2a66377e95f3ac22046f4d76747808a5f0a374e19f30dd05c03f6c23b93ae801

See more details on using hashes here.

Provenance

The following attestation bundles were made for mini_vllm-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ubermenchh/mini-vllm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page