Last released Mar 22, 2026
From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption
Last released Mar 9, 2026
Zero-code OpenTelemetry tracing for AI agents
Last released Feb 3, 2026
Relay: minimal LLM inference server for heterogeneous devices
Supported by