2 projects
thaw-vllm
The fork primitive for LLM inference. Snapshot a running session — weights + KV cache + scheduler state — and hydrate it into N divergent children that skip prefill. For RL rollouts, parallel coding agents, agent branching. Supports vLLM and SGLang.
thaw-native
Rust+CUDA native extension for thaw (pipelined DMA freeze/restore)