Last released Apr 22, 2024
A high-throughput and memory-efficient inference and serving engine for LLMs
Last released Apr 7, 2024
None
Supported by