Last released Oct 30, 2024
GenZ is designed to simplify the relationship between the hardware platform used for serving Large Language Models(LLMs) and inference serving metrics like latency and memory.
Supported by