Last released Mar 21, 2025
GenZ is designed to simplify the relationship between the hardware platform used for serving Large Language Models(LLMs) and inference serving metrics like latency and memory.
Supported by