5 projects
vllm-cpu
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-cpu-amxbf16
vLLM CPU inference engine (AVX512 + VNNI + BF16 + AMX optimized)
vllm-cpu-avx512bf16
vLLM CPU inference engine (AVX512 + VNNI + BF16 optimized)
vllm-cpu-avx512vnni
vLLM CPU inference engine (AVX512 + VNNI optimized)
vllm-cpu-avx512
vLLM CPU inference engine (AVX512 optimized)