2 projects
lsglang
SGLang is a fast serving framework for large language models and vision language models.
lk-moe
lk_moe is a special NUMA extension of vllm that makes full use of CPU and memory resources, reduces GPU memory requirements, and features an efficient GPU parallel and NUMA parallel architecture, supporting hybrid inference for MOE large models.