6 projects
miniforge
High-performance MiniMax M2.7 inference library optimized for GMKtech M7
fastvq
TurboQuant: Extreme compression for AI models with near-optimal distortion rates
petallm
PetaLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning. 8GB vmem to run 405B Llama3.1.
kimi-k2-optimizer
Kimi K2.5 (1.1T) optimization suite for RTX 3090 with aggressive RAM optimization
zeroquant
Zero-config model quantization for notebooks and Python
ommi-llm
Run 70B+ LLMs on consumer GPUs with layer-wise inference