Last released Dec 27, 2023
AirLLM allows single 4GB GPU card to run 70B large language models without quantization, distillation or pruning.
Supported by