Last released Feb 28, 2026
Run 70B+ LLMs on a single 4GB GPU — no quantization required. Layer-streaming inference for consumer hardware.
Last released Dec 25, 2024
A Python backend framework inspired by NestJS
Supported by