Last released Jun 30, 2025
llama-optimus is a lightweight Python tool to automatically optimize llama.cpp performance flags for maximum tg & pp token/s throughput. Powered by Bayesian optimization with Optuna
Supported by