Unofficial implementation of Momentum Low-Rank Compression (MLorc) for memory-efficient LLM fine-tuning
Project description
MLorc - Momentum Low-Rank Compression for Memory-Efficient LLM Fine-tuning
Unofficial implementation of "MLorc: Momentum Low-rank Compression for Large Language Model Adaptation"
This repository introduces MLorc (Momentum Low-rank Compression), a novel and highly memory-efficient paradigm designed to significantly reduce the memory footprint of full-parameter fine-tuning for large language models. Based on the paper "MLorc: Momentum Low-rank Compression for Large Language Model Adaptation" this method offers a compelling alternative to existing memory-efficient techniques.
How MLorc Works
MLorc's core innovation lies in its approach to momentum compression and reconstruction:
- Direct Momentum Compression: It directly compresses and reconstructs both first and second-order momentum using Randomized SVD (RSVD) at each optimization step.
- Adaptive Second-Order Momentum Handling: To ensure stability, especially for non-negative second-order momentum, MLorc adaptively adds a small constant to zero values introduced by ReLU during reconstruction.
Key Advantages of MLorc
MLorc is broadly applicable to any momentum-based optimizer (e.g., Adam, Lion) and delivers superior performance:
- State-of-the-Art Performance: Empirically, MLorc consistently outperforms other memory-efficient methods like LoRA and GaLore in terms of validation accuracy. It can even match or exceed the performance of full fine-tuning with a small rank (e.g.,
rank=4). - Memory and Time Efficiency: It maintains comparable memory efficiency to LoRA while demonstrating improved time efficiency compared to GaLore.
- Theoretical Guarantees: MLorc offers a theoretical guarantee for convergence, matching the convergence rate of the original Lion optimizer under reasonable assumptions.
Included MLorc-Integrated Optimizers
This repository integrates MLorc into six momentum-based optimizers, each with additional enhancements for improved performance and stability:
-
MLorc_AdamW: AdamW with MLorc compression, featuring:- Fused Backward Pass
- Gradient Descent with Adaptive Momentum Scaling (Grams): For better performance and faster convergence.
atan2smoothing & scaling: A robust replacement foreps(no tuning required), which also incorporates gradient clipping. (If enabled,epsis ignored.)- OrthoGrad: Prevents "naïve loss minimization" (NLM) that can lead to overfitting by removing the gradient component parallel to the weight, thus improving generalization
-
MLorc_Prodigy:- Same Features as
MLorc_AdamW - Incorporates MLorc with the Prodigy adaptive method and its associated features.
- Same Features as
-
MLorc_Lion: Lion with MLorc compression, featuring:- Fused Backward Pass
- OrthoGrad
use_cautious: use the cautious varaint of Lion.clip_threshold: whether to clip the gradients norm per-parameter as proposed in the paper Lions and Muons: Optimization via Stochastic Frank-Wolfe to make Lion more stable (default: 5.0, from the paper).
-
MLorc_DAdapt_Lion:- Same Features as
MLorc_Lion - Integrates MLorc with the DAdaptation adaptive method for LION, and includes the slice_p feature (from Prodigy).
- Same Features as
-
MLorc_Adopt:- Same Features as
MLorc_AdamW - Implements the method of ADOPT: Modified Adam Can Converge with Any β_2 with the Optimal Rate.
- Same Features as
-
MLorc_CAME:- Same Features as
MLorc_AdamW - The first moment (momentum) is compressed using the low-rank factorization from MLorc, while the adaptive pre-conditioning and confidence-guided updates are from CAME: Confidence-guided Adaptive Memory Efficient Optimization.
- Same Features as
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file MLorc_optim-0.1.2.tar.gz.
File metadata
- Download URL: MLorc_optim-0.1.2.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3d958469faa03b79aa60ac7a34163bb4802099f0251d278c4d98998b2165831
|
|
| MD5 |
856fdb9c59f8445e1dfc90e64abf7b68
|
|
| BLAKE2b-256 |
592d41a03213ce6e5e5b1fb3c0536433467defbd5f9b9430a607da6148753202
|
File details
Details for the file MLorc_optim-0.1.2-py3-none-any.whl.
File metadata
- Download URL: MLorc_optim-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
278f86be4b8815964eeef823b499d0282cb50468aa7aa134211405d6fc62fad4
|
|
| MD5 |
6d93b82ebf8ed1a777a0ba3fda9a4a2f
|
|
| BLAKE2b-256 |
e39a4f29a9d033c8a19d88bf6dbf3bc72b9b359fd710961948c65d59c6bf5359
|