ZClip: Adaptive Spike Mitigation for LLM Pre-Training
Project description
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
Official PyTorch Lightning implementation of our paper:
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
Abhay Kumar, Louis Owen, Nilabhra Roy Chowdhury, Fabian Güra
BluOrion
🚀 Installation
You can install this package using pip:
Basic Installation
pip install git+https://github.com/bluorion-com/ZClip.git
With PyTorch Lightning Support
pip install "git+https://github.com/bluorion-com/ZClip.git#egg=zclip[lightning]"
🧠 Algorithm Overview
ZClip is an adaptive gradient clipping technique designed to mitigate gradient spikes by tracking running statistics of gradient norms through Exponential Moving Averages (EMA). At each training step, it updates the mean and variance of the gradient norm without storing historical data, allowing it to respond quickly to shifts in training dynamics.
When the current gradient norm deviates significantly from recent trends, ZClip dynamically computes a clipping threshold based on the observed variance. This approach automatically suppresses unusually large gradient updates—often the cause of loss spikes—without relying on fixed, manually-tuned thresholds.
By continuously adapting to the evolving scale and variability of gradients, ZClip ensures greater training stability and maintains learning efficiency, even under high learning rates or aggressive scheduling.
📚 Usage
Basic Usage
from zclip import ZClip
model = YourModel() # Your PyTorch model
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
# Initialize ZClip
zclip = ZClip(alpha=0.97, z_thresh=2.5)
# Training loop
for batch in dataloader:
# Forward and backward pass
loss = model(batch)
loss.backward()
# Apply ZClip before optimizer step
zclip.step(model)
# Update weights
optimizer.step()
optimizer.zero_grad()
PyTorch Lightning (with optional dependency)
from lightning import Trainer
from zclip import ZClipLightningCallback
# Create a Lightning Trainer with ZClip
trainer = Trainer(
callbacks=[
ZClipLightningCallback(alpha=0.97, z_thresh=2.5)
]
)
# Train your model
trainer.fit(model, dataloader)
📉 Example Impact
|
Training Loss |
Gradient Norm after Clipping |
⚙️ Implementation Details
Our code is built within the PyTorch Lightning framework, utilizing its callback system for seamless integration into the training pipeline. It is fully compatible with FSDP and requires no code changes to work out of the box.
You can also use ZClip directly with standard PyTorch by calling .step(model) after loss.backward() and before optimizer.step().
🔬 Testing & Development
ZClip comes with a comprehensive test suite to ensure reliability and correctness.
Running Tests
./run_tests.sh
Continuous Integration
We use circleci for continuous integration, which runs tests on every commit and pull request.
🧪 Usage
PyTorch
from zclip import ZClip
zclip = ZClip(mode="zscore", alpha=0.97, z_thresh=2.5, clip_option="adaptive_scaling", max_grad_norm=1.0, clip_factor=1.0)
for batch in dataloader:
optimizer.zero_grad()
loss = model(batch)
loss.backward()
zclip.step(model)
optimizer.step()
PyTorch Lightning
from zclip import ZClipLightningCallback
zclip_cb = ZClipLightningCallback(mode="zscore", alpha=0.97, z_thresh=2.5, clip_option="adaptive_scaling", max_grad_norm=1.0, clip_factor=1.0)
trainer = L.Trainer(
callbacks=[zclip_cb]
)
🔍 ZClip Parameters
| Argument | Description | Default |
|---|---|---|
mode |
Clipping mode. Options: • "zscore" – Uses z‑score based clipping. • "percentile" – Uses fixed threshold clipping defined as EMA mean plus (z_thresh × std). |
"zscore" |
z_thresh |
Threshold value. In "zscore" mode, it sets the z‑score threshold; in "percentile" mode, it is used as the multiplier for std. | 2.5 |
alpha |
EMA smoothing factor for updating the gradient norm statistics. | 0.97 |
clip_option |
(Only for "zscore" mode) Clipping strategy: • "adaptive_scaling" – Compute an adaptive threshold if the z‑score is high. • "mean" – Clip to the EMA mean. |
"adaptive_scaling" |
clip_factor |
Constant Multiplier for the adaptive scaling threshold. A value between 0.5 and 0.9 yields more aggressive clipping, while a higher value (default 1.0) is less aggressive. |
1.0 |
max_grad_norm |
Optional maximum gradient norm to limit the clipping threshold. | 1.0 |
warmup_steps |
Number of steps to collect gradient norms for initializing the EMA statistics. | 25 |
Aggressive Hyperparameter Settings
When training models with volatile gradients, noisy data, or when using curriculum learning strategies, more aggressive gradient clipping can be beneficial. In such scenarios, consider adjusting the following parameters:
-
alpha:
Thealphaparameter controls the smoothing of the EMA for gradient norm statistics. A lower value (e.g. around 0.90-0.95) makes the EMA more responsive to recent gradients, which can be beneficial for rapidly changing gradient distributions. However, setting it too low might introduce noise into the EMA estimate, so it must be balanced carefully. -
z_thresh:
You may also consider reducing thez_threshslightly (for example, from the default2.5to around 2.0) to tighten the criteria for clipping further. -
clip_factor:
Lowering theclip_factorto a value between 0.5 and 0.9 will reduce the adaptive threshold in the"adaptive_scaling"mode, resulting in more aggressive clipping. This can help stabilize training by curbing large gradient spikes.
These settings are particularly useful in scenarios where the gradient distribution is highly dynamic. Adjust and monitor these hyperparameters based on your specific model, dataset, and training dynamics to achieve optimal performance.
Citation
@misc{kumar2025zclipadaptivespikemitigation,
title={ZClip: Adaptive Spike Mitigation for LLM Pre-Training},
author={Abhay Kumar and Louis Owen and Nilabhra Roy Chowdhury and Fabian Güra},
year={2025},
eprint={2504.02507},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2504.02507},
}
📜 License
Apache-2.0 license
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zclip-1.0.0.tar.gz.
File metadata
- Download URL: zclip-1.0.0.tar.gz
- Upload date:
- Size: 358.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fd6098e4b7ba81884c2493b187cc87d527e2be5478e2c8d4d77ab85318e622f
|
|
| MD5 |
68f162a13723389061f9ad0951238cbd
|
|
| BLAKE2b-256 |
051ceaafbd39b958ee37c07e17263b45c3df53fe834e48d3f17f4557ab19a495
|
Provenance
The following attestation bundles were made for zclip-1.0.0.tar.gz:
Publisher:
ci-cd.yml on bluorion-com/ZClip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zclip-1.0.0.tar.gz -
Subject digest:
8fd6098e4b7ba81884c2493b187cc87d527e2be5478e2c8d4d77ab85318e622f - Sigstore transparency entry: 251558536
- Sigstore integration time:
-
Permalink:
bluorion-com/ZClip@d344952266ee3336db8ba8a25103b438261c4484 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/bluorion-com
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@d344952266ee3336db8ba8a25103b438261c4484 -
Trigger Event:
push
-
Statement type:
File details
Details for the file zclip-1.0.0-py3-none-any.whl.
File metadata
- Download URL: zclip-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4156306a92d9f1fe94f8d4785cabe02889a883cf725c596ef03bf7e8140e8e3
|
|
| MD5 |
a9f77388c31aec476a27bd6c58884db7
|
|
| BLAKE2b-256 |
7c9b5a168261fce9f35b1b37d2cc9053b3fdb41407905acc82e94ae1b554964d
|
Provenance
The following attestation bundles were made for zclip-1.0.0-py3-none-any.whl:
Publisher:
ci-cd.yml on bluorion-com/ZClip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
zclip-1.0.0-py3-none-any.whl -
Subject digest:
b4156306a92d9f1fe94f8d4785cabe02889a883cf725c596ef03bf7e8140e8e3 - Sigstore transparency entry: 251558538
- Sigstore integration time:
-
Permalink:
bluorion-com/ZClip@d344952266ee3336db8ba8a25103b438261c4484 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/bluorion-com
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@d344952266ee3336db8ba8a25103b438261c4484 -
Trigger Event:
push
-
Statement type: