A comprehensive collection of learning rate schedulers for PyTorch

These details have not been verified by PyPI

Project links

Project description

pytorch-scheduler

A comprehensive, research-driven collection of learning rate schedulers for PyTorch — with 17 ready-to-use schedulers, composable warmup wrappers, opinionated presets, and first-class paper references.

All Schedulers

Installation

pip install pytorch-scheduler

# With visualization support
pip install "pytorch-scheduler[viz]"

Which Scheduler Should I Use?

Training Scenario	Recommended Preset	Scheduler	Key Idea
LLM pre-training	`llm_pretrain`	WSD	Warmup → stable → cosine decay
LLM fine-tuning	`llm_finetune`	CosineWithWarmup	Short warmup + cosine decay
Vision fine-tuning	`vision_finetune`	CosineWithWarmup	Moderate warmup + cosine decay
Vision pre-training	`vision_pretrain`	WarmupHoldCosine	Warmup → hold → cosine decay
Transfer learning (small data)	`transfer_small_data`	SlantedTriangular	Short warmup + long linear decay
Fixed compute budget	`budgeted_training`	Rex	Budget-optimal allocation
HPO → full training	—	HyperbolicLR / ExpHyperbolicLR	Tune at small epochs, train at large epochs

Using Presets (Recommended)

import torch
from pytorch_scheduler import create_from_preset

model = torch.nn.Linear(10, 1)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)

# One line — the library picks the right scheduler and defaults
scheduler = create_from_preset(optimizer, 'llm_pretrain', total_steps=100000)

for step in range(100000):
    loss = train_step(model, optimizer)
    scheduler.step()

Using Schedulers Directly

from pytorch_scheduler import CosineWithWarmupScheduler, RexScheduler, WarmupScheduler

# Most common: cosine with warmup
scheduler = CosineWithWarmupScheduler(
    optimizer, total_steps=10000, warmup_steps=500, min_lr=1e-5
)

# Budget-optimal
scheduler = RexScheduler(optimizer, total_steps=10000)

# Compose any scheduler with warmup
base = RexScheduler(optimizer, total_steps=10000)
scheduler = WarmupScheduler(optimizer, base, warmup_steps=500, warmup_type="cosine")

Auto-Config from Training Plan

from pytorch_scheduler import create_scheduler_from_plan

# Automatically computes total_steps = epochs × steps_per_epoch ÷ grad_accum
scheduler = create_scheduler_from_plan(
    optimizer, "wsd",
    epochs=10, steps_per_epoch=500, grad_accum_steps=4,
    warmup_steps=100, stable_steps=500,
)

All Schedulers (17)

Scheduler	Description	Paper	Year
`CosineWithWarmupScheduler`	Linear warmup + cosine decay (no restart) — the modern default	—	—
`WarmupHoldCosineScheduler`	Warmup → hold at peak LR → cosine decay — three-phase schedule	—	—
`InverseSqrtScheduler`	Inverse square-root with built-in warmup (Transformer default)	Attention is All You Need	2017
`CosineAnnealingWarmupRestarts`	SGDR with per-cycle warmup and max-LR decay	SGDR: Stochastic Gradient Descent with Warm Restarts	2017
`TanhDecayScheduler`	Hyperbolic-tangent decay with steepness control	Online Learning Rate Adaptation with Hypergradient Descent	2018
`SlantedTriangularScheduler`	Short warmup + longer linear decay (ULMFiT)	Universal Language Model Fine-tuning for Text Classification	2018
`KDecayScheduler`	k-decay modifier on cosine schedule	k-decay: A New Method for Learning Rate Schedule	2020
`PowerDecayScheduler`	Power-law decay `step^{-alpha}` after warmup	Scaling Laws for Neural Language Models	2020
`RexScheduler`	Reciprocal decay `(1-t)/(1-t/2)` for budgeted training	Revisiting Budgeted Training with an Improved Schedule	2022
`LinearDecayScheduler`	Linear warmup then linear decay to `min_lr`	Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations	2024
`WSDScheduler`	Warmup–Stable–Decay with cosine/linear/sqrt decay	MiniCPM: Unveiling the Potential of Small Language Models	2024
`HyperbolicLRScheduler`	Hyperbolic decay curve, epoch-insensitive	HyperbolicLR: Epoch Insensitive Learning Rate Scheduler	2024
`ExpHyperbolicLRScheduler`	Exponential variant of hyperbolic decay	HyperbolicLR: Epoch Insensitive Learning Rate Scheduler	2024
`TrapezoidalScheduler`	Three-phase: warmup / constant / linear decay	—	—
`FlatCosineScheduler`	Flat phase at `base_lr` then cosine annealing	—	—
`PolynomialScheduler`	Polynomial decay with optional cycling	—	—
`ChebyshevScheduler`	Non-monotonic schedule using Chebyshev nodes	—	—

Scheduler Cards

CosineWithWarmup — The Modern Default

When to use: Fine-tuning LLMs/ViTs, general-purpose training

When NOT to use: Pre-training from scratch (consider WSD instead)

Key parameters: warmup_steps, min_lr

Shape: Linear ramp → smooth cosine decay

WSD (Warmup-Stable-Decay) — LLM Pre-training

When to use: Large-scale pre-training with known compute budget

When NOT to use: Short fine-tuning runs

Key parameters: warmup_steps, stable_steps, decay_type

Shape: Linear ramp → flat → cosine/linear decay

WarmupHoldCosine — Vision Pre-training

When to use: Pre-training ViTs on large image datasets; maximizes time at peak LR

When NOT to use: Quick fine-tuning experiments

Key parameters: warmup_steps, hold_steps, min_lr

Shape: Linear ramp → flat hold at peak → cosine decay

Rex — Budget-Optimal

When to use: Fixed compute budget, want optimal LR allocation

When NOT to use: When you need warmup (wrap with WarmupScheduler)

Key parameters: None beyond total_steps

Shape: Smooth reciprocal decay

HyperbolicLR / ExpHyperbolicLR — HPO-Friendly

When to use: Hyperparameter optimization (HPO) workflows where you tune at small epoch counts and then train the best config at large epoch counts. The epoch-insensitive decay curve means the LR schedule shape is preserved regardless of total epochs, so rankings from short HPO runs transfer reliably to full training.

When NOT to use: When your training length is fixed and you want precise control over decay timing

Key parameters: upper_bound, max_iter (HyperbolicLR, ExpHyperbolicLR)

Shape: Hyperbolic (linear variant) or exponential-hyperbolic decay — consistent shape across different epoch counts

Warmup Composition

Any scheduler can be wrapped with a warmup phase using WarmupScheduler:

from pytorch_scheduler import WarmupScheduler, KDecayScheduler

base = KDecayScheduler(optimizer, total_steps=10000, k=2.0)
scheduler = WarmupScheduler(
    optimizer,
    base_scheduler=base,
    warmup_steps=500,
    warmup_type="linear",  # "linear" | "cosine" | "exponential"
)

Warmup Types

Note: InverseSqrtScheduler has warmup built into its formula. Do not wrap it with WarmupScheduler — doing so would apply warmup twice.

Presets

from pytorch_scheduler import list_presets, get_preset_info, create_from_preset

# See all presets
print(list_presets())
# ['llm_pretrain', 'llm_finetune', 'vision_finetune', 'vision_pretrain', 'transfer_small_data', 'budgeted_training']

# Get details
info = get_preset_info('llm_pretrain')
print(info['description'])  # Warmup-Stable-Decay for large language model pretraining

# Create from preset with overrides
scheduler = create_from_preset(optimizer, 'llm_pretrain', total_steps=50000, decay_type='linear')

Experimental Features

SequentialComposer

Chain multiple schedulers at specified step milestones:

from pytorch_scheduler.experimental import SequentialComposer
from pytorch_scheduler import LinearDecayScheduler, RexScheduler

s1 = RexScheduler(optimizer, total_steps=500)
s2 = LinearDecayScheduler(optimizer, total_steps=500)

scheduler = SequentialComposer(
    optimizer,
    schedulers=[s1, s2],
    milestones=[500],  # switch to s2 at step 500
)

ScheduleFreeWrapper

Schedule-free optimization via online-to-batch conversion:

from pytorch_scheduler.experimental import ScheduleFreeWrapper

wrapper = ScheduleFreeWrapper(optimizer, warmup_steps=1000, beta=0.9)

for step in range(total_steps):
    wrapper.train()
    loss = model(x).sum()
    loss.backward()
    wrapper.step()
    optimizer.zero_grad()

# For evaluation
wrapper.eval()
val_loss = evaluate(model)

Reference: The Road Less Scheduled (Defazio et al., 2024)

Visualization

Plots use SciencePlots (science + nature style) automatically when installed. Falls back to default matplotlib style otherwise.

import torch
from pytorch_scheduler import RexScheduler, CosineAnnealingWarmupRestarts, WSDScheduler
from pytorch_scheduler.visualization import compare_schedules

optimizer = torch.optim.AdamW([torch.randn(2, requires_grad=True)], lr=0.1)
total_steps = 10000

fig = compare_schedules(
    {
        "Rex": RexScheduler(optimizer, total_steps=total_steps),
        "CosineAnnealing": CosineAnnealingWarmupRestarts(
            optimizer, first_cycle_steps=2000, warmup_steps=200,
            max_lr=0.1, min_lr=0.001, gamma=0.9,
        ),
        "WSD": WSDScheduler(
            optimizer, total_steps=total_steps,
            warmup_steps=500, stable_steps=5000,
        ),
    },
    total_steps=total_steps,
)
fig.savefig("comparison.png", dpi=300)

Visualization Example

Factory API

from pytorch_scheduler import create_scheduler, create_scheduler_from_plan, load_scheduler, get_supported_schedulers

# List available schedulers
print(get_supported_schedulers())            # all 17
print(get_supported_schedulers("*cosine*"))  # pattern matching

# Load by name
cls = load_scheduler("rex")
scheduler = cls(optimizer, total_steps=10000)

# Create directly
scheduler = create_scheduler(optimizer, "wsd", total_steps=10000, warmup_steps=500, stable_steps=3000)

# Auto-compute total_steps from training plan
scheduler = create_scheduler_from_plan(
    optimizer, "wsd",
    epochs=10, steps_per_epoch=500, grad_accum_steps=4,
    warmup_steps=100, stable_steps=500,
)

API Reference

All schedulers follow PyTorch's LRScheduler protocol:

scheduler.step()              # advance one step
scheduler.get_last_lr()       # current LR(s)
scheduler.state_dict()        # serialize state
scheduler.load_state_dict()   # restore state

# Pure functional interface — compute LR at any step without side effects
lr = scheduler._lr_at(step, base_lrs)

Every scheduler exposes paper metadata:

scheduler.paper_title  # str
scheduler.paper_url    # str
scheduler.paper_year   # int

Step-semantics metadata:

scheduler.step_unit         # 'step' | 'epoch'
scheduler.needs_total_steps  # bool

Acknowledgments

This project is inspired by pytorch-optimizer — a fantastic collection of optimization algorithms for PyTorch with paper references and clean API design. If you're looking for optimizers to pair with these schedulers, check it out!

License

Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Mar 24, 2026

0.2.1

Mar 14, 2026

This version

0.2.0

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch_scheduler-0.2.0.tar.gz (1.1 MB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytorch_scheduler-0.2.0-py3-none-any.whl (43.3 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file pytorch_scheduler-0.2.0.tar.gz.

File metadata

Download URL: pytorch_scheduler-0.2.0.tar.gz
Upload date: Mar 14, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytorch_scheduler-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8427018236c7cc16c495ba2ca491c245ce049326385b39a911cbe1b9119ec6ff`
MD5	`640b60890cb7b913a12c05e9e5766886`
BLAKE2b-256	`84219d71a0d69c4c02d2fb0b328b140d0e473af386ac01e4b792db5c909a2843`

See more details on using hashes here.

File details

Details for the file pytorch_scheduler-0.2.0-py3-none-any.whl.

File metadata

Download URL: pytorch_scheduler-0.2.0-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 43.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.6 {"installer":{"name":"uv","version":"0.10.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"CachyOS Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytorch_scheduler-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ccbdaf39b229e623883290a4640416db1667e644342d368754717fc3e639dfff`
MD5	`4f9230906eb72da759028295be32a732`
BLAKE2b-256	`2b9e62c8ba90a7b8baea58687f63e242b6bfbf5f00350a4ad8c8b556e33a43e4`

See more details on using hashes here.

pytorch-scheduler 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytorch-scheduler

Installation

Which Scheduler Should I Use?

Using Presets (Recommended)

Using Schedulers Directly

Auto-Config from Training Plan

All Schedulers (17)

Scheduler Cards

CosineWithWarmup — The Modern Default

WSD (Warmup-Stable-Decay) — LLM Pre-training

WarmupHoldCosine — Vision Pre-training

Rex — Budget-Optimal

HyperbolicLR / ExpHyperbolicLR — HPO-Friendly

Warmup Composition

Presets

Experimental Features

SequentialComposer

ScheduleFreeWrapper

Visualization

Factory API

API Reference

Acknowledgments

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes