Shampoo (Second order Optimizer for Deep Learning) Optax Optimizer
Project description
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs.
Here we present a scalable implementation of a second-order preconditioning method (concretely, a variant of full-matrix Adagrad) that provides significant convergence and wall-clock time improvements compared to conventional first-order methods on state-of-the-art deep models.
Paper preprints: https://arxiv.org/abs/2002.09018
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for optax_shampoo-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6403875978f2a0183ce5c622addbd26a47a4771d8476879792bcaaf2dc538c6c |
|
MD5 | a4078174b288aa771d755692514adb61 |
|
BLAKE2b-256 | d0029f9676c643af72c522e844d7f05ca6c068123f734ccd5227503fd962c5ad |