Shampoo (Second order Optimizer for Deep Learning) Optax Optimizer
Project description
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs.
Here we present a scalable implementation of a second-order preconditioning method (concretely, a variant of full-matrix Adagrad) that provides significant convergence and wall-clock time improvements compared to conventional first-order methods on state-of-the-art deep models.
Paper preprints: https://arxiv.org/abs/2002.09018
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for optax_shampoo-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19bfee2c9a14eda670dcba63f210b07269269a6cbffbd1013a41be1fc9f239b0 |
|
MD5 | cd97872f9bab4acaa8da3d7564af32b3 |
|
BLAKE2b-256 | 0751044fa0ea3c5d197c0c4790c4d7abb82dbf01c8fa2246bbdf2a7fec4a02b4 |