Distributed Shampoo (Second order Optimizer for Deep Learning) Optax Optimizer
Project description
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second order statistics of the data, are far less prevalent despite strong theoretical properties, due to their prohibitive computation, memory and communication costs.
Here we present a scalable implementation of a second-order preconditioning method (concretely, a variant of full-matrix Adagrad) that provides significant convergence and wall-clock time improvements compared to conventional first-order methods on state-of-the-art deep models.
Paper preprints: https://arxiv.org/abs/2002.09018
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for optax_shampoo-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 935bf236149365e6afbaef5de2384008c78cb5a7d2ac74b0c75181fadf14c6aa |
|
MD5 | 179f8497a911f496081b02a7683d8fb2 |
|
BLAKE2b-256 | 803b6609f0b3a98527a7fe2c2b4b48944b455ab65a3c9223e86d3cd5cf823a5e |