Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
- Greg Yang ,
- Edward J. Hu ,
- Igor Babuschkin ,
- Szymon Sidor ,
- David Farhi ,
- Jakub Pachocki ,
- Xiaodong Liu ,
- Weizhu Chen ,
- Jianfeng Gao
Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization (pip install mup
.
Publication Downloads
Maximal Update Parametrization (μP)
March 8, 2022
Maximal Update Parametrization (μP) and Hyperparameter Transfer (μTransfer), in association with the paper: Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer