Xavier Initialization
We first observe the influence of the non-linear activations functions. Xavier Glorot and Yoshua Bengio, "Understanding the difficulty of training deep feedforward neural networks" http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf デ…
Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the…
All these experimental results were obtained with new initialization or training mechanisms. Xavier Glorot and Yoshua Bengio, "Understanding the difficulty of training deep feedforward neural networks" http://proceedings.mlr.press/v9/gloro…
Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs les…