<F.o.R.だけではうまく解釈できない可能性のある英文>
Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the…
During training, dropout samples from an exponential number of different “thinned” networks. Nitish Srivastava, et al., "Dropout: A Simple Way to Prevent Neural Networks from Overfitting" http://jmlr.org/papers/volume15/srivastava14a/sriva…
Our contributions is two-fold. Diederik P Kingma, et al., "Auto-Encoding Variational Bayes" https://arxiv.org/abs/1312.6114 通常の自己符号化器(Autoencoder)とは異なり、観測されたデータがある確率分布に基づいて生成されたと仮定する変分自己符号…
Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. Jacob Devlin, et al., "BERT: Pre-train…
We introduce two simple global hyperparameters that efficiently trade off between latency and accuracy. Andrew G. Howard, et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" https://arxiv.org/abs/17…
SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. Wei Liu, et al., "…