RNN gradient signal can end up being multiplied a large number of times (as many as the number of timesteps
This means that, the magnitude of weights in the transition matrix can have a strong impact on the learning process.
\begin{itemize}
\item\begin{bf}vanishing gradients\end{bf}
If the weights in this matrix are small (or, more formally, if the leading eigenvalue of the weight matrix is smaller than 1.0).
\item\begin{bf}exploding gradients\end{bf} If the weights in this matrix are large (or, again, more formally, if the leading eigenvalue of the weight matrix is larger than 1.0),
\end{itemize}
\end{frame}
\begin{frame}
\frametitle{History}
\begin{itemize}
\item Original version introduced in 1997 by Hochreiter, S., & Schmidhuber, J.
\item Forget gate introduced in 2000 by Gers, F. A., Schmidhuber, J., & Cummins, F.
\item All people use know use Forget gate.
\end{itemize}
\end{frame}
\begin{frame}
\frametitle{}
\begin{itemize}
\end{itemize}
\end{frame}
\begin{frame}{Conclusion}
Theano/Pylearn2/libgpuarry provide an environment for machine learning that is:
\begin{bf}Fast to develop\end{bf}\newline
...
...
@@ -1000,6 +1038,8 @@ Theano/Pylearn2/libgpuarry provide an environment for machine learning that is: