Second, we compute the value for $f_t$, the activation of the memory cells’ forget gates at time t :
(3)$f_t =\sigma(W_f x_t + U_f h_{t-1}+ b_f)$
\begin{equation}
f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f)
\end{equation}
\framebreak
Given the value of the input gate activation $i_t$, the forget gate activation $f_t$ and the candidate state value $\widetilde{C_t}$, we can compute $C_t$ the memory cells’ new state at time $t$ :
(4)$C_t = i_t *\widetilde{C_t}+ f_t * C_{t-1}$
\begin{equation}
C_t = i_t * \widetilde{C_t} + f_t * C_{t-1}
\end{equation}
With the new state of the memory cells, we can compute the value of their output gates and, subsequently, their outputs :
\item The model we used in this tutorial is a variation of the standard LSTM model. In this variant, the activation of a cell’s output gate does not depend on the memory cell’s state $C_t$. This allows us to perform part of the computation more efficiently (see the implementation note, below, for details). This means that, in the variant we have implemented, there is no matrix $V_o$ and equation (5) is replaced by equation (7) :
The model we used in this tutorial is a variation of the standard LSTM model. In this variant, the activation of a cell’s output gate does not depend on the memory cell’s state $C_t$. This allows us to perform part of the computation more efficiently (see the implementation note, below, for details). This means that, in the variant we have implemented, there is no matrix $V_o$ and equation (5) is replaced by equation (7) :
(7)$o_t =\sigma(W_o x_t + U_o h_{t-1}+ b_1)$
\begin{equation}
o_t = \sigma(W_o x_t + U_o h_{t-1} + b_1)
\end{equation}
\end{itemize}
\end{frame}
\begin{frame}
\frametitle{Implementation Note}
\begin{itemize}
\item Implementation note : In the code included this tutorial, the equations (1), (2), (3) and (7) are performed in parallel to make the computation more efficient. This is possible because none of these equations rely on a result produced by the other ones. It is achieved by concatenating the four matrices $W_*$ into a single weight matrix W and performing the same concatenation on the weight matrices $U_*$ to produce the matrix U and the bias vectors $b_*$ to produce the vector b. Then, the pre-nonlinearity activations can be computed with :
$z =\sigma(W x_t + U h_{t-1}+ b)$
Implementation note : In the code included this tutorial, the equations (1), (2), (3) and (7) are performed in parallel to make the computation more efficient. This is possible because none of these equations rely on a result produced by the other ones. It is achieved by concatenating the four matrices $W_*$ into a single weight matrix W and performing the same concatenation on the weight matrices $U_*$ to produce the matrix U and the bias vectors $b_*$ to produce the vector b. Then, the pre-nonlinearity activations can be computed with :
\vspace{-1em}
\begin{equation*}
z = \sigma(W x_t + U h_{t-1} + b)
\end{equation*}
\vspace{-2em}% don't remove the blank line
The result is then sliced to obtain the pre-nonlinearity activations for i, f, $\widetilde{C_t}$, and o and the non-linearities are then applied independently for each.
\end{itemize}
\end{frame}
\begin{frame}{LSTM Tips For Training}
\begin{itemize}
\item Do use use SGD, but use something like adagrad or rmsprop.
\item Do not use SGD, but use something like adagrad or rmsprop.
\item Initialize any recurrent weights as orthogonal matrices (orth\_weights). This helps optimization.
\item Take out any operation that does not have to be inside "scan".
Theano do many cases, but not all.
Theano does many cases, but not all.
\item Rescale (clip) the L2 norm of the gradient, if necessary.
\item You can use weight noise or dropout at the output of the recurrent layer for regularization.
\end{itemize}
\end{frame}
\begin{frame}
\frametitle{}
\begin{itemize}
\item a
\end{itemize}
\end{frame}
\section{Exercices}
\begin{frame}{Exercices}
\begin{itemize}
...
...
@@ -1135,6 +1154,8 @@ The result is then sliced to obtain the pre-nonlinearity activations for i, f, $
outputs of both LSTM to the logistic regression. (No solutions provided)
\end{itemize}
% I don't know how to fix this frame since it seems incomplete.
Deep Learning Tutorial on LSTM: \url{http://deeplearning.net/tutorial/lstm.html}