提交 dda5846d authored 作者: Frederic Bastien's avatar Frederic Bastien

small fix following review

上级 7b1ec344
...@@ -21,7 +21,7 @@ Département d'Informatique et de Recherche Opérationnelle \newline ...@@ -21,7 +21,7 @@ Département d'Informatique et de Recherche Opérationnelle \newline
Université de Montréal \newline Université de Montréal \newline
Montréal, Canada \newline Montréal, Canada \newline
\texttt{bastienf@iro.umontreal.ca} \newline \newline \texttt{bastienf@iro.umontreal.ca} \newline \newline
Presentation prepared with Pierre-Luc Carrier, KyungHyun Cho and \newline Presentation prepared with Pierre Luc Carrier, KyungHyun Cho and \newline
Çağlar Gülçehre Çağlar Gülçehre
} }
...@@ -42,7 +42,7 @@ Presentation prepared with Pierre-Luc Carrier, KyungHyun Cho and \newline ...@@ -42,7 +42,7 @@ Presentation prepared with Pierre-Luc Carrier, KyungHyun Cho and \newline
\begin{frame} \begin{frame}
\frametitle{Task} \frametitle{Task}
This is a classification task where we need to tell if the review was This is a classification task where we need to tell if the movie review was
positive or negative. positive or negative.
We use the IMDB dataset. We use the IMDB dataset.
...@@ -151,7 +151,7 @@ We use the IMDB dataset. ...@@ -151,7 +151,7 @@ We use the IMDB dataset.
\begin{itemize} \begin{itemize}
\item Syntax as close to NumPy as possible \item Syntax as close to NumPy as possible
\item Compiles most common expressions to C for CPU and/or GPU \item Compiles most common expressions to C for CPU and/or GPU
\item Limited expressivity means more opportunities optimizations \item Limited expressivity means more opportunities for optimizations
\begin{itemize} \begin{itemize}
\item No subroutines -> global optimization \item No subroutines -> global optimization
\item Strongly typed -> compiles to C \item Strongly typed -> compiles to C
...@@ -445,7 +445,7 @@ array(3.0) ...@@ -445,7 +445,7 @@ array(3.0)
\begin{itemize} \begin{itemize}
\item It’s hard to do much with purely functional programming \item It’s hard to do much with purely functional programming
\item ``shared variables'' add just a little bit of imperative programming \item ``shared variables'' add just a little bit of imperative programming
\item A “shared variable” is a buffer that stores a numerical value for a Theano variable \item A ``shared variable'' is a buffer that stores a numerical value for a Theano variable
\item Can write to as many shared variables as you want, once each, at the end of the function \item Can write to as many shared variables as you want, once each, at the end of the function
\item Modify outside Theano function with get\_value() and set\_value() methods. \item Modify outside Theano function with get\_value() and set\_value() methods.
\end{itemize} \end{itemize}
...@@ -517,7 +517,7 @@ modes regard as fine. ...@@ -517,7 +517,7 @@ modes regard as fine.
\begin{itemize} \begin{itemize}
\item Theano current back-end only supports 32 bit on GPU \item Theano current back-end only supports 32 bit on GPU
\item libgpuarray (new-backend) support all dtype \item libgpuarray (new-backend) support all dtype
\item CUDA supports 64 bit, but is slow in gamer card \item CUDA supports 64 bit, but is slow on gamer GPUs
\item T.fscalar, T.fvector, T.fmatrix are all 32 bit \item T.fscalar, T.fvector, T.fmatrix are all 32 bit
\item T.scalar, T.vector, T.matrix resolve to 32 bit or 64 bit depending on theano’s floatX flag \item T.scalar, T.vector, T.matrix resolve to 32 bit or 64 bit depending on theano’s floatX flag
\item floatX is float64 by default, set it to float32 \item floatX is float64 by default, set it to float32
...@@ -864,9 +864,9 @@ print f([0, 1, 2]) ...@@ -864,9 +864,9 @@ print f([0, 1, 2])
\begin{frame} \begin{frame}
\frametitle{Scan} \frametitle{Scan}
\begin{itemize} \begin{itemize}
\item Allow looping (for, map, while) \item Allows looping (for, map, while)
\item Allow recursion (reduce) \item Allows recursion (reduce)
\item Allow recursion with dependency on many of the previous time step \item Allows recursion with dependency on many of the previous time steps
\item Optimize some cases like moving computation outside of scan \item Optimize some cases like moving computation outside of scan
\item The Scan grad is done via Backpropagation Through Time(BPTT) \item The Scan grad is done via Backpropagation Through Time(BPTT)
\end{itemize} \end{itemize}
...@@ -874,11 +874,11 @@ print f([0, 1, 2]) ...@@ -874,11 +874,11 @@ print f([0, 1, 2])
\begin{frame}{When not to use scan} \begin{frame}{When not to use scan}
\begin{itemize} \begin{itemize}
\item If you only need for ``vectorization'' or \item If you only need it for ``vectorization'' or
``broadcasting''. tensor and numpy.ndarray support them ``broadcasting''. tensor and numpy.ndarray support them
natively. This will be much better for that use case. natively. This will be much better for that use case.
\item You do a fixed number of iteration that is very small (2,3). You \item If you do a fixed number of iteration that is very small (2,3). You
are probably better to just unroll the graph to do it. are probably better to just unroll the graph to do it.
\end{itemize} \end{itemize}
...@@ -1120,7 +1120,7 @@ o_t = \sigma(W_o x_t + U_o h_{t-1} + b_1) ...@@ -1120,7 +1120,7 @@ o_t = \sigma(W_o x_t + U_o h_{t-1} + b_1)
\begin{frame} \begin{frame}
\frametitle{Implementation Note} \frametitle{Implementation Note}
Implementation note : In the code included this tutorial, the equations (1), (2), (3) and (7) are performed in parallel to make the computation more efficient. This is possible because none of these equations rely on a result produced by the other ones. It is achieved by concatenating the four matrices $W_*$ into a single weight matrix W and performing the same concatenation on the weight matrices $U_*$ to produce the matrix U and the bias vectors $b_*$ to produce the vector b. Then, the pre-nonlinearity activations can be computed with : In the code included this tutorial, the equations (1), (2), (3) and (7) are performed in parallel to make the computation more efficient. This is possible because none of these equations rely on a result produced by the other ones. It is achieved by concatenating the four matrices $W_*$ into a single weight matrix W and performing the same concatenation on the weight matrices $U_*$ to produce the matrix U and the bias vectors $b_*$ to produce the vector b. Then, the pre-nonlinearity activations can be computed with :
\vspace{-1em} \vspace{-1em}
\begin{equation*} \begin{equation*}
z = \sigma(W x_t + U h_{t-1} + b) z = \sigma(W x_t + U h_{t-1} + b)
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论