James Bergstra, Olivier Breuleux, Frederic Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, Olivier Delalleau, Arnaud Bergeron, Josh Bleecher Snyder, Ian Goodfellow, Fran\c{c}ois Savard, Xavier Glorot, Douglas Eck, Dumitru Erhan, Michael Mandel, Philippe Hamel, Simon Lemieux, Thierry Bertin-Mahieux, Yoshua Bengio
Multi-Layer Perceptron: 60x784 matrix times 784x500 matrix, tanh, times 500x10 matrix, elemwise, then all in reverse for backpropagation
\begin{center}
\includegraphics[width=3.in]{pics/mlp.pdf}
\end{center}
}
\frame{
\frametitle{Benchmark Convolutional Network}
Convolutional Network: 256x256 images convolved with 6 7x7 filters, downsampled to 6x50x50, tanh, convolution with 16 6x7x7 filter, tanh, matrix multiply, elemwise, then in reverse
\item NVIDIA Quadro FX 580(71GF/s single), compute capability 1.1 (140\$ But 'profesionnal card'), 32 cores
\end{itemize}
%Device 0: "Quadro FX 580"
% Total amount of global memory: 536150016 bytes
% Multiprocessors x Cores/MP = Cores: 4 (MP) x 8 (Cores/MP) = 32 (Cores)
% Clock rate: 1.12 GHz
% Run time limit on kernels: Yes
% Compute mode: Default (multiple host
%threads can use this device simultaneously)
}
\frame{
\frametitle{Theano Exercices}
\begin{itemize}
\item Run the simple example
\item Run the real example
\item Modify your version to run in float32 with floatX.
\item Run your version on the CPU and GPU
\item Do you see a speed up with the GPU? Where does it come from?(Try to profile it)
\item Scan: modify the polynomial example to have the reduction done by scan
\end{itemize}
}
\section{PyCUDA}
\subsection{PyCUDA}
\frame{
\frametitle{Intro}
Authors: Andreas Kl\"{o}ckner
PyCUDA lets you access Nvidia's CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist. So what's so special about PyCUDA?
\begin{itemize}
\item Object cleanup tied to lifetime of objects (RAII, Resource Acquisition Is Initialization).
\begin{itemize}
\item Makes it much easier to write correct, leak- and crash-free code
\item PyCUDA knows about dependencies, too, so (for example) it won't detach from a context before all memory allocated in it is also freed
\end{itemize}
\item Convenience
\begin{itemize}
\item Abstractions to compile CUDA code from python pycuda.driver.SourceModule
\item A GPU memory buffer pycuda.gpuarray.GPUArray
\end{itemize}
\item Completeness
\begin{itemize}
\item Binding to all of CUDA's driver API
\end{itemize}
\item Automatic Error Checking
\begin{itemize}
\item All CUDA errors are automatically translated into Python exceptions