提交 36eefee4 authored 作者: f0k's avatar f0k

Updated documentation to match current convolution code

上级 29024484
...@@ -25,64 +25,59 @@ ...@@ -25,64 +25,59 @@
.. note:: .. note::
As of October 21st, 2014, the default GPU image convolution As of October 21st, 2014, the default GPU image convolution
changed. Here is the algo: changed: By default, if `cuDNN <https://developer.nvidia.com/cuDNN>`_
is available, we will use it, otherwise we will fall back to using the
gemm version (slower then cuDNN in most cases, uses more memory, but
faster than the legacy version we used before).
- If we can use `cuDNN <https://developer.nvidia.com/cuDNN>`_, use it. Both cuDNN and the gemm version can be disabled using the Theano flags
- If not, use gemm version (slower then cuDNN, uses more memory). ``optimizer_excluding=conv_dnn`` and ``optimizer_excluding=conv_gemm``,
respectively. In this case, we will fall back to using the legacy
convolution code, which is slower, but does not require extra memory.
To verify that cuDNN is used, you can supply the Theano flag
``optimizer_including=cudnn``. This will raise an error if cuDNN is
unavailable.
If the users do not want the extra memory usage of the gemm It is not advised to ever disable cuDNN, as this is usually the fastest
version, they can enable the legacy code that is even slower, but option. Disabling the gemm version is only useful if cuDNN is unavailable
does not use extra memory. For this, use the Theano flag and you run out of GPU memory.
``optimizer_excluding=conv_gemm``.
There is no reason to use the legacy code or the gemm version if There are two other implementations: An FFT-based convolution integrated
cuDNN is available. into Theano, and an implementation by Alex Krizhevsky available via
Pylearn2. See the documentation below on how to use them.
2 other options:
- There is also the fft version that is the fastest in some cases,
but uses even more memory. It does not support striding to remove
computation and has some shapes restriction.
- There is also the cuda_convnet convolution in Pylearn2. It uses a
different memory layout, has shapes restrictions, but does not use
extra memory and is faster then the legacy convolution.
If you want to verify the usage of cuDNN, you can use the Theano
flag ``optimizer_including=cudnn``. This will raise an error if we
can't use cuDNN.
TODO: Give examples on how to use these things! They are pretty complicated. TODO: Give examples on how to use these things! They are pretty complicated.
- Convolution operators implemented: - Implemented operators for neural network 2D / image convolution:
- :func:`signal.conv2d <theano.tensor.signal.conv.conv2d>`. See note above.
- :func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`. - :func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
This is the standard operator for convolutional neural networks working This is the standard operator for convolutional neural networks working
with batches of multi-channel 2D images, available for CPU and GPU. with batches of multi-channel 2D images, available for CPU and GPU. It
Most of the more efficient GPU implementations listed below can be used computes a convolution, i.e., it flips the kernel.
as an automatic replacement for nnet.conv2d by enabling specific graph Most of the more efficient GPU implementations listed below can be
optimizations. It flip the kernel. inserted automatically as a replacement for nnet.conv2d via graph
optimizations. Some of these graph optimizations are enabled by default,
others can be enabled via Theano flags.
- :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` This - :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` This
is a GPU-only version of nnet.conv2d that uses an FFT transform is a GPU-only version of nnet.conv2d that uses an FFT transform
to perform the work. It flip the kernel as ``conv2d``. to perform the work. It flips the kernel just like ``conv2d``.
conv2d_fft should not be used directly as conv2d_fft should not be used directly as
it does not provide a gradient. Instead, use nnet.conv2d and it does not provide a gradient. Instead, use nnet.conv2d and
allow Theano's graph optimizer to replace it by the FFT version allow Theano's graph optimizer to replace it by the FFT version
by setting by setting 'THEANO_FLAGS=optimizer_including=conv_fft'
'THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full' in your environment. If enabled, it will take precedence over cuDNN
in your environement. This is not enabled by default because it and the gemm version. It is not enabled by default because it
has some restrictions on input and uses a lot more memory. Also has some restrictions on input and uses a lot more memory. Also
note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and
PyCUDA to run. To deactivate the FFT optimization on a specific PyCUDA to run. To deactivate the FFT optimization on a specific
nnet.conv2d while the optimization flags are active, you can set nnet.conv2d while the optimization flag is active, you can set
its ``version`` parameter to ``'no_fft'``. To enable it for just its ``version`` parameter to ``'no_fft'``. To enable it for just
one Theano function: one Theano function:
.. code-block:: python .. code-block:: python
mode = theano.compile.get_default_mode() mode = theano.compile.get_default_mode()
mode = mode.including('conv_fft_valid', 'conv_fft_full') mode = mode.including('conv_fft')
f = theano.function(..., mode=mode) f = theano.function(..., mode=mode)
...@@ -90,17 +85,18 @@ TODO: Give examples on how to use these things! They are pretty complicated. ...@@ -90,17 +85,18 @@ TODO: Give examples on how to use these things! They are pretty complicated.
Wrapper for an open-source GPU-only implementation of conv2d by Alex Wrapper for an open-source GPU-only implementation of conv2d by Alex
Krizhevsky, very fast, but with several restrictions on input and kernel Krizhevsky, very fast, but with several restrictions on input and kernel
shapes, and with a different memory layout for the input. shapes, and with a different memory layout for the input. It does not
flip the kernel.
This is in Pylearn2, where it is normally called from the `linear transform This is in Pylearn2, where it is normally called from the `linear transform
<http://deeplearning.net/software/pylearn2/library/linear.html>`_ <http://deeplearning.net/software/pylearn2/library/linear.html>`_
implementation, but it can also be used `directly from within Theano implementation, but it can also be used `directly from within Theano
<http://benanne.github.io/2014/04/03/faster-convolutions-in-theano.html>`_ <http://benanne.github.io/2014/04/03/faster-convolutions-in-theano.html>`_
as a manual replacement for nnet.conv2d. It does not flip the kernel. as a manual replacement for nnet.conv2d.
- :func:`GpuCorrMM <theano.sandbox.cuda.blas.GpuCorrMM>` - :func:`GpuCorrMM <theano.sandbox.cuda.blas.GpuCorrMM>`
This is a GPU-only 2d correlation implementation taken from This is a GPU-only 2d correlation implementation taken from
`caffe <https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu>`_ `caffe <https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu>`_
and also used by Torch. and also used by Torch. It does not flip the kernel.
For each element in a batch, it first creates a For each element in a batch, it first creates a
`Toeplitz <http://en.wikipedia.org/wiki/Toeplitz_matrix>`_ matrix in a CUDA kernel. `Toeplitz <http://en.wikipedia.org/wiki/Toeplitz_matrix>`_ matrix in a CUDA kernel.
...@@ -110,36 +106,24 @@ TODO: Give examples on how to use these things! They are pretty complicated. ...@@ -110,36 +106,24 @@ TODO: Give examples on how to use these things! They are pretty complicated.
``(no of channels * filter width * filter height, output width * output height)``. ``(no of channels * filter width * filter height, output width * output height)``.
As it provides a gradient, you can use it as a replacement for nnet.conv2d. As it provides a gradient, you can use it as a replacement for nnet.conv2d.
Alternatively, you can use nnet.conv2d and allow Theano's graph optimizer But usually, you will just use nnet.conv2d and allow Theano's graph
to replace it by the GEMM version by setting optimizer to automatically replace it by the GEMM version if cuDNN is not
``THEANO_FLAGS=optimizer_including=conv_gemm`` in your environment. available. To explicitly disable the graph optimizer, set
This is not enabled by default because it uses some extra memory, but the ``THEANO_FLAGS=optimizer_excluding=conv_gemm`` in your environment.
overhead is small compared to conv2d_fft, there are no restrictions on
input or kernel shapes and it is sometimes still faster than cuda-convnet.
If using it, please see the warning about a bug in CUDA 5.0 to 6.0 below. If using it, please see the warning about a bug in CUDA 5.0 to 6.0 below.
To enable it for just one Theano function:
.. code-block:: python
mode = theano.compile.get_default_mode()
mode = mode.including('conv_gemm')
f = theano.function(..., mode=mode)
- :func:`dnn_conv <theano.sandbox.cuda.dnn.dnn_conv>` GPU-only - :func:`dnn_conv <theano.sandbox.cuda.dnn.dnn_conv>` GPU-only
convolution using NVIDIA's cuDNN library. To have conv2d() convolution using NVIDIA's cuDNN library. This requires that you have
automatically converted set cuDNN installed and available, which in turn requires CUDA 6.5 and a GPU
``THEANO_FLAGS=optimizer_including=cudnn`` in your environment. with compute capability 3.0 or more.
This will also replace other operations by their a
cuDNN-accelerated equivalent. This requires that you have cuDNN If cuDNN is available, by default, Theano will replace all nnet.conv2d
installed and available. It requires a GPU with compute operations with dnn_conv. To explicitly disable it, set
capability 3.0 or more. ``THEANO_FLAGS=optimizer_excluding=conv_dnn`` in your environment.
As dnn_conv has a gradient defined, you can also use it manually.
Since it has a gradient defined it can also be used manually. - Implemented operators for neural network 3D / video convolution:
- :func:`conv3D <theano.tensor.nnet.Conv3D.conv3D>` - :func:`conv3D <theano.tensor.nnet.Conv3D.conv3D>`
3D Convolution applying multi-channel 3D filters to batches of 3D Convolution applying multi-channel 3D filters to batches of
multi-channel 3D images. It do not flip the kernel. multi-channel 3D images. It does not flip the kernel.
- :func:`conv3d_fft <theano.sandbox.cuda.fftconv.conv3d_fft>` - :func:`conv3d_fft <theano.sandbox.cuda.fftconv.conv3d_fft>`
GPU-only version of conv3D using FFT transform. conv3d_fft should GPU-only version of conv3D using FFT transform. conv3d_fft should
not be called directly as it does not provide a gradient. not be called directly as it does not provide a gradient.
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论