Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
4ad5236b
提交
4ad5236b
authored
11月 18, 2014
作者:
Frédéric Bastien
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #2255 from f0k/cleaner-conv-opt
Slightly clean up registration of GPU convolution optimizers
上级
f423ac63
0befaea4
隐藏空白字符变更
内嵌
并排
正在显示
4 个修改的文件
包含
91 行增加
和
104 行删除
+91
-104
dnn.txt
doc/library/sandbox/cuda/dnn.txt
+13
-6
conv.txt
doc/library/tensor/nnet/conv.txt
+48
-64
dnn.py
theano/sandbox/cuda/dnn.py
+1
-1
opt.py
theano/sandbox/cuda/opt.py
+29
-33
没有找到文件。
doc/library/sandbox/cuda/dnn.txt
浏览文件 @
4ad5236b
...
...
@@ -13,12 +13,19 @@ installed with CUDA 6.5. You must download and install it
yourself.
To install it, decompress the downloaded file and make the ``*.h`` and
``*.so*`` files available to the compilation environment. On Linux,
this can be done by setting the environment variables
``LD_LIBRARY_PATH``, ``LIBRARY_PATH`` and ``CPATH`` to the
uncompressed directory path. Separate multiple directory with ``:`` as
the ``PATH`` environment variable. Or you can copy the ``*.h`` files
to ``/usr/include`` and the ``*.so*`` files to ``/lib64``.
``*.so*`` files available to the compilation environment.
There are at least three possible ways of doing so:
- The easiest is to include them in your CUDA installation. Copy the
``*.h`` files to ``CUDA_ROOT/include`` and the ``*.so*`` files to
``CUDA_ROOT/lib64`` (by default, ``CUDA_ROOT`` is ``/usr/local/cuda``
on Linux).
- Alternatively, on Linux, you can set the environment variables
``LD_LIBRARY_PATH``, ``LIBRARY_PATH`` and ``CPATH`` to the directory
extracted from the download. If needed, separate multiple directories
with ``:`` as in the ``PATH`` environment variable.
- And as a third way, also on Linux, you can copy the ``*.h`` files
to ``/usr/include`` and the ``*.so*`` files to ``/lib64``.
By default, Theano will detect if it can use cuDNN. If so, it will use
it. If not, Theano optimizations will not introduce cuDNN ops. So
...
...
doc/library/tensor/nnet/conv.txt
浏览文件 @
4ad5236b
...
...
@@ -25,64 +25,59 @@
.. note::
As of October 21st, 2014, the default GPU image convolution
changed. Here is the algo:
changed: By default, if :ref:`cuDNN <_libdoc_cuda_dnn>`
is available, we will use it, otherwise we will fall back to using the
gemm version (slower then cuDNN in most cases, uses more memory, but
faster than the legacy version we used before).
- If we can use `cuDNN <https://developer.nvidia.com/cuDNN>`_, use it.
- If not, use gemm version (slower then cuDNN, uses more memory).
Both cuDNN and the gemm version can be disabled using the Theano flags
``optimizer_excluding=conv_dnn`` and ``optimizer_excluding=conv_gemm``,
respectively. In this case, we will fall back to using the legacy
convolution code, which is slower, but does not require extra memory.
To verify that cuDNN is used, you can supply the Theano flag
``optimizer_including=cudnn``. This will raise an error if cuDNN is
unavailable.
If the users do not want the extra memory usage of the gemm
version, they can enable the legacy code that is even slower, but
does not use extra memory. For this, use the Theano flag
``optimizer_excluding=conv_gemm``.
It is not advised to ever disable cuDNN, as this is usually the fastest
option. Disabling the gemm version is only useful if cuDNN is unavailable
and you run out of GPU memory.
There is no reason to use the legacy code or the gemm version if
cuDNN is available.
2 other options:
- There is also the fft version that is the fastest in some cases,
but uses even more memory. It does not support striding to remove
computation and has some shapes restriction.
- There is also the cuda_convnet convolution in Pylearn2. It uses a
different memory layout, has shapes restrictions, but does not use
extra memory and is faster then the legacy convolution.
If you want to verify the usage of cuDNN, you can use the Theano
flag ``optimizer_including=cudnn``. This will raise an error if we
can't use cuDNN.
There are two other implementations: An FFT-based convolution integrated
into Theano, and an implementation by Alex Krizhevsky available via
Pylearn2. See the documentation below on how to use them.
TODO: Give examples on how to use these things! They are pretty complicated.
- Convolution operators implemented:
- :func:`signal.conv2d <theano.tensor.signal.conv.conv2d>`. See note above.
- Implemented operators for neural network 2D / image convolution:
- :func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
This is the standard operator for convolutional neural networks working
with batches of multi-channel 2D images, available for CPU and GPU.
Most of the more efficient GPU implementations listed below can be used
as an automatic replacement for nnet.conv2d by enabling specific graph
optimizations. It flip the kernel.
with batches of multi-channel 2D images, available for CPU and GPU. It
computes a convolution, i.e., it flips the kernel.
Most of the more efficient GPU implementations listed below can be
inserted automatically as a replacement for nnet.conv2d via graph
optimizations. Some of these graph optimizations are enabled by default,
others can be enabled via Theano flags.
- :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` This
is a GPU-only version of nnet.conv2d that uses an FFT transform
to perform the work. It flip
the kernel as
``conv2d``.
to perform the work. It flip
s the kernel just like
``conv2d``.
conv2d_fft should not be used directly as
it does not provide a gradient. Instead, use nnet.conv2d and
allow Theano's graph optimizer to replace it by the FFT version
by setting
'THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full'
in your environement. This
is not enabled by default because it
by setting
'THEANO_FLAGS=optimizer_including=conv_fft'
in your environment. If enabled, it will take precedence over cuDNN
and the gemm version. It
is not enabled by default because it
has some restrictions on input and uses a lot more memory. Also
note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and
PyCUDA to run. To deactivate the FFT optimization on a specific
nnet.conv2d while the optimization flag
s are
active, you can set
nnet.conv2d while the optimization flag
is
active, you can set
its ``version`` parameter to ``'no_fft'``. To enable it for just
one Theano function:
.. code-block:: python
mode = theano.compile.get_default_mode()
mode = mode.including('conv_fft
_valid', 'conv_fft_full
')
mode = mode.including('conv_fft')
f = theano.function(..., mode=mode)
...
...
@@ -90,17 +85,18 @@ TODO: Give examples on how to use these things! They are pretty complicated.
Wrapper for an open-source GPU-only implementation of conv2d by Alex
Krizhevsky, very fast, but with several restrictions on input and kernel
shapes, and with a different memory layout for the input.
shapes, and with a different memory layout for the input. It does not
flip the kernel.
This is in Pylearn2, where it is normally called from the `linear transform
<http://deeplearning.net/software/pylearn2/library/linear.html>`_
implementation, but it can also be used `directly from within Theano
<http://benanne.github.io/2014/04/03/faster-convolutions-in-theano.html>`_
as a manual replacement for nnet.conv2d.
It does not flip the kernel.
as a manual replacement for nnet.conv2d.
- :func:`GpuCorrMM <theano.sandbox.cuda.blas.GpuCorrMM>`
This is a GPU-only 2d correlation implementation taken from
`caffe <https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu>`_
and also used by Torch.
and also used by Torch.
It does not flip the kernel.
For each element in a batch, it first creates a
`Toeplitz <http://en.wikipedia.org/wiki/Toeplitz_matrix>`_ matrix in a CUDA kernel.
...
...
@@ -110,36 +106,24 @@ TODO: Give examples on how to use these things! They are pretty complicated.
``(no of channels * filter width * filter height, output width * output height)``.
As it provides a gradient, you can use it as a replacement for nnet.conv2d.
Alternatively, you can use nnet.conv2d and allow Theano's graph optimizer
to replace it by the GEMM version by setting
``THEANO_FLAGS=optimizer_including=conv_gemm`` in your environment.
This is not enabled by default because it uses some extra memory, but the
overhead is small compared to conv2d_fft, there are no restrictions on
input or kernel shapes and it is sometimes still faster than cuda-convnet.
But usually, you will just use nnet.conv2d and allow Theano's graph
optimizer to automatically replace it by the GEMM version if cuDNN is not
available. To explicitly disable the graph optimizer, set
``THEANO_FLAGS=optimizer_excluding=conv_gemm`` in your environment.
If using it, please see the warning about a bug in CUDA 5.0 to 6.0 below.
To enable it for just one Theano function:
.. code-block:: python
mode = theano.compile.get_default_mode()
mode = mode.including('conv_gemm')
f = theano.function(..., mode=mode)
- :func:`dnn_conv <theano.sandbox.cuda.dnn.dnn_conv>` GPU-only
convolution using NVIDIA's cuDNN library. To have conv2d()
automatically converted set
``THEANO_FLAGS=optimizer_including=cudnn`` in your environment.
This will also replace other operations by their a
cuDNN-accelerated equivalent. This requires that you have cuDNN
installed and available. It requires a GPU with compute
capability 3.0 or more.
Since it has a gradient defined it can also be used manually.
convolution using NVIDIA's cuDNN library. This requires that you have
cuDNN installed and available, which in turn requires CUDA 6.5 and a GPU
with compute capability 3.0 or more.
If cuDNN is available, by default, Theano will replace all nnet.conv2d
operations with dnn_conv. To explicitly disable it, set
``THEANO_FLAGS=optimizer_excluding=conv_dnn`` in your environment.
As dnn_conv has a gradient defined, you can also use it manually.
- Implemented operators for neural network 3D / video convolution:
- :func:`conv3D <theano.tensor.nnet.Conv3D.conv3D>`
3D Convolution applying multi-channel 3D filters to batches of
multi-channel 3D images. It do not flip the kernel.
multi-channel 3D images. It do
es
not flip the kernel.
- :func:`conv3d_fft <theano.sandbox.cuda.fftconv.conv3d_fft>`
GPU-only version of conv3D using FFT transform. conv3d_fft should
not be called directly as it does not provide a gradient.
...
...
theano/sandbox/cuda/dnn.py
浏览文件 @
4ad5236b
...
...
@@ -1089,7 +1089,7 @@ if cuda_available:
from
theano.sandbox.cuda.opt
import
(
local_optimizer
,
gpu_optimizer
,
gpu_seqopt
)
@register_opt
(
'cudnn'
)
#@register_opt('cudnn') # this optimizer is registered in opt.py instead.
@local_optimizer
([
GpuConv
])
def
local_conv_dnn
(
node
):
if
not
dnn_available
():
...
...
theano/sandbox/cuda/opt.py
浏览文件 @
4ad5236b
...
...
@@ -1105,12 +1105,9 @@ def local_gpu_softmax_with_bias(node):
return
[
host_from_gpu
(
gpu_sm
)]
return
False
# Convolution, maxpooling
# Convolution
from
theano.tensor.nnet
import
conv
# We need a fixed order for the user interface.
conv_groupopt
=
theano
.
gof
.
optdb
.
LocalGroupDB
()
conv_groupopt
.
__name__
=
"gpu_conv_opts"
register_opt
(
'fast_compile'
,
'fast_run'
,
'gpu'
)(
conv_groupopt
)
def
_gpu_conv_to_fftconv
(
node
):
...
...
@@ -1163,22 +1160,8 @@ def local_conv_fft_full(node):
return
@local_optimizer
([
GpuConv
])
def
local_gpu_conv
(
node
):
"""
If cudnn is available, use it. Otherwise, use the gemm version.
"""
if
(
isinstance
(
node
.
op
,
GpuConv
)
and
theano
.
sandbox
.
cuda
.
dnn
.
dnn_available
()):
return
theano
.
sandbox
.
cuda
.
dnn
.
local_conv_dnn
.
transform
(
node
)
# If dnn isn't avail, the local_gpu_conv_legacy wil introduce the
# legacy opt. Then the local_conv_gemm will convert it to gemm
# opt.
@local_optimizer
([
gpu_from_host
,
conv
.
ConvOp
])
def
local_gpu_conv
_legacy
(
node
):
def
local_gpu_conv
(
node
):
"""
gpu_from_host(conv) -> gpu_conv(gpu_from_host)
...
...
@@ -1334,19 +1317,31 @@ def local_conv_gemm(node):
gpu_contiguous
(
kern
),
gpu_contiguous
(
img
))]
# Legacy opt first, as this is the only that move to the GPU.
# Then fft, as disabled dy default. So if use enable it, it have prio
# Then default, use dnn if avail
# Then default, use gemm if dnn or fft didn't worked.
# Normally, gemm should catch all case, so the legacy should never run.
conv_groupopt
.
register
(
'local_gpu_conv_legacy'
,
local_gpu_conv_legacy
,
0
,
'fast_compile'
,
'fast_run'
)
conv_groupopt
.
register
(
"conv_fft_valid"
,
local_conv_fft_valid
,
1
)
conv_groupopt
.
register
(
"conv_fft_full"
,
local_conv_fft_full
,
1
)
# Use dnn if avail, so have the dnn tag to be able to disable it.
conv_groupopt
.
register
(
'local_gpu_conv'
,
local_gpu_conv
,
10
,
'fast_compile'
,
'fast_run'
,
'cudnn'
)
conv_groupopt
.
register
(
'local_conv_gemm'
,
local_conv_gemm
,
12
,
# First we register the optimizer that moves convolutions to the GPU.
register_opt
()(
local_gpu_conv
)
# Then we create a group of optimizers that replace the legacy GpuConv
# with other implementations. They are tried in a specific order so we
# can control which ones take precedence over others.
conv_groupopt
=
theano
.
gof
.
optdb
.
LocalGroupDB
()
conv_groupopt
.
__name__
=
"gpu_conv_opts"
register_opt
()(
conv_groupopt
)
# FFT gets the highest priority (lowest number), but is disabled by default.
# It can be enabled by including 'conv_fft'.
conv_groupopt
.
register
(
'conv_fft_valid'
,
local_conv_fft_valid
,
10
,
'conv_fft'
)
conv_groupopt
.
register
(
'conv_fft_full'
,
local_conv_fft_full
,
10
,
'conv_fft'
)
# cuDNN is the second, but only registered if cuDNN is available.
# It can be disabled by excluding 'conv_dnn' or 'cudnn'.
from
.
import
dnn
if
dnn
.
dnn_available
():
conv_groupopt
.
register
(
'conv_dnn'
,
dnn
.
local_conv_dnn
,
20
,
'fast_compile'
,
'fast_run'
,
'cudnn'
)
# The GEMM-based convolution comes last to catch all remaining cases.
# It can be disabled by excluding 'conv_gemm'.
conv_groupopt
.
register
(
'conv_gemm'
,
local_conv_gemm
,
30
,
'fast_compile'
,
'fast_run'
)
...
...
@@ -1500,6 +1495,7 @@ def local_convtransp3d_gemm(node):
gpu_optimizer
.
register
(
"convtransp3d_gemm"
,
local_convtransp3d_gemm
)
# Pooling
import
theano.tensor.signal.downsample
as
downsample
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论