Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
4ad5236b
提交
4ad5236b
authored
11月 18, 2014
作者:
Frédéric Bastien
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #2255 from f0k/cleaner-conv-opt
Slightly clean up registration of GPU convolution optimizers
上级
f423ac63
0befaea4
隐藏空白字符变更
内嵌
并排
正在显示
4 个修改的文件
包含
91 行增加
和
104 行删除
+91
-104
dnn.txt
doc/library/sandbox/cuda/dnn.txt
+13
-6
conv.txt
doc/library/tensor/nnet/conv.txt
+48
-64
dnn.py
theano/sandbox/cuda/dnn.py
+1
-1
opt.py
theano/sandbox/cuda/opt.py
+29
-33
没有找到文件。
doc/library/sandbox/cuda/dnn.txt
浏览文件 @
4ad5236b
...
@@ -13,12 +13,19 @@ installed with CUDA 6.5. You must download and install it
...
@@ -13,12 +13,19 @@ installed with CUDA 6.5. You must download and install it
yourself.
yourself.
To install it, decompress the downloaded file and make the ``*.h`` and
To install it, decompress the downloaded file and make the ``*.h`` and
``*.so*`` files available to the compilation environment. On Linux,
``*.so*`` files available to the compilation environment.
this can be done by setting the environment variables
There are at least three possible ways of doing so:
``LD_LIBRARY_PATH``, ``LIBRARY_PATH`` and ``CPATH`` to the
uncompressed directory path. Separate multiple directory with ``:`` as
- The easiest is to include them in your CUDA installation. Copy the
the ``PATH`` environment variable. Or you can copy the ``*.h`` files
``*.h`` files to ``CUDA_ROOT/include`` and the ``*.so*`` files to
to ``/usr/include`` and the ``*.so*`` files to ``/lib64``.
``CUDA_ROOT/lib64`` (by default, ``CUDA_ROOT`` is ``/usr/local/cuda``
on Linux).
- Alternatively, on Linux, you can set the environment variables
``LD_LIBRARY_PATH``, ``LIBRARY_PATH`` and ``CPATH`` to the directory
extracted from the download. If needed, separate multiple directories
with ``:`` as in the ``PATH`` environment variable.
- And as a third way, also on Linux, you can copy the ``*.h`` files
to ``/usr/include`` and the ``*.so*`` files to ``/lib64``.
By default, Theano will detect if it can use cuDNN. If so, it will use
By default, Theano will detect if it can use cuDNN. If so, it will use
it. If not, Theano optimizations will not introduce cuDNN ops. So
it. If not, Theano optimizations will not introduce cuDNN ops. So
...
...
doc/library/tensor/nnet/conv.txt
浏览文件 @
4ad5236b
...
@@ -25,64 +25,59 @@
...
@@ -25,64 +25,59 @@
.. note::
.. note::
As of October 21st, 2014, the default GPU image convolution
As of October 21st, 2014, the default GPU image convolution
changed. Here is the algo:
changed: By default, if :ref:`cuDNN <_libdoc_cuda_dnn>`
is available, we will use it, otherwise we will fall back to using the
gemm version (slower then cuDNN in most cases, uses more memory, but
faster than the legacy version we used before).
- If we can use `cuDNN <https://developer.nvidia.com/cuDNN>`_, use it.
Both cuDNN and the gemm version can be disabled using the Theano flags
- If not, use gemm version (slower then cuDNN, uses more memory).
``optimizer_excluding=conv_dnn`` and ``optimizer_excluding=conv_gemm``,
respectively. In this case, we will fall back to using the legacy
convolution code, which is slower, but does not require extra memory.
To verify that cuDNN is used, you can supply the Theano flag
``optimizer_including=cudnn``. This will raise an error if cuDNN is
unavailable.
If the users do not want the extra memory usage of the gemm
It is not advised to ever disable cuDNN, as this is usually the fastest
version, they can enable the legacy code that is even slower, but
option. Disabling the gemm version is only useful if cuDNN is unavailable
does not use extra memory. For this, use the Theano flag
and you run out of GPU memory.
``optimizer_excluding=conv_gemm``.
There is no reason to use the legacy code or the gemm version if
There are two other implementations: An FFT-based convolution integrated
cuDNN is available.
into Theano, and an implementation by Alex Krizhevsky available via
Pylearn2. See the documentation below on how to use them.
2 other options:
- There is also the fft version that is the fastest in some cases,
but uses even more memory. It does not support striding to remove
computation and has some shapes restriction.
- There is also the cuda_convnet convolution in Pylearn2. It uses a
different memory layout, has shapes restrictions, but does not use
extra memory and is faster then the legacy convolution.
If you want to verify the usage of cuDNN, you can use the Theano
flag ``optimizer_including=cudnn``. This will raise an error if we
can't use cuDNN.
TODO: Give examples on how to use these things! They are pretty complicated.
TODO: Give examples on how to use these things! They are pretty complicated.
- Convolution operators implemented:
- Implemented operators for neural network 2D / image convolution:
- :func:`signal.conv2d <theano.tensor.signal.conv.conv2d>`. See note above.
- :func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
- :func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
This is the standard operator for convolutional neural networks working
This is the standard operator for convolutional neural networks working
with batches of multi-channel 2D images, available for CPU and GPU.
with batches of multi-channel 2D images, available for CPU and GPU. It
Most of the more efficient GPU implementations listed below can be used
computes a convolution, i.e., it flips the kernel.
as an automatic replacement for nnet.conv2d by enabling specific graph
Most of the more efficient GPU implementations listed below can be
optimizations. It flip the kernel.
inserted automatically as a replacement for nnet.conv2d via graph
optimizations. Some of these graph optimizations are enabled by default,
others can be enabled via Theano flags.
- :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` This
- :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` This
is a GPU-only version of nnet.conv2d that uses an FFT transform
is a GPU-only version of nnet.conv2d that uses an FFT transform
to perform the work. It flip
the kernel as
``conv2d``.
to perform the work. It flip
s the kernel just like
``conv2d``.
conv2d_fft should not be used directly as
conv2d_fft should not be used directly as
it does not provide a gradient. Instead, use nnet.conv2d and
it does not provide a gradient. Instead, use nnet.conv2d and
allow Theano's graph optimizer to replace it by the FFT version
allow Theano's graph optimizer to replace it by the FFT version
by setting
by setting
'THEANO_FLAGS=optimizer_including=conv_fft'
'THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full'
in your environment. If enabled, it will take precedence over cuDNN
in your environement. This
is not enabled by default because it
and the gemm version. It
is not enabled by default because it
has some restrictions on input and uses a lot more memory. Also
has some restrictions on input and uses a lot more memory. Also
note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and
note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and
PyCUDA to run. To deactivate the FFT optimization on a specific
PyCUDA to run. To deactivate the FFT optimization on a specific
nnet.conv2d while the optimization flag
s are
active, you can set
nnet.conv2d while the optimization flag
is
active, you can set
its ``version`` parameter to ``'no_fft'``. To enable it for just
its ``version`` parameter to ``'no_fft'``. To enable it for just
one Theano function:
one Theano function:
.. code-block:: python
.. code-block:: python
mode = theano.compile.get_default_mode()
mode = theano.compile.get_default_mode()
mode = mode.including('conv_fft
_valid', 'conv_fft_full
')
mode = mode.including('conv_fft')
f = theano.function(..., mode=mode)
f = theano.function(..., mode=mode)
...
@@ -90,17 +85,18 @@ TODO: Give examples on how to use these things! They are pretty complicated.
...
@@ -90,17 +85,18 @@ TODO: Give examples on how to use these things! They are pretty complicated.
Wrapper for an open-source GPU-only implementation of conv2d by Alex
Wrapper for an open-source GPU-only implementation of conv2d by Alex
Krizhevsky, very fast, but with several restrictions on input and kernel
Krizhevsky, very fast, but with several restrictions on input and kernel
shapes, and with a different memory layout for the input.
shapes, and with a different memory layout for the input. It does not
flip the kernel.
This is in Pylearn2, where it is normally called from the `linear transform
This is in Pylearn2, where it is normally called from the `linear transform
<http://deeplearning.net/software/pylearn2/library/linear.html>`_
<http://deeplearning.net/software/pylearn2/library/linear.html>`_
implementation, but it can also be used `directly from within Theano
implementation, but it can also be used `directly from within Theano
<http://benanne.github.io/2014/04/03/faster-convolutions-in-theano.html>`_
<http://benanne.github.io/2014/04/03/faster-convolutions-in-theano.html>`_
as a manual replacement for nnet.conv2d.
It does not flip the kernel.
as a manual replacement for nnet.conv2d.
- :func:`GpuCorrMM <theano.sandbox.cuda.blas.GpuCorrMM>`
- :func:`GpuCorrMM <theano.sandbox.cuda.blas.GpuCorrMM>`
This is a GPU-only 2d correlation implementation taken from
This is a GPU-only 2d correlation implementation taken from
`caffe <https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu>`_
`caffe <https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu>`_
and also used by Torch.
and also used by Torch.
It does not flip the kernel.
For each element in a batch, it first creates a
For each element in a batch, it first creates a
`Toeplitz <http://en.wikipedia.org/wiki/Toeplitz_matrix>`_ matrix in a CUDA kernel.
`Toeplitz <http://en.wikipedia.org/wiki/Toeplitz_matrix>`_ matrix in a CUDA kernel.
...
@@ -110,36 +106,24 @@ TODO: Give examples on how to use these things! They are pretty complicated.
...
@@ -110,36 +106,24 @@ TODO: Give examples on how to use these things! They are pretty complicated.
``(no of channels * filter width * filter height, output width * output height)``.
``(no of channels * filter width * filter height, output width * output height)``.
As it provides a gradient, you can use it as a replacement for nnet.conv2d.
As it provides a gradient, you can use it as a replacement for nnet.conv2d.
Alternatively, you can use nnet.conv2d and allow Theano's graph optimizer
But usually, you will just use nnet.conv2d and allow Theano's graph
to replace it by the GEMM version by setting
optimizer to automatically replace it by the GEMM version if cuDNN is not
``THEANO_FLAGS=optimizer_including=conv_gemm`` in your environment.
available. To explicitly disable the graph optimizer, set
This is not enabled by default because it uses some extra memory, but the
``THEANO_FLAGS=optimizer_excluding=conv_gemm`` in your environment.
overhead is small compared to conv2d_fft, there are no restrictions on
input or kernel shapes and it is sometimes still faster than cuda-convnet.
If using it, please see the warning about a bug in CUDA 5.0 to 6.0 below.
If using it, please see the warning about a bug in CUDA 5.0 to 6.0 below.
To enable it for just one Theano function:
.. code-block:: python
mode = theano.compile.get_default_mode()
mode = mode.including('conv_gemm')
f = theano.function(..., mode=mode)
- :func:`dnn_conv <theano.sandbox.cuda.dnn.dnn_conv>` GPU-only
- :func:`dnn_conv <theano.sandbox.cuda.dnn.dnn_conv>` GPU-only
convolution using NVIDIA's cuDNN library. To have conv2d()
convolution using NVIDIA's cuDNN library. This requires that you have
automatically converted set
cuDNN installed and available, which in turn requires CUDA 6.5 and a GPU
``THEANO_FLAGS=optimizer_including=cudnn`` in your environment.
with compute capability 3.0 or more.
This will also replace other operations by their a
cuDNN-accelerated equivalent. This requires that you have cuDNN
If cuDNN is available, by default, Theano will replace all nnet.conv2d
installed and available. It requires a GPU with compute
operations with dnn_conv. To explicitly disable it, set
capability 3.0 or more.
``THEANO_FLAGS=optimizer_excluding=conv_dnn`` in your environment.
As dnn_conv has a gradient defined, you can also use it manually.
Since it has a gradient defined it can also be used manually.
- Implemented operators for neural network 3D / video convolution:
- :func:`conv3D <theano.tensor.nnet.Conv3D.conv3D>`
- :func:`conv3D <theano.tensor.nnet.Conv3D.conv3D>`
3D Convolution applying multi-channel 3D filters to batches of
3D Convolution applying multi-channel 3D filters to batches of
multi-channel 3D images. It do not flip the kernel.
multi-channel 3D images. It do
es
not flip the kernel.
- :func:`conv3d_fft <theano.sandbox.cuda.fftconv.conv3d_fft>`
- :func:`conv3d_fft <theano.sandbox.cuda.fftconv.conv3d_fft>`
GPU-only version of conv3D using FFT transform. conv3d_fft should
GPU-only version of conv3D using FFT transform. conv3d_fft should
not be called directly as it does not provide a gradient.
not be called directly as it does not provide a gradient.
...
...
theano/sandbox/cuda/dnn.py
浏览文件 @
4ad5236b
...
@@ -1089,7 +1089,7 @@ if cuda_available:
...
@@ -1089,7 +1089,7 @@ if cuda_available:
from
theano.sandbox.cuda.opt
import
(
from
theano.sandbox.cuda.opt
import
(
local_optimizer
,
gpu_optimizer
,
gpu_seqopt
)
local_optimizer
,
gpu_optimizer
,
gpu_seqopt
)
@register_opt
(
'cudnn'
)
#@register_opt('cudnn') # this optimizer is registered in opt.py instead.
@local_optimizer
([
GpuConv
])
@local_optimizer
([
GpuConv
])
def
local_conv_dnn
(
node
):
def
local_conv_dnn
(
node
):
if
not
dnn_available
():
if
not
dnn_available
():
...
...
theano/sandbox/cuda/opt.py
浏览文件 @
4ad5236b
...
@@ -1105,12 +1105,9 @@ def local_gpu_softmax_with_bias(node):
...
@@ -1105,12 +1105,9 @@ def local_gpu_softmax_with_bias(node):
return
[
host_from_gpu
(
gpu_sm
)]
return
[
host_from_gpu
(
gpu_sm
)]
return
False
return
False
# Convolution, maxpooling
# Convolution
from
theano.tensor.nnet
import
conv
from
theano.tensor.nnet
import
conv
# We need a fixed order for the user interface.
conv_groupopt
=
theano
.
gof
.
optdb
.
LocalGroupDB
()
conv_groupopt
.
__name__
=
"gpu_conv_opts"
register_opt
(
'fast_compile'
,
'fast_run'
,
'gpu'
)(
conv_groupopt
)
def
_gpu_conv_to_fftconv
(
node
):
def
_gpu_conv_to_fftconv
(
node
):
...
@@ -1163,22 +1160,8 @@ def local_conv_fft_full(node):
...
@@ -1163,22 +1160,8 @@ def local_conv_fft_full(node):
return
return
@local_optimizer
([
GpuConv
])
def
local_gpu_conv
(
node
):
"""
If cudnn is available, use it. Otherwise, use the gemm version.
"""
if
(
isinstance
(
node
.
op
,
GpuConv
)
and
theano
.
sandbox
.
cuda
.
dnn
.
dnn_available
()):
return
theano
.
sandbox
.
cuda
.
dnn
.
local_conv_dnn
.
transform
(
node
)
# If dnn isn't avail, the local_gpu_conv_legacy wil introduce the
# legacy opt. Then the local_conv_gemm will convert it to gemm
# opt.
@local_optimizer
([
gpu_from_host
,
conv
.
ConvOp
])
@local_optimizer
([
gpu_from_host
,
conv
.
ConvOp
])
def
local_gpu_conv
_legacy
(
node
):
def
local_gpu_conv
(
node
):
"""
"""
gpu_from_host(conv) -> gpu_conv(gpu_from_host)
gpu_from_host(conv) -> gpu_conv(gpu_from_host)
...
@@ -1334,19 +1317,31 @@ def local_conv_gemm(node):
...
@@ -1334,19 +1317,31 @@ def local_conv_gemm(node):
gpu_contiguous
(
kern
),
gpu_contiguous
(
img
))]
gpu_contiguous
(
kern
),
gpu_contiguous
(
img
))]
# Legacy opt first, as this is the only that move to the GPU.
# First we register the optimizer that moves convolutions to the GPU.
# Then fft, as disabled dy default. So if use enable it, it have prio
register_opt
()(
local_gpu_conv
)
# Then default, use dnn if avail
# Then default, use gemm if dnn or fft didn't worked.
# Then we create a group of optimizers that replace the legacy GpuConv
# Normally, gemm should catch all case, so the legacy should never run.
# with other implementations. They are tried in a specific order so we
conv_groupopt
.
register
(
'local_gpu_conv_legacy'
,
local_gpu_conv_legacy
,
0
,
# can control which ones take precedence over others.
'fast_compile'
,
'fast_run'
)
conv_groupopt
=
theano
.
gof
.
optdb
.
LocalGroupDB
()
conv_groupopt
.
register
(
"conv_fft_valid"
,
local_conv_fft_valid
,
1
)
conv_groupopt
.
__name__
=
"gpu_conv_opts"
conv_groupopt
.
register
(
"conv_fft_full"
,
local_conv_fft_full
,
1
)
register_opt
()(
conv_groupopt
)
# Use dnn if avail, so have the dnn tag to be able to disable it.
conv_groupopt
.
register
(
'local_gpu_conv'
,
local_gpu_conv
,
10
,
# FFT gets the highest priority (lowest number), but is disabled by default.
'fast_compile'
,
'fast_run'
,
'cudnn'
)
# It can be enabled by including 'conv_fft'.
conv_groupopt
.
register
(
'local_conv_gemm'
,
local_conv_gemm
,
12
,
conv_groupopt
.
register
(
'conv_fft_valid'
,
local_conv_fft_valid
,
10
,
'conv_fft'
)
conv_groupopt
.
register
(
'conv_fft_full'
,
local_conv_fft_full
,
10
,
'conv_fft'
)
# cuDNN is the second, but only registered if cuDNN is available.
# It can be disabled by excluding 'conv_dnn' or 'cudnn'.
from
.
import
dnn
if
dnn
.
dnn_available
():
conv_groupopt
.
register
(
'conv_dnn'
,
dnn
.
local_conv_dnn
,
20
,
'fast_compile'
,
'fast_run'
,
'cudnn'
)
# The GEMM-based convolution comes last to catch all remaining cases.
# It can be disabled by excluding 'conv_gemm'.
conv_groupopt
.
register
(
'conv_gemm'
,
local_conv_gemm
,
30
,
'fast_compile'
,
'fast_run'
)
'fast_compile'
,
'fast_run'
)
...
@@ -1500,6 +1495,7 @@ def local_convtransp3d_gemm(node):
...
@@ -1500,6 +1495,7 @@ def local_convtransp3d_gemm(node):
gpu_optimizer
.
register
(
"convtransp3d_gemm"
,
local_convtransp3d_gemm
)
gpu_optimizer
.
register
(
"convtransp3d_gemm"
,
local_convtransp3d_gemm
)
# Pooling
import
theano.tensor.signal.downsample
as
downsample
import
theano.tensor.signal.downsample
as
downsample
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论