Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
8cb9d50e
提交
8cb9d50e
authored
8月 04, 2015
作者:
Frédéric Bastien
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #3245 from carriepl/v3
CuDNN v3
上级
2b83b6ac
4d4be316
显示空白字符变更
内嵌
并排
正在显示
10 个修改的文件
包含
2137 行增加
和
317 行删除
+2137
-317
dnn.txt
doc/library/sandbox/cuda/dnn.txt
+61
-8
cudnn_helper.h
theano/sandbox/cuda/cudnn_helper.h
+93
-0
dnn.py
theano/sandbox/cuda/dnn.py
+781
-242
dnn_base.c
theano/sandbox/cuda/dnn_base.c
+30
-38
dnn_conv_base.c
theano/sandbox/cuda/dnn_conv_base.c
+37
-2
dnn_fwd.c
theano/sandbox/cuda/dnn_fwd.c
+199
-7
dnn_gi.c
theano/sandbox/cuda/dnn_gi.c
+206
-4
dnn_gw.c
theano/sandbox/cuda/dnn_gw.c
+207
-4
test_conv_cuda_ndarray.py
theano/sandbox/cuda/tests/test_conv_cuda_ndarray.py
+29
-8
test_dnn.py
theano/sandbox/cuda/tests/test_dnn.py
+494
-4
没有找到文件。
doc/library/sandbox/cuda/dnn.txt
浏览文件 @
8cb9d50e
...
@@ -36,12 +36,61 @@ To get an error if Theano can not use cuDNN, use this Theano flag:
...
@@ -36,12 +36,61 @@ To get an error if Theano can not use cuDNN, use this Theano flag:
.. note::
.. note::
CuDNN v2 is now released, if you used any v2 release candidate, we
CuDNN v3 has now been released. CuDNN v2 remains supported but CuDNN v3 is
strongly suggest that you update it to the final version. From now
faster and offers many more options. We recommend that everybody update to
on, we only support the final release.
v3.
.. note::
Starting in CuDNN v3, multiple convolution implementations are offered and
it is possible to use heuristics to automatically choose a convolution
implementation well suited to the parameters of the convolution.
The Theano flag ``dnn.conv.algo_fwd`` allows to specify the CuDNN
convolution implementation that Theano should use for forward convolutions.
Possible values include :
* ``small`` (default) : use a convolution implementation with small memory
usage
* ``none`` : use a slower implementation with minimal memory usage
* ``large`` : use a faster implementation with large memory usage
* ``fft`` : use the Fast Fourrier Transform implementation of convolution
(very high memory usage)
* ``guess_once`` : the first time a convolution is executed, the
implementation to use is chosen according to CuDNN's heuristics and reused
for every subsequent execution of the convolution.
* ``guess_on_shape_change`` : like ``guess_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
* ``time_once`` : the first time a convolution is executed, every convolution
implementation offered by CuDNN is executed and timed. The fastest is
reused for every subsequent execution of the convolution.
* ``time_on_shape_change`` : like ``time_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
The Theano flag ``dnn.conv.algo_bwd`` allows to specify the CuDNN
convolution implementation that Theano should use for gradient convolutions.
Possible values include :
* ``none`` (default) : use the default non-deterministic convolution
implementation
* ``deterministic`` : use a slower but deterministic implementation
* ``fft`` : use the Fast Fourrier Transform implementation of convolution
(very high memory usage)
* ``guess_once`` : the first time a convolution is executed, the
implementation to use is chosen according to CuDNN's heuristics and reused
for every subsequent execution of the convolution.
* ``guess_on_shape_change`` : like ``guess_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
* ``time_once`` : the first time a convolution is executed, every convolution
implementation offered by CuDNN is executed and timed. The fastest is
reused for every subsequent execution of the convolution.
* ``time_on_shape_change`` : like ``time_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
CuDNN v2 is much faster than v1. We recommend that everybody
updates to v2.
.. note::
.. note::
...
@@ -51,13 +100,16 @@ To get an error if Theano can not use cuDNN, use this Theano flag:
...
@@ -51,13 +100,16 @@ To get an error if Theano can not use cuDNN, use this Theano flag:
.. note::
.. note::
The documentation of CUDNN
R1 and R2 tells that, for the following
The documentation of CUDNN
tells that, for the 2 following operations, the
2 operations, the reproducibility is not guaranteed
:
reproducibility is not guaranteed with the default implementation
:
`cudnnConvolutionBackwardFilter` and `cudnnConvolutionBackwardData`.
`cudnnConvolutionBackwardFilter` and `cudnnConvolutionBackwardData`.
Those correspond to the gradient wrt the weights and the gradient wrt the
Those correspond to the gradient wrt the weights and the gradient wrt the
input of the convolution. They are also used sometimes in the forward
input of the convolution. They are also used sometimes in the forward
pass, when they give a speed up.
pass, when they give a speed up.
The Theano flag ``dnn.conv.algo_bwd`` can be use to force the use of a
slower but deterministic convolution implementation.
.. note::
.. note::
There is a problem we do not understand yet when cudnn paths are
There is a problem we do not understand yet when cudnn paths are
...
@@ -79,7 +131,8 @@ Convolution Ops
...
@@ -79,7 +131,8 @@ Convolution Ops
===============
===============
.. automodule:: theano.sandbox.cuda.dnn
.. automodule:: theano.sandbox.cuda.dnn
:members: GpuDnnConvDesc, GpuDnnConv, GpuDnnConvGradW, GpuDnnConvGradI
:members: GpuDnnConvDesc, GpuDnnConv, GpuDnnConv3d, GpuDnnConvGradW,
GpuDnnConv3dGradW, GpuDnnConvGradI, GpuDnnConv3dGradI
Pooling Ops
Pooling Ops
===========
===========
...
...
theano/sandbox/cuda/cudnn_helper.h
浏览文件 @
8cb9d50e
...
@@ -41,6 +41,20 @@ static inline const char *cudnnGetErrorString(cudnnStatus_t err) {
...
@@ -41,6 +41,20 @@ static inline const char *cudnnGetErrorString(cudnnStatus_t err) {
typedef
cudnnTensor4dDescriptor_t
cudnnTensorDescriptor_t
;
typedef
cudnnTensor4dDescriptor_t
cudnnTensorDescriptor_t
;
static
inline
cudnnStatus_t
cudnnSetTensorNdDescriptor
(
cudnnTensorDescriptor_t
tensorDesc
,
cudnnDataType_t
dataType
,
int
nbDims
,
const
int
dimA
[],
const
int
strideA
[])
{
if
(
ndDims
!=
4
)
return
CUDNN_STATUS_NOT_SUPPORTED
;
return
cudnnSetTensor4dDescriptorEx
(
tensorDesc
,
dataType
,
dimA
[
0
],
dimA
[
1
],
dimA
[
2
],
dimA
[
3
],
strideA
[
0
],
strideA
[
1
],
strideA
[
2
],
strideA
[
3
]);
}
static
inline
cudnnStatus_t
static
inline
cudnnStatus_t
cudnnGetConvolution2dForwardOutputDim
(
cudnnGetConvolution2dForwardOutputDim
(
const
cudnnConvolutionDescriptor_t
convDesc
,
const
cudnnConvolutionDescriptor_t
convDesc
,
...
@@ -183,6 +197,85 @@ cudnnConvolutionBackwardData_v2(
...
@@ -183,6 +197,85 @@ cudnnConvolutionBackwardData_v2(
#define cudnnConvolutionBackwardData cudnnConvolutionBackwardData_v2
#define cudnnConvolutionBackwardData cudnnConvolutionBackwardData_v2
static
inline
cudnnStatus_t
cudnnSetPoolingNdDescriptor
(
cudnnPoolingDescriptor_t
poolingDesc
,
const
cudnnPoolingMode_t
mode
,
int
nbDims
,
const
int
windowDimA
[],
const
int
paddingA
[],
const
in
strideA
[])
{
if
(
nbDims
!=
2
)
return
CUDNN_STATUS_NOT_SUPPORTED
;
if
(
paddingA
[
0
]
!=
0
||
paddingA
[
1
]
!=
0
)
return
CUDNN_STATUS_NOT_SUPPORTED
;
return
cudnnSetPoolingDescriptor
(
poolingDesc
,
mode
,
windowDimA
[
0
],
windowDimA
[
1
],
strideA
[
0
],
strideA
[
1
]);
}
static
inline
cudnnStatus_t
cudnnGetPoolingNdDescriptor
(
const
cudnnPoolingDescriptor_t
poolingDesc
,
const
int
nbDimsRequested
,
cudnnPoolingMode_t
*
mode
,
int
*
nbDims
,
int
windowA
[],
int
paddingA
[],
int
strideA
[])
{
int
win0
,
win1
,
str0
,
str1
;
cudnnStatus_t
err
;
if
(
ndDimsRequested
<
2
)
return
CUDNN_STATUS_NOT_SUPPORTED
;
err
=
cudnnGetPoolingDescriptor
(
poolingDesc
,
mode
,
&
win0
,
&
win1
,
&
str0
,
&
str1
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
return
err
;
*
nbDims
=
2
;
paddingA
[
0
]
=
0
;
paddingA
[
1
]
=
0
;
windowA
[
0
]
=
win0
;
windowA
[
1
]
=
win1
;
strideA
[
0
]
=
str0
;
strideA
[
1
]
=
str1
;
return
CUDNN_STATUS_SUCCESS
;
}
static
inline
cudnnStatus_t
cudnnPoolingForward_v2
(
cudnnHandle_t
handle
,
const
cudnnPoolingDescriptor_t
poolingDesc
,
const
void
*
alpha
,
const
cudnnTensorDescriptor_t
srcDesc
,
const
void
*
srcData
,
const
void
*
beta
,
const
cudnnTensorDescriptor_t
destDesc
,
void
*
destData
)
{
if
(
*
(
float
*
)
alpha
!=
1
.
0
||
*
(
float
*
)
beta
!=
0
.
0
)
return
CUDNN_STATUS_NOT_SUPPORTED
;
return
cudnnPoolingForward
(
handle
,
poolingDesc
,
srcDesc
,
srcData
,
destDesc
,
destData
);
}
#define cudnnPoolingForward cudnnPoolingForward_v2
static
inline
cudnnStatus_t
cudnnPoolingBackward_v2
(
cudnnHandle_t
handle
,
const
cudnnPoolingDescriptor_t
poolingDesc
,
const
void
*
alpha
,
const
cudnnTensorDescriptor_t
srcDesc
,
const
void
*
srcData
,
const
cudnnTensorDescriptor_t
srcDiffDesc
,
const
void
*
srcDiffData
,
const
cudnnTensorDescriptor_t
destDesc
,
const
void
*
destData
,
const
void
*
beta
,
const
cudnnTensorDescriptor_t
destDiffDesc
,
void
*
destDiffData
)
{
if
(
*
(
float
*
)
alpha
!=
1
.
0
||
*
(
float
*
)
beta
!=
0
.
0
)
return
CUDNN_STATUS_NOT_SUPPORTED
;
return
cudnnPoolingBackward
(
handle
,
poolingDesc
,
srcDesc
,
srcData
,
srcDiffDesc
,
srcDiffData
,
destDesc
,
destData
,
destDiffDesc
,
destDiffData
);
}
#define cudnnPoolingBackward cudnnPoolingBackward_v2
//Needed for R2 rc2
//Needed for R2 rc2
# define CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING CUDNN_POOLING_AVERAGE
# define CUDNN_POOLING_AVERAGE_COUNT_INCLUDE_PADDING CUDNN_POOLING_AVERAGE
#else
#else
...
...
theano/sandbox/cuda/dnn.py
浏览文件 @
8cb9d50e
import
os
import
os
import
numpy
import
numpy
import
warnings
import
theano
import
theano
from
theano
import
Apply
,
gof
,
tensor
,
config
,
Variable
from
theano
import
Apply
,
tensor
,
config
,
Variable
from
theano.scalar
import
as_scalar
,
constant
from
theano.scalar
import
as_scalar
,
constant
,
Log
from
theano.gradient
import
DisconnectedType
,
grad_not_implemented
from
theano.gradient
import
DisconnectedType
,
grad_not_implemented
from
theano.gof
import
Optimizer
,
local_optimizer
,
COp
from
theano.gof
import
Optimizer
,
local_optimizer
,
COp
from
theano.gof.type
import
CDataType
,
Generic
from
theano.gof.type
import
CDataType
,
Generic
...
@@ -17,7 +18,8 @@ from theano.sandbox.cuda import GpuOp
...
@@ -17,7 +18,8 @@ from theano.sandbox.cuda import GpuOp
from
theano.sandbox.cuda.basic_ops
import
(
as_cuda_ndarray_variable
,
from
theano.sandbox.cuda.basic_ops
import
(
as_cuda_ndarray_variable
,
host_from_gpu
,
host_from_gpu
,
gpu_contiguous
,
HostFromGpu
,
gpu_contiguous
,
HostFromGpu
,
gpu_alloc_empty
,
GpuAllocEmpty
)
gpu_alloc_empty
,
GpuAllocEmpty
,
GpuElemwise
)
from
theano.sandbox.cuda.blas
import
(
GpuConv
,
GpuDownsampleFactorMax
,
from
theano.sandbox.cuda.blas
import
(
GpuConv
,
GpuDownsampleFactorMax
,
GpuDownsampleFactorMaxGrad
)
GpuDownsampleFactorMaxGrad
)
from
theano.sandbox.cuda.nnet
import
GpuSoftmax
from
theano.sandbox.cuda.nnet
import
GpuSoftmax
...
@@ -236,10 +238,10 @@ class GpuDnnConvDesc(GpuOp):
...
@@ -236,10 +238,10 @@ class GpuDnnConvDesc(GpuOp):
def
__init__
(
self
,
border_mode
,
subsample
=
(
1
,
1
),
conv_mode
=
'conv'
):
def
__init__
(
self
,
border_mode
,
subsample
=
(
1
,
1
),
conv_mode
=
'conv'
):
if
isinstance
(
border_mode
,
int
):
if
isinstance
(
border_mode
,
int
):
border_mode
=
(
border_mode
,
border_mod
e
)
border_mode
=
(
border_mode
,
)
*
len
(
subsampl
e
)
if
isinstance
(
border_mode
,
tuple
):
if
isinstance
(
border_mode
,
tuple
):
pad_h
,
pad_w
=
map
(
int
,
border_mod
e
)
assert
len
(
border_mode
)
==
len
(
subsampl
e
)
border_mode
=
(
pad_h
,
pad_w
)
border_mode
=
tuple
(
map
(
int
,
border_mode
)
)
if
not
((
isinstance
(
border_mode
,
tuple
)
and
min
(
border_mode
)
>=
0
)
or
if
not
((
isinstance
(
border_mode
,
tuple
)
and
min
(
border_mode
)
>=
0
)
or
border_mode
in
(
'valid'
,
'full'
)):
border_mode
in
(
'valid'
,
'full'
)):
raise
ValueError
(
raise
ValueError
(
...
@@ -247,7 +249,7 @@ class GpuDnnConvDesc(GpuOp):
...
@@ -247,7 +249,7 @@ class GpuDnnConvDesc(GpuOp):
'"valid", "full", an integer or a pair of'
'"valid", "full", an integer or a pair of'
' integers'
.
format
(
border_mode
))
' integers'
.
format
(
border_mode
))
self
.
border_mode
=
border_mode
self
.
border_mode
=
border_mode
assert
len
(
subsample
)
==
2
assert
len
(
subsample
)
in
[
2
,
3
]
self
.
subsample
=
subsample
self
.
subsample
=
subsample
assert
conv_mode
in
(
'conv'
,
'cross'
)
assert
conv_mode
in
(
'conv'
,
'cross'
)
self
.
conv_mode
=
conv_mode
self
.
conv_mode
=
conv_mode
...
@@ -266,12 +268,14 @@ class GpuDnnConvDesc(GpuOp):
...
@@ -266,12 +268,14 @@ class GpuDnnConvDesc(GpuOp):
img_shape
,
kern_shape
=
inputs
img_shape
,
kern_shape
=
inputs
desc
,
=
outputs
desc
,
=
outputs
nb_dim
=
len
(
self
.
subsample
)
if
isinstance
(
self
.
border_mode
,
tuple
):
if
isinstance
(
self
.
border_mode
,
tuple
):
pad_
h_spec
,
pad_w_spe
c
=
map
(
int
,
self
.
border_mode
)
pad_
des
c
=
map
(
int
,
self
.
border_mode
)
assert
pad_h_spec
>=
0
and
pad_w_spec
>=
0
assert
min
(
pad_desc
)
>=
0
bmode
=
2
bmode
=
2
else
:
else
:
pad_
h_spec
=
pad_w_spec
=
0
pad_
desc
=
[
0
]
*
nb_dim
if
self
.
border_mode
==
"valid"
:
if
self
.
border_mode
==
"valid"
:
bmode
=
1
bmode
=
1
...
@@ -284,11 +288,13 @@ class GpuDnnConvDesc(GpuOp):
...
@@ -284,11 +288,13 @@ class GpuDnnConvDesc(GpuOp):
else
:
else
:
conv_flag
=
'CUDNN_CROSS_CORRELATION'
conv_flag
=
'CUDNN_CROSS_CORRELATION'
pad_str
=
", "
.
join
([
str
(
s
)
for
s
in
pad_desc
])
subsample_str
=
", "
.
join
([
str
(
s
)
for
s
in
self
.
subsample
])
upscale_str
=
", "
.
join
([
"1"
]
*
nb_dim
)
return
"""
return
"""
{
{
cudnnStatus_t err;
cudnnStatus_t err;
int pad_h
%(name)
s;
int pad_w
%(name)
s;
if ((err = cudnnCreateConvolutionDescriptor(&
%(desc)
s)) != CUDNN_STATUS_SUCCESS) {
if ((err = cudnnCreateConvolutionDescriptor(&
%(desc)
s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate convolution "
PyErr_Format(PyExc_MemoryError, "could not allocate convolution "
...
@@ -296,42 +302,29 @@ class GpuDnnConvDesc(GpuOp):
...
@@ -296,42 +302,29 @@ class GpuDnnConvDesc(GpuOp):
%(fail)
s
%(fail)
s
}
}
if (
%(bmode)
d == 2) {
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 30
pad_h
%(name)
s =
%(pad_h_spec)
d;
pad_w
%(name)
s =
%(pad_w_spec)
d;
int pad[
%(nb_dim)
d] = {
%(pad_str)
s};
} else if (
%(bmode)
d == 1) {
int subsample[
%(nb_dim)
d] = {
%(subsample_str)
s};
pad_h
%(name)
s = 0;
int upscale[
%(nb_dim)
d] = {
%(upscale_str)
s};
pad_w
%(name)
s = 0;
} else if (
%(bmode)
d == 0) {
// Adjust padding values if using full convolution
pad_h
%(name)
s = *(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 2) - 1;
if (
%(bmode)
d == 0) {
pad_w
%(name)
s = *(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 3) - 1;
pad[0] = *(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 2) - 1;
} else {
pad[1] = *(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 3) - 1;
PyErr_SetString(PyExc_ValueError, "bad border mode");
if (
%(nb_dim)
d >= 3) {
%(fail)
s
pad[2] = *(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 4) - 1;
}
}
}
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 20
err = cudnnSetConvolution
2
dDescriptor(
err = cudnnSetConvolution
N
dDescriptor(
%(desc)
s,
%(desc)
s,
pad_h
%(name)
s,
%(nb_dim)
d,
pad_w
%(name)
s,
pad, subsample, upscale,
%(subsx)
d,
%(subsy)
d, 1, 1,
%(conv_flag)
s
%(conv_flag)
s
);
);
#else
#else
err = cudnnSetConvolutionDescriptorEx(
PyErr_Format(PyExc_RuntimeError, "could not set op descriptor: CUDNN_VERSION must be >= 30");
%(desc)
s,
*(npy_int64 *)PyArray_GETPTR1(
%(img_shape)
s, 0),
*(npy_int64 *)PyArray_GETPTR1(
%(img_shape)
s, 1),
*(npy_int64 *)PyArray_GETPTR1(
%(img_shape)
s, 2),
*(npy_int64 *)PyArray_GETPTR1(
%(img_shape)
s, 3),
*(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 0),
*(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 2),
*(npy_int64 *)PyArray_GETPTR1(
%(kern_shape)
s, 3),
pad_h
%(name)
s,
pad_w
%(name)
s,
%(subsx)
d,
%(subsy)
d, 1, 1,
%(conv_flag)
s
);
#endif
#endif
if (err != CUDNN_STATUS_SUCCESS) {
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not set op descriptor:
%%
s",
PyErr_Format(PyExc_RuntimeError, "could not set op descriptor:
%%
s",
...
@@ -341,21 +334,43 @@ class GpuDnnConvDesc(GpuOp):
...
@@ -341,21 +334,43 @@ class GpuDnnConvDesc(GpuOp):
}
}
"""
%
dict
(
name
=
name
,
img_shape
=
img_shape
,
kern_shape
=
kern_shape
,
desc
=
desc
,
"""
%
dict
(
name
=
name
,
img_shape
=
img_shape
,
kern_shape
=
kern_shape
,
desc
=
desc
,
bmode
=
bmode
,
conv_flag
=
conv_flag
,
fail
=
sub
[
'fail'
],
bmode
=
bmode
,
conv_flag
=
conv_flag
,
fail
=
sub
[
'fail'
],
subsx
=
self
.
subsample
[
0
],
subsy
=
self
.
subsample
[
1
]
,
pad_str
=
pad_str
,
subsample_str
=
subsample_str
,
pad_h_spec
=
pad_h_spec
,
pad_w_spec
=
pad_w_spec
)
upscale_str
=
upscale_str
,
nb_dim
=
nb_dim
)
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
return
(
2
,
version
())
return
(
2
,
version
())
AddConfigVar
(
'dnn.conv.workmem'
,
AddConfigVar
(
'dnn.conv.workmem'
,
"Default value for the workmem attribute of cudnn convolutions."
,
"This flag is deprecated; use dnn.conv.algo_fwd."
,
EnumStr
(
'small'
,
'none'
,
'large'
),
EnumStr
(
''
),
in_c_key
=
False
)
AddConfigVar
(
'dnn.conv.workmem_bwd'
,
"This flag is deprecated; use dnn.conv.algo_bwd."
,
EnumStr
(
''
),
in_c_key
=
False
)
in_c_key
=
False
)
AddConfigVar
(
'dnn.conv.algo_fwd'
,
"Default implementation to use for CuDNN forward convolution."
,
EnumStr
(
'small'
,
'none'
,
'large'
,
'fft'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
),
in_c_key
=
False
)
AddConfigVar
(
'dnn.conv.algo_bwd'
,
"Default implementation to use for CuDNN backward convolution."
,
EnumStr
(
'none'
,
'deterministic'
,
'fft'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
),
in_c_key
=
False
)
# scalar constants
# scalar constants
_zero
=
constant
(
numpy
.
asarray
(
0.0
,
dtype
=
'float32'
))
_zero
=
constant
(
numpy
.
asarray
(
0.0
,
dtype
=
'float32'
))
_one
=
constant
(
numpy
.
asarray
(
1.0
,
dtype
=
'float32'
))
_one
=
constant
(
numpy
.
asarray
(
1.0
,
dtype
=
'float32'
))
_ifour
=
constant
(
numpy
.
asarray
(
4
,
dtype
=
'int32'
))
_ifive
=
constant
(
numpy
.
asarray
(
5
,
dtype
=
'int32'
))
def
ensure_float
(
val
,
default
,
name
):
def
ensure_float
(
val
,
default
,
name
):
...
@@ -372,6 +387,20 @@ def ensure_float(val, default, name):
...
@@ -372,6 +387,20 @@ def ensure_float(val, default, name):
return
val
return
val
def
ensure_int
(
val
,
default
,
name
):
if
val
is
None
:
return
default
.
clone
()
if
not
isinstance
(
val
,
Variable
):
val
=
constant
(
val
)
if
hasattr
(
val
,
'ndim'
)
and
val
.
ndim
==
0
:
val
=
as_scalar
(
val
)
if
not
isinstance
(
val
.
type
,
theano
.
scalar
.
Scalar
):
raise
TypeError
(
"
%
s: expected a scalar value"
%
(
name
,))
if
not
val
.
type
.
dtype
==
'int32'
:
raise
TypeError
(
"
%
s: type is not int32"
%
(
name
,))
return
val
class
GpuDnnConv
(
DnnBase
,
COp
):
class
GpuDnnConv
(
DnnBase
,
COp
):
"""
"""
The forward convolution.
The forward convolution.
...
@@ -380,29 +409,59 @@ class GpuDnnConv(DnnBase, COp):
...
@@ -380,29 +409,59 @@ class GpuDnnConv(DnnBase, COp):
:param kernel:
:param kernel:
:param descr: the convolution descriptor
:param descr: the convolution descriptor
"""
"""
__props__
=
(
'
workmem
'
,
'inplace'
)
__props__
=
(
'
algo
'
,
'inplace'
)
__input_name__
=
(
'image'
,
'kernel'
,
'output'
,
__input_name__
=
(
'image'
,
'kernel'
,
'output'
,
'descriptor'
,
'alpha'
,
'beta'
)
'descriptor'
,
'alpha'
,
'beta'
)
def
__init__
(
self
,
workmem
=
None
,
inplace
=
False
):
def
__init__
(
self
,
workmem
=
None
,
inplace
=
False
,
algo
=
None
):
"""
"""
:param workmem: either 'none', 'small' or 'large'. Default is
:param workmem: *deprecated*, use param algo instead
the value of :attr:`config.dnn.conv.workmem`.
:param algo: either 'small', 'none', 'large', 'fft', 'guess_once',
'guess_on_shape_change', 'time_once' or 'time_on_shape_change'.
Default is the value of :attr:`config.dnn.conv.algo_fwd`.
"""
"""
COp
.
__init__
(
self
,
[
"dnn_base.c"
,
"dnn_conv_base.c"
,
"dnn_fwd.c"
],
COp
.
__init__
(
self
,
[
"dnn_base.c"
,
"dnn_conv_base.c"
,
"dnn_fwd.c"
],
"APPLY_SPECIFIC(conv_fwd)"
)
"APPLY_SPECIFIC(conv_fwd)"
)
if
workmem
is
None
:
workmem
=
config
.
dnn
.
conv
.
workmem
if
workmem
is
not
None
:
self
.
workmem
=
workmem
warnings
.
warn
((
"GpuDnnConv: parameter 'workmem' is deprecated. "
"Use 'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
self
.
algo
=
workmem
else
:
if
algo
is
None
:
algo
=
config
.
dnn
.
conv
.
algo_fwd
self
.
algo
=
algo
self
.
inplace
=
inplace
self
.
inplace
=
inplace
if
self
.
inplace
:
if
self
.
inplace
:
self
.
destroy_map
=
{
0
:
[
2
]}
self
.
destroy_map
=
{
0
:
[
2
]}
assert
self
.
workmem
in
[
'none'
,
'small'
,
'large'
]
# In CuDNN version older than V3, the FFT implementation and the
# option to time the different implementations to get the fastest
# are both unavailable.
if
version
()
<
(
3000
,
3000
):
if
self
.
algo
==
'fft'
:
raise
RuntimeError
(
"CuDNN FFT convolution requires CuDNN v3"
)
elif
self
.
algo
in
[
'guess_once'
,
'guess_on_shape_change'
]:
raise
RuntimeError
(
"CuDNN selection of convolution "
"implementation based on heuristics "
"requires CuDNN v3"
)
elif
self
.
algo
in
[
'time_once'
,
'time_on_shape_change'
]:
raise
RuntimeError
(
"CuDNN convolution timing requires CuDNN "
"v3"
)
assert
self
.
algo
in
[
'none'
,
'small'
,
'large'
,
'fft'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
]
def
__setstate__
(
self
,
d
):
def
__setstate__
(
self
,
d
):
self
.
__dict__
.
update
(
d
)
self
.
__dict__
.
update
(
d
)
if
not
hasattr
(
self
,
'workmem'
):
if
not
hasattr
(
self
,
'algo'
):
self
.
workmem
=
'none'
if
hasattr
(
self
,
'workmem'
):
self
.
algo
=
self
.
workmem
else
:
self
.
algo
=
config
.
dnn
.
conv
.
algo_fwd
if
not
hasattr
(
self
,
'inplace'
):
if
not
hasattr
(
self
,
'inplace'
):
self
.
inplace
=
False
self
.
inplace
=
False
...
@@ -411,17 +470,44 @@ class GpuDnnConv(DnnBase, COp):
...
@@ -411,17 +470,44 @@ class GpuDnnConv(DnnBase, COp):
inpl_def
=
[(
'CONV_INPLACE'
,
'1'
)]
inpl_def
=
[(
'CONV_INPLACE'
,
'1'
)]
else
:
else
:
inpl_def
=
[]
inpl_def
=
[]
choose_alg
=
'0'
choose_alg_once
=
'0'
choose_alg_time
=
'0'
if
version
()
==
-
1
:
if
version
()
==
-
1
:
alg
_def
=
(
'CONV_ALGO'
,
"0"
)
alg
=
"0"
else
:
else
:
if
self
.
workmem
==
'none'
:
if
self
.
algo
==
'none'
:
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM'
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM'
elif
self
.
workmem
==
'small'
:
elif
self
.
algo
==
'small'
:
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM'
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM'
elif
self
.
workmem
==
'large'
:
elif
self
.
algo
==
'large'
:
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_GEMM'
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_GEMM'
elif
self
.
algo
==
'fft'
:
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_FFT'
elif
self
.
algo
in
[
'guess_once'
,
'guess_on_shape_change'
]:
# The convolution implementation should be choosen according
# to a heuristic
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM'
choose_alg
=
'1'
if
self
.
algo
==
'guess_once'
:
choose_alg_once
=
'1'
elif
self
.
algo
in
[
'time_once'
,
'time_on_shape_change'
]:
# The convolution implementation should be choosen by timing
# every available implementation
alg
=
'CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM'
choose_alg
=
'1'
choose_alg_time
=
'1'
if
self
.
algo
==
'time_once'
:
choose_alg_once
=
'1'
alg_def
=
(
'CONV_ALGO'
,
alg
)
alg_def
=
(
'CONV_ALGO'
,
alg
)
return
[
alg_def
]
+
inpl_def
alg_choose_def
=
(
'CHOOSE_ALGO'
,
choose_alg
)
alg_choose_once_def
=
(
'CHOOSE_ALGO_ONCE'
,
choose_alg_once
)
alg_choose_time_def
=
(
'CHOOSE_ALGO_TIME'
,
choose_alg_time
)
return
[
alg_def
,
alg_choose_def
,
alg_choose_once_def
,
alg_choose_time_def
]
+
inpl_def
def
make_node
(
self
,
img
,
kern
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
def
make_node
(
self
,
img
,
kern
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
img
=
as_cuda_ndarray_variable
(
img
)
img
=
as_cuda_ndarray_variable
(
img
)
...
@@ -450,8 +536,10 @@ class GpuDnnConv(DnnBase, COp):
...
@@ -450,8 +536,10 @@ class GpuDnnConv(DnnBase, COp):
top
=
gpu_contiguous
(
top
)
top
=
gpu_contiguous
(
top
)
d_img
=
GpuDnnConvGradI
()(
kerns
,
top
,
gpu_alloc_empty
(
*
img
.
shape
),
desc
)
d_img
=
GpuDnnConvGradI
()(
kerns
,
top
,
gpu_alloc_empty
(
*
img
.
shape
),
d_kerns
=
GpuDnnConvGradW
()(
img
,
top
,
gpu_alloc_empty
(
*
kerns
.
shape
),
desc
)
desc
)
d_kerns
=
GpuDnnConvGradW
()(
img
,
top
,
gpu_alloc_empty
(
*
kerns
.
shape
),
desc
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
...
@@ -497,6 +585,108 @@ class GpuDnnConv(DnnBase, COp):
...
@@ -497,6 +585,108 @@ class GpuDnnConv(DnnBase, COp):
return
[
shape
[
2
]]
return
[
shape
[
2
]]
class
GpuDnnConv3d
(
GpuDnnConv
):
"""
The forward convolution.
:param image:
:param kernel:
:param descr: the convolution descriptor
"""
__props__
=
(
'algo'
,
'inplace'
)
__input_name__
=
(
'image'
,
'kernel'
,
'output'
,
'descriptor'
,
'alpha'
,
'beta'
)
def
__init__
(
self
,
workmem
=
None
,
inplace
=
False
,
algo
=
None
):
"""
:param workmem: *deprecated*, use param algo instead
:param algo: either 'none', 'guess_once', 'guess_on_shape_change',
'time_once' or 'time_on_shape_change'.
Default is the value of :attr:`config.dnn.conv.algo_fwd.
"""
if
workmem
is
not
None
:
warnings
.
warn
((
"GpuDnnConv3d: parameter 'workmem' is deprecated. "
"Use 'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
algo
=
workmem
super
(
GpuDnnConv3d
,
self
)
.
__init__
(
inplace
=
inplace
,
algo
=
'none'
)
assert
self
.
algo
in
[
'none'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
]
def
make_node
(
self
,
img
,
kern
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
img
=
as_cuda_ndarray_variable
(
img
)
kern
=
as_cuda_ndarray_variable
(
kern
)
output
=
as_cuda_ndarray_variable
(
output
)
if
img
.
type
.
ndim
!=
5
:
raise
TypeError
(
'img must be 5D tensor'
)
if
kern
.
type
.
ndim
!=
5
:
raise
TypeError
(
'kern must be 5D tensor'
)
if
output
.
type
.
ndim
!=
5
:
raise
TypeError
(
'output must be a 5D tensor'
)
if
not
isinstance
(
desc
.
type
,
CDataType
)
\
or
desc
.
type
.
ctype
!=
'cudnnConvolutionDescriptor_t'
:
raise
TypeError
(
'desc must be cudnnConvolutionDescriptor_t'
)
alpha
=
ensure_float
(
alpha
,
_one
,
'alpha'
)
beta
=
ensure_float
(
beta
,
_zero
,
'beta'
)
return
Apply
(
self
,
[
img
,
kern
,
output
,
desc
,
alpha
,
beta
],
[
output
.
type
()])
def
grad
(
self
,
inp
,
grads
):
img
,
kerns
,
output
,
desc
,
alpha
,
beta
=
inp
top
,
=
grads
top
=
gpu_contiguous
(
top
)
d_img
=
GpuDnnConv3dGradI
()(
kerns
,
top
,
gpu_alloc_empty
(
*
img
.
shape
),
desc
)
d_kerns
=
GpuDnnConv3dGradW
()(
img
,
top
,
gpu_alloc_empty
(
*
kerns
.
shape
),
desc
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
return
[
d_img
*
alpha
,
d_kerns
*
alpha
,
top
*
beta
,
DisconnectedType
()(),
d_alpha
,
d_beta
]
@staticmethod
def
get_out_shape
(
ishape
,
kshape
,
border_mode
,
subsample
):
"""
This function computes the output shape for a convolution with
the specified parameters. `ishape` and `kshape` can be symbolic
or scalar.
"""
b
=
ishape
[
0
]
# Number of inputs
d
=
ishape
[
2
]
# Depth of input feature maps
h
=
ishape
[
3
]
# Height of input feature maps
w
=
ishape
[
4
]
# Width of input feature maps
nb
=
kshape
[
0
]
# Number of output feature maps
kd
=
kshape
[
2
]
# Depth of each filter
kh
=
kshape
[
3
]
# Height of each filter
kw
=
kshape
[
4
]
# Width of each filter
sd
,
sh
,
sw
=
subsample
if
border_mode
==
'full'
:
padd
=
kd
-
1
padh
=
kh
-
1
padw
=
kw
-
1
elif
isinstance
(
border_mode
,
tuple
):
padd
,
padh
,
padw
=
border_mode
else
:
assert
border_mode
==
'valid'
padd
=
0
padh
=
0
padw
=
0
return
(
b
,
nb
,
(
d
+
2
*
padd
-
kd
)
//
sd
+
1
,
(
h
+
2
*
padh
-
kh
)
//
sh
+
1
,
(
w
+
2
*
padw
-
kw
)
//
sw
+
1
)
class
GpuDnnConvGradW
(
DnnBase
,
COp
):
class
GpuDnnConvGradW
(
DnnBase
,
COp
):
"""
"""
The convolution gradient with respect to the weights.
The convolution gradient with respect to the weights.
...
@@ -506,18 +696,43 @@ class GpuDnnConvGradW(DnnBase, COp):
...
@@ -506,18 +696,43 @@ class GpuDnnConvGradW(DnnBase, COp):
:param descr: the convolution descriptor
:param descr: the convolution descriptor
"""
"""
__props__
=
(
'inplace'
,)
__props__
=
(
'
algo'
,
'
inplace'
,)
__input_name__
=
(
'image'
,
'grad'
,
'output'
,
'descriptor'
,
'alpha'
,
'beta'
)
__input_name__
=
(
'image'
,
'grad'
,
'output'
,
'descriptor'
,
'alpha'
,
'beta'
)
def
__init__
(
self
,
inplace
=
False
):
def
__init__
(
self
,
inplace
=
False
,
workmem
=
None
,
algo
=
None
):
"""
:param workmem: *deprecated*, use param algo instead
:param algo: either 'none', 'deterministic', 'fft', 'guess_once',
'guess_on_shape_change', 'time_once' or 'time_on_shape_change'.
Default is the value of :attr:`config.dnn.conv.algo_bwd`.
"""
COp
.
__init__
(
self
,
[
"dnn_base.c"
,
"dnn_conv_base.c"
,
"dnn_gw.c"
],
COp
.
__init__
(
self
,
[
"dnn_base.c"
,
"dnn_conv_base.c"
,
"dnn_gw.c"
],
"APPLY_SPECIFIC(conv_gw)"
)
"APPLY_SPECIFIC(conv_gw)"
)
if
workmem
is
not
None
:
warnings
.
warn
((
"GpuDnnConvGradW: parameter 'workmem' is "
"deprecated. Use 'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
self
.
algo
=
workmem
else
:
if
algo
is
None
:
algo
=
config
.
dnn
.
conv
.
algo_bwd
self
.
algo
=
algo
self
.
inplace
=
inplace
self
.
inplace
=
inplace
if
self
.
inplace
:
if
self
.
inplace
:
self
.
destroy_map
=
{
0
:
[
2
]}
self
.
destroy_map
=
{
0
:
[
2
]}
assert
self
.
algo
in
[
'none'
,
'deterministic'
,
'fft'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
]
def
__setstate__
(
self
,
d
):
def
__setstate__
(
self
,
d
):
self
.
__dict__
.
update
(
d
)
self
.
__dict__
.
update
(
d
)
if
not
hasattr
(
self
,
'algo'
):
if
hasattr
(
self
,
'workmem'
):
self
.
algo
=
self
.
workmem
else
:
self
.
algo
=
config
.
dnn
.
conv
.
algo_bwd
if
not
hasattr
(
self
,
'inplace'
):
if
not
hasattr
(
self
,
'inplace'
):
self
.
inplace
=
False
self
.
inplace
=
False
...
@@ -527,7 +742,8 @@ class GpuDnnConvGradW(DnnBase, COp):
...
@@ -527,7 +742,8 @@ class GpuDnnConvGradW(DnnBase, COp):
kerns
=
gpu_contiguous
(
kerns
)
kerns
=
gpu_contiguous
(
kerns
)
d_img
=
GpuDnnConvGradI
()(
kerns
,
top
,
gpu_alloc_empty
(
*
img
.
shape
),
desc
)
d_img
=
GpuDnnConvGradI
()(
kerns
,
top
,
gpu_alloc_empty
(
*
img
.
shape
),
desc
)
d_top
=
GpuDnnConv
()(
img
,
kerns
,
gpu_alloc_empty
(
*
top
.
shape
),
desc
)
d_top
=
GpuDnnConv
()(
img
,
kerns
,
gpu_alloc_empty
(
*
top
.
shape
),
desc
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
...
@@ -541,9 +757,46 @@ class GpuDnnConvGradW(DnnBase, COp):
...
@@ -541,9 +757,46 @@ class GpuDnnConvGradW(DnnBase, COp):
def
get_op_params
(
self
):
def
get_op_params
(
self
):
if
self
.
inplace
:
if
self
.
inplace
:
return
[(
'CONV_INPLACE'
,
'1'
)]
inplace_def
=
[(
'CONV_INPLACE'
,
'1'
)]
else
:
else
:
return
[]
inplace_def
=
[]
choose_alg
=
'0'
choose_alg_once
=
'0'
choose_alg_time
=
'0'
if
version
()
==
-
1
or
version
()
<
(
3000
,
3000
):
alg
=
"0"
else
:
if
self
.
algo
==
'none'
:
alg
=
'CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0'
elif
self
.
algo
==
'deterministic'
:
alg
=
'CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1'
elif
self
.
algo
==
'fft'
:
alg
=
'CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT'
elif
self
.
algo
in
[
'guess_once'
,
'guess_on_shape_change'
]:
# The convolution implementation should be chosen according
# to a heuristic
alg
=
'CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0'
choose_alg
=
'1'
if
self
.
algo
==
'guess_once'
:
choose_alg_once
=
'1'
elif
self
.
algo
in
[
'time_once'
,
'guess_on_shape_change'
]:
# The convolution implementation should be chosen according
# to timing
alg
=
'CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0'
choose_alg
=
'1'
choose_alg_time
=
'1'
if
self
.
algo
==
'time_once'
:
choose_alg_once
=
'1'
alg_def
=
(
'CONV_ALGO'
,
alg
)
alg_choose_def
=
(
'CHOOSE_ALGO'
,
choose_alg
)
alg_choose_once_def
=
(
'CHOOSE_ALGO_ONCE'
,
choose_alg_once
)
alg_choose_time_def
=
(
'CHOOSE_ALGO_TIME'
,
choose_alg_time
)
return
inplace_def
+
[
alg_def
,
alg_choose_def
,
alg_choose_once_def
,
alg_choose_time_def
]
def
make_node
(
self
,
img
,
topgrad
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
def
make_node
(
self
,
img
,
topgrad
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
img
=
as_cuda_ndarray_variable
(
img
)
img
=
as_cuda_ndarray_variable
(
img
)
...
@@ -570,6 +823,73 @@ class GpuDnnConvGradW(DnnBase, COp):
...
@@ -570,6 +823,73 @@ class GpuDnnConvGradW(DnnBase, COp):
return
[
shape
[
2
]]
return
[
shape
[
2
]]
class
GpuDnnConv3dGradW
(
GpuDnnConvGradW
):
"""
The convolution gradient with respect to the weights.
:param image:
:param kernel:
:param descr: the convolution descriptor
"""
__props__
=
(
'algo'
,
'inplace'
,)
__input_name__
=
(
'image'
,
'grad'
,
'output'
,
'descriptor'
,
'alpha'
,
'beta'
)
def
__init__
(
self
,
inplace
=
False
,
workmem
=
None
,
algo
=
None
):
"""
:param workmem: *deprecated*, use param algo instead
:param algo: either 'none', 'guess_once', 'guess_on_shape_change',
'time_once' or 'time_on_shape_change'.
Default is the value of :attr:`config.dnn.conv.algo_bwd.
"""
if
workmem
is
not
None
:
warnings
.
warn
((
"GpuDnnConv3dGradW: parameter 'workmem' is "
"deprecated. Use 'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
algo
=
workmem
super
(
GpuDnnConv3dGradW
,
self
)
.
__init__
(
inplace
=
inplace
,
algo
=
'none'
)
assert
self
.
algo
in
[
'none'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
]
def
grad
(
self
,
inp
,
grads
):
img
,
top
,
output
,
desc
,
alpha
,
beta
=
inp
kerns
,
=
grads
kerns
=
gpu_contiguous
(
kerns
)
d_img
=
GpuDnnConv3dGradI
()(
kerns
,
top
,
gpu_alloc_empty
(
*
img
.
shape
),
desc
)
d_top
=
GpuDnnConv3d
()(
img
,
kerns
,
gpu_alloc_empty
(
*
top
.
shape
),
desc
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
return
(
d_img
*
alpha
,
d_top
*
alpha
,
kerns
*
beta
,
DisconnectedType
()(),
d_alpha
,
d_beta
)
def
make_node
(
self
,
img
,
topgrad
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
img
=
as_cuda_ndarray_variable
(
img
)
topgrad
=
as_cuda_ndarray_variable
(
topgrad
)
output
=
as_cuda_ndarray_variable
(
output
)
if
img
.
type
.
ndim
!=
5
:
raise
TypeError
(
'img must be 5D tensor'
)
if
topgrad
.
type
.
ndim
!=
5
:
raise
TypeError
(
'topgrad must be 5D tensor'
)
if
output
.
type
.
ndim
!=
5
:
raise
TypeError
(
'output must be 5D tensor'
)
if
not
isinstance
(
desc
.
type
,
CDataType
)
\
or
desc
.
type
.
ctype
!=
'cudnnConvolutionDescriptor_t'
:
raise
TypeError
(
'desc must be cudnnConvolutionDescriptor_t'
)
alpha
=
ensure_float
(
alpha
,
_one
,
'alpha'
)
beta
=
ensure_float
(
beta
,
_zero
,
'beta'
)
return
Apply
(
self
,
[
img
,
topgrad
,
output
,
desc
,
alpha
,
beta
],
[
output
.
type
()])
class
GpuDnnConvGradI
(
DnnBase
,
COp
):
class
GpuDnnConvGradI
(
DnnBase
,
COp
):
"""
"""
The convolution gradient with respect to the inputs.
The convolution gradient with respect to the inputs.
...
@@ -579,16 +899,46 @@ class GpuDnnConvGradI(DnnBase, COp):
...
@@ -579,16 +899,46 @@ class GpuDnnConvGradI(DnnBase, COp):
:param descr: the convolution descriptor
:param descr: the convolution descriptor
"""
"""
__props__
=
(
'inplace'
,)
__props__
=
(
'
algo'
,
'
inplace'
,)
__input_name__
=
(
'kernel'
,
'grad'
,
'output'
,
__input_name__
=
(
'kernel'
,
'grad'
,
'output'
,
'descriptor'
,
'alpha'
,
'
descriptor'
,
'alpha'
,
'
beta'
)
'beta'
)
def
__init__
(
self
,
inplace
=
False
):
def
__init__
(
self
,
inplace
=
False
,
workmem
=
None
,
algo
=
None
):
"""
:param workmem: *deprecated*, use param algo instead
:param algo: either 'none', 'deterministic', 'fft', 'guess_once',
'guess_on_shape_change', 'time_once' or 'time_on_shape_change'.
Default is the value of :attr:`config.dnn.conv.algo_bwd`.
"""
COp
.
__init__
(
self
,
[
"dnn_base.c"
,
"dnn_conv_base.c"
,
"dnn_gi.c"
],
COp
.
__init__
(
self
,
[
"dnn_base.c"
,
"dnn_conv_base.c"
,
"dnn_gi.c"
],
"APPLY_SPECIFIC(conv_gi)"
)
"APPLY_SPECIFIC(conv_gi)"
)
if
workmem
is
not
None
:
warnings
.
warn
((
"GpuDnnConvGradI: parameter 'workmem' is "
"deprecated. Use 'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
self
.
algo
=
workmem
else
:
if
algo
is
None
:
algo
=
config
.
dnn
.
conv
.
algo_bwd
self
.
algo
=
algo
self
.
inplace
=
inplace
self
.
inplace
=
inplace
if
self
.
inplace
:
if
self
.
inplace
:
self
.
destroy_map
=
{
0
:
[
2
]}
self
.
destroy_map
=
{
0
:
[
2
]}
assert
self
.
algo
in
[
'none'
,
'deterministic'
,
'fft'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
]
def
__setstate__
(
self
,
d
):
self
.
__dict__
.
update
(
d
)
if
not
hasattr
(
self
,
'algo'
):
if
hasattr
(
self
,
'workmem'
):
self
.
algo
=
self
.
workmem
else
:
self
.
algo
=
config
.
dnn
.
conv
.
algo_bwd
if
not
hasattr
(
self
,
'inplace'
):
self
.
inplace
=
False
def
grad
(
self
,
inp
,
grads
):
def
grad
(
self
,
inp
,
grads
):
kerns
,
top
,
output
,
desc
,
alpha
,
beta
=
inp
kerns
,
top
,
output
,
desc
,
alpha
,
beta
=
inp
...
@@ -596,7 +946,8 @@ class GpuDnnConvGradI(DnnBase, COp):
...
@@ -596,7 +946,8 @@ class GpuDnnConvGradI(DnnBase, COp):
img
=
gpu_contiguous
(
img
)
img
=
gpu_contiguous
(
img
)
d_kerns
=
GpuDnnConvGradW
()(
img
,
top
,
gpu_alloc_empty
(
*
kerns
.
shape
),
desc
)
d_kerns
=
GpuDnnConvGradW
()(
img
,
top
,
gpu_alloc_empty
(
*
kerns
.
shape
),
desc
)
d_top
=
GpuDnnConv
()(
img
,
kerns
,
gpu_alloc_empty
(
*
top
.
shape
),
desc
)
d_top
=
GpuDnnConv
()(
img
,
kerns
,
gpu_alloc_empty
(
*
top
.
shape
),
desc
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
...
@@ -610,9 +961,46 @@ class GpuDnnConvGradI(DnnBase, COp):
...
@@ -610,9 +961,46 @@ class GpuDnnConvGradI(DnnBase, COp):
def
get_op_params
(
self
):
def
get_op_params
(
self
):
if
self
.
inplace
:
if
self
.
inplace
:
return
[(
'CONV_INPLACE'
,
'1'
)]
inplace_def
=
[(
'CONV_INPLACE'
,
'1'
)]
else
:
inplace_def
=
[]
choose_alg
=
'0'
choose_alg_once
=
'0'
choose_alg_time
=
'0'
if
version
()
==
-
1
or
version
()
<
(
3000
,
3000
):
alg
=
"0"
else
:
else
:
return
[]
if
self
.
algo
==
'none'
:
alg
=
'CUDNN_CONVOLUTION_BWD_DATA_ALGO_0'
elif
self
.
algo
==
'deterministic'
:
alg
=
'CUDNN_CONVOLUTION_BWD_DATA_ALGO_1'
elif
self
.
algo
==
'fft'
:
alg
=
'CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT'
elif
self
.
algo
in
[
'guess_once'
,
'guess_on_shape_change'
]:
# The convolution implementation should be chosen according
# to a heuristic
alg
=
'CUDNN_CONVOLUTION_BWD_DATA_ALGO_0'
choose_alg
=
'1'
if
self
.
algo
==
'guess_once'
:
choose_alg_once
=
'1'
elif
self
.
algo
in
[
'time_once'
,
'guess_on_shape_change'
]:
# The convolution implementation should be chosen according
# to timing
alg
=
'CUDNN_CONVOLUTION_BWD_DATA_ALGO_0'
choose_alg
=
'1'
choose_alg_time
=
'1'
if
self
.
algo
==
'time_once'
:
choose_alg_once
=
'1'
alg_def
=
(
'CONV_ALGO'
,
alg
)
alg_choose_def
=
(
'CHOOSE_ALGO'
,
choose_alg
)
alg_choose_once_def
=
(
'CHOOSE_ALGO_ONCE'
,
choose_alg_once
)
alg_choose_time_def
=
(
'CHOOSE_ALGO_TIME'
,
choose_alg_time
)
return
inplace_def
+
[
alg_def
,
alg_choose_def
,
alg_choose_once_def
,
alg_choose_time_def
]
def
make_node
(
self
,
kern
,
topgrad
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
def
make_node
(
self
,
kern
,
topgrad
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
kern
=
as_cuda_ndarray_variable
(
kern
)
kern
=
as_cuda_ndarray_variable
(
kern
)
...
@@ -639,8 +1027,76 @@ class GpuDnnConvGradI(DnnBase, COp):
...
@@ -639,8 +1027,76 @@ class GpuDnnConvGradI(DnnBase, COp):
return
[
shape
[
2
]]
return
[
shape
[
2
]]
class
GpuDnnConv3dGradI
(
GpuDnnConvGradI
):
"""
The convolution gradient with respect to the inputs.
:param image:
:param kernel:
:param descr: the convolution descriptor
"""
__props__
=
(
'algo'
,
'inplace'
,)
__input_name__
=
(
'kernel'
,
'grad'
,
'output'
,
'descriptor'
,
'alpha'
,
'beta'
)
def
__init__
(
self
,
inplace
=
False
,
workmem
=
None
,
algo
=
None
):
"""
:param workmem: *deprecated*, use param algo instead
:param algo: either 'none', 'guess_once', 'guess_on_shape_change',
'time_once' or 'time_on_shape_change'.
Default is the value of :attr:`config.dnn.conv.algo_bwd.
"""
if
workmem
is
not
None
:
warnings
.
warn
((
"GpuDnnConv3dGradI: parameter 'workmem' is "
"deprecated. Use 'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
algo
=
workmem
super
(
GpuDnnConv3dGradI
,
self
)
.
__init__
(
inplace
=
inplace
,
algo
=
"none"
)
assert
self
.
algo
in
[
'none'
,
'guess_once'
,
'guess_on_shape_change'
,
'time_once'
,
'time_on_shape_change'
]
def
grad
(
self
,
inp
,
grads
):
kerns
,
top
,
output
,
desc
,
alpha
,
beta
=
inp
img
,
=
grads
img
=
gpu_contiguous
(
img
)
d_kerns
=
GpuDnnConv3dGradW
()(
img
,
top
,
gpu_alloc_empty
(
*
kerns
.
shape
),
desc
)
d_top
=
GpuDnnConv3d
()(
img
,
kerns
,
gpu_alloc_empty
(
*
top
.
shape
),
desc
)
d_alpha
=
grad_not_implemented
(
self
,
4
,
alpha
)
d_beta
=
grad_not_implemented
(
self
,
5
,
beta
)
return
(
d_kerns
*
alpha
,
d_top
*
alpha
,
img
*
beta
,
DisconnectedType
()(),
d_alpha
,
d_beta
)
def
make_node
(
self
,
kern
,
topgrad
,
output
,
desc
,
alpha
=
None
,
beta
=
None
):
kern
=
as_cuda_ndarray_variable
(
kern
)
topgrad
=
as_cuda_ndarray_variable
(
topgrad
)
output
=
as_cuda_ndarray_variable
(
output
)
if
kern
.
type
.
ndim
!=
5
:
raise
TypeError
(
'kern must be 5D tensor'
)
if
topgrad
.
type
.
ndim
!=
5
:
raise
TypeError
(
'topgrad must be 5D tensor'
)
if
output
.
type
.
ndim
!=
5
:
raise
TypeError
(
'output must be 5D tensor'
)
if
not
isinstance
(
desc
.
type
,
CDataType
)
\
or
desc
.
type
.
ctype
!=
'cudnnConvolutionDescriptor_t'
:
raise
TypeError
(
'desc must be cudnnConvolutionDescriptor_t'
)
alpha
=
ensure_float
(
alpha
,
_one
,
'alpha'
)
beta
=
ensure_float
(
beta
,
_zero
,
'beta'
)
return
Apply
(
self
,
[
kern
,
topgrad
,
output
,
desc
,
alpha
,
beta
],
[
output
.
type
()])
def
dnn_conv
(
img
,
kerns
,
border_mode
=
'valid'
,
subsample
=
(
1
,
1
),
def
dnn_conv
(
img
,
kerns
,
border_mode
=
'valid'
,
subsample
=
(
1
,
1
),
conv_mode
=
'conv'
,
direction_hint
=
None
,
workmem
=
None
):
conv_mode
=
'conv'
,
direction_hint
=
None
,
workmem
=
None
,
algo
=
None
):
"""
"""
GPU convolution using cuDNN from NVIDIA.
GPU convolution using cuDNN from NVIDIA.
...
@@ -662,16 +1118,28 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
...
@@ -662,16 +1118,28 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
*not* 'forward!', it will use GpuDnnConvGradI.
*not* 'forward!', it will use GpuDnnConvGradI.
This parameter is used internally by graph optimizers and may be
This parameter is used internally by graph optimizers and may be
removed at any time without a deprecation period. You have been warned.
removed at any time without a deprecation period. You have been warned.
:param workmem: Specify the amount of working memory allowed.
:param workmem: *deprecated*, use param algo instead
More memory is usually faster. One of 'none', 'small' or
:param algo: convolution implementation to use. One of 'none', 'small',
'large'. (default is None which takes its value from
'large', 'fft', 'guess_once', 'guess_on_shape_change', 'time_once' or
:attr:`config.dnn.conv.workmem`)
'time_on_shape_change'. Some of these values may require certain
versions of CuDNN to be installed. Default is the value of
:attr:`config.dnn.conv.algo_fwd.
:warning: The cuDNN library only works with GPU that have a compute
:warning: The cuDNN library only works with GPU that have a compute
capability of 3.0 or higer. This means that older GPU will not
capability of 3.0 or higer. This means that older GPU will not
work with this Op.
work with this Op.
"""
"""
# Check if deprecated param 'workmem' is used
if
workmem
is
not
None
:
warnings
.
warn
((
"dnn_conv: parameter 'workmem' is deprecated. Use "
"'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
algo
=
workmem
# Ensure the value of direction_hint is supported
assert
direction_hint
in
[
None
,
'bprop weights'
,
'forward'
]
fgraph
=
getattr
(
img
,
'fgraph'
,
None
)
or
getattr
(
kerns
,
'fgraph'
,
None
)
fgraph
=
getattr
(
img
,
'fgraph'
,
None
)
or
getattr
(
kerns
,
'fgraph'
,
None
)
if
(
border_mode
==
'valid'
and
subsample
==
(
1
,
1
)
and
if
(
border_mode
==
'valid'
and
subsample
==
(
1
,
1
)
and
direction_hint
==
'bprop weights'
):
direction_hint
==
'bprop weights'
):
...
@@ -693,10 +1161,11 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
...
@@ -693,10 +1161,11 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
return
as_cuda_ndarray_variable
(
conv
.
dimshuffle
(
1
,
0
,
2
,
3
))
return
as_cuda_ndarray_variable
(
conv
.
dimshuffle
(
1
,
0
,
2
,
3
))
elif
(
border_mode
==
'full'
and
subsample
==
(
1
,
1
)
and
elif
(
border_mode
==
'full'
and
subsample
==
(
1
,
1
)
and
direction_hint
!=
'forward!'
):
direction_hint
!=
'forward!'
and
version
()
==
-
1
):
# Special case: We can be faster by using GpuDnnConvGradI to compute
# Special case: In CuDNN v1, we can be faster by using GpuDnnConvGradI
# the full convolution as the backward pass of a valid convolution.
# to compute the full convolution as the backward pass of a valid
# We just need to set up a suitable 'fake' valid convolution.
# convolution. We just need to set up a suitable 'fake' valid
# convolution.
img
=
gpu_contiguous
(
img
)
# cudnn v1 and v2 rc3 need contiguous data
img
=
gpu_contiguous
(
img
)
# cudnn v1 and v2 rc3 need contiguous data
kerns
=
gpu_contiguous
(
kerns
.
dimshuffle
(
1
,
0
,
2
,
3
))
kerns
=
gpu_contiguous
(
kerns
.
dimshuffle
(
1
,
0
,
2
,
3
))
conv_mode
=
'cross'
if
conv_mode
==
'conv'
else
'conv'
conv_mode
=
'cross'
if
conv_mode
==
'conv'
else
'conv'
...
@@ -720,7 +1189,87 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
...
@@ -720,7 +1189,87 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
desc_op
.
border_mode
,
desc_op
.
border_mode
,
desc_op
.
subsample
)
desc_op
.
subsample
)
out
=
gpu_alloc_empty
(
*
out_shp
)
out
=
gpu_alloc_empty
(
*
out_shp
)
return
GpuDnnConv
(
workmem
=
workmem
)(
img
,
kerns
,
out
,
desc
)
return
GpuDnnConv
(
algo
=
algo
)(
img
,
kerns
,
out
,
desc
)
def
dnn_conv3d
(
img
,
kerns
,
border_mode
=
'valid'
,
subsample
=
(
1
,
1
,
1
),
conv_mode
=
'conv'
,
direction_hint
=
None
,
workmem
=
None
,
algo
=
'none'
):
"""
GPU convolution using cuDNN from NVIDIA.
The memory layout to use is 'bct01', that is 'batch', 'channel',
'first dim', 'second dim', 'third dim' in that order.
:param img: images to do the convolution over
:param kerns: convolution filters
:param border_mode: one of 'valid', 'full'; additionally, the padding size
could be directly specified by an integer or a pair of integers
:param subsample: perform subsampling of the output (default: (1, 1, 1))
:param conv_mode: perform convolution (kernels flipped) or
cross-correlation. One of 'conv', 'cross'. (default: 'conv')
:param direction_hint: Used by graph optimizers to change algorithm choice.
By default, GpuDnnConv will be used to carry out the convolution.
If border_mode is 'valid', subsample is (1,1,1) and direction_hint is
'bprop weights', it will use GpuDnnConvGradW.
This parameter is used internally by graph optimizers and may be
removed at any time without a deprecation period. You have been warned.
:param workmem: *deprecated*, use param algo instead
:param algo: convolution implementation to use. Only 'none' is implemented
for the conv3d. Default is the value of
:attr:`config.dnn.conv.algo_fwd.
:warning: The cuDNN library only works with GPU that have a compute
capability of 3.0 or higer. This means that older GPU will not
work with this Op.
:warning: dnn_conv3d only works with cuDNN library 3.0
"""
# Check if deprecated param 'workmem' is used
if
workmem
is
not
None
:
warnings
.
warn
((
"dnn_conv3d: parameter 'workmem' is deprecated. Use "
"'algo' instead."
),
stacklevel
=
3
)
assert
algo
is
None
algo
=
workmem
# Ensure the value of direction_hint is supported
assert
direction_hint
in
[
None
,
'bprop weights'
,
'forward'
]
fgraph
=
getattr
(
img
,
'fgraph'
,
None
)
or
getattr
(
kerns
,
'fgraph'
,
None
)
if
(
border_mode
==
'valid'
and
subsample
==
(
1
,
1
,
1
)
and
direction_hint
==
'bprop weights'
):
# Special case: We are asked to use GpuDnnConvGradW. We need to set
# up a suitable 'fake' convolution to compute the gradient for.
img
=
gpu_contiguous
(
img
.
dimshuffle
(
1
,
0
,
2
,
3
,
4
))
if
conv_mode
==
'conv'
:
# We need to flip manually. These 'kerns' are not the kernels
# that would be flipped by conv_mode='conv' in GpuDnnConvGradW.
kerns
=
kerns
[:,
:,
::
-
1
,
::
-
1
,
::
-
1
]
kerns
=
gpu_contiguous
(
kerns
.
dimshuffle
(
1
,
0
,
2
,
3
,
4
))
shape2
=
shape_i
(
img
,
2
,
fgraph
)
-
shape_i
(
kerns
,
2
,
fgraph
)
+
1
shape3
=
shape_i
(
img
,
3
,
fgraph
)
-
shape_i
(
kerns
,
3
,
fgraph
)
+
1
shape4
=
shape_i
(
img
,
4
,
fgraph
)
-
shape_i
(
kerns
,
4
,
fgraph
)
+
1
out
=
gpu_alloc_empty
(
shape_i
(
kerns
,
1
,
fgraph
),
shape_i
(
img
,
1
,
fgraph
),
shape2
,
shape3
,
shape4
)
desc
=
GpuDnnConvDesc
(
border_mode
=
'valid'
,
subsample
=
(
1
,
1
,
1
),
conv_mode
=
'cross'
)(
img
.
shape
,
out
.
shape
)
conv
=
GpuDnnConv3dGradW
()(
img
,
kerns
,
out
,
desc
)
return
as_cuda_ndarray_variable
(
conv
.
dimshuffle
(
1
,
0
,
2
,
3
,
4
))
# Standard case: We use GpuDnnConv with suitable padding.
# contig_version will return a gpu_contiguous copy
# if the img contains negative strides
img
=
gpu_contiguous
(
img
)
kerns
=
gpu_contiguous
(
kerns
)
desc
=
GpuDnnConvDesc
(
border_mode
=
border_mode
,
subsample
=
subsample
,
conv_mode
=
conv_mode
)(
img
.
shape
,
kerns
.
shape
)
desc_op
=
desc
.
owner
.
op
out_shp
=
GpuDnnConv3d
.
get_out_shape
(
img
.
shape
,
kerns
.
shape
,
desc_op
.
border_mode
,
desc_op
.
subsample
)
out
=
gpu_alloc_empty
(
*
out_shp
)
return
GpuDnnConv3d
(
algo
=
algo
)(
img
,
kerns
,
out
,
desc
)
class
GpuDnnPoolDesc
(
GpuOp
):
class
GpuDnnPoolDesc
(
GpuOp
):
...
@@ -758,14 +1307,20 @@ class GpuDnnPoolDesc(GpuOp):
...
@@ -758,14 +1307,20 @@ class GpuDnnPoolDesc(GpuOp):
mode
=
'average_inc_pad'
mode
=
'average_inc_pad'
assert
mode
in
(
'max'
,
'average_inc_pad'
,
'average_exc_pad'
)
assert
mode
in
(
'max'
,
'average_inc_pad'
,
'average_exc_pad'
)
self
.
mode
=
mode
self
.
mode
=
mode
assert
len
(
ws
)
==
2
assert
len
(
ws
)
==
len
(
stride
)
and
len
(
stride
)
==
len
(
pad
)
assert
len
(
ws
)
in
(
2
,
3
)
self
.
ws
=
ws
self
.
ws
=
ws
assert
len
(
stride
)
==
2
self
.
stride
=
stride
self
.
stride
=
stride
assert
len
(
stride
)
==
2
self
.
pad
=
pad
self
.
pad
=
pad
if
(
pad
[
0
]
!=
0
or
pad
[
1
]
!=
0
)
and
version
()
==
-
1
:
if
(
pad
[
0
]
!=
0
or
pad
[
1
]
!=
0
)
and
version
()
==
-
1
:
raise
RuntimeError
(
"CuDNN pooling with padding requires CuDNN v2"
)
raise
RuntimeError
(
"CuDNN pooling with padding requires CuDNN v2"
)
if
self
.
get_ndim
()
==
3
and
version
()
<
(
3000
,
3000
):
raise
RuntimeError
(
"CuDNN 3d pooling requires CuDNN v3"
)
def
get_ndim
(
self
):
return
len
(
self
.
ws
)
def
__setstate__
(
self
,
d
):
def
__setstate__
(
self
,
d
):
self
.
__dict__
.
update
(
d
)
self
.
__dict__
.
update
(
d
)
...
@@ -803,22 +1358,14 @@ class GpuDnnPoolDesc(GpuOp):
...
@@ -803,22 +1358,14 @@ class GpuDnnPoolDesc(GpuOp):
"descriptor:
%%
s", cudnnGetErrorString(err));
"descriptor:
%%
s", cudnnGetErrorString(err));
%(fail)
s
%(fail)
s
}
}
#ifndef CUDNN_VERSION
{
err = cudnnSetPoolingDescriptor(
int win[
%(nd)
d] = {
%(win)
s};
%(desc)
s,
int pad[
%(nd)
d] = {
%(pad)
s};
%(mode_flag)
s,
int str[
%(nd)
d] = {
%(str)
s};
%(wsX)
d,
%(wsY)
d,
err = cudnnSetPoolingNdDescriptor(
%(stridex)
d,
%(stridey)
d
%(desc)
s,
%(mode_flag)
s,
%(nd)
d,
);
win, pad, str);
#else
}
err = cudnnSetPooling2dDescriptor(
%(desc)
s,
%(mode_flag)
s,
%(wsX)
d,
%(wsY)
d,
%(padX)
d,
%(padY)
d,
%(stridex)
d,
%(stridey)
d
);
#endif
if (err != CUDNN_STATUS_SUCCESS) {
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not set op descriptor:
%%
s",
PyErr_Format(PyExc_RuntimeError, "could not set op descriptor:
%%
s",
cudnnGetErrorString(err));
cudnnGetErrorString(err));
...
@@ -826,46 +1373,49 @@ class GpuDnnPoolDesc(GpuOp):
...
@@ -826,46 +1373,49 @@ class GpuDnnPoolDesc(GpuOp):
}
}
}
}
"""
%
dict
(
name
=
name
,
desc
=
desc
,
mode_flag
=
mode_flag
,
fail
=
sub
[
'fail'
],
"""
%
dict
(
name
=
name
,
desc
=
desc
,
mode_flag
=
mode_flag
,
fail
=
sub
[
'fail'
],
wsX
=
self
.
ws
[
0
],
wsY
=
self
.
ws
[
1
]
,
nd
=
self
.
get_ndim
(),
win
=
', '
.
join
(
str
(
w
)
for
w
in
self
.
ws
)
,
stridex
=
self
.
stride
[
0
],
stridey
=
self
.
stride
[
1
]
,
pad
=
', '
.
join
(
str
(
p
)
for
p
in
self
.
pad
)
,
padX
=
self
.
pad
[
0
],
padY
=
self
.
pad
[
1
]
)
str
=
', '
.
join
(
str
(
s
)
for
s
in
self
.
stride
)
)
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
return
(
2
,
version
())
return
(
3
,
version
())
class
GpuDnnPool
(
DnnBase
):
class
GpuDnnPool
(
DnnBase
):
"""
"""
Pooling.
Pooling.
:param img: the image 4d tensor.
:param img: the image 4d
or 5d
tensor.
:param desc: the pooling descriptor.
:param desc: the pooling descriptor.
"""
"""
__props__
=
()
__props__
=
()
def
make_node
(
self
,
img
,
desc
):
def
make_node
(
self
,
img
,
desc
):
img
=
as_cuda_ndarray_variable
(
img
)
img
=
as_cuda_ndarray_variable
(
img
)
if
img
.
type
.
ndim
!=
4
:
raise
TypeError
(
'img must be 4D tensor'
)
if
not
isinstance
(
desc
.
type
,
CDataType
)
\
if
not
isinstance
(
desc
.
type
,
CDataType
)
\
or
desc
.
type
.
ctype
!=
'cudnnPoolingDescriptor_t'
:
or
desc
.
type
.
ctype
!=
'cudnnPoolingDescriptor_t'
:
raise
TypeError
(
'desc must be cudnnPoolingDescriptor_t'
)
raise
TypeError
(
'desc must be cudnnPoolingDescriptor_t'
)
return
Apply
(
self
,
[
img
,
desc
],
dop
=
desc
.
owner
.
op
[
img
.
type
()])
e_ndim
=
dop
.
get_ndim
()
+
2
# 4 or 5
if
img
.
type
.
ndim
!=
e_ndim
:
raise
TypeError
(
'img must be
%
dD tensor'
%
e_ndim
)
return
Apply
(
self
,
[
img
,
desc
],
[
img
.
type
()])
def
infer_shape
(
self
,
node
,
shape
):
def
infer_shape
(
self
,
node
,
shape
):
desc
=
node
.
inputs
[
1
]
.
owner
.
op
desc
=
node
.
inputs
[
1
]
.
owner
.
op
kh
,
kw
=
desc
.
ws
nd
=
desc
.
get_ndim
()
sh
,
sw
=
desc
.
stride
w
=
desc
.
ws
padh
,
padw
=
desc
.
pad
s
=
desc
.
stride
return
[(
p
=
desc
.
pad
shape
[
0
][
0
],
ret
=
[
shape
[
0
][
0
],
shape
[
0
][
1
],
shape
[
0
][
1
],
(
shape
[
0
][
2
]
+
2
*
p
[
0
]
-
w
[
0
])
//
s
[
0
]
+
1
,
(
shape
[
0
][
2
]
+
2
*
padh
-
kh
)
//
sh
+
1
,
(
shape
[
0
][
3
]
+
2
*
p
[
1
]
-
w
[
1
])
//
s
[
1
]
+
1
]
(
shape
[
0
][
3
]
+
2
*
padw
-
kw
)
//
sw
+
1
if
nd
==
3
:
)]
ret
.
append
((
shape
[
0
][
4
]
+
2
*
p
[
2
]
-
w
[
2
])
//
s
[
2
]
+
1
)
return
[
ret
]
def
c_support_code_struct
(
self
,
node
,
name
):
def
c_support_code_struct
(
self
,
node
,
name
):
return
"""
return
"""
...
@@ -879,12 +1429,12 @@ cudnnStatus_t err%(name)s;
...
@@ -879,12 +1429,12 @@ cudnnStatus_t err%(name)s;
input
%(name)
s = NULL;
input
%(name)
s = NULL;
output
%(name)
s = NULL;
output
%(name)
s = NULL;
if ((err
%(name)
s = cudnnCreateTensorDescriptor(&input
%(name)
s)) != CUDNN_STATUS_SUCCESS) {
if ((err
%(name)
s = cudnnCreateTensorDescriptor(&input
%(name)
s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor
4d
descriptor "
PyErr_Format(PyExc_MemoryError, "could not allocate tensor descriptor "
"(inp):
%%
s", cudnnGetErrorString(err
%(name)
s));
"(inp):
%%
s", cudnnGetErrorString(err
%(name)
s));
%(fail)
s
%(fail)
s
}
}
if ((err
%(name)
s = cudnnCreateTensorDescriptor(&output
%(name)
s)) != CUDNN_STATUS_SUCCESS) {
if ((err
%(name)
s = cudnnCreateTensorDescriptor(&output
%(name)
s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor
4d
descriptor "
PyErr_Format(PyExc_MemoryError, "could not allocate tensor descriptor "
"(out):
%%
s", cudnnGetErrorString(err
%(name)
s));
"(out):
%%
s", cudnnGetErrorString(err
%(name)
s));
%(fail)
s
%(fail)
s
}
}
...
@@ -900,65 +1450,51 @@ if (output%(name)s != NULL) { cudnnDestroyTensorDescriptor(output%(name)s); }
...
@@ -900,65 +1450,51 @@ if (output%(name)s != NULL) { cudnnDestroyTensorDescriptor(output%(name)s); }
desc
=
inputs
[
1
]
desc
=
inputs
[
1
]
out
,
=
outputs
out
,
=
outputs
set_in
=
c_set_tensor4d
(
inputs
[
0
],
"input"
+
str
(
name
),
'err'
+
name
,
sub
[
'fail'
])
set_out
=
c_set_tensor4d
(
out
,
"output"
+
str
(
name
),
'err'
+
name
,
sub
[
'fail'
])
return
"""
return
"""
cudnnStatus_t err
%(name)
s;
cudnnStatus_t err
%(name)
s;
int
%(out)
s_dims[
4
];
int
%(out)
s_dims[
5
];
if (!CudaNdarray_is_c_contiguous(
%(input)
s)) {
if (!CudaNdarray_is_c_contiguous(
%(input)
s)) {
PyErr_SetString(PyExc_ValueError, "Only contiguous inputs are supported.");
PyErr_SetString(PyExc_ValueError, "Only contiguous inputs are supported.");
%(fail)
s
%(fail)
s
}
}
%(set_in)
s
if (c_set_tensorNd(
%(input)
s,
%(input_desc)
s) != 0)
%(fail)
s
cudnnPoolingMode_t mode;
cudnnPoolingMode_t mode;
int wsX, wsY, vpad, hpad, strideX, strideY;
int win[3];
#ifndef CUDNN_VERSION
int pad[3];
err
%(name)
s = cudnnGetPoolingDescriptor(
int str[3];
%(desc)
s, &mode,
int ndims;
&wsX, &wsY,
err
%(name)
s = cudnnGetPoolingNdDescriptor(
&strideX, &strideY);
%(desc)
s, 3,
#else
&mode, &ndims,
err
%(name)
s = cudnnGetPooling2dDescriptor(
win, pad, str);
%(desc)
s, &mode,
&wsX, &wsY,
&vpad, &hpad,
&strideX, &strideY);
#endif
if (err
%(name)
s != CUDNN_STATUS_SUCCESS) {
if (err
%(name)
s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
PyErr_Format(PyExc_RuntimeError,
"GpuDnnPool: error doing cudnnGetPoolingDescriptor operation:
%%
s",
"GpuDnnPool: error doing cudnnGetPooling
Nd
Descriptor operation:
%%
s",
cudnnGetErrorString(err
%(name)
s));
cudnnGetErrorString(err
%(name)
s));
%(fail)
s
%(fail)
s
}
}
%(out)
s_dims[0] = CudaNdarray_HOST_DIMS(
%(input)
s)[0];
%(out)
s_dims[0] = CudaNdarray_HOST_DIMS(
%(input)
s)[0];
%(out)
s_dims[1] = CudaNdarray_HOST_DIMS(
%(input)
s)[1];
%(out)
s_dims[1] = CudaNdarray_HOST_DIMS(
%(input)
s)[1];
%(out)
s_dims[2] = (CudaNdarray_HOST_DIMS(
%(input)
s)[2] + (vpad*2) - wsX) / strideX + 1;
%(out)
s_dims[2] = (CudaNdarray_HOST_DIMS(
%(input)
s)[2] + (pad[0]*2) - win[0]) / str[0] + 1;
%(out)
s_dims[3] = (CudaNdarray_HOST_DIMS(
%(input)
s)[3] + (hpad*2) - wsY) / strideY + 1;
%(out)
s_dims[3] = (CudaNdarray_HOST_DIMS(
%(input)
s)[3] + (pad[1]*2) - win[1]) / str[1] + 1;
if (ndims == 3)
%(out)
s_dims[4] = (CudaNdarray_HOST_DIMS(
%(input)
s)[4] + (pad[2]*2) - win[2]) / str[2] + 1;
if (CudaNdarray_prep_output(&
%(out)
s,
4
,
%(out)
s_dims) != 0)
if (CudaNdarray_prep_output(&
%(out)
s,
ndims+2
,
%(out)
s_dims) != 0)
{
{
%(fail)
s
%(fail)
s
}
}
%(set_out)
s
if (c_set_tensorNd(
%(out)
s,
%(output_desc)
s) != 0)
#ifndef CUDNN_VERSION
%(fail)
s
err
%(name)
s = cudnnPoolingForward(
_handle,
%(desc)
s,
%(input_desc)
s, CudaNdarray_DEV_DATA(
%(input)
s),
%(output_desc)
s, CudaNdarray_DEV_DATA(
%(out)
s)
);
#else
{
{
const float alpha = 1;
const float alpha = 1;
const float beta = 0;
const float beta = 0;
...
@@ -971,7 +1507,6 @@ _handle,
...
@@ -971,7 +1507,6 @@ _handle,
%(output_desc)
s, CudaNdarray_DEV_DATA(
%(out)
s)
%(output_desc)
s, CudaNdarray_DEV_DATA(
%(out)
s)
);
);
}
}
#endif
if (err
%(name)
s != CUDNN_STATUS_SUCCESS) {
if (err
%(name)
s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
PyErr_Format(PyExc_RuntimeError,
"GpuDnnPool: error doing cudnnPoolingForward operation:
%%
s",
"GpuDnnPool: error doing cudnnPoolingForward operation:
%%
s",
...
@@ -979,8 +1514,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) {
...
@@ -979,8 +1514,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) {
%(fail)
s
%(fail)
s
}
}
"""
%
dict
(
out
=
out
,
desc
=
desc
,
fail
=
sub
[
'fail'
],
"""
%
dict
(
out
=
out
,
desc
=
desc
,
fail
=
sub
[
'fail'
],
name
=
name
,
set_in
=
set_in
,
name
=
name
,
input
=
inputs
[
0
],
set_out
=
set_out
,
input
=
inputs
[
0
],
input_desc
=
"input"
+
name
,
input_desc
=
"input"
+
name
,
output_desc
=
"output"
+
name
)
output_desc
=
"output"
+
name
)
...
@@ -1001,7 +1535,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) {
...
@@ -1001,7 +1535,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) {
return
[[
1
],
[
0
]]
return
[[
1
],
[
0
]]
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
return
(
6
,
version
())
return
(
7
,
version
())
class
GpuDnnPoolGrad
(
DnnBase
):
class
GpuDnnPoolGrad
(
DnnBase
):
...
@@ -1010,27 +1544,30 @@ class GpuDnnPoolGrad(DnnBase):
...
@@ -1010,27 +1544,30 @@ class GpuDnnPoolGrad(DnnBase):
:param inp: the input of the pooling.
:param inp: the input of the pooling.
:param out: the output of the pooling in the forward.
:param out: the output of the pooling in the forward.
:param inp_grad: same size as out, but is the corresponding gradient information.
:param inp_grad: same size as out, but is the corresponding gradient
information.
:param desc: The pooling descriptor.
:param desc: The pooling descriptor.
"""
"""
__props__
=
()
__props__
=
()
def
make_node
(
self
,
inp
,
out
,
inp_grad
,
desc
):
def
make_node
(
self
,
inp
,
out
,
inp_grad
,
desc
):
if
not
isinstance
(
desc
.
type
,
CDataType
)
\
or
desc
.
type
.
ctype
!=
'cudnnPoolingDescriptor_t'
:
raise
TypeError
(
'desc must be cudnnPoolingDescriptor_t'
)
nd
=
desc
.
owner
.
op
.
get_ndim
()
+
2
# 4 or 5
inp
=
as_cuda_ndarray_variable
(
inp
)
inp
=
as_cuda_ndarray_variable
(
inp
)
if
inp
.
type
.
ndim
!=
4
:
if
inp
.
type
.
ndim
!=
nd
:
raise
TypeError
(
'inp must be
4D tensor'
)
raise
TypeError
(
'inp must be
%
dD tensor'
%
(
nd
,)
)
inp_grad
=
as_cuda_ndarray_variable
(
inp_grad
)
inp_grad
=
as_cuda_ndarray_variable
(
inp_grad
)
if
inp_grad
.
type
.
ndim
!=
4
:
if
inp_grad
.
type
.
ndim
!=
nd
:
raise
TypeError
(
'inp_grad must be
4D tensor'
)
raise
TypeError
(
'inp_grad must be
%
dD tensor'
%
(
nd
,)
)
out
=
as_cuda_ndarray_variable
(
out
)
out
=
as_cuda_ndarray_variable
(
out
)
if
out
.
type
.
ndim
!=
4
:
if
out
.
type
.
ndim
!=
nd
:
raise
TypeError
(
'out must be 4D tensor'
)
raise
TypeError
(
'out must be
%
dD tensor'
%
(
nd
,))
if
not
isinstance
(
desc
.
type
,
CDataType
)
\
or
desc
.
type
.
ctype
!=
'cudnnPoolingDescriptor_t'
:
raise
TypeError
(
'desc must be cudnnPoolingDescriptor_t'
)
return
Apply
(
self
,
[
inp
,
out
,
inp_grad
,
desc
],
return
Apply
(
self
,
[
inp
,
out
,
inp_grad
,
desc
],
[
inp
.
type
()])
[
inp
.
type
()])
...
@@ -1091,18 +1628,6 @@ if (output_grad%(name)s != NULL) { cudnnDestroyTensorDescriptor(output_grad%(nam
...
@@ -1091,18 +1628,6 @@ if (output_grad%(name)s != NULL) { cudnnDestroyTensorDescriptor(output_grad%(nam
out
,
inp
,
inp_grad
,
desc
=
inputs
out
,
inp
,
inp_grad
,
desc
=
inputs
out_grad
,
=
outputs
out_grad
,
=
outputs
set_in
=
"
\n
"
.
join
([
c_set_tensor4d
(
inp
,
"input"
+
name
,
'err'
+
name
,
sub
[
'fail'
]),
c_set_tensor4d
(
inp_grad
,
"input_grad"
+
name
,
'err'
+
name
,
sub
[
'fail'
]),
c_set_tensor4d
(
out
,
"output"
+
name
,
'err'
+
name
,
sub
[
'fail'
])
])
set_out
=
c_set_tensor4d
(
out
,
"output_grad"
+
name
,
'err'
+
name
,
sub
[
'fail'
])
return
"""
return
"""
cudnnStatus_t err
%(name)
s;
cudnnStatus_t err
%(name)
s;
...
@@ -1124,25 +1649,35 @@ if (!CudaNdarray_is_c_contiguous(%(output)s)) {
...
@@ -1124,25 +1649,35 @@ if (!CudaNdarray_is_c_contiguous(%(output)s)) {
%(fail)
s
%(fail)
s
}
}
%(set_in)
s
if (c_set_tensorNd(
%(input)
s,
%(input_desc)
s) != 0)
%(fail)
s
if (c_set_tensorNd(
%(input_grad)
s,
%(input_grad_desc)
s) != 0)
%(fail)
s
if (c_set_tensorNd(
%(output)
s,
%(output_desc)
s) != 0)
%(fail)
s
if (CudaNdarray_prep_output(&
%(output_grad)
s, 4,
if (CudaNdarray_prep_output(&
%(output_grad)
s,
%(output)
s->nd,
CudaNdarray_HOST_DIMS(
%(output)
s)) != 0)
CudaNdarray_HOST_DIMS(
%(output)
s)) != 0)
{
{
%(fail)
s
%(fail)
s
}
}
%(set_out)
s
// Get the pooling_mode to be used. Variable 'tmp' is used because we don't
#ifndef CUDNN_VERSION
// care about the other outputs of the function
err
%(name)
s = cudnnPoolingBackward(
cudnnPoolingMode_t pooling_mode;
_handle,
int tmp;
%(desc)
s,
err
%(name)
s = cudnnGetPoolingNdDescriptor(
%(desc)
s, 0, &pooling_mode, &tmp,
%(input_desc)
s, CudaNdarray_DEV_DATA(
%(input)
s),
&tmp, &tmp, &tmp);
%(input_grad_desc)
s, CudaNdarray_DEV_DATA(
%(input_grad)
s),
if (err
%(name)
s != CUDNN_STATUS_SUCCESS) {
%(output_desc)
s, CudaNdarray_DEV_DATA(
%(output)
s),
PyErr_Format(PyExc_RuntimeError,
%(output_grad_desc)
s, CudaNdarray_DEV_DATA(
%(output_grad)
s)
"GpuDnnPoolGrad: could not obtain pooling mode");
);
%(fail)
s
#else
}
if (c_set_tensorNd(
%(output_grad)
s,
%(output_grad_desc)
s) != 0)
%(fail)
s
{
{
const float alpha = 1;
const float alpha = 1;
const float beta = 0;
const float beta = 0;
...
@@ -1157,45 +1692,22 @@ _handle,
...
@@ -1157,45 +1692,22 @@ _handle,
%(output_grad_desc)
s, CudaNdarray_DEV_DATA(
%(output_grad)
s)
%(output_grad_desc)
s, CudaNdarray_DEV_DATA(
%(output_grad)
s)
);
);
}
}
#endif
if (err
%(name)
s != CUDNN_STATUS_SUCCESS) {
if (err
%(name)
s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
PyErr_Format(PyExc_RuntimeError,
"GpuDnnPoolGrad: error doing operation:
%%
s. "
"GpuDnnPoolGrad: error doing operation:
%%
s.",
"input.shape=(
%%
d,
%%
d,
%%
d,
%%
d) "
cudnnGetErrorString(err
%(name)
s));
"input_grad.shape=(
%%
d,
%%
d,
%%
d,
%%
d) "
"output.shape=(
%%
d,
%%
d,
%%
d,
%%
d) "
"output_grad.shape=(
%%
d,
%%
d,
%%
d,
%%
d)",
cudnnGetErrorString(err
%(name)
s),
CudaNdarray_HOST_DIMS(
%(input)
s)[0],
CudaNdarray_HOST_DIMS(
%(input)
s)[1],
CudaNdarray_HOST_DIMS(
%(input)
s)[2],
CudaNdarray_HOST_DIMS(
%(input)
s)[3],
CudaNdarray_HOST_DIMS(
%(input_grad)
s)[0],
CudaNdarray_HOST_DIMS(
%(input_grad)
s)[1],
CudaNdarray_HOST_DIMS(
%(input_grad)
s)[2],
CudaNdarray_HOST_DIMS(
%(input_grad)
s)[3],
CudaNdarray_HOST_DIMS(
%(output)
s)[0],
CudaNdarray_HOST_DIMS(
%(output)
s)[1],
CudaNdarray_HOST_DIMS(
%(output)
s)[2],
CudaNdarray_HOST_DIMS(
%(output)
s)[3],
CudaNdarray_HOST_DIMS(
%(output_grad)
s)[0],
CudaNdarray_HOST_DIMS(
%(output_grad)
s)[1],
CudaNdarray_HOST_DIMS(
%(output_grad)
s)[2],
CudaNdarray_HOST_DIMS(
%(output_grad)
s)[3]
);
%(fail)
s
%(fail)
s
}
}
"""
%
dict
(
output_grad
=
out_grad
,
desc
=
desc
,
"""
%
dict
(
output_grad
=
out_grad
,
desc
=
desc
,
fail
=
sub
[
'fail'
],
fail
=
sub
[
'fail'
],
name
=
name
,
name
=
name
,
set_in
=
set_in
,
input
=
inp
,
input_grad
=
inp_grad
,
output
=
out
,
set_out
=
set_out
,
input
=
inp
,
input_grad
=
inp_grad
,
output
=
out
,
input_desc
=
"input"
+
name
,
input_desc
=
"input"
+
name
,
input_grad_desc
=
"input_grad"
+
name
,
input_grad_desc
=
"input_grad"
+
name
,
output_desc
=
"output"
+
name
,
output_desc
=
"output"
+
name
,
output_grad_desc
=
"output_grad"
+
name
)
output_grad_desc
=
"output_grad"
+
name
)
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
return
(
5
,
version
())
return
(
7
,
version
())
def
infer_shape
(
self
,
node
,
shape
):
def
infer_shape
(
self
,
node
,
shape
):
return
[
shape
[
0
]]
return
[
shape
[
0
]]
...
@@ -1216,6 +1728,9 @@ def dnn_pool(img, ws, stride=(1, 1), mode='max', pad=(0, 0)):
...
@@ -1216,6 +1728,9 @@ def dnn_pool(img, ws, stride=(1, 1), mode='max', pad=(0, 0)):
:param pad: (padX, padY) padding information.
:param pad: (padX, padY) padding information.
padX is the size of the left and right borders,
padX is the size of the left and right borders,
padY is the size of the top and bottom borders.
padY is the size of the top and bottom borders.
:param nd: dimensions of pooling, can be 2 or 3 for 2d or 3d pooling
If set to 3 all other params (except mode) must have an extra
dimension to match. 3 is only available for cudnn v3
:warning: The cuDNN library only works with GPU that have a compute
:warning: The cuDNN library only works with GPU that have a compute
capability of 3.0 or higer. This means that older GPU will not
capability of 3.0 or higer. This means that older GPU will not
...
@@ -1232,8 +1747,9 @@ class GpuDnnSoftmaxBase(DnnBase):
...
@@ -1232,8 +1747,9 @@ class GpuDnnSoftmaxBase(DnnBase):
Op for the cuDNN Softmax.
Op for the cuDNN Softmax.
:param tensor_format: Whether the data format is 'bc01' or 'b01c'.
:param tensor_format: Whether the data format is 'bc01' or 'b01c'.
:param algo: 'fast' or 'accurate' indicating whether computations should be
:param algo: 'fast', 'accurate' or 'log' indicating whether, respectively,
optimized for speed or accuracy respectively.
computations should be optimized for speed, for accuracy, or if CuDNN
should rather compute the log-softmax instead.
:param mode: 'instance' or 'channel' indicating whether the softmax should
:param mode: 'instance' or 'channel' indicating whether the softmax should
be computed per image across 'c01' or per spatial location '01' per
be computed per image across 'c01' or per spatial location '01' per
image across 'c'.
image across 'c'.
...
@@ -1246,7 +1762,10 @@ class GpuDnnSoftmaxBase(DnnBase):
...
@@ -1246,7 +1762,10 @@ class GpuDnnSoftmaxBase(DnnBase):
DnnBase
.
__init__
(
self
)
DnnBase
.
__init__
(
self
)
self
.
tensor_format
=
tensor_format
self
.
tensor_format
=
tensor_format
assert
(
algo
in
(
'fast'
,
'accurate'
))
if
algo
==
'log'
and
version
()
<
(
3000
,
3000
):
raise
RuntimeError
(
"CuDNN log-softmax requires CuDNN v3"
)
assert
(
algo
in
(
'fast'
,
'accurate'
,
'log'
))
self
.
algo
=
algo
self
.
algo
=
algo
assert
(
mode
in
(
'instance'
,
'channel'
))
assert
(
mode
in
(
'instance'
,
'channel'
))
...
@@ -1319,9 +1838,11 @@ cudnnStatus_t err%(name)s;
...
@@ -1319,9 +1838,11 @@ cudnnStatus_t err%(name)s;
mode
=
0
mode
=
0
if
self
.
algo
==
'fast'
:
if
self
.
algo
==
'fast'
:
algo
=
1
algo
=
"CUDNN_SOFTMAX_FAST"
elif
self
.
algo
==
"log"
:
algo
=
"CUDNN_SOFTMAX_LOG"
else
:
else
:
algo
=
0
algo
=
"CUDNN_SOFTMAX_ACCURATE"
# Setup configuration variables.
# Setup configuration variables.
result
=
"""
result
=
"""
...
@@ -1330,9 +1851,7 @@ cudnnTensorFormat_t format%(name)s = CUDNN_TENSOR_NCHW;
...
@@ -1330,9 +1851,7 @@ cudnnTensorFormat_t format%(name)s = CUDNN_TENSOR_NCHW;
if (
%(tensor_format)
d == 1)
if (
%(tensor_format)
d == 1)
format
%(name)
s = CUDNN_TENSOR_NHWC;
format
%(name)
s = CUDNN_TENSOR_NHWC;
cudnnSoftmaxAlgorithm_t algo
%(name)
s = CUDNN_SOFTMAX_ACCURATE;
cudnnSoftmaxAlgorithm_t algo
%(name)
s =
%(algo)
s;
if (
%(algo)
d == 1)
algo
%(name)
s = CUDNN_SOFTMAX_FAST;
cudnnSoftmaxMode_t mode
%(name)
s = CUDNN_SOFTMAX_MODE_CHANNEL;
cudnnSoftmaxMode_t mode
%(name)
s = CUDNN_SOFTMAX_MODE_CHANNEL;
if (
%(mode)
d == 1)
if (
%(mode)
d == 1)
...
@@ -1557,7 +2076,7 @@ if True:
...
@@ -1557,7 +2076,7 @@ if True:
isinstance
(
dest
.
owner
.
op
,
GpuAllocEmpty
)
and
isinstance
(
dest
.
owner
.
op
,
GpuAllocEmpty
)
and
len
(
dest
.
clients
)
>
1
):
len
(
dest
.
clients
)
>
1
):
inputs
[
2
]
=
gpu_alloc_empty
(
*
dest
.
owner
.
inputs
)
inputs
[
2
]
=
gpu_alloc_empty
(
*
dest
.
owner
.
inputs
)
return
[
GpuDnnConv
(
workmem
=
node
.
op
.
workmem
,
inplace
=
True
)(
*
inputs
)]
return
[
GpuDnnConv
(
algo
=
node
.
op
.
algo
,
inplace
=
True
)(
*
inputs
)]
@local_optimizer
([
GpuDnnConvGradW
],
inplace
=
True
)
@local_optimizer
([
GpuDnnConvGradW
],
inplace
=
True
)
def
local_dnn_convgw_inplace
(
node
):
def
local_dnn_convgw_inplace
(
node
):
...
@@ -1595,7 +2114,7 @@ if True:
...
@@ -1595,7 +2114,7 @@ if True:
def
local_dnn_conv_alpha_merge
(
node
,
*
inputs
):
def
local_dnn_conv_alpha_merge
(
node
,
*
inputs
):
if
not
dnn_available
()
or
version
()
==
-
1
:
if
not
dnn_available
()
or
version
()
==
-
1
:
return
None
return
None
return
[
GpuDnnConv
(
workmem
=
node
.
op
.
workmem
)(
*
inputs
)]
return
[
GpuDnnConv
(
algo
=
node
.
op
.
algo
)(
*
inputs
)]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
)
@alpha_merge
(
GpuDnnConvGradW
,
alpha_in
=
4
,
beta_in
=
5
,
nd
=
4
)
@alpha_merge
(
GpuDnnConvGradW
,
alpha_in
=
4
,
beta_in
=
5
,
nd
=
4
)
...
@@ -1615,7 +2134,7 @@ if True:
...
@@ -1615,7 +2134,7 @@ if True:
@output_merge
(
GpuDnnConv
,
alpha_in
=
4
,
beta_in
=
5
,
out_in
=
2
,
nd
=
4
)
@output_merge
(
GpuDnnConv
,
alpha_in
=
4
,
beta_in
=
5
,
out_in
=
2
,
nd
=
4
)
def
local_dnn_conv_output_merge
(
node
,
*
inputs
):
def
local_dnn_conv_output_merge
(
node
,
*
inputs
):
inputs
=
inputs
[
0
:
2
]
+
(
gpu_contiguous
(
inputs
[
2
]),)
+
inputs
[
3
:]
inputs
=
inputs
[
0
:
2
]
+
(
gpu_contiguous
(
inputs
[
2
]),)
+
inputs
[
3
:]
return
[
GpuDnnConv
(
workmem
=
node
.
op
.
workmem
)(
*
inputs
)]
return
[
GpuDnnConv
(
algo
=
node
.
op
.
algo
)(
*
inputs
)]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
)
@output_merge
(
GpuDnnConvGradW
,
alpha_in
=
4
,
beta_in
=
5
,
out_in
=
2
,
nd
=
4
)
@output_merge
(
GpuDnnConvGradW
,
alpha_in
=
4
,
beta_in
=
5
,
out_in
=
2
,
nd
=
4
)
...
@@ -1713,6 +2232,26 @@ if True:
...
@@ -1713,6 +2232,26 @@ if True:
out
=
as_cuda_ndarray_variable
(
out
.
dimshuffle
(
0
,
1
))
out
=
as_cuda_ndarray_variable
(
out
.
dimshuffle
(
0
,
1
))
return
[
out
]
return
[
out
]
@register_opt
(
'cudnn'
)
@local_optimizer
([
GpuElemwise
])
def
local_log_softmax_dnn
(
node
):
# The log-softmax implementation is only available starting at CuDNN V3
if
not
dnn_available
()
or
version
()
<
(
3000
,
3000
):
return
if
(
isinstance
(
node
.
op
,
GpuElemwise
)
and
isinstance
(
node
.
op
.
scalar_op
,
Log
)
and
node
.
inputs
[
0
]
.
owner
and
isinstance
(
node
.
inputs
[
0
]
.
owner
.
op
,
GpuDnnSoftmax
)
and
len
(
node
.
inputs
[
0
]
.
owner
.
out
.
clients
)
==
1
):
log_input
=
node
.
inputs
[
0
]
softmax_node
=
log_input
.
owner
new_softmax_node
=
GpuDnnSoftmax
(
softmax_node
.
op
.
tensor_format
,
'log'
,
softmax_node
.
op
.
mode
)
new_log_softmax
=
new_softmax_node
(
softmax_node
.
inputs
[
0
])
return
[
new_log_softmax
]
class
NoCuDNNRaise
(
Optimizer
):
class
NoCuDNNRaise
(
Optimizer
):
def
apply
(
self
,
fgraph
):
def
apply
(
self
,
fgraph
):
""" Raise a RuntimeError if cudnn can't be used"""
""" Raise a RuntimeError if cudnn can't be used"""
...
@@ -1730,8 +2269,8 @@ if True:
...
@@ -1730,8 +2269,8 @@ if True:
def
local_softmax_dnn_grad
(
node
):
def
local_softmax_dnn_grad
(
node
):
if
(
isinstance
(
node
.
op
,
SoftmaxGrad
)
and
if
(
isinstance
(
node
.
op
,
SoftmaxGrad
)
and
((
node
.
inputs
[
0
]
.
owner
and
((
node
.
inputs
[
0
]
.
owner
and
isinstance
(
node
.
inputs
[
0
]
.
owner
.
op
,
HostFromGpu
))
isinstance
(
node
.
inputs
[
0
]
.
owner
.
op
,
HostFromGpu
))
or
or
(
node
.
inputs
[
1
]
.
owner
and
(
node
.
inputs
[
1
]
.
owner
and
isinstance
(
node
.
inputs
[
1
]
.
owner
.
op
,
HostFromGpu
)))):
isinstance
(
node
.
inputs
[
1
]
.
owner
.
op
,
HostFromGpu
)))):
if
not
dnn_available
():
if
not
dnn_available
():
return
return
...
...
theano/sandbox/cuda/dnn_base.c
浏览文件 @
8cb9d50e
#section support_code
#section support_code
static
cudnnHandle_t
_handle
=
NULL
;
static
cudnnHandle_t
_handle
=
NULL
;
static
int
static
int
c_set_tensor4d
(
CudaNdarray
*
var
,
cudnnTensorDescriptor_t
desc
)
{
c_set_tensorNd
(
CudaNdarray
*
var
,
cudnnTensorDescriptor_t
desc
)
{
cudnnStatus_t
err
=
cudnnSetTensor4dDescriptorEx
(
desc
,
CUDNN_DATA_FLOAT
,
CudaNdarray_HOST_DIMS
(
var
)[
0
],
int
dim
=
CudaNdarray_NDIM
(
var
);
CudaNdarray_HOST_DIMS
(
var
)[
1
],
int
strides
[
dim
];
CudaNdarray_HOST_DIMS
(
var
)[
2
],
int
default_str
=
1
;
CudaNdarray_HOST_DIMS
(
var
)[
3
],
CudaNdarray_HOST_STRIDES
(
var
)[
0
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
0
]
:
CudaNdarray_HOST_DIMS
(
var
)[
2
]
*
CudaNdarray_HOST_DIMS
(
var
)[
3
]
*
CudaNdarray_HOST_DIMS
(
var
)[
1
],
for
(
int
i
=
dim
-
1
;
i
>=
0
;
i
--
)
CudaNdarray_HOST_STRIDES
(
var
)[
1
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
1
]
:
CudaNdarray_HOST_DIMS
(
var
)[
2
]
*
CudaNdarray_HOST_DIMS
(
var
)[
3
],
{
CudaNdarray_HOST_STRIDES
(
var
)[
2
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
2
]
:
CudaNdarray_HOST_DIMS
(
var
)[
3
],
if
(
CudaNdarray_HOST_STRIDES
(
var
)[
i
])
CudaNdarray_HOST_STRIDES
(
var
)[
3
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
3
]
:
1
strides
[
i
]
=
CudaNdarray_HOST_STRIDES
(
var
)[
i
];
);
else
strides
[
i
]
=
default_str
;
default_str
*=
CudaNdarray_HOST_DIMS
(
var
)[
i
];
}
cudnnStatus_t
err
=
cudnnSetTensorNdDescriptor
(
desc
,
CUDNN_DATA_FLOAT
,
dim
,
CudaNdarray_HOST_DIMS
(
var
),
strides
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
PyErr_Format
(
PyExc_RuntimeError
,
"Could not set tensor4d descriptor: %s"
"Could not set tensorNd descriptor: %s"
"shapes=%d %d %d %d strides=%d %d %d %d"
,
"dim=%d"
,
cudnnGetErrorString
(
err
),
cudnnGetErrorString
(
err
),
dim
);
CudaNdarray_HOST_DIMS
(
var
)[
0
],
CudaNdarray_HOST_DIMS
(
var
)[
1
],
CudaNdarray_HOST_DIMS
(
var
)[
2
],
CudaNdarray_HOST_DIMS
(
var
)[
3
],
CudaNdarray_HOST_STRIDES
(
var
)[
0
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
0
]
:
CudaNdarray_HOST_DIMS
(
var
)[
2
]
*
CudaNdarray_HOST_DIMS
(
var
)[
3
]
*
CudaNdarray_HOST_DIMS
(
var
)[
1
],
CudaNdarray_HOST_STRIDES
(
var
)[
1
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
1
]
:
CudaNdarray_HOST_DIMS
(
var
)[
2
]
*
CudaNdarray_HOST_DIMS
(
var
)[
3
],
CudaNdarray_HOST_STRIDES
(
var
)[
2
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
2
]
:
CudaNdarray_HOST_DIMS
(
var
)[
3
],
CudaNdarray_HOST_STRIDES
(
var
)[
3
]
?
CudaNdarray_HOST_STRIDES
(
var
)[
3
]
:
1
);
return
-
1
;
return
-
1
;
}
}
return
0
;
return
0
;
}
}
static
int
static
int
c_set_filter
(
CudaNdarray
*
var
,
cudnnFilterDescriptor_t
desc
)
{
c_set_filter
Nd
(
CudaNdarray
*
var
,
cudnnFilterDescriptor_t
desc
)
{
if
(
!
CudaNdarray_is_c_contiguous
(
var
))
{
if
(
!
CudaNdarray_is_c_contiguous
(
var
))
{
PyErr_SetString
(
PyExc_ValueError
,
PyErr_SetString
(
PyExc_ValueError
,
"Only contiguous filters (kernels) are supported."
);
"Only contiguous filters (kernels) are supported."
);
return
-
1
;
return
-
1
;
}
}
cudnnStatus_t
err
=
cudnnSetFilter4dDescriptor
(
int
dim
=
CudaNdarray_NDIM
(
var
);
desc
,
CUDNN_DATA_FLOAT
,
cudnnStatus_t
err
=
cudnnSetFilterNdDescriptor
(
desc
,
CUDNN_DATA_FLOAT
,
dim
,
CudaNdarray_HOST_DIMS
(
var
)[
0
],
CudaNdarray_HOST_DIMS
(
var
));
CudaNdarray_HOST_DIMS
(
var
)[
1
],
CudaNdarray_HOST_DIMS
(
var
)[
2
],
CudaNdarray_HOST_DIMS
(
var
)[
3
]
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
PyErr_Format
(
PyExc_RuntimeError
,
"Could not set filter descriptor: %s."
"Could not set filter descriptor: %s."
" dims= %d %d %d %d"
,
" dims= %d"
,
cudnnGetErrorString
(
err
),
cudnnGetErrorString
(
err
),
dim
);
CudaNdarray_HOST_DIMS
(
var
)[
0
],
CudaNdarray_HOST_DIMS
(
var
)[
1
],
CudaNdarray_HOST_DIMS
(
var
)[
2
],
CudaNdarray_HOST_DIMS
(
var
)[
3
]);
return
-
1
;
return
-
1
;
}
}
return
0
;
return
0
;
...
...
theano/sandbox/cuda/dnn_conv_base.c
浏览文件 @
8cb9d50e
...
@@ -3,6 +3,24 @@ cudnnTensorDescriptor_t APPLY_SPECIFIC(input);
...
@@ -3,6 +3,24 @@ cudnnTensorDescriptor_t APPLY_SPECIFIC(input);
cudnnTensorDescriptor_t
APPLY_SPECIFIC
(
output
);
cudnnTensorDescriptor_t
APPLY_SPECIFIC
(
output
);
cudnnFilterDescriptor_t
APPLY_SPECIFIC
(
kerns
);
cudnnFilterDescriptor_t
APPLY_SPECIFIC
(
kerns
);
/* Keep track, from one execution to another, of the dimension of the data
and the algorithms, if any, that were selected according to these dimensions
and according to the amount of memory available at that time.
Note : Implementation selection for backward convolution only exists starting
at V3.
*/
int
APPLY_SPECIFIC
(
previous_input_shape
)[
5
];
int
APPLY_SPECIFIC
(
previous_kerns_shape
)[
5
];
int
APPLY_SPECIFIC
(
previous_output_shape
)[
5
];
bool
APPLY_SPECIFIC
(
previous_algo_set
);
cudnnConvolutionFwdAlgo_t
APPLY_SPECIFIC
(
previous_algo
);
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 3000
cudnnConvolutionBwdFilterAlgo_t
APPLY_SPECIFIC
(
previous_bwd_f_algo
);
cudnnConvolutionBwdDataAlgo_t
APPLY_SPECIFIC
(
previous_bwd_d_algo
);
#endif
#section init_code_struct
#section init_code_struct
cudnnStatus_t
APPLY_SPECIFIC
(
err
);
cudnnStatus_t
APPLY_SPECIFIC
(
err
);
...
@@ -10,12 +28,12 @@ APPLY_SPECIFIC(input) = NULL;
...
@@ -10,12 +28,12 @@ APPLY_SPECIFIC(input) = NULL;
APPLY_SPECIFIC
(
output
)
=
NULL
;
APPLY_SPECIFIC
(
output
)
=
NULL
;
APPLY_SPECIFIC
(
kerns
)
=
NULL
;
APPLY_SPECIFIC
(
kerns
)
=
NULL
;
if
((
APPLY_SPECIFIC
(
err
)
=
cudnnCreateTensorDescriptor
(
&
APPLY_SPECIFIC
(
input
)))
!=
CUDNN_STATUS_SUCCESS
)
{
if
((
APPLY_SPECIFIC
(
err
)
=
cudnnCreateTensorDescriptor
(
&
APPLY_SPECIFIC
(
input
)))
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_MemoryError
,
"could not allocate tensor
4d
descriptor "
PyErr_Format
(
PyExc_MemoryError
,
"could not allocate tensor descriptor "
"(inp): %s"
,
cudnnGetErrorString
(
APPLY_SPECIFIC
(
err
)));
"(inp): %s"
,
cudnnGetErrorString
(
APPLY_SPECIFIC
(
err
)));
FAIL
;
FAIL
;
}
}
if
((
APPLY_SPECIFIC
(
err
)
=
cudnnCreateTensorDescriptor
(
&
APPLY_SPECIFIC
(
output
)))
!=
CUDNN_STATUS_SUCCESS
)
{
if
((
APPLY_SPECIFIC
(
err
)
=
cudnnCreateTensorDescriptor
(
&
APPLY_SPECIFIC
(
output
)))
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_MemoryError
,
"could not allocate tensor
4d
descriptor "
PyErr_Format
(
PyExc_MemoryError
,
"could not allocate tensor descriptor "
"(out): %s"
,
cudnnGetErrorString
(
APPLY_SPECIFIC
(
err
)));
"(out): %s"
,
cudnnGetErrorString
(
APPLY_SPECIFIC
(
err
)));
FAIL
;
FAIL
;
}
}
...
@@ -25,6 +43,23 @@ if ((APPLY_SPECIFIC(err) = cudnnCreateFilterDescriptor(&APPLY_SPECIFIC(kerns)))
...
@@ -25,6 +43,23 @@ if ((APPLY_SPECIFIC(err) = cudnnCreateFilterDescriptor(&APPLY_SPECIFIC(kerns)))
FAIL
;
FAIL
;
}
}
for
(
int
i
=
0
;
i
<
5
;
i
++
)
{
APPLY_SPECIFIC
(
previous_input_shape
)[
i
]
=
0
;
APPLY_SPECIFIC
(
previous_kerns_shape
)[
i
]
=
0
;
APPLY_SPECIFIC
(
previous_output_shape
)[
i
]
=
0
;
}
APPLY_SPECIFIC
(
previous_algo_set
)
=
false
;
// Select default implementations for the case where the convolution
// implementations should be selected based on the size of the data.
APPLY_SPECIFIC
(
previous_algo
)
=
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
;
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 3000
APPLY_SPECIFIC
(
previous_bwd_f_algo
)
=
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
;
APPLY_SPECIFIC
(
previous_bwd_d_algo
)
=
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0
;
#endif
#section cleanup_code_struct
#section cleanup_code_struct
if
(
APPLY_SPECIFIC
(
input
)
!=
NULL
)
if
(
APPLY_SPECIFIC
(
input
)
!=
NULL
)
...
...
theano/sandbox/cuda/dnn_fwd.c
浏览文件 @
8cb9d50e
...
@@ -4,6 +4,7 @@ int
...
@@ -4,6 +4,7 @@ int
APPLY_SPECIFIC
(
conv_fwd
)(
CudaNdarray
*
input
,
CudaNdarray
*
kerns
,
APPLY_SPECIFIC
(
conv_fwd
)(
CudaNdarray
*
input
,
CudaNdarray
*
kerns
,
CudaNdarray
*
om
,
cudnnConvolutionDescriptor_t
desc
,
CudaNdarray
*
om
,
cudnnConvolutionDescriptor_t
desc
,
float
alpha
,
float
beta
,
CudaNdarray
**
output
)
{
float
alpha
,
float
beta
,
CudaNdarray
**
output
)
{
cudnnStatus_t
err
=
CUDNN_STATUS_SUCCESS
;
cudnnStatus_t
err
=
CUDNN_STATUS_SUCCESS
;
if
(
CudaNdarray_HOST_DIMS
(
input
)[
1
]
!=
CudaNdarray_HOST_DIMS
(
kerns
)[
1
])
{
if
(
CudaNdarray_HOST_DIMS
(
input
)[
1
]
!=
CudaNdarray_HOST_DIMS
(
kerns
)[
1
])
{
PyErr_SetString
(
PyExc_ValueError
,
PyErr_SetString
(
PyExc_ValueError
,
...
@@ -11,35 +12,227 @@ APPLY_SPECIFIC(conv_fwd)(CudaNdarray *input, CudaNdarray *kerns,
...
@@ -11,35 +12,227 @@ APPLY_SPECIFIC(conv_fwd)(CudaNdarray *input, CudaNdarray *kerns,
return
1
;
return
1
;
}
}
if
(
c_set_tensor
4
d
(
input
,
APPLY_SPECIFIC
(
input
))
==
-
1
)
if
(
c_set_tensor
N
d
(
input
,
APPLY_SPECIFIC
(
input
))
==
-
1
)
return
1
;
return
1
;
if
(
c_set_filter
(
kerns
,
APPLY_SPECIFIC
(
kerns
))
==
-
1
)
if
(
c_set_filter
Nd
(
kerns
,
APPLY_SPECIFIC
(
kerns
))
==
-
1
)
return
1
;
return
1
;
int
nb_dim
=
CudaNdarray_NDIM
(
input
);
#ifdef CONV_INPLACE
#ifdef CONV_INPLACE
Py_XDECREF
(
*
output
);
Py_XDECREF
(
*
output
);
*
output
=
om
;
*
output
=
om
;
Py_INCREF
(
*
output
);
Py_INCREF
(
*
output
);
#else
#else
if
(
CudaNdarray_prep_output
(
output
,
4
,
CudaNdarray_HOST_DIMS
(
om
))
!=
0
)
if
(
CudaNdarray_prep_output
(
output
,
nb_dim
,
CudaNdarray_HOST_DIMS
(
om
))
!=
0
)
return
1
;
return
1
;
if
(
beta
!=
0
.
0
&&
CudaNdarray_CopyFromCudaNdarray
(
*
output
,
om
))
if
(
beta
!=
0
.
0
&&
CudaNdarray_CopyFromCudaNdarray
(
*
output
,
om
))
return
1
;
return
1
;
#endif
#endif
if
(
c_set_tensor4
d
(
*
output
,
APPLY_SPECIFIC
(
output
))
==
-
1
)
if
(
c_set_tensorN
d
(
*
output
,
APPLY_SPECIFIC
(
output
))
==
-
1
)
return
1
;
return
1
;
{
{
size_t
worksize
;
size_t
worksize
;
void
*
workspace
;
void
*
workspace
;
cudnnConvolutionFwdAlgo_t
chosen_algo
;
if
(
CHOOSE_ALGO
)
{
// A new convolution implementation should be selected, based either on
// timing or heuristics if in one of the two following cases :
// - The implementation should only be chosen during the first execution
// of an apply node and this is the first execution of the apply node.
// - The implementation should be chosen as often as necessary and the
// shapes of the inputs differ from the last time an implementation
// was chosen.
bool
reuse_previous_algo
;
if
(
CHOOSE_ALGO_ONCE
)
{
// Only choose a new implementation of none has been chosen before.
reuse_previous_algo
=
APPLY_SPECIFIC
(
previous_algo_set
);
}
else
{
// Reuse the previous implementation if the inputs and the kernels
// have the same shapes as they had when the previous implementation
// was selected
bool
same_shapes
=
true
;
for
(
int
i
=
0
;
(
i
<
nb_dim
)
&&
same_shapes
;
i
++
)
{
same_shapes
&=
(
CudaNdarray_HOST_DIMS
(
input
)[
i
]
==
APPLY_SPECIFIC
(
previous_input_shape
)[
i
]);
same_shapes
&=
(
CudaNdarray_HOST_DIMS
(
kerns
)[
i
]
==
APPLY_SPECIFIC
(
previous_kerns_shape
)[
i
]);
}
reuse_previous_algo
=
same_shapes
;
}
// If the previously choosen implementation can't be reused, select a
// new one based on the shapes of the current inputs
if
(
!
reuse_previous_algo
)
{
// Obtain a convolution algorithm appropriate for the input and kernel
// shapes. Either by choosing one according to heuristics or by making
// CuDNN time every implementation and choose the best one.
if
(
CHOOSE_ALGO_TIME
)
{
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 3000
// Time the different implementations to choose the best one
int
requestedCount
=
1
;
int
count
;
cudnnConvolutionFwdAlgoPerf_t
choosen_algo_perf
;
err
=
cudnnFindConvolutionForwardAlgorithm
(
_handle
,
APPLY_SPECIFIC
(
input
),
APPLY_SPECIFIC
(
kerns
),
desc
,
APPLY_SPECIFIC
(
output
),
requestedCount
,
&
count
,
&
choosen_algo_perf
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConv: error selecting convolution algo: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
chosen_algo
=
choosen_algo_perf
.
algo
;
#endif
}
else
{
// The implementation should be chosen using heuristics based on the
// input shapes and the amount of memory available.
// Get the amount of available memory
size_t
free
=
0
,
total
=
0
;
cudaError_t
err2
=
cudaMemGetInfo
(
&
free
,
&
total
);
if
(
err2
!=
cudaSuccess
){
cudaGetLastError
();
fprintf
(
stderr
,
"Error when trying to find the memory information"
" on the GPU: %s
\n
"
,
cudaGetErrorString
(
err2
));
return
1
;
}
// Use heuristics to choose the implementation
err
=
cudnnGetConvolutionForwardAlgorithm
(
_handle
,
APPLY_SPECIFIC
(
input
),
APPLY_SPECIFIC
(
kerns
),
desc
,
APPLY_SPECIFIC
(
output
),
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT
,
free
,
&
chosen_algo
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConv: error selecting convolution algo: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
}
// Store the shapes of the inputs and kernels as well as the chosen
// algorithm for future use.
APPLY_SPECIFIC
(
previous_algo
)
=
chosen_algo
;
for
(
int
i
=
0
;
i
<
nb_dim
;
i
++
)
{
APPLY_SPECIFIC
(
previous_input_shape
)[
i
]
=
CudaNdarray_HOST_DIMS
(
input
)[
i
];
APPLY_SPECIFIC
(
previous_kerns_shape
)[
i
]
=
CudaNdarray_HOST_DIMS
(
kerns
)[
i
];
}
}
else
{
// Reuse the previously chosen convolution implementation
chosen_algo
=
APPLY_SPECIFIC
(
previous_algo
);
}
}
else
{
chosen_algo
=
CONV_ALGO
;
}
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 3000
// The FFT implementation (only in V3 and onward) does not support strides,
// 1x1 filters or inputs with a spatial dimension larger than 1024.
// If the chosen implementation is FFT, validate that it can be used
// on the current data and default on a safe implementation if it
// can't.
// Following code is 2d-specific, but it is fine as ftt is defined only for
// 2d-filters
if
(
chosen_algo
==
CUDNN_CONVOLUTION_FWD_ALGO_FFT
&&
nb_dim
==
4
)
{
// Extract the properties of the convolution descriptor
int
pad_h
,
pad_w
,
stride_v
,
stride_h
,
upscale_x
,
upscale_y
;
cudnnConvolutionMode_t
mode
;
err
=
cudnnGetConvolution2dDescriptor
(
desc
,
&
pad_h
,
&
pad_w
,
&
stride_v
,
&
stride_h
,
&
upscale_x
,
&
upscale_y
,
&
mode
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConv: error getting convolution properties: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
// Extract the spatial size of the filters
int
filter_h
=
CudaNdarray_HOST_DIMS
(
kerns
)[
3
];
int
filter_w
=
CudaNdarray_HOST_DIMS
(
kerns
)[
4
];
// Extract the spatial size of the input
int
input_h
=
CudaNdarray_HOST_DIMS
(
input
)[
3
];
int
input_w
=
CudaNdarray_HOST_DIMS
(
input
)[
4
];
// Ensure that the selected implementation supports the requested
// convolution. Fall back to a safe implementation otherwise.
if
(
stride_v
!=
1
||
stride_h
!=
1
||
input_h
>
1024
||
input_w
>
1024
||
(
filter_h
==
1
&&
filter_w
==
1
))
{
chosen_algo
=
CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
;
}
}
#endif
#if defined(CUDNN_VERSION) && CUDNN_VERSION < 3000
// In versions before V3, CuDNN did not support kernels larger than the
// inputs in any spatial dimension, even if padding was used such that the
// padded inputs were larger than the kernels. If the kernels are larger
// then the inputs, raise an error message.
bool
shape_mismatch
=
false
;
for
(
int
i
=
2
;
i
<
nb_dim
;
i
++
){
shape_mismatch
=
shape_mismatch
||
(
CudaNdarray_HOST_DIMS
(
kerns
)[
i
]
>
CudaNdarray_HOST_DIMS
(
input
)[
i
]);
}
if
(
shape_mismatch
){
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConv: the current version of CuDNN does not support "
"kernels larger than the inputs in any spatial dimension, "
"even if the inputs are padded such that the padded inputs "
"are larger than the kernels. Update your installation of "
"CuDNN to V3 or more recent to solve the issue."
);
return
1
;
}
#endif
err
=
cudnnGetConvolutionForwardWorkspaceSize
(
_handle
,
err
=
cudnnGetConvolutionForwardWorkspaceSize
(
_handle
,
APPLY_SPECIFIC
(
input
),
APPLY_SPECIFIC
(
input
),
APPLY_SPECIFIC
(
kerns
),
APPLY_SPECIFIC
(
kerns
),
desc
,
desc
,
APPLY_SPECIFIC
(
output
),
APPLY_SPECIFIC
(
output
),
CONV_ALGO
,
chosen_algo
,
&
worksize
);
&
worksize
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
PyErr_Format
(
PyExc_RuntimeError
,
...
@@ -47,7 +240,6 @@ APPLY_SPECIFIC(conv_fwd)(CudaNdarray *input, CudaNdarray *kerns,
...
@@ -47,7 +240,6 @@ APPLY_SPECIFIC(conv_fwd)(CudaNdarray *input, CudaNdarray *kerns,
cudnnGetErrorString
(
err
));
cudnnGetErrorString
(
err
));
return
1
;
return
1
;
}
}
workspace
=
get_work_mem
(
worksize
);
workspace
=
get_work_mem
(
worksize
);
if
(
workspace
==
NULL
&&
worksize
!=
0
)
if
(
workspace
==
NULL
&&
worksize
!=
0
)
return
1
;
return
1
;
...
@@ -58,7 +250,7 @@ APPLY_SPECIFIC(conv_fwd)(CudaNdarray *input, CudaNdarray *kerns,
...
@@ -58,7 +250,7 @@ APPLY_SPECIFIC(conv_fwd)(CudaNdarray *input, CudaNdarray *kerns,
APPLY_SPECIFIC
(
input
),
CudaNdarray_DEV_DATA
(
input
),
APPLY_SPECIFIC
(
input
),
CudaNdarray_DEV_DATA
(
input
),
APPLY_SPECIFIC
(
kerns
),
CudaNdarray_DEV_DATA
(
kerns
),
APPLY_SPECIFIC
(
kerns
),
CudaNdarray_DEV_DATA
(
kerns
),
desc
,
desc
,
CONV_ALGO
,
chosen_algo
,
workspace
,
worksize
,
workspace
,
worksize
,
(
void
*
)
&
beta
,
(
void
*
)
&
beta
,
APPLY_SPECIFIC
(
output
),
CudaNdarray_DEV_DATA
(
*
output
));
APPLY_SPECIFIC
(
output
),
CudaNdarray_DEV_DATA
(
*
output
));
...
...
theano/sandbox/cuda/dnn_gi.c
浏览文件 @
8cb9d50e
...
@@ -12,25 +12,225 @@ APPLY_SPECIFIC(conv_gi)(CudaNdarray *kerns, CudaNdarray *output,
...
@@ -12,25 +12,225 @@ APPLY_SPECIFIC(conv_gi)(CudaNdarray *kerns, CudaNdarray *output,
return
1
;
return
1
;
}
}
if
(
c_set_tensor
4
d
(
output
,
APPLY_SPECIFIC
(
output
))
==
-
1
)
if
(
c_set_tensor
N
d
(
output
,
APPLY_SPECIFIC
(
output
))
==
-
1
)
return
1
;
return
1
;
if
(
c_set_filter
(
kerns
,
APPLY_SPECIFIC
(
kerns
))
==
-
1
)
if
(
c_set_filter
Nd
(
kerns
,
APPLY_SPECIFIC
(
kerns
))
==
-
1
)
return
1
;
return
1
;
int
nb_dim
=
CudaNdarray_NDIM
(
output
);
#ifdef CONV_INPLACE
#ifdef CONV_INPLACE
Py_XDECREF
(
*
input
);
Py_XDECREF
(
*
input
);
*
input
=
im
;
*
input
=
im
;
Py_INCREF
(
*
input
);
Py_INCREF
(
*
input
);
#else
#else
if
(
CudaNdarray_prep_output
(
input
,
4
,
CudaNdarray_HOST_DIMS
(
im
))
!=
0
)
if
(
CudaNdarray_prep_output
(
input
,
nb_dim
,
CudaNdarray_HOST_DIMS
(
im
))
!=
0
)
return
1
;
return
1
;
if
(
beta
!=
0
.
0
&&
CudaNdarray_CopyFromCudaNdarray
(
*
input
,
im
))
if
(
beta
!=
0
.
0
&&
CudaNdarray_CopyFromCudaNdarray
(
*
input
,
im
))
return
1
;
return
1
;
#endif
#endif
if
(
c_set_tensor4d
(
*
input
,
APPLY_SPECIFIC
(
input
))
==
-
1
)
if
(
c_set_tensorNd
(
*
input
,
APPLY_SPECIFIC
(
input
))
==
-
1
)
return
1
;
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 3000
{
size_t
worksize
;
void
*
workspace
;
cudnnConvolutionBwdDataAlgo_t
chosen_algo
;
if
(
CHOOSE_ALGO
)
{
// A new convolution implementation should be selected, based either on
// timing or heuristics, if in one of the two following cases :
// - The implementation should only be chosen during the first execution
// of an apply node and this is the first execution of the apply node.
// - The implementation should be chosen as often as necessary and the
// shapes of the inputs differ from the last time an implementation
// was chosen.
bool
reuse_previous_algo
;
if
(
CHOOSE_ALGO_ONCE
)
{
// Only choose a new implementation of none has been chosen before.
reuse_previous_algo
=
APPLY_SPECIFIC
(
previous_algo_set
);
}
else
{
// Reuse the previous implementation if the the kernels and the outputs
// have the same shapes as they had when the previous implementation
// was selected
bool
same_shapes
=
true
;
for
(
int
i
=
0
;
(
i
<
nb_dim
)
&&
same_shapes
;
i
++
)
{
same_shapes
&=
(
CudaNdarray_HOST_DIMS
(
kerns
)[
i
]
==
APPLY_SPECIFIC
(
previous_kerns_shape
)[
i
]);
same_shapes
&=
(
CudaNdarray_HOST_DIMS
(
output
)[
i
]
==
APPLY_SPECIFIC
(
previous_output_shape
)[
i
]);
}
reuse_previous_algo
=
same_shapes
;
}
// If the previously choosen implementation can't be reused, select a
// new one based on the shapes of the current inputs
if
(
!
reuse_previous_algo
)
{
// Obtain a convolution algorithm appropriate for the kernel and output
// shapes. Either by choosing one according to heuristics or by making
// CuDNN time every implementation and choose the best one.
if
(
CHOOSE_ALGO_TIME
)
{
// Time the different implementations to choose the best one
int
requestedCount
=
1
;
int
count
;
cudnnConvolutionBwdDataAlgoPerf_t
choosen_algo_perf
;
err
=
cudnnFindConvolutionBackwardDataAlgorithm
(
_handle
,
APPLY_SPECIFIC
(
kerns
),
APPLY_SPECIFIC
(
output
),
desc
,
APPLY_SPECIFIC
(
input
),
requestedCount
,
&
count
,
&
choosen_algo_perf
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradI: error selecting convolution algo: "
"%s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
chosen_algo
=
choosen_algo_perf
.
algo
;
}
else
{
// Choose the convolution implementation using heuristics based on the
// shapes of the inputs and the amount of memory available.
// Get the amount of available memory
size_t
free
=
0
,
total
=
0
;
cudaError_t
err2
=
cudaMemGetInfo
(
&
free
,
&
total
);
if
(
err2
!=
cudaSuccess
){
cudaGetLastError
();
fprintf
(
stderr
,
"Error when trying to find the memory information"
" on the GPU: %s
\n
"
,
cudaGetErrorString
(
err2
));
return
1
;
return
1
;
}
// Use heuristics to choose the implementation
err
=
cudnnGetConvolutionBackwardDataAlgorithm
(
_handle
,
APPLY_SPECIFIC
(
kerns
),
APPLY_SPECIFIC
(
output
),
desc
,
APPLY_SPECIFIC
(
input
),
CUDNN_CONVOLUTION_BWD_DATA_SPECIFY_WORKSPACE_LIMIT
,
free
,
&
chosen_algo
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradI: error selecting convolution algo: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
}
// Store the shapes of the kernels and output as well as the chosen
// algorithm for future use.
APPLY_SPECIFIC
(
previous_bwd_d_algo
)
=
chosen_algo
;
for
(
int
i
=
0
;
i
<
nb_dim
;
i
++
)
{
APPLY_SPECIFIC
(
previous_kerns_shape
)[
i
]
=
CudaNdarray_HOST_DIMS
(
kerns
)[
i
];
APPLY_SPECIFIC
(
previous_output_shape
)[
i
]
=
CudaNdarray_HOST_DIMS
(
output
)[
i
];
}
}
else
{
// Reuse the previously chosen convlution implementation
chosen_algo
=
APPLY_SPECIFIC
(
previous_bwd_d_algo
);
}
}
else
{
chosen_algo
=
CONV_ALGO
;
}
// The FFT implementation (only in v3 and onward) does not support strides,
// 1x1 filters or inputs with a spatial dimension larger than 1024.
// If the chosen implementation is FFT, validate that it can be used
// on the current data and default on a safe implementation if it
// can't.
if
(
chosen_algo
==
CUDNN_CONVOLUTION_BWD_DATA_ALGO_FFT
&&
nb_dim
==
4
)
{
// Extract the properties of the convolution descriptor
int
pad_h
,
pad_w
,
stride_v
,
stride_h
,
upscale_x
,
upscale_y
;
cudnnConvolutionMode_t
mode
;
err
=
cudnnGetConvolution2dDescriptor
(
desc
,
&
pad_h
,
&
pad_w
,
&
stride_v
,
&
stride_h
,
&
upscale_x
,
&
upscale_y
,
&
mode
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradI: error getting convolution properties: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
// Extract the spatial size of the filters
int
filter_h
=
CudaNdarray_HOST_DIMS
(
kerns
)[
3
];
int
filter_w
=
CudaNdarray_HOST_DIMS
(
kerns
)[
4
];
// Extract the spatial size of the input
int
input_h
=
CudaNdarray_HOST_DIMS
(
*
input
)[
3
];
int
input_w
=
CudaNdarray_HOST_DIMS
(
*
input
)[
4
];
// Ensure that the selected implementation supports the requested
// convolution. Fall back to a safe implementation otherwise.
if
(
stride_v
!=
1
||
stride_h
!=
1
||
input_h
>
1024
||
input_w
>
1024
||
(
filter_h
==
1
&&
filter_w
==
1
))
{
chosen_algo
=
CUDNN_CONVOLUTION_BWD_DATA_ALGO_0
;
}
}
// Infer required workspace size from the chosen implementation
err
=
cudnnGetConvolutionBackwardDataWorkspaceSize
(
_handle
,
APPLY_SPECIFIC
(
kerns
),
APPLY_SPECIFIC
(
output
),
desc
,
APPLY_SPECIFIC
(
input
),
chosen_algo
,
&
worksize
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradI: error getting worksize: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
// Allocate workspace for the convolution
workspace
=
get_work_mem
(
worksize
);
if
(
workspace
==
NULL
&&
worksize
!=
0
)
return
1
;
// Perform the convolution
err
=
cudnnConvolutionBackwardData_v3
(
_handle
,
(
void
*
)
&
alpha
,
APPLY_SPECIFIC
(
kerns
),
CudaNdarray_DEV_DATA
(
kerns
),
APPLY_SPECIFIC
(
output
),
CudaNdarray_DEV_DATA
(
output
),
desc
,
chosen_algo
,
workspace
,
worksize
,
(
void
*
)
&
beta
,
APPLY_SPECIFIC
(
input
),
CudaNdarray_DEV_DATA
(
*
input
));
}
#else
err
=
cudnnConvolutionBackwardData
(
err
=
cudnnConvolutionBackwardData
(
_handle
,
_handle
,
(
void
*
)
&
alpha
,
(
void
*
)
&
alpha
,
...
@@ -39,6 +239,8 @@ APPLY_SPECIFIC(conv_gi)(CudaNdarray *kerns, CudaNdarray *output,
...
@@ -39,6 +239,8 @@ APPLY_SPECIFIC(conv_gi)(CudaNdarray *kerns, CudaNdarray *output,
desc
,
desc
,
(
void
*
)
&
beta
,
(
void
*
)
&
beta
,
APPLY_SPECIFIC
(
input
),
CudaNdarray_DEV_DATA
(
*
input
));
APPLY_SPECIFIC
(
input
),
CudaNdarray_DEV_DATA
(
*
input
));
#endif
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradI: error doing operation: %s"
,
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradI: error doing operation: %s"
,
cudnnGetErrorString
(
err
));
cudnnGetErrorString
(
err
));
...
...
theano/sandbox/cuda/dnn_gw.c
浏览文件 @
8cb9d50e
...
@@ -12,25 +12,226 @@ APPLY_SPECIFIC(conv_gw)(CudaNdarray *input, CudaNdarray *output,
...
@@ -12,25 +12,226 @@ APPLY_SPECIFIC(conv_gw)(CudaNdarray *input, CudaNdarray *output,
return
1
;
return
1
;
}
}
if
(
c_set_tensor
4
d
(
input
,
APPLY_SPECIFIC
(
input
))
==
-
1
)
if
(
c_set_tensor
N
d
(
input
,
APPLY_SPECIFIC
(
input
))
==
-
1
)
return
1
;
return
1
;
if
(
c_set_tensor
4
d
(
output
,
APPLY_SPECIFIC
(
output
))
==
-
1
)
if
(
c_set_tensor
N
d
(
output
,
APPLY_SPECIFIC
(
output
))
==
-
1
)
return
1
;
return
1
;
int
nb_dim
=
CudaNdarray_NDIM
(
output
);
#ifdef CONV_INPLACE
#ifdef CONV_INPLACE
Py_XDECREF
(
*
kerns
);
Py_XDECREF
(
*
kerns
);
*
kerns
=
km
;
*
kerns
=
km
;
Py_INCREF
(
*
kerns
);
Py_INCREF
(
*
kerns
);
#else
#else
if
(
CudaNdarray_prep_output
(
kerns
,
4
,
CudaNdarray_HOST_DIMS
(
km
))
!=
0
)
if
(
CudaNdarray_prep_output
(
kerns
,
nb_dim
,
CudaNdarray_HOST_DIMS
(
km
))
!=
0
)
return
1
;
return
1
;
if
(
beta
!=
0
.
0
&&
CudaNdarray_CopyFromCudaNdarray
(
*
kerns
,
km
))
if
(
beta
!=
0
.
0
&&
CudaNdarray_CopyFromCudaNdarray
(
*
kerns
,
km
))
return
1
;
return
1
;
#endif
#endif
if
(
c_set_filter
(
*
kerns
,
APPLY_SPECIFIC
(
kerns
))
==
-
1
)
if
(
c_set_filterNd
(
*
kerns
,
APPLY_SPECIFIC
(
kerns
))
==
-
1
)
return
1
;
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 3000
{
size_t
worksize
;
void
*
workspace
;
cudnnConvolutionBwdFilterAlgo_t
chosen_algo
;
if
(
CHOOSE_ALGO
)
{
// A new convolution implementation should be selected, based either on
// timing or heuristics, if in one of the two following cases :
// - The implementation should only be chosen during the first execution
// of an apply node and this is the first execution of the apply node.
// - The implementation should be chosen as often as necessary and the
// shapes of the inputs differ from the last time an implementation
// was chosen.
bool
reuse_previous_algo
;
if
(
CHOOSE_ALGO_ONCE
)
{
// Only choose a new implementation of none has been chosen before.
reuse_previous_algo
=
APPLY_SPECIFIC
(
previous_algo_set
);
}
else
{
// Reuse the previous implementation if the the kernels and the outputs
// have the same shapes as they had when the previous implementation
// was selected
bool
same_shapes
=
true
;
for
(
int
i
=
0
;
(
i
<
nb_dim
)
&&
same_shapes
;
i
++
)
{
same_shapes
&=
(
CudaNdarray_HOST_DIMS
(
input
)[
i
]
==
APPLY_SPECIFIC
(
previous_input_shape
)[
i
]);
same_shapes
&=
(
CudaNdarray_HOST_DIMS
(
output
)[
i
]
==
APPLY_SPECIFIC
(
previous_output_shape
)[
i
]);
}
reuse_previous_algo
=
same_shapes
;
}
// If the previously choosen implementation can't be reused, select a
// new one based on the shapes of the current inputs
if
(
!
reuse_previous_algo
)
{
// Obtain a convolution algorithm appropriate for the input and output
// shapes. Either by choosing one according to heuristics or by making
// CuDNN time every implementation and choose the best one.
if
(
CHOOSE_ALGO_TIME
)
{
// Time the different implementations to choose the best one
int
requestedCount
=
1
;
int
count
;
cudnnConvolutionBwdFilterAlgoPerf_t
choosen_algo_perf
;
err
=
cudnnFindConvolutionBackwardFilterAlgorithm
(
_handle
,
APPLY_SPECIFIC
(
input
),
APPLY_SPECIFIC
(
output
),
desc
,
APPLY_SPECIFIC
(
kerns
),
requestedCount
,
&
count
,
&
choosen_algo_perf
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradW: error selecting convolution algo: "
"%s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
chosen_algo
=
choosen_algo_perf
.
algo
;
}
else
{
// Choose the convolution implementation using heuristics based on the
// shapes of the inputs and the amount of memory available.
// Get the amount of available memory
size_t
free
=
0
,
total
=
0
;
cudaError_t
err2
=
cudaMemGetInfo
(
&
free
,
&
total
);
if
(
err2
!=
cudaSuccess
){
cudaGetLastError
();
fprintf
(
stderr
,
"Error when trying to find the memory information"
" on the GPU: %s
\n
"
,
cudaGetErrorString
(
err2
));
return
1
;
}
// Use heuristics to choose the implementation
err
=
cudnnGetConvolutionBackwardFilterAlgorithm
(
_handle
,
APPLY_SPECIFIC
(
input
),
APPLY_SPECIFIC
(
output
),
desc
,
APPLY_SPECIFIC
(
kerns
),
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT
,
free
,
&
chosen_algo
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradW: error selecting convolution algo: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
}
// Store the shapes of the inputs and kernels as well as the chosen
// algorithm for future use.
APPLY_SPECIFIC
(
previous_bwd_f_algo
)
=
chosen_algo
;
for
(
int
i
=
0
;
i
<
nb_dim
;
i
++
)
{
APPLY_SPECIFIC
(
previous_input_shape
)[
i
]
=
CudaNdarray_HOST_DIMS
(
input
)[
i
];
APPLY_SPECIFIC
(
previous_output_shape
)[
i
]
=
CudaNdarray_HOST_DIMS
(
output
)[
i
];
}
}
else
{
// Reuse the previously chosen convlution implementation
chosen_algo
=
APPLY_SPECIFIC
(
previous_bwd_f_algo
);
}
}
else
{
chosen_algo
=
CONV_ALGO
;
}
// The FFT implementation (only in v3 and onward) does not support strides,
// 1x1 filters or inputs with a spatial dimension larger than 1024.
// If the chosen implementation is FFT, validate that it can be used
// on the current data and default on a safe implementation if it
// can't.
if
(
chosen_algo
==
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT
&&
nb_dim
==
4
)
{
// Extract the properties of the convolution descriptor
int
pad_h
,
pad_w
,
stride_v
,
stride_h
,
upscale_x
,
upscale_y
;
cudnnConvolutionMode_t
mode
;
err
=
cudnnGetConvolution2dDescriptor
(
desc
,
&
pad_h
,
&
pad_w
,
&
stride_v
,
&
stride_h
,
&
upscale_x
,
&
upscale_y
,
&
mode
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradW: error getting convolution properties: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
// Extract the spatial size of the filters
int
filter_h
=
CudaNdarray_HOST_DIMS
(
*
kerns
)[
3
];
int
filter_w
=
CudaNdarray_HOST_DIMS
(
*
kerns
)[
4
];
// Extract the spatial size of the input
int
input_h
=
CudaNdarray_HOST_DIMS
(
input
)[
3
];
int
input_w
=
CudaNdarray_HOST_DIMS
(
input
)[
4
];
// Ensure that the selected implementation supports the requested
// convolution. Fall back to a safe implementation otherwise.
if
(
stride_v
!=
1
||
stride_h
!=
1
||
input_h
>
1024
||
input_w
>
1024
||
(
filter_h
==
1
&&
filter_w
==
1
))
{
chosen_algo
=
CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0
;
}
}
// Infer required workspace size from the chosen implementation
err
=
cudnnGetConvolutionBackwardFilterWorkspaceSize
(
_handle
,
APPLY_SPECIFIC
(
input
),
APPLY_SPECIFIC
(
output
),
desc
,
APPLY_SPECIFIC
(
kerns
),
chosen_algo
,
&
worksize
);
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradW: error getting worksize: %s"
,
cudnnGetErrorString
(
err
));
return
1
;
}
// Allocate workspace for the convolution
workspace
=
get_work_mem
(
worksize
);
if
(
workspace
==
NULL
&&
worksize
!=
0
)
return
1
;
return
1
;
// Perform the convolution
err
=
cudnnConvolutionBackwardFilter_v3
(
_handle
,
(
void
*
)
&
alpha
,
APPLY_SPECIFIC
(
input
),
CudaNdarray_DEV_DATA
(
input
),
APPLY_SPECIFIC
(
output
),
CudaNdarray_DEV_DATA
(
output
),
desc
,
chosen_algo
,
workspace
,
worksize
,
(
void
*
)
&
beta
,
APPLY_SPECIFIC
(
kerns
),
CudaNdarray_DEV_DATA
(
*
kerns
));
}
#else
err
=
cudnnConvolutionBackwardFilter
(
err
=
cudnnConvolutionBackwardFilter
(
_handle
,
_handle
,
(
void
*
)
&
alpha
,
(
void
*
)
&
alpha
,
...
@@ -39,6 +240,8 @@ APPLY_SPECIFIC(conv_gw)(CudaNdarray *input, CudaNdarray *output,
...
@@ -39,6 +240,8 @@ APPLY_SPECIFIC(conv_gw)(CudaNdarray *input, CudaNdarray *output,
desc
,
desc
,
(
void
*
)
&
beta
,
(
void
*
)
&
beta
,
APPLY_SPECIFIC
(
kerns
),
CudaNdarray_DEV_DATA
(
*
kerns
));
APPLY_SPECIFIC
(
kerns
),
CudaNdarray_DEV_DATA
(
*
kerns
));
#endif
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
if
(
err
!=
CUDNN_STATUS_SUCCESS
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradW: error doing operation: %s"
,
PyErr_Format
(
PyExc_RuntimeError
,
"GpuDnnConvGradW: error doing operation: %s"
,
cudnnGetErrorString
(
err
));
cudnnGetErrorString
(
err
));
...
...
theano/sandbox/cuda/tests/test_conv_cuda_ndarray.py
浏览文件 @
8cb9d50e
...
@@ -452,7 +452,8 @@ def test_default_conv():
...
@@ -452,7 +452,8 @@ def test_default_conv():
for
a
in
f
.
maker
.
fgraph
.
apply_nodes
])
for
a
in
f
.
maker
.
fgraph
.
apply_nodes
])
def
_test_full
(
cls
,
mode
=
None
,
version
=
[
-
1
],
extra_shapes
=
[]):
def
_test_full
(
cls
,
mode
=
None
,
version
=
[
-
1
],
extra_shapes
=
[],
test_bigger_kernels
=
True
):
seed_rng
()
seed_rng
()
shapes
=
get_basic_shapes
()
shapes
=
get_basic_shapes
()
shapes
+=
get_shapes2
()
shapes
+=
get_shapes2
()
...
@@ -481,14 +482,18 @@ def _test_full(cls, mode=None, version=[-1], extra_shapes=[]):
...
@@ -481,14 +482,18 @@ def _test_full(cls, mode=None, version=[-1], extra_shapes=[]):
,
((
16
,
5
,
64
,
64
),
(
8
,
5
,
8
,
8
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# a big one
,
((
16
,
5
,
64
,
64
),
(
8
,
5
,
8
,
8
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# a big one
,
((
16
,
1
,
28
,
28
),
(
20
,
1
,
5
,
5
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# MNIST LeNET layer 1
,
((
16
,
1
,
28
,
28
),
(
20
,
1
,
5
,
5
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# MNIST LeNET layer 1
,
((
20
,
16
,
32
,
32
),
(
1
,
16
,
28
,
28
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# layer 1 backprop to weights
,
((
20
,
16
,
32
,
32
),
(
1
,
16
,
28
,
28
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# layer 1 backprop to weights
]
# other test
if
test_bigger_kernels
:
,
((
3
,
1
,
1
,
1
),
(
2
,
1
,
5
,
3
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# kernel bigger then image
# Shapes where the kernel is larger than the image in some dimension
shapes
+=
[
((
3
,
1
,
1
,
1
),
(
2
,
1
,
5
,
3
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
,
((
3
,
2
,
1
,
1
),
(
4
,
2
,
1
,
1
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
,
((
3
,
2
,
1
,
1
),
(
4
,
2
,
1
,
1
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
,
((
3
,
2
,
4
,
4
),
(
4
,
2
,
2
,
6
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
,
((
3
,
2
,
4
,
4
),
(
4
,
2
,
2
,
6
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
,
((
3
,
2
,
4
,
4
),
(
4
,
2
,
8
,
6
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
# kernel bigger then image
,
((
3
,
2
,
4
,
4
),
(
4
,
2
,
8
,
6
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
,
((
4
,
2
,
10
,
10
),
(
3
,
2
,
2
,
12
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
,
((
4
,
2
,
10
,
10
),
(
3
,
2
,
2
,
12
),
(
1
,
1
),
(
1
,
1
),
(
1
,
1
))
]
]
shapes
+=
[
shapes
+=
[
# ((60,1,28,28),(20,1,5,5), (1, 1), (1, 1), (1, 1))#test_lenet_28 1 layers
# ((60,1,28,28),(20,1,5,5), (1, 1), (1, 1), (1, 1))#test_lenet_28 1 layers
# , ((60,20,12,12),(30,20,5,5), (1, 1), (1, 1), (1, 1))#test_lenet_28 2 layers
# , ((60,20,12,12),(30,20,5,5), (1, 1), (1, 1), (1, 1))#test_lenet_28 2 layers
...
@@ -516,9 +521,16 @@ def _test_full(cls, mode=None, version=[-1], extra_shapes=[]):
...
@@ -516,9 +521,16 @@ def _test_full(cls, mode=None, version=[-1], extra_shapes=[]):
def
test_full
():
def
test_full
():
for
t
in
_test_full
(
None
,
mode
=
theano_mode
,
# If using CuDNN version before v3, only run the tests where the
version
=
[
-
1
]):
# kernels are not larger than the input in any spatial dimension.
if
cuda
.
dnn
.
dnn_available
()
and
cuda
.
dnn
.
version
()
<
(
3000
,
3000
):
test_bigger_kernels
=
False
else
:
test_bigger_kernels
=
True
for
t
in
_test_full
(
None
,
mode
=
theano_mode
,
version
=
[
-
1
],
test_bigger_kernels
=
test_bigger_kernels
):
yield
t
yield
t
...
@@ -531,7 +543,16 @@ def test_gemm_full():
...
@@ -531,7 +543,16 @@ def test_gemm_full():
def
test_dnn_full
():
def
test_dnn_full
():
if
not
cuda
.
dnn
.
dnn_available
():
if
not
cuda
.
dnn
.
dnn_available
():
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
for
t
in
_test_full
(
DnnBase
,
mode
=
theano_mode
.
including
(
"cudnn"
)):
# If using CuDNN version before v3, only run the tests where the
# kernels are not larger than the input in any spatial dimension.
if
cuda
.
dnn
.
version
()
<
(
3000
,
3000
):
test_bigger_kernels
=
False
else
:
test_bigger_kernels
=
True
for
t
in
_test_full
(
DnnBase
,
mode
=
theano_mode
.
including
(
"cudnn"
),
test_bigger_kernels
=
test_bigger_kernels
):
yield
t
yield
t
...
...
theano/sandbox/cuda/tests/test_dnn.py
浏览文件 @
8cb9d50e
...
@@ -2,7 +2,7 @@ import logging
...
@@ -2,7 +2,7 @@ import logging
from
nose.plugins.skip
import
SkipTest
from
nose.plugins.skip
import
SkipTest
import
numpy
import
numpy
from
itertools
import
product
from
itertools
import
chain
,
product
import
theano
import
theano
from
six
import
StringIO
from
six
import
StringIO
...
@@ -13,6 +13,7 @@ from theano.tensor.signal.downsample import max_pool_2d
...
@@ -13,6 +13,7 @@ from theano.tensor.signal.downsample import max_pool_2d
from
theano.tensor.signal.downsample
import
DownsampleFactorMaxGrad
from
theano.tensor.signal.downsample
import
DownsampleFactorMaxGrad
import
theano.sandbox.cuda.dnn
as
dnn
import
theano.sandbox.cuda.dnn
as
dnn
from
theano.sandbox.cuda.basic_ops
import
GpuAllocEmpty
,
gpu_alloc_empty
from
theano.sandbox.cuda.basic_ops
import
GpuAllocEmpty
,
gpu_alloc_empty
from
theano.sandbox.cuda
import
float32_shared_constructor
as
shared
# Skip test if cuda_ndarray is not available.
# Skip test if cuda_ndarray is not available.
import
theano.sandbox.cuda
as
cuda
import
theano.sandbox.cuda
as
cuda
...
@@ -160,8 +161,28 @@ def test_dnn_conv_inplace():
...
@@ -160,8 +161,28 @@ def test_dnn_conv_inplace():
assert
len
([
n
for
n
in
topo
if
isinstance
(
n
.
op
,
GpuAllocEmpty
)])
==
2
assert
len
([
n
for
n
in
topo
if
isinstance
(
n
.
op
,
GpuAllocEmpty
)])
==
2
def
pool_2d_i2n
(
input
,
ds
=
(
2
,
2
),
strides
=
None
,
def
pool3d2d
(
input
,
ds
=
(
2
,
2
,
2
),
strides
=
None
,
pad
=
(
0
,
0
,
0
),
pad
=
(
0
,
0
),
pool_func
=
T
.
max
,
mode
=
'ignore_borders'
):
if
strides
is
None
:
strides
=
ds
shape
=
input
.
shape
# resahpe to B, C*0, 1, 2 and do the pooling on 1, 2
first
=
input
.
reshape
((
shape
[
0
],
shape
[
1
]
*
shape
[
2
],
shape
[
3
],
shape
[
4
]))
pooled1
=
pool_2d_i2n
(
first
,
ds
=
ds
[
1
:],
strides
=
strides
[
1
:],
pad
=
pad
[
1
:],
pool_function
=
pool_func
,
mode
=
mode
)
shp1
=
pooled1
.
shape
# reshape to B, C, 0, 1*2 and do the pooling on 0
second
=
pooled1
.
reshape
((
shape
[
0
],
shape
[
1
],
shape
[
2
],
shp1
[
2
]
*
shp1
[
3
]))
pooled2
=
pool_2d_i2n
(
second
,
ds
=
(
ds
[
0
],
1
),
strides
=
(
strides
[
0
],
1
),
pad
=
(
pad
[
0
],
0
),
pool_function
=
pool_func
,
mode
=
mode
)
shp2
=
pooled2
.
shape
return
pooled2
.
reshape
((
shape
[
0
],
shape
[
1
],
shp2
[
2
],
shp1
[
2
],
shp1
[
3
]))
def
pool_2d_i2n
(
input
,
ds
=
(
2
,
2
),
strides
=
None
,
pad
=
(
0
,
0
),
pool_function
=
T
.
max
,
mode
=
'ignore_borders'
):
pool_function
=
T
.
max
,
mode
=
'ignore_borders'
):
if
strides
is
None
:
if
strides
is
None
:
strides
=
ds
strides
=
ds
...
@@ -301,6 +322,111 @@ def test_pooling():
...
@@ -301,6 +322,111 @@ def test_pooling():
assert
numpy
.
allclose
(
c_out
,
g_out
)
assert
numpy
.
allclose
(
c_out
,
g_out
)
def
test_pooling3d
():
# CuDNN 3d pooling requires CuDNN v3. Don't test if the CuDNN version is
# too old.
if
not
cuda
.
dnn
.
dnn_available
()
or
cuda
.
dnn
.
version
()
<
(
3000
,
3000
):
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
x
=
T
.
TensorType
(
broadcastable
=
(
False
,
False
,
False
,
False
,
False
),
dtype
=
'float32'
)()
for
mode
,
pad
in
product
((
'max'
,
'average_inc_pad'
,
'average_exc_pad'
),
((
0
,
0
,
0
),
(
1
,
0
,
0
),
(
0
,
1
,
0
),
(
0
,
0
,
1
),
(
2
,
3
,
2
),
(
3
,
2
,
2
),
(
2
,
2
,
3
))):
if
mode
==
'max'
:
func
=
T
.
max
else
:
func
=
T
.
mean
if
pad
!=
(
0
,
0
,
0
)
and
cuda
.
dnn
.
version
()
==
-
1
:
continue
if
pad
!=
(
0
,
0
,
0
)
and
func
is
T
.
mean
:
continue
for
ws
in
(
4
,
2
,
5
):
for
stride
in
(
2
,
3
):
if
stride
>
ws
:
continue
if
pad
[
0
]
>
stride
or
pad
[
1
]
>
stride
or
pad
[
2
]
>
stride
:
# Not implemented
continue
out1
=
cuda
.
dnn
.
dnn_pool
(
x
,
(
ws
,
ws
,
ws
),
stride
=
(
stride
,
stride
,
stride
),
pad
=
pad
,
mode
=
mode
)
out2
=
pool3d2d
(
x
,
ds
=
(
ws
,
ws
,
ws
),
strides
=
(
stride
,
stride
,
stride
),
pad
=
pad
,
pool_func
=
func
)
# For max pooling pool3d2d explicitly pads the input with
# -inf. Because of this, the compilation mode for the function
# that uses pool3d2d should not check for infinite values or
# it will falsely believe there is a error in the graph.
mode_without_gpu2
=
mode_without_gpu
.
including
()
mode_without_gpu2
.
check_isfinite
=
False
f1
=
theano
.
function
([
x
],
out1
,
mode
=
mode_with_gpu
)
assert
any
([
isinstance
(
node
.
op
,
cuda
.
dnn
.
GpuDnnPool
)
for
node
in
f1
.
maker
.
fgraph
.
apply_nodes
])
f2
=
theano
.
function
([
x
],
out2
,
mode
=
mode_without_gpu2
)
assert
not
any
([
isinstance
(
node
.
op
,
cuda
.
dnn
.
GpuDnnPool
)
for
node
in
f2
.
maker
.
fgraph
.
apply_nodes
])
for
shp
in
[(
1
,
10
,
100
,
100
,
100
),
(
1
,
3
,
99
,
99
,
99
),
(
32
,
1
,
147
,
197
,
37
),
]:
data
=
numpy
.
random
.
normal
(
0
,
1
,
shp
)
.
astype
(
"float32"
)
a
=
f1
(
data
)
.
__array__
()
b
=
f2
(
data
)
.
__array__
()
utt
.
assert_allclose
(
a
,
b
,
atol
=
numpy
.
finfo
(
numpy
.
float32
)
.
eps
)
# Test the grad
for
shp
in
[(
1
,
1
,
2
,
2
,
2
),
(
1
,
1
,
3
,
3
,
3
),
(
1
,
1
,
3
,
3
,
4
),
(
1
,
1
,
3
,
4
,
3
),
(
1
,
1
,
4
,
3
,
3
),
(
1
,
1
,
4
,
4
,
4
),
(
1
,
1
,
5
,
5
,
5
)]:
data
=
numpy
.
random
.
normal
(
0
,
1
,
shp
)
.
astype
(
"float32"
)
*
10
ws
=
2
stride
=
2
if
pad
[
0
]
>
stride
or
pad
[
1
]
>
stride
or
pad
[
2
]
>
stride
:
# Not implemented
continue
# Test the GPU grad + GPU implementation
def
fn
(
x
):
dnn_op
=
cuda
.
dnn
.
dnn_pool
(
x
,
ws
=
(
ws
,
ws
,
ws
),
stride
=
(
stride
,
stride
,
stride
),
pad
=
pad
,
mode
=
mode
)
return
dnn_op
theano
.
tests
.
unittest_tools
.
verify_grad
(
fn
,
[
data
],
cast_to_output_type
=
False
,
mode
=
mode_with_gpu
)
# Confirm that we get the good op.
fg
=
theano
.
function
([
x
],
theano
.
grad
(
fn
(
x
)
.
sum
(),
x
),
mode
=
mode_with_gpu
)
assert
any
([
isinstance
(
node
.
op
,
cuda
.
dnn
.
GpuDnnPoolGrad
)
for
node
in
fg
.
maker
.
fgraph
.
toposort
()])
g_out
=
fg
(
data
)
# Compare again the CPU result
out
=
pool3d2d
(
x
,
(
ws
,
ws
,
ws
),
strides
=
(
stride
,
stride
,
stride
),
pad
=
pad
,
pool_func
=
func
)
fc
=
theano
.
function
([
x
],
theano
.
grad
(
out
.
sum
(),
x
),
mode
=
mode_without_gpu
)
c_out
=
fc
(
data
)
assert
numpy
.
allclose
(
c_out
,
g_out
)
def
test_pooling_opt
():
def
test_pooling_opt
():
if
not
cuda
.
dnn
.
dnn_available
():
if
not
cuda
.
dnn
.
dnn_available
():
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
...
@@ -324,6 +450,45 @@ def test_pooling_opt():
...
@@ -324,6 +450,45 @@ def test_pooling_opt():
for
n
in
f
.
maker
.
fgraph
.
toposort
()])
for
n
in
f
.
maker
.
fgraph
.
toposort
()])
def
test_log_softmax
():
# This is a test for an optimization that depends on CuDNN v3 or
# more recent. Don't test if the CuDNN version is too old.
if
not
cuda
.
dnn
.
dnn_available
()
or
cuda
.
dnn
.
version
()
<
(
3000
,
3000
):
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
x
=
T
.
ftensor4
()
softmax_out
=
dnn
.
GpuDnnSoftmax
(
'bc01'
,
'accurate'
,
'channel'
)(
x
)
log_out
=
T
.
log
(
T
.
as_tensor_variable
(
softmax_out
))
f
=
theano
.
function
([
x
],
log_out
,
mode
=
mode_with_gpu
)
# Ensure that the optimization has been applied
dnn_softmax_nodes
=
[
n
for
n
in
f
.
maker
.
fgraph
.
toposort
()
if
isinstance
(
n
.
op
,
cuda
.
dnn
.
GpuDnnSoftmax
)]
assert
len
(
dnn_softmax_nodes
)
==
1
assert
dnn_softmax_nodes
[
0
]
.
op
.
algo
==
"log"
# Ensure that the output of the function is valid
input_shapes
=
[(
3
,
4
,
5
,
6
),
(
1025
,
2
,
3
,
4
),
(
2
,
1025
,
3
,
4
),
(
2
,
3
,
1025
,
4
),
(
2
,
3
,
4
,
1025
),
(
66000
,
2
,
3
,
4
),
(
2
,
66000
,
3
,
4
),
(
2
,
3
,
66000
,
4
),
(
2
,
3
,
4
,
66000
)]
for
inp_shape
in
input_shapes
:
input_val
=
numpy
.
random
.
normal
(
0
,
1
,
inp_shape
)
.
astype
(
"float32"
)
out
=
f
(
input_val
)
expected_out
=
numpy
.
log
(
numpy
.
exp
(
input_val
)
/
numpy
.
exp
(
input_val
)
.
sum
(
1
)[:,
None
,
:,
:])
utt
.
assert_allclose
(
out
,
expected_out
)
def
test_dnn_tag
():
def
test_dnn_tag
():
"""
"""
Test that if cudnn isn't avail we crash and that if it is avail, we use it.
Test that if cudnn isn't avail we crash and that if it is avail, we use it.
...
@@ -432,6 +597,45 @@ class TestDnnInferShapes(utt.InferShapeTester):
...
@@ -432,6 +597,45 @@ class TestDnnInferShapes(utt.InferShapeTester):
dnn
.
GpuDnnConv
dnn
.
GpuDnnConv
)
)
def
test_conv3d
(
self
):
if
not
(
cuda
.
dnn
.
dnn_available
()
and
dnn
.
version
()
>=
(
2000
,
2000
)):
raise
SkipTest
(
'"CuDNN 3D convolution requires CuDNN v2'
)
ftensor5
=
T
.
TensorType
(
dtype
=
"float32"
,
broadcastable
=
(
False
,)
*
5
)
img
=
ftensor5
(
'img'
)
kerns
=
ftensor5
(
'kerns'
)
out
=
ftensor5
(
'out'
)
img_val
=
numpy
.
asarray
(
numpy
.
random
.
rand
(
7
,
2
,
6
,
4
,
11
),
dtype
=
'float32'
)
kern_vals
=
numpy
.
asarray
(
numpy
.
random
.
rand
(
8
,
2
,
4
,
3
,
1
),
dtype
=
'float32'
)
for
params
in
product
(
[
'valid'
,
'full'
],
[(
1
,
1
,
1
),
(
2
,
2
,
2
)],
[
'conv'
,
'cross'
]
):
out_vals
=
numpy
.
zeros
(
dnn
.
GpuDnnConv3d
.
get_out_shape
(
img_val
.
shape
,
kern_vals
.
shape
,
border_mode
=
params
[
0
],
subsample
=
params
[
1
]),
dtype
=
'float32'
)
desc
=
dnn
.
GpuDnnConvDesc
(
border_mode
=
params
[
0
],
subsample
=
params
[
1
],
conv_mode
=
params
[
2
]
)(
img
.
shape
,
kerns
.
shape
)
conv
=
dnn
.
GpuDnnConv3d
()(
img
,
kerns
,
out
,
desc
)
self
.
_compile_and_check
(
[
img
,
kerns
,
out
],
[
conv
],
[
img_val
,
kern_vals
,
out_vals
],
dnn
.
GpuDnnConv3d
)
def
test_conv_gradw
(
self
):
def
test_conv_gradw
(
self
):
if
not
dnn
.
dnn_available
():
if
not
dnn
.
dnn_available
():
raise
SkipTest
(
dnn
.
dnn_available
.
msg
)
raise
SkipTest
(
dnn
.
dnn_available
.
msg
)
...
@@ -481,6 +685,51 @@ class TestDnnInferShapes(utt.InferShapeTester):
...
@@ -481,6 +685,51 @@ class TestDnnInferShapes(utt.InferShapeTester):
dnn
.
GpuDnnConvGradW
dnn
.
GpuDnnConvGradW
)
)
def
test_conv3d_gradw
(
self
):
if
not
(
cuda
.
dnn
.
dnn_available
()
and
dnn
.
version
()
>=
(
2000
,
2000
)):
raise
SkipTest
(
'"CuDNN 3D convolution requires CuDNN v2'
)
ftensor5
=
T
.
TensorType
(
dtype
=
"float32"
,
broadcastable
=
(
False
,)
*
5
)
img
=
ftensor5
(
'img'
)
kerns
=
ftensor5
(
'kerns'
)
out
=
ftensor5
(
'out'
)
img_val
=
numpy
.
asarray
(
numpy
.
random
.
rand
(
9
,
2
,
4
,
8
,
7
),
dtype
=
'float32'
)
kern_vals
=
numpy
.
asarray
(
numpy
.
random
.
rand
(
11
,
2
,
3
,
1
,
4
),
dtype
=
'float32'
)
for
params
in
product
(
[
'valid'
,
'full'
],
[(
1
,
1
,
1
),
(
2
,
2
,
2
)],
[
'conv'
,
'cross'
]
):
out_vals
=
numpy
.
zeros
(
dnn
.
GpuDnnConv3d
.
get_out_shape
(
img_val
.
shape
,
kern_vals
.
shape
,
border_mode
=
params
[
0
],
subsample
=
params
[
1
]),
dtype
=
'float32'
)
desc
=
dnn
.
GpuDnnConvDesc
(
border_mode
=
params
[
0
],
subsample
=
params
[
1
],
conv_mode
=
params
[
2
]
)(
img
.
shape
,
out
.
shape
)
conv_grad_w
=
dnn
.
GpuDnnConv3dGradW
()(
img
,
out
,
kerns
,
desc
,
)
self
.
_compile_and_check
(
[
img
,
out
,
kerns
],
[
conv_grad_w
],
[
img_val
,
out_vals
,
kern_vals
],
dnn
.
GpuDnnConv3dGradW
)
def
test_conv_gradi
(
self
):
def
test_conv_gradi
(
self
):
if
not
dnn
.
dnn_available
():
if
not
dnn
.
dnn_available
():
raise
SkipTest
(
dnn
.
dnn_available
.
msg
)
raise
SkipTest
(
dnn
.
dnn_available
.
msg
)
...
@@ -492,7 +741,7 @@ class TestDnnInferShapes(utt.InferShapeTester):
...
@@ -492,7 +741,7 @@ class TestDnnInferShapes(utt.InferShapeTester):
dtype
=
'float32'
dtype
=
'float32'
)
)
kern_vals
=
numpy
.
asarray
(
kern_vals
=
numpy
.
asarray
(
numpy
.
random
.
rand
(
13
,
14
,
15
,
16
),
numpy
.
random
.
rand
(
4
,
14
,
15
,
16
),
dtype
=
'float32'
dtype
=
'float32'
)
)
...
@@ -526,6 +775,51 @@ class TestDnnInferShapes(utt.InferShapeTester):
...
@@ -526,6 +775,51 @@ class TestDnnInferShapes(utt.InferShapeTester):
dnn
.
GpuDnnConvGradI
dnn
.
GpuDnnConvGradI
)
)
def
test_conv3d_gradi
(
self
):
if
not
(
cuda
.
dnn
.
dnn_available
()
and
dnn
.
version
()
>=
(
2000
,
2000
)):
raise
SkipTest
(
'"CuDNN 3D convolution requires CuDNN v2'
)
ftensor5
=
T
.
TensorType
(
dtype
=
"float32"
,
broadcastable
=
(
False
,)
*
5
)
img
=
ftensor5
(
'img'
)
kerns
=
ftensor5
(
'kerns'
)
out
=
ftensor5
(
'out'
)
img_val
=
numpy
.
asarray
(
numpy
.
random
.
rand
(
8
,
4
,
6
,
7
,
5
),
dtype
=
'float32'
)
kern_vals
=
numpy
.
asarray
(
numpy
.
random
.
rand
(
9
,
4
,
5
,
1
,
2
),
dtype
=
'float32'
)
for
params
in
product
(
[
'valid'
,
'full'
],
[(
1
,
1
,
1
),
(
2
,
2
,
2
)],
[
'conv'
,
'cross'
]
):
out_vals
=
numpy
.
zeros
(
dnn
.
GpuDnnConv3d
.
get_out_shape
(
img_val
.
shape
,
kern_vals
.
shape
,
border_mode
=
params
[
0
],
subsample
=
params
[
1
]),
dtype
=
'float32'
)
desc
=
dnn
.
GpuDnnConvDesc
(
border_mode
=
params
[
0
],
subsample
=
params
[
1
],
conv_mode
=
params
[
2
]
)(
img
.
shape
,
kerns
.
shape
)
conv_grad_i
=
dnn
.
GpuDnnConv3dGradI
()(
kerns
,
out
,
img
,
desc
,
)
self
.
_compile_and_check
(
[
kerns
,
out
,
img
],
[
conv_grad_i
],
[
kern_vals
,
out_vals
,
img_val
],
dnn
.
GpuDnnConv3dGradI
)
def
test_pool
(
self
):
def
test_pool
(
self
):
if
not
dnn
.
dnn_available
():
if
not
dnn
.
dnn_available
():
raise
SkipTest
(
dnn
.
dnn_available
.
msg
)
raise
SkipTest
(
dnn
.
dnn_available
.
msg
)
...
@@ -763,6 +1057,202 @@ def test_dnn_conv_grad():
...
@@ -763,6 +1057,202 @@ def test_dnn_conv_grad():
utt
.
verify_grad
(
dconvw
,
[
img_val
,
kern_val
,
out_val
])
utt
.
verify_grad
(
dconvw
,
[
img_val
,
kern_val
,
out_val
])
def
get_conv3d_test_cases
():
# Every element of test_shapes follows the format
# [input_shape, filter_shape, subsample]
test_shapes
=
[[(
128
,
3
,
5
,
5
,
5
),
(
64
,
3
,
1
,
2
,
4
),
(
1
,
1
,
1
)],
[(
8
,
4
,
20
,
12
,
15
),
(
5
,
4
,
6
,
12
,
4
),
(
2
,
2
,
2
)],
[(
8
,
1
,
20
,
12
,
15
),
(
5
,
1
,
6
,
12
,
4
),
(
3
,
3
,
3
)],
[(
8
,
1
,
20
,
12
,
15
),
(
5
,
1
,
6
,
12
,
4
),
(
3
,
2
,
1
)],
[(
8
,
1
,
20
,
12
,
15
),
(
5
,
1
,
6
,
12
,
4
),
(
3
,
2
,
1
)],
# Test with 1x1x1 filters
[(
8
,
1
,
10
,
10
,
10
),
(
10
,
1
,
1
,
1
,
1
),
(
1
,
1
,
1
)],
# Test with dimensions larger than 1024 (thread block dim)
[(
1025
,
1
,
2
,
3
,
4
),
(
5
,
1
,
1
,
2
,
3
),
(
1
,
1
,
1
)],
[(
8
,
1
,
2
,
3
,
4
),
(
1025
,
1
,
1
,
2
,
3
),
(
1
,
1
,
1
)],
[(
8
,
1025
,
2
,
3
,
4
),
(
5
,
1025
,
1
,
1
,
2
),
(
1
,
1
,
1
)],
[(
8
,
1
,
1030
,
3
,
4
),
(
5
,
1
,
1025
,
1
,
1
),
(
1
,
1
,
1
)],
[(
8
,
1
,
2
,
1030
,
4
),
(
5
,
1
,
2
,
1025
,
1
),
(
1
,
1
,
1
)],
[(
8
,
1
,
2
,
3
,
1030
),
(
5
,
1
,
1
,
2
,
1025
),
(
1
,
1
,
1
)],
# The equivalent of this caused a crash with conv2d
[(
1
,
1
,
1
,
44800
,
1
),
(
6
,
1
,
1
,
1
,
1
),
(
1
,
1
,
1
)]]
# With border mode 'full', test with kernel bigger than image in some/all
# dimensions
test_shapes_full
=
[[(
6
,
2
,
2
,
2
,
2
),
(
4
,
2
,
3
,
1
,
1
),
(
1
,
1
,
1
)],
[(
6
,
2
,
2
,
2
,
2
),
(
4
,
2
,
1
,
3
,
1
),
(
1
,
1
,
1
)],
[(
6
,
2
,
2
,
2
,
2
),
(
4
,
2
,
1
,
1
,
3
),
(
1
,
1
,
1
)],
[(
6
,
2
,
2
,
2
,
2
),
(
4
,
2
,
5
,
5
,
5
),
(
1
,
1
,
1
)]]
border_modes
=
[
'valid'
,
'full'
,
(
1
,
2
,
3
),
(
3
,
2
,
1
),
1
,
2
]
conv_modes
=
[
'conv'
,
'cross'
]
if
cuda
.
dnn
.
dnn_available
()
and
dnn
.
version
()
>=
(
3000
,
3000
):
itt
=
chain
(
product
(
test_shapes
,
border_modes
,
conv_modes
),
product
(
test_shapes_full
,
[
'full'
],
conv_modes
))
else
:
# CuDNN, before V3, did not support kernels larger than the inputs,
# even if the original inputs were padded so they would be larger than
# the kernels. If using a version older than V3 don't run the tests
# with kernels larger than the unpadded inputs.
itt
=
product
(
test_shapes
,
border_modes
,
conv_modes
)
return
itt
def
test_conv3d_fwd
():
if
not
(
cuda
.
dnn
.
dnn_available
()
and
dnn
.
version
()
>=
(
2000
,
2000
)):
raise
SkipTest
(
'"CuDNN 3D convolution requires CuDNN v2'
)
def
run_conv3d_fwd
(
inputs_shape
,
filters_shape
,
subsample
,
border_mode
,
conv_mode
):
inputs_val
=
numpy
.
random
.
random
(
inputs_shape
)
.
astype
(
'float32'
)
filters_val
=
numpy
.
random
.
random
(
filters_shape
)
.
astype
(
'float32'
)
# Scale down the input values to prevent very large absolute errors
# due to float rounding
inputs_val
/=
10
filters_val
/=
10
inputs
=
shared
(
inputs_val
)
filters
=
shared
(
filters_val
)
bias
=
shared
(
numpy
.
zeros
(
filters_shape
[
0
])
.
astype
(
'float32'
))
# Compile a theano function for the CuDNN implementation
conv
=
dnn
.
dnn_conv3d
(
img
=
inputs
,
kerns
=
filters
,
border_mode
=
border_mode
,
subsample
=
subsample
,
conv_mode
=
conv_mode
)
f
=
theano
.
function
([],
conv
,
mode
=
mode_with_gpu
)
# If conv_mode is 'conv' the reference implementation should use
# filters filpped according to the width, height and time axis
if
conv_mode
==
'conv'
:
flipped_filters
=
filters
[:,
:,
::
-
1
,
::
-
1
,
::
-
1
]
else
:
flipped_filters
=
filters
# If border mode is anything but 'valid', the reference implementation
# should operate on padded inputs
if
border_mode
==
'valid'
:
padded_inputs
=
inputs
else
:
if
border_mode
==
'full'
:
pad_per_dim
=
[
filters_shape
[
i
]
-
1
for
i
in
range
(
2
,
5
)]
else
:
if
isinstance
(
border_mode
,
int
):
pad_per_dim
=
[
border_mode
]
*
3
else
:
pad_per_dim
=
border_mode
pad_before_after
=
([(
0
,
0
),
(
0
,
0
)]
+
[(
p
,
p
)
for
p
in
pad_per_dim
])
padded_inputs_val
=
numpy
.
pad
(
inputs_val
,
pad_before_after
,
'constant'
)
padded_inputs
=
shared
(
padded_inputs_val
)
# Compile a theano function for the reference implementation
conv_ref
=
theano
.
tensor
.
nnet
.
conv3D
(
V
=
padded_inputs
.
dimshuffle
(
0
,
2
,
3
,
4
,
1
),
W
=
flipped_filters
.
dimshuffle
(
0
,
2
,
3
,
4
,
1
),
b
=
bias
,
d
=
subsample
)
f_ref
=
theano
.
function
([],
conv_ref
.
dimshuffle
(
0
,
4
,
1
,
2
,
3
))
# Compare the results of the two implementations
res_ref
=
f_ref
()
res
=
f
()
utt
.
assert_allclose
(
res_ref
,
res
)
test_cases
=
get_conv3d_test_cases
()
for
(
i_shape
,
f_shape
,
subsample
),
border_mode
,
conv_mode
in
test_cases
:
yield
(
run_conv3d_fwd
,
i_shape
,
f_shape
,
subsample
,
border_mode
,
conv_mode
)
def
test_conv3d_bwd
():
if
not
(
cuda
.
dnn
.
dnn_available
()
and
dnn
.
version
()
>=
(
2000
,
2000
)):
raise
SkipTest
(
'"CuDNN 3D convolution requires CuDNN v2'
)
def
run_conv3d_bwd
(
inputs_shape
,
filters_shape
,
subsample
,
border_mode
,
conv_mode
):
inputs_val
=
numpy
.
random
.
random
(
inputs_shape
)
.
astype
(
'float32'
)
filters_val
=
numpy
.
random
.
random
(
filters_shape
)
.
astype
(
'float32'
)
inputs
=
shared
(
inputs_val
)
filters
=
shared
(
filters_val
)
bias
=
shared
(
numpy
.
zeros
(
filters_shape
[
0
])
.
astype
(
'float32'
))
# Compile a theano function for the CuDNN implementation
conv
=
dnn
.
dnn_conv3d
(
img
=
inputs
,
kerns
=
filters
,
border_mode
=
border_mode
,
subsample
=
subsample
,
conv_mode
=
conv_mode
)
grad_i
,
grad_w
=
theano
.
tensor
.
grad
(
conv
.
sum
(),
[
inputs
,
filters
])
f
=
theano
.
function
([],
[
grad_i
,
grad_w
],
mode
=
mode_with_gpu
)
# If conv_mode is 'conv' the reference implementation should use
# filters filpped according to the width, height and time axis
if
conv_mode
==
'conv'
:
flipped_filters
=
filters
[:,
:,
::
-
1
,
::
-
1
,
::
-
1
]
else
:
flipped_filters
=
filters
# If border mode is anything but 'valid', the reference implementation
# should operate on padded inputs
if
border_mode
==
'valid'
:
padded_inputs
=
inputs
else
:
if
border_mode
==
'full'
:
pad_per_dim
=
[
filters_shape
[
i
]
-
1
for
i
in
range
(
2
,
5
)]
else
:
if
isinstance
(
border_mode
,
int
):
pad_per_dim
=
[
border_mode
]
*
3
else
:
pad_per_dim
=
border_mode
pad_before_after
=
([(
0
,
0
),
(
0
,
0
)]
+
[(
p
,
p
)
for
p
in
pad_per_dim
])
padded_inputs_val
=
numpy
.
pad
(
inputs_val
,
pad_before_after
,
'constant'
)
padded_inputs
=
shared
(
padded_inputs_val
)
# Compile a theano function for the reference implementation
conv_ref
=
theano
.
tensor
.
nnet
.
conv3D
(
V
=
padded_inputs
.
dimshuffle
(
0
,
2
,
3
,
4
,
1
),
W
=
flipped_filters
.
dimshuffle
(
0
,
2
,
3
,
4
,
1
),
b
=
bias
,
d
=
subsample
)
(
grad_padded_i_ref
,
grad_w_ref
)
=
theano
.
tensor
.
grad
(
conv_ref
.
sum
(),
[
padded_inputs
,
filters
])
# Recover grad_i_ref from grad_padded_i_ref
if
border_mode
==
'valid'
:
grad_i_ref
=
grad_padded_i_ref
else
:
shp
=
grad_padded_i_ref
.
shape
grad_i_ref
=
grad_padded_i_ref
[
:,
:,
pad_per_dim
[
0
]:
shp
[
2
]
-
pad_per_dim
[
0
],
pad_per_dim
[
1
]:
shp
[
3
]
-
pad_per_dim
[
1
],
pad_per_dim
[
2
]:
shp
[
4
]
-
pad_per_dim
[
2
]]
f_ref
=
theano
.
function
([],
[
grad_i_ref
,
grad_w_ref
])
# Compare the results of the two implementations
res_ref
=
f_ref
()
res
=
f
()
utt
.
assert_allclose
(
res_ref
[
0
],
res
[
0
])
utt
.
assert_allclose
(
res_ref
[
1
],
res
[
1
])
test_cases
=
get_conv3d_test_cases
()
for
(
i_shape
,
f_shape
,
subsample
),
border_mode
,
conv_mode
in
test_cases
:
yield
(
run_conv3d_bwd
,
i_shape
,
f_shape
,
subsample
,
border_mode
,
conv_mode
)
def
test_version
():
def
test_version
():
if
not
cuda
.
dnn
.
dnn_available
():
if
not
cuda
.
dnn
.
dnn_available
():
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
raise
SkipTest
(
cuda
.
dnn
.
dnn_available
.
msg
)
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论