Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
c4135899
提交
c4135899
authored
8月 10, 2015
作者:
Iban Harlouchet
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
numpydoc for theano/sandbox/cuda/blas.py
上级
257d4b5f
隐藏空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
198 行增加
和
131 行删除
+198
-131
blas.py
theano/sandbox/cuda/blas.py
+198
-131
没有找到文件。
theano/sandbox/cuda/blas.py
浏览文件 @
c4135899
...
@@ -217,6 +217,7 @@ batched_dot = BatchedDotOp()
...
@@ -217,6 +217,7 @@ batched_dot = BatchedDotOp()
class
GpuDot22
(
GpuOp
):
class
GpuDot22
(
GpuOp
):
"""
"""
Implement dot(2d, 2d) on the gpu.
Implement dot(2d, 2d) on the gpu.
"""
"""
def
__str__
(
self
):
def
__str__
(
self
):
return
'GpuDot22'
return
'GpuDot22'
...
@@ -299,7 +300,10 @@ class GpuDot22Scalar(GpuOp):
...
@@ -299,7 +300,10 @@ class GpuDot22Scalar(GpuOp):
"""
"""
Implement dot(2d, 2d) * scalar on the gpu.
Implement dot(2d, 2d) * scalar on the gpu.
:note: Not used anymore. Keep to allow unpickle of old graph.
Notes
-----
Not used anymore. Keep to allow unpickle of old graph.
"""
"""
def
__str__
(
self
):
def
__str__
(
self
):
return
'GpuDot22Scalar'
return
'GpuDot22Scalar'
...
@@ -707,16 +711,22 @@ gpu_ger_inplace = GpuGer(inplace=True)
...
@@ -707,16 +711,22 @@ gpu_ger_inplace = GpuGer(inplace=True)
class
BaseGpuCorrMM
(
GpuOp
):
class
BaseGpuCorrMM
(
GpuOp
):
"""Base class for `GpuCorrMM`, `GpuCorrMM_gradWeights` and
"""
Base class for `GpuCorrMM`, `GpuCorrMM_gradWeights` and
`GpuCorrMM_gradInputs`. Cannot be used directly.
`GpuCorrMM_gradInputs`. Cannot be used directly.
:param border_mode: one of 'valid', 'full', 'half'; additionally, the
Parameters
padding size could be directly specified by an integer or a pair of
----------
integers
border_mode : {'valid', 'full', 'half'}
:param subsample: perform subsampling of the output (default: (1, 1))
Additionally, the padding size could be directly specified by an integer
:param pad: *deprecated*, now you should always use border_mode
or a pair of integers
subsample
Perform subsampling of the output (default: (1, 1)).
pad
*deprecated*, now you should always use border_mode.
"""
"""
check_broadcast
=
False
check_broadcast
=
False
__props__
=
(
'border_mode'
,
'subsample'
)
__props__
=
(
'border_mode'
,
'subsample'
)
...
@@ -757,7 +767,10 @@ class BaseGpuCorrMM(GpuOp):
...
@@ -757,7 +767,10 @@ class BaseGpuCorrMM(GpuOp):
str
(
self
.
subsample
))
str
(
self
.
subsample
))
def
flops
(
self
,
inp
,
outp
):
def
flops
(
self
,
inp
,
outp
):
""" Useful with the hack in profilemode to print the MFlops"""
"""
Useful with the hack in profilemode to print the MFlops.
"""
# if the output shape is correct, then this gives the correct
# if the output shape is correct, then this gives the correct
# flops for any direction, sampling, padding, and border mode
# flops for any direction, sampling, padding, and border mode
inputs
,
filters
=
inp
inputs
,
filters
=
inp
...
@@ -794,32 +807,40 @@ class BaseGpuCorrMM(GpuOp):
...
@@ -794,32 +807,40 @@ class BaseGpuCorrMM(GpuOp):
Depending on the direction, one of bottom, weights, top will
Depending on the direction, one of bottom, weights, top will
receive the output, while the other two serve as inputs.
receive the output, while the other two serve as inputs.
:param bottom: Variable name of the input images in the forward pass,
Parameters
----------
bottom
Variable name of the input images in the forward pass,
or the gradient of the input images in backprop wrt. inputs
or the gradient of the input images in backprop wrt. inputs
:param weights: Variable name of the filters in the forward pass,
weights
Variable name of the filters in the forward pass,
or the gradient of the filters in backprop wrt. weights
or the gradient of the filters in backprop wrt. weights
:param top: Variable name of the output images / feature maps in the
top
Variable name of the output images / feature maps in the
forward pass, or the gradient of the outputs in the backprop passes
forward pass, or the gradient of the outputs in the backprop passes
:param direction: "forward" to correlate bottom with weights and store
direction : {'forward', 'backprop weights', 'backprop inputs'}
results in top,
"forward" to correlate bottom with weights and store
results in top,
"backprop weights" to do a valid convolution of bottom with top
"backprop weights" to do a valid convolution of bottom with top
(swapping the first two dimensions) and store results in weights,
(swapping the first two dimensions) and store results in weights,
and "backprop inputs" to do a full convolution of top with weights
and "backprop inputs" to do a full convolution of top with weights
(swapping the first two dimensions) and store results in bottom.
(swapping the first two dimensions) and store results in bottom.
:param sub: Dictionary of substitutions useable to help generating the
sub
C code.
Dictionary of substitutions useable to help generating the
C code.
:param height: If self.subsample[0] != 1, a variable giving the
height
height
of the filters for direction="backprop weights" or the height of
If self.subsample[0] != 1, a variable giving the height of the
the input images for direction="backprop inputs".
filters for direction="backprop weights" or the height of the input
images for direction="backprop inputs".
If self.border_mode == 'half', a variable giving the height of the
If self.border_mode == 'half', a variable giving the height of the
filters for direction="backprop weights". Ignored otherwise.
filters for direction="backprop weights".
:param width: If self.subsample[1] != 1, a variable giving the width
Ignored otherwise.
of the filters for direction="backprop weights" or the width of the
width
If self.subsample[1] != 1, a variable giving the width of the
filters for direction="backprop weights" or the width of the
input images for direction="backprop inputs".
input images for direction="backprop inputs".
If self.border_mode == 'half', a variable giving the width of the
If self.border_mode == 'half', a variable giving the width of the
filters for direction="backprop weights". Ignored otherwise.
filters for direction="backprop weights".
Ignored otherwise.
"""
"""
dH
,
dW
=
self
.
subsample
dH
,
dW
=
self
.
subsample
if
self
.
border_mode
==
"half"
:
if
self
.
border_mode
==
"half"
:
...
@@ -993,9 +1014,13 @@ class BaseGpuCorrMM(GpuOp):
...
@@ -993,9 +1014,13 @@ class BaseGpuCorrMM(GpuOp):
class
GpuCorrMM
(
BaseGpuCorrMM
):
class
GpuCorrMM
(
BaseGpuCorrMM
):
"""GPU correlation implementation using Matrix Multiplication.
"""
GPU correlation implementation using Matrix Multiplication.
:param border_mode: the width of a border of implicit zeros to pad the
Parameters
----------
border_mode
The width of a border of implicit zeros to pad the
input with. Must be a tuple with 2 elements giving the numbers of rows
input with. Must be a tuple with 2 elements giving the numbers of rows
and columns to pad on each side, or a single integer to pad the same
and columns to pad on each side, or a single integer to pad the same
on all sides, or a string shortcut setting the padding at runtime:
on all sides, or a string shortcut setting the padding at runtime:
...
@@ -1004,27 +1029,31 @@ class GpuCorrMM(BaseGpuCorrMM):
...
@@ -1004,27 +1029,31 @@ class GpuCorrMM(BaseGpuCorrMM):
``'half'`` for ``(kernel_rows // 2, kernel_columns // 2)`` (same
``'half'`` for ``(kernel_rows // 2, kernel_columns // 2)`` (same
convolution for odd-sized kernels). Note that the two widths are each
convolution for odd-sized kernels). Note that the two widths are each
applied twice, once per side (left and right, top and bottom).
applied twice, once per side (left and right, top and bottom).
:param subsample: the subsample operation applied to each output image.
subsample
The subsample operation applied to each output image.
Should be a tuple with 2 elements.
Should be a tuple with 2 elements.
`(sv, sh)` is equivalent to `GpuCorrMM(...)(...)[:,:,::sv, ::sh]`,
`(sv, sh)` is equivalent to `GpuCorrMM(...)(...)[:,:,::sv, ::sh]`,
but faster.
but faster.
Set to `(1, 1)` to disable subsampling.
Set to `(1, 1)` to disable subsampling.
:param pad: deprecated alias for `border_mode`.
pad
Deprecated alias for `border_mode`.
:note: Currently, the Op requires the inputs, filters and outputs to be
C-contiguous. Use :func:`gpu_contiguous
Notes
<theano.sandbox.cuda.basic_ops.gpu_contiguous>` on these arguments
-----
if needed.
Currently, the Op requires the inputs, filters and outputs to be
C-contiguous. Use :func:`gpu_contiguous
:note: You can either enable the Theano flag `optimizer_including=conv_gemm`
<theano.sandbox.cuda.basic_ops.gpu_contiguous>` on these arguments
to automatically replace all convolution operations with `GpuCorrMM`
if needed.
or one of its gradients, or you can use it as a replacement for
:func:`conv2d <theano.tensor.nnet.conv.conv2d>`, called as
You can either enable the Theano flag `optimizer_including=conv_gemm`
`GpuCorrMM(subsample=...)(image, filters)`. The latter is currently
to automatically replace all convolution operations with `GpuCorrMM`
faster, but note that it computes a correlation -- if you need to
or one of its gradients, or you can use it as a replacement for
compute a convolution, flip the filters as `filters[:,:,::-1,::-1]`.
:func:`conv2d <theano.tensor.nnet.conv.conv2d>`, called as
`GpuCorrMM(subsample=...)(image, filters)`. The latter is currently
:warning: For 700 series Nvidia GPUs of compute capability 3.5 and CUDA 5.0
faster, but note that it computes a correlation -- if you need to
compute a convolution, flip the filters as `filters[:,:,::-1,::-1]`.
..warning:: For 700 series Nvidia GPUs of compute capability 3.5 and CUDA 5.0
to 6.0, there is a bug in CUBLAS' matrix multiplication function that
to 6.0, there is a bug in CUBLAS' matrix multiplication function that
can make GpuCorrMM or its gradients crash for some input and filter
can make GpuCorrMM or its gradients crash for some input and filter
shapes. So if you have a Tesla K20, Tesla K40, Quadro K6000, GeForce GT
shapes. So if you have a Tesla K20, Tesla K40, Quadro K6000, GeForce GT
...
@@ -1032,6 +1061,7 @@ class GpuCorrMM(BaseGpuCorrMM):
...
@@ -1032,6 +1061,7 @@ class GpuCorrMM(BaseGpuCorrMM):
and experience a crash, switching to CUDA 6.5 or CUDA 4.2 should fix it.
and experience a crash, switching to CUDA 6.5 or CUDA 4.2 should fix it.
If this is not possible, changing the input or filter shapes (e.g., the
If this is not possible, changing the input or filter shapes (e.g., the
batchsize or number of filters) may also work around the CUBLAS bug.
batchsize or number of filters) may also work around the CUBLAS bug.
"""
"""
def
__init__
(
self
,
border_mode
=
"valid"
,
def
__init__
(
self
,
border_mode
=
"valid"
,
subsample
=
(
1
,
1
),
subsample
=
(
1
,
1
),
...
@@ -1068,11 +1098,13 @@ class GpuCorrMM(BaseGpuCorrMM):
...
@@ -1068,11 +1098,13 @@ class GpuCorrMM(BaseGpuCorrMM):
class
GpuCorrMM_gradWeights
(
BaseGpuCorrMM
):
class
GpuCorrMM_gradWeights
(
BaseGpuCorrMM
):
"""Gradient wrt. filters for `GpuCorrMM`.
"""
Gradient wrt. filters for `GpuCorrMM`.
:note: You will not want to use this directly, but rely on
Notes
Theano's automatic differentiation or graph optimization to
-----
use it as needed.
You will not want to use this directly, but rely on Theano's automatic
differentiation or graph optimization to use it as needed.
"""
"""
...
@@ -1126,11 +1158,13 @@ class GpuCorrMM_gradWeights(BaseGpuCorrMM):
...
@@ -1126,11 +1158,13 @@ class GpuCorrMM_gradWeights(BaseGpuCorrMM):
class
GpuCorrMM_gradInputs
(
BaseGpuCorrMM
):
class
GpuCorrMM_gradInputs
(
BaseGpuCorrMM
):
"""Gradient wrt. inputs for `GpuCorrMM`.
"""
Gradient wrt. inputs for `GpuCorrMM`.
:note: You will not want to use this directly, but rely on
Notes
Theano's automatic differentiation or graph optimization to
-----
use it as needed.
You will not want to use this directly, but rely on Theano's automatic
differentiation or graph optimization to use it as needed.
"""
"""
...
@@ -1180,8 +1214,12 @@ class GpuCorrMM_gradInputs(BaseGpuCorrMM):
...
@@ -1180,8 +1214,12 @@ class GpuCorrMM_gradInputs(BaseGpuCorrMM):
class
BaseGpuCorr3dMM
(
GpuOp
):
class
BaseGpuCorr3dMM
(
GpuOp
):
"""Base class for `GpuCorr3dMM`, `GpuCorr3dMM_gradWeights` and
"""
`GpuCorr3dMM_gradInputs`. Cannot be used directly."""
Base class for `GpuCorr3dMM`, `GpuCorr3dMM_gradWeights` and
`GpuCorr3dMM_gradInputs`. Cannot be used directly.
"""
__props__
=
(
'border_mode'
,
'subsample'
,
'pad'
)
__props__
=
(
'border_mode'
,
'subsample'
,
'pad'
)
def
__init__
(
self
,
border_mode
=
"valid"
,
def
__init__
(
self
,
border_mode
=
"valid"
,
...
@@ -1245,38 +1283,47 @@ class BaseGpuCorr3dMM(GpuOp):
...
@@ -1245,38 +1283,47 @@ class BaseGpuCorr3dMM(GpuOp):
Depending on the direction, one of bottom, weights, top will
Depending on the direction, one of bottom, weights, top will
receive the output, while the other two serve as inputs.
receive the output, while the other two serve as inputs.
:param bottom: Variable name of the input images in the forward pass,
Parameters
or the gradient of the input images in backprop wrt. inputs
----------
:param weights: Variable name of the filters in the forward pass,
bottom
or the gradient of the filters in backprop wrt. weights
Variable name of the input images in the forward pass,
:param top: Variable name of the output images / feature maps in the
or the gradient of the input images in backprop wrt. inputs.
forward pass, or the gradient of the outputs in the backprop passes
weights
:param direction: "forward" to correlate bottom with weights and store
Variable name of the filters in the forward pass,
results in top,
or the gradient of the filters in backprop wrt. weights.
top
Variable name of the output images / feature maps in the
forward pass, or the gradient of the outputs in the backprop passes.
direction : {'forward', 'backprop weights', 'backprop inputs'}
"forward" to correlate bottom with weights and store results in top,
"backprop weights" to do a valid convolution of bottom with top
"backprop weights" to do a valid convolution of bottom with top
(swapping the first two dimensions) and store results in weights,
(swapping the first two dimensions) and store results in weights,
and "backprop inputs" to do a full convolution of top with weights
and "backprop inputs" to do a full convolution of top with weights
(swapping the first two dimensions) and store results in bottom.
(swapping the first two dimensions) and store results in bottom.
:param sub: Dictionary of substitutions useable to help generating the
sub
C code.
Dictionary of substitutions useable to help generating the C code.
:param height: If self.subsample[0] != 1, a variable giving the height
height
If self.subsample[0] != 1, a variable giving the height
of the filters for direction="backprop weights" or the height of the
of the filters for direction="backprop weights" or the height of the
input images for direction="backprop inputs".
input images for direction="backprop inputs".
If self.pad == 'half', a variable giving the height of the filters
If self.pad == 'half', a variable giving the height of the filters
for direction="backprop weights".
for direction="backprop weights".
Ignored otherwise.
Ignored otherwise.
:param width: If self.subsample[1] != 1, a variable giving the width
width
If self.subsample[1] != 1, a variable giving the width
of the filters for direction="backprop weights" or the width of the
of the filters for direction="backprop weights" or the width of the
input images for direction="backprop inputs".
input images for direction="backprop inputs".
If self.pad == 'half', a variable giving the width of the filters
If self.pad == 'half', a variable giving the width of the filters
for direction="backprop weights".
for direction="backprop weights".
Ignored otherwise.
Ignored otherwise.
:param depth: If self.subsample[2] != 1, a variable giving the depth
depth
If self.subsample[2] != 1, a variable giving the depth
of the filters for direction="backprop weights" or the depth of the
of the filters for direction="backprop weights" or the depth of the
input images for direction="backprop inputs".
input images for direction="backprop inputs".
If self.pad == 'half', a variable giving the depth of the filters
If self.pad == 'half', a variable giving the depth of the filters
for direction="backprop weights".
for direction="backprop weights".
Ignored otherwise.
Ignored otherwise.
"""
"""
if
self
.
border_mode
!=
"valid"
:
if
self
.
border_mode
!=
"valid"
:
raise
ValueError
(
"mode must be 'valid'"
)
raise
ValueError
(
"mode must be 'valid'"
)
...
@@ -1503,7 +1550,34 @@ class BaseGpuCorr3dMM(GpuOp):
...
@@ -1503,7 +1550,34 @@ class BaseGpuCorr3dMM(GpuOp):
class
GpuCorr3dMM
(
BaseGpuCorr3dMM
):
class
GpuCorr3dMM
(
BaseGpuCorr3dMM
):
"""GPU correlation implementation using Matrix Multiplication.
"""GPU correlation implementation using Matrix Multiplication.
:warning: For 700 series Nvidia GPUs of compute capability 3.5 and CUDA 5.0
Parameters
----------
border_mode
Currently supports "valid" only; "full" can be simulated by setting
`pad="full"` (at the cost of performance), or by using
`GpuCorrMM_gradInputs`.
subsample
The subsample operation applied to each output image. Should be a tuple
with 3 elements. `(sv, sh, sl)` is equivalent to
`GpuCorrMM(...)(...)[:,:,::sv, ::sh, ::sl]`, but faster.
Set to `(1, 1, 1)` to disable subsampling.
pad
The width of a border of implicit zeros to pad the input image with.
Should be a tuple with 3 elements giving the numbers of rows and columns
to pad on each side, or "half" to set the padding
to `(kernel_rows // 2, kernel_columns // 2, kernel_depth // 2)`,
or "full" to set the padding
to `(kernel_rows - 1, kernel_columns - 1, kernel_depth - 1)` at runtime.
Set to `(0, 0, 0)` to disable padding.
Notes
-----
Currently, the Op requires the inputs, filters and outputs to be
C-contiguous. Use :func:`gpu_contiguous
<theano.sandbox.cuda.basic_ops.gpu_contiguous>` on these arguments
if needed.
.. warning:: For 700 series Nvidia GPUs of compute capability 3.5 and CUDA 5.0
to 6.0, there is a bug in CUBLAS' matrix multiplication function that
to 6.0, there is a bug in CUBLAS' matrix multiplication function that
can make GpuCorrMM or its gradients crash for some input and filter
can make GpuCorrMM or its gradients crash for some input and filter
shapes. So if you have a Tesla K20, Tesla K40, Quadro K6000, GeForce GT
shapes. So if you have a Tesla K20, Tesla K40, Quadro K6000, GeForce GT
...
@@ -1511,31 +1585,9 @@ class GpuCorr3dMM(BaseGpuCorr3dMM):
...
@@ -1511,31 +1585,9 @@ class GpuCorr3dMM(BaseGpuCorr3dMM):
and experience a crash, switching to CUDA 6.5 or CUDA 4.2 should fix it.
and experience a crash, switching to CUDA 6.5 or CUDA 4.2 should fix it.
If this is not possible, changing the input or filter shapes (e.g., the
If this is not possible, changing the input or filter shapes (e.g., the
batchsize or number of filters) may also work around the CUBLAS bug.
batchsize or number of filters) may also work around the CUBLAS bug.
"""
"""
def
__init__
(
self
,
border_mode
=
"valid"
,
def
__init__
(
self
,
border_mode
=
"valid"
,
subsample
=
(
1
,
1
,
1
),
pad
=
(
0
,
0
,
0
)):
subsample
=
(
1
,
1
,
1
),
pad
=
(
0
,
0
,
0
)):
"""
:param border_mode: currently supports "valid" only; "full" can be
simulated by setting `pad="full"` (at the cost of performance), or
by using `GpuCorrMM_gradInputs`
:param subsample: the subsample operation applied to each output image.
Should be a tuple with 3 elements.
`(sv, sh, sl)` is equivalent to `GpuCorrMM(...)(...)[:,:,::sv, ::sh, ::sl]`,
but faster.
Set to `(1, 1, 1)` to disable subsampling.
:param pad: the width of a border of implicit zeros to pad the input
image with. Should be a tuple with 3 elements giving the numbers of
rows and columns to pad on each side, or "half" to set the padding
to `(kernel_rows // 2, kernel_columns // 2, kernel_depth // 2)`, or "full" to set the
padding to `(kernel_rows - 1, kernel_columns - 1, kernel_depth - 1)` at runtime.
Set to `(0, 0, 0)` to disable padding.
:note: Currently, the Op requires the inputs, filters and outputs to be
C-contiguous. Use :func:`gpu_contiguous
<theano.sandbox.cuda.basic_ops.gpu_contiguous>` on these arguments
if needed.
"""
super
(
GpuCorr3dMM
,
self
)
.
__init__
(
border_mode
,
subsample
,
pad
)
super
(
GpuCorr3dMM
,
self
)
.
__init__
(
border_mode
,
subsample
,
pad
)
def
make_node
(
self
,
img
,
kern
):
def
make_node
(
self
,
img
,
kern
):
...
@@ -1570,8 +1622,11 @@ class GpuCorr3dMM(BaseGpuCorr3dMM):
...
@@ -1570,8 +1622,11 @@ class GpuCorr3dMM(BaseGpuCorr3dMM):
class
GpuCorr3dMM_gradWeights
(
BaseGpuCorr3dMM
):
class
GpuCorr3dMM_gradWeights
(
BaseGpuCorr3dMM
):
"""Gradient wrt. filters for `GpuCorr3dMM`.
"""Gradient wrt. filters for `GpuCorr3dMM`.
:note: You will not want to use this directly, but rely on Theano's
Notes
automatic differentiation or graph optimization to use it as needed.
-----
You will not want to use this directly, but rely on Theano's
automatic differentiation or graph optimization to use it as needed.
"""
"""
def
__init__
(
self
,
border_mode
=
"valid"
,
def
__init__
(
self
,
border_mode
=
"valid"
,
...
@@ -1627,8 +1682,11 @@ class GpuCorr3dMM_gradWeights(BaseGpuCorr3dMM):
...
@@ -1627,8 +1682,11 @@ class GpuCorr3dMM_gradWeights(BaseGpuCorr3dMM):
class
GpuCorr3dMM_gradInputs
(
BaseGpuCorr3dMM
):
class
GpuCorr3dMM_gradInputs
(
BaseGpuCorr3dMM
):
"""Gradient wrt. inputs for `GpuCorr3dMM`.
"""Gradient wrt. inputs for `GpuCorr3dMM`.
:note: You will not want to use this directly, but rely on Theano's
Notes
automatic differentiation or graph optimization to use it as needed.
-----
You will not want to use this directly, but rely on Theano's
automatic differentiation or graph optimization to use it as needed.
"""
"""
def
__init__
(
self
,
border_mode
=
"valid"
,
def
__init__
(
self
,
border_mode
=
"valid"
,
...
@@ -1683,6 +1741,48 @@ class GpuCorr3dMM_gradInputs(BaseGpuCorr3dMM):
...
@@ -1683,6 +1741,48 @@ class GpuCorr3dMM_gradInputs(BaseGpuCorr3dMM):
class
GpuConv
(
GpuOp
):
class
GpuConv
(
GpuOp
):
"""
"""
Implement the batched and stacked 2d convolution on the gpu.
Implement the batched and stacked 2d convolution on the gpu.
Parameters
----------
version
Each version of c_code implements many kernel for the
convolution. By default we try to guess the best one.
You can force one version with this parameter. This
parameter is used by the tests.
direction_hint : {'forward', 'bprop weights', 'bprop inputs'}
Serves as a hint for graph optimizers replacing
GpuConv by other implementations. If the GpuConv is
inserted automatically, we take its value from ConvOp.
verbose
For value of 1,2 and 3. Print more information during
the execution of the convolution. Mostly used for
optimization or debugging.
kshp
The size of the kernel. If provided, can generate
faster code. If the GpuConv op is automatically inserted,
We take its value automatically from the Conv op.
imshp
The size of the image. Not used for code generation but
allows to select an experimental new version in another repo.
max_threads_dim0
The maximum number of threads for the block size dimensions 0
(blockDim.x) used by the GPU function.
nkern
The number of kernels. Not used for this op, but can be
used by graph optimizers to select a more optimal
convolution implementation. If the GpuConv op is inserted
automatically, we take its value from the Conv op.
bsize
The batch size. Not used for this op, but can be used by graph
optimizers to select a more optimal convolution implementation.
If the GpuConv op is inserted automatically, we take its value from
the Conv op.
fft_opt
Deactivate fft_opt optimization at the op level when set to False.
Note that by default fft optimization aren't enabled.
See :ref:`convolution documentation <libdoc_tensor_nnet_conv>`
to enable them.
"""
"""
check_broadcast
=
False
check_broadcast
=
False
...
@@ -1708,42 +1808,6 @@ class GpuConv(GpuOp):
...
@@ -1708,42 +1808,6 @@ class GpuConv(GpuOp):
nkern
=
None
,
nkern
=
None
,
bsize
=
None
,
bsize
=
None
,
fft_opt
=
True
):
fft_opt
=
True
):
"""
:param version: each version of c_code implements many kernel for the
convolution. By default we try to guess the best one.
You can force one version with this parameter. This
parameter is used by the tests.
:param direction_hint: 'forward', 'bprop weights' or 'bprop inputs'.
Serves as a hint for graph optimizers replacing
GpuConv by other implementations. If the GpuConv is
inserted automatically, we take its value from ConvOp.
:param verbose: for value of 1,2 and 3. Print more information during
the execution of the convolution. Mostly used for
optimization or debugging.
:param kshp: The size of the kernel. If provided, can generate
faster code. If the GpuConv op is automatically
inserted,
we take its value automatically from the Conv op.
:param imshp: The size of the image. Not used for code generation but
allows to select an experimental new version in another
repo.
:param max_threads_dim0: The maximum number of threads for the
block size dimensions 0 (blockDim.x) used by the
GPU function.
:param nkern: The number of kernels. Not used for this op, but can be
used by graph optimizers to select a more optimal
convolution implementation. If the GpuConv op is inserted
automatically, we take its value from the Conv op.
:param bsize: The batch size. Not used for this op, but can be
used by graph optimizers to select a more optimal
convolution implementation. If the GpuConv op is inserted
automatically, we take its value from the Conv op.
:param fft_opt: deactivate fft_opt optimization at the op level when
set to False. Note that by default fft optimization
aren't enabled. See
:ref:`convolution documentation <libdoc_tensor_nnet_conv>`
to enable them.
"""
self
.
border_mode
=
border_mode
self
.
border_mode
=
border_mode
if
version
!=
-
1
:
if
version
!=
-
1
:
raise
Exception
(
raise
Exception
(
...
@@ -1956,6 +2020,7 @@ class GpuConv(GpuOp):
...
@@ -1956,6 +2020,7 @@ class GpuConv(GpuOp):
class
GpuDownsampleFactorMax
(
GpuOp
):
class
GpuDownsampleFactorMax
(
GpuOp
):
"""
"""
Implement downsample with max on the gpu.
Implement downsample with max on the gpu.
"""
"""
def
__init__
(
self
,
ds
,
ignore_border
=
False
):
def
__init__
(
self
,
ds
,
ignore_border
=
False
):
self
.
ds
=
tuple
(
ds
)
self
.
ds
=
tuple
(
ds
)
...
@@ -2149,6 +2214,7 @@ class GpuDownsampleFactorMax(GpuOp):
...
@@ -2149,6 +2214,7 @@ class GpuDownsampleFactorMax(GpuOp):
class
GpuDownsampleFactorMaxGrad
(
GpuOp
):
class
GpuDownsampleFactorMaxGrad
(
GpuOp
):
"""
"""
Implement the grad of downsample with max on the gpu.
Implement the grad of downsample with max on the gpu.
"""
"""
def
__init__
(
self
,
ds
,
ignore_border
):
def
__init__
(
self
,
ds
,
ignore_border
):
self
.
ds
=
tuple
(
ds
)
self
.
ds
=
tuple
(
ds
)
...
@@ -2371,6 +2437,7 @@ class GpuDownsampleFactorMaxGrad(GpuOp):
...
@@ -2371,6 +2437,7 @@ class GpuDownsampleFactorMaxGrad(GpuOp):
class
GpuDownsampleFactorMaxGradGrad
(
GpuOp
):
class
GpuDownsampleFactorMaxGradGrad
(
GpuOp
):
"""
"""
Implement the grad of downsample with max on the gpu.
Implement the grad of downsample with max on the gpu.
"""
"""
__props__
=
(
'ds'
,
'ignore_border'
)
__props__
=
(
'ds'
,
'ignore_border'
)
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论