Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
1b8b9149
提交
1b8b9149
authored
6月 23, 2016
作者:
Pascal Lamblin
提交者:
GitHub
6月 23, 2016
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #4584 from abergeron/gpua_doc
Add some documentation on how to write gpu ops.
上级
eb95000e
cb836344
隐藏空白字符变更
内嵌
并排
正在显示
6 个修改的文件
包含
557 行增加
和
4 行删除
+557
-4
extending_theano_gpu.txt
doc/extending/extending_theano_gpu.txt
+252
-0
index.txt
doc/extending/index.txt
+1
-0
using_params.txt
doc/extending/using_params.txt
+15
-0
basic_ops.py
theano/gpuarray/basic_ops.py
+165
-4
test_cgpukernelbase.py
theano/gpuarray/tests/test_cgpukernelbase.py
+72
-0
tstgpueye.c
theano/gpuarray/tests/tstgpueye.c
+52
-0
没有找到文件。
doc/extending/extending_theano_gpu.txt
0 → 100644
浏览文件 @
1b8b9149
.. _extending_theano_gpu:
==============================
Extending Theano with a GPU Op
==============================
.. note::
This covers the :ref:`gpuarray <gpuarray>` back-end for the GPU.
This tutorial covers how to extend Theano with an op that offers a GPU
implementation. It assumes you are familiar with how to write new
Theano ops. If that is not the case you should probably follow the
:ref:`extending_theano` and :ref:`extending_theano_c` sections before
continuing on.
Writing a new GPU op can be done in Python for some simple tasks, but
will usually done in C to access the complete API and avoid paying the
overhead of a Python function call.
Dealing With the Context
========================
One of the major differences with GPU ops is that they require a
context (a.k.a. device) to execute. Most of the time you can infer
the context to run on from your inputs. There is a way for the user
to transfer things between contexts and to tag certain variables for
transfer. It might also be the case that your inputs are not all from
the same context and you would have to choose which one to run on.
In order to support all of those options and have a consistent
interface, :func:`theano.gpuarray.basic_ops.infer_context_name` was
written. An example usage is below::
def make_node(self, a, b, c):
ctx = infer_context_name(a, b, c)
a = as_gpuarray_variable(a, ctx)
b = as_gpuarray_variable(b, ctx)
c = as_gpuarray_variable(c, ctx)
return Apply(self, [a, b, c], [a.type()])
In this example the Op takes three inputs, all on the GPU. In case
one or more of your inputs is not supposed to be on the GPU, you
should not pass it to :func:`infer_context_name` or call
:func:`as_gpuarray_variable` on it.
Also note that :func:`theano.gpuarray.basic_ops.as_gpuarray_variable`
takes ``context_name`` as a mandatory parameter. This is because it's
not enough to know you want the value to be on the GPU, you also want
to know which GPU to put it on. In almost all cases, you can pass in
the return value of :func:`infer_context_name` there.
If you also need the context during runtime (for example to allocate
the output), you can use the context of one of your inputs to know
which one to use. Here is another example::
def perform(self, node, inputs, output_storage):
A, B = inputs
C, = output_storage
C[0] = pygpu.empty([A.shape[0], B.shape[1]], dtype=A.dtype, A.context)
pygpu.blas.gemm(1, A, B, 0, C, overwrite_c=True)
Finally if you require the context before perform, such as during
make_thunk() to initialize kernels and such, you can access the
context of your inputs through the type of the variables::
def make_thunk(self, node, storage_map, compute_map, no_recycling):
ctx = node.inputs[0].type.context
Note that ``GpuArrayType`` objects also have a ``context_name``
attribute which is the symbolic equivalent of ``context``. It can't
be used for calls to pygpu or libgpuarray, but it should be used for
theano operations and variables.
The last place where you might need the context is in the C
initialization code. For that you will have to use the :ref:`params
<extending_op_params>`. The params type should be
:data:`theano.gpuarray.type.gpu_context_type` and the params object
should be a context object from one of your input variables::
def get_params(self, node):
return node.inputs[0].type.context
If you don't have any input variables on the GPU you can follow the
the example of :class:`GpuFromHost
<theano.gpuarray.basic_ops.GpuFromHost>` or :class:`GpuEye
<theano.gpuarray.basic_ops.GpuEye>`. This is not a case that you
should encounter often, so it will not be covered further.
Defining New Kernels
====================
If your op needs to do some transformation on the data, chances are
that you will need to write a new kernel. The best way to do this is
to leverage :class:`GpuKernelBase
<theano.gpuarray.basic_ops.GpuKernelBase>` (or :class:`CGpuKernelBase
<theano.gpuarray.basic_ops.CGpuKernelBase>` if you want to use the
:class:`COp <theano.gof.op.COp>` functionality).
For plain :class:`GpuKernelBase
<theano.gpuarray.basic_ops.GpuKernelBase>`, you have to define a
method called ``gpu_kernels`` which returns a list of :class:`Kernel
<theano.gpuarray.basic_ops.Kernel>` objects. You can define as many
kernels as you want for a single op. An example would look like
this::
def gpu_kernels(self, node, name):
code = """
KERNEL void k(GLOBAL_MEM ga_double *a, ga_size n, ga_size m) {
ga_size nb = n < m ? n : m;
for (ga_size i = LID_0; i < nb; i += LDIM_0) {
a[i*m + i] = 1;
}
}"""
return [Kernel(
code=code, name="k",
params=[gpuarray.GpuArray, gpuarray.SIZE, gpuarray.SIZE],
flags=Kernel.get_flags('float64'))]
If you want to use ``COp``, then you should use ``CGpuKernelBase``
instead. It adds a new section to the parsed files whose tag is
``kernels``. Inside that section you can define some kernels with
``#kernel name:params:flags``.
Here ``name`` is the name of the kernel function in the following
code, ``params`` is a comma-separeted list of numpy typecode names.
There are three exceptions for ``size_t`` which should be noted as
``size``, ``ssize_t`` which should be noted as ``ssize`` and a pointer
which should be noted as ``*``.
``flags`` is a ``|``-separated list of C kernel flag values (can be
empty). The same kernel definition as above would look like this with
``CGpuKernelBase``::
#section kernels
#kernel k : *, size, size : GA_USE_DOUBLE
KERNEL void k(GLOBAL_MEM ga_double *a, ga_size n, ga_size m) {
ga_size nb = n < m ? n : m;
for (ga_size i = LID_0; i < nb; i += LDIM_0) {
a[i*m + i] = 1;
}
}
The second method is to handle the kernel compilation and cache on
your own. This is not recommended because there are lots of details
to pay attention to that can cripple your performance if not done
right, which GpuKernelBase handles for you. But if you really want to
go this way, then you can look up the C API for kernels in
libgpuarray.
In any case you will need to call your compiled kernel with some data,
in most cases in your :meth:`c_code` method. This is done using the
`GpuKernel_call()
<http://deeplearning.net/software/libgpuarray/c_api.html#GpuKernel_call>`_
function in your C code. An example calling the above kernel would
be::
size_t ls, gs;
size_t dims[2];
void *args[3];
// ...
args[0] = input->ga.data;
args[1] = &dims[0];
args[2] = &dims[1];
ls = 1;
gs = 256;
err = GpuKernel_call(&k_k, 1, &ls, &gs, 0, args);
// ...
The name of the kernel object depends on the name you passed to
``Kernel()`` when you declared it (or the name in your `#kernel`
statement). It defaults to `'k_' + name`.
For other operations in the C code you should refer to the
`libgpuarray documentation
<http://deeplearning.net/software/libgpuarray/>`_.
A Complete Example
==================
This is a complete example using both approches for a implementation
of the Eye operation.
GpuKernelBase
-------------
Python File
~~~~~~~~~~~
.. literalinclude:: ../../theano/gpuarray/basic_ops.py
:language: python
:pyobject: GpuEye
CGpuKernelBase
--------------
Python File
~~~~~~~~~~~
.. literalinclude:: ../../theano/gpuarray/tests/test_cgpukernelbase.py
:language: python
:pyobject: GpuEye
``tstgpueye.c``
~~~~~~~~~~~~~~~
.. literalinclude:: ../../theano/gpuarray/tests/tstgpueye.c
:language: C
Wrapping Exisiting Libraries
============================
PyCUDA
------
For things in PyCUDA (or things wrapped with PyCUDA), we usually need
to create a PyCUDA context. This can be done with the following
code::
with gpuarray_cuda_context:
pycuda_context = pycuda.driver.Context.attach()
If you don't need to create a context, because the library doesn't
require it, you can also just use the pygpu context and a `with`
statement like above for all your code which will make the context the
current context on the cuda stack.
GpuArray objects are compatible with PyCUDA and will expose the
necessary interface so that they can be used in most things. One
notable exception is PyCUDA kernels which require native objects. If
you need to convert a pygpu GpuArray to a PyCUDA GPUArray, this code
should do the trick::
assert pygpu_array.flags['IS_C_CONTIGUOUS']
pycuda_array = pycuda.gpuarray.GPUArray(pygpu_array.shape,
pygpu_array.dtype,
base=pygpu_array,
gpudata=(pygpu_array.gpudata +
pygpu_array.offset))
As long as the computations happen on the NULL stream there are no
special considerations to watch for with regards to synchronization.
Otherwise, you will have to make sure that you synchronize the pygpu
objects by calling the `.sync()` method before scheduling any work and
synchronize with the work that happends in the library after all the
work is scheduled.
doc/extending/index.txt
浏览文件 @
1b8b9149
...
@@ -45,6 +45,7 @@ with Theano itself.
...
@@ -45,6 +45,7 @@ with Theano itself.
ctype
ctype
cop
cop
using_params
using_params
extending_theano_gpu
optimization
optimization
tips
tips
unittest
unittest
...
...
doc/extending/using_params.txt
浏览文件 @
1b8b9149
...
@@ -29,6 +29,13 @@ Making a purpose-built class may require more upfront work, but can
...
@@ -29,6 +29,13 @@ Making a purpose-built class may require more upfront work, but can
pay off if you reuse the type for a lot of Ops, by not having to re-do
pay off if you reuse the type for a lot of Ops, by not having to re-do
all of the python manipulation.
all of the python manipulation.
The params object
-----------------
The object that you use to store your param values must be hashable
and comparable for equality, because it will be stored in a dictionary
at some point. Apart from those requirements it can be anything that
matches what you have declared as the params type.
Defining a params type
Defining a params type
~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~
...
@@ -175,6 +182,14 @@ weights.
...
@@ -175,6 +182,14 @@ weights.
self.alpha = alpha
self.alpha = alpha
self.beta = beta
self.beta = beta
def __hash__(self):
return hash((type(self), self.alpha, self.beta))
def __eq__(self, other):
return (type(self) == type(other) and
self.alpha == other.alpha and
self.beta == other.beta)
class Mix(Op):
class Mix(Op):
params_type = Generic()
params_type = Generic()
...
...
theano/gpuarray/basic_ops.py
浏览文件 @
1b8b9149
from
__future__
import
absolute_import
,
print_function
,
division
from
__future__
import
absolute_import
,
print_function
,
division
import
os
import
os
import
copy
import
re
import
numpy
import
numpy
from
theano
import
Op
,
Apply
,
Type
,
Variable
from
theano
import
Op
,
Apply
,
Type
,
Variable
...
@@ -8,7 +9,7 @@ from theano import tensor, config
...
@@ -8,7 +9,7 @@ from theano import tensor, config
from
theano.gradient
import
grad_undefined
from
theano.gradient
import
grad_undefined
from
theano.tensor.basic
import
Alloc
,
Join
,
Split
from
theano.tensor.basic
import
Alloc
,
Join
,
Split
from
theano.gof
import
HideC
from
theano.gof
import
HideC
,
COp
from
theano.gof.utils
import
MethodNotDefined
from
theano.gof.utils
import
MethodNotDefined
from
collections
import
deque
from
collections
import
deque
...
@@ -124,6 +125,51 @@ class Kernel(object):
...
@@ -124,6 +125,51 @@ class Kernel(object):
"""
"""
This class groups together all the attributes of a gpu kernel.
This class groups together all the attributes of a gpu kernel.
`params` should contain the data type for each argument. Buffer
arguments should use the GpuArray class as the data type and
scalar should use their equivalent numpy dtype. For ga_size and
ga_ssize, use gpuarray.SIZE and gpuarray.SSIZE.
If the `ctypes` flags is set to `True` then it should be a C
string which represent the typecode to use.
`flags` can contain the following keys whose values are booleans:
have_double
the kernel uses double-typed variables somewhere
have_small
the kernel uses variables whose type takes less than 4
bytes somewhere
have_complex
the kernel uses complex values somewhere
have_half
the kernel uses half-floats somewhere
ctypes
the `params` list consists of C typecodes
It can also have the key `cflags` which is a string of C flag
values like this `"GA_USE_DOUBLE|GA_USE_CLUDA"`.
Parameters
----------
code: str
The source code of the kernel.
params: list
list of parameter types.
name: str
the name of the kernel function in the source.
flags: dict
dictionary of flags
codevar: str
the name of the variable for the code object.
(defaults to `kcode_` + name)
binvar: str
the name of the variable for the binary object.
(defaults to `kbin_` + name)
objvar: str
the name of the variable for the kernel object.
(defaults to `k_` + name)
"""
"""
def
__init__
(
self
,
code
,
params
,
name
,
flags
,
def
__init__
(
self
,
code
,
params
,
name
,
flags
,
...
@@ -167,6 +213,8 @@ class Kernel(object):
...
@@ -167,6 +213,8 @@ class Kernel(object):
def
_get_c_flags
(
self
):
def
_get_c_flags
(
self
):
res
=
[]
res
=
[]
if
self
.
flags
.
get
(
'cflags'
,
''
)
!=
''
:
res
.
append
(
self
.
flags
[
'cflags'
])
if
self
.
flags
.
get
(
'cluda'
,
False
):
if
self
.
flags
.
get
(
'cluda'
,
False
):
res
.
append
(
'GA_USE_CLUDA'
)
res
.
append
(
'GA_USE_CLUDA'
)
if
self
.
flags
.
get
(
'have_double'
,
False
):
if
self
.
flags
.
get
(
'have_double'
,
False
):
...
@@ -176,9 +224,26 @@ class Kernel(object):
...
@@ -176,9 +224,26 @@ class Kernel(object):
if
self
.
flags
.
get
(
'have_complex'
,
False
):
if
self
.
flags
.
get
(
'have_complex'
,
False
):
res
.
append
(
'GA_USE_COMPLEX'
)
res
.
append
(
'GA_USE_COMPLEX'
)
if
self
.
flags
.
get
(
'have_half'
,
False
):
if
self
.
flags
.
get
(
'have_half'
,
False
):
res
.
append
(
'GA_USE_
SMALL
'
)
res
.
append
(
'GA_USE_
HALF
'
)
return
'|'
.
join
(
res
)
return
'|'
.
join
(
res
)
def
_get_py_flags
(
self
):
res
=
dict
(
self
.
flags
)
cflags
=
res
.
pop
(
'cflags'
,
''
)
for
fl
in
cflags
.
split
(
'|'
):
fl
=
fl
.
strip
()
if
fl
==
'GA_USE_CLUDA'
:
res
[
'cluda'
]
=
True
if
fl
==
'GA_USE_DOUBLE'
:
res
[
'have_double'
]
=
True
if
fl
==
'GA_USE_SMALL'
:
res
[
'have_small'
]
=
True
if
fl
==
'GA_USE_COMPLEX'
:
res
[
'have_complex'
]
=
True
if
fl
==
'GA_USE_HALF'
:
res
[
'have_half'
]
=
True
return
res
def
_get_c_types
(
self
):
def
_get_c_types
(
self
):
def
m
(
t
):
def
m
(
t
):
if
t
==
gpuarray
.
GpuArray
:
if
t
==
gpuarray
.
GpuArray
:
...
@@ -215,7 +280,7 @@ class GpuKernelBase(object):
...
@@ -215,7 +280,7 @@ class GpuKernelBase(object):
def
_generate_kernel_bin
(
self
,
k
,
ctx
):
def
_generate_kernel_bin
(
self
,
k
,
ctx
):
gk
=
gpuarray
.
GpuKernel
(
k
.
code
,
k
.
name
,
k
.
params
,
context
=
ctx
,
gk
=
gpuarray
.
GpuKernel
(
k
.
code
,
k
.
name
,
k
.
params
,
context
=
ctx
,
**
k
.
flags
)
**
k
.
_get_py_flags
()
)
bin
=
gk
.
_binary
bin
=
gk
.
_binary
bcode
=
','
.
join
(
hex
(
c
)
for
c
in
iterbytes
(
bin
))
bcode
=
','
.
join
(
hex
(
c
)
for
c
in
iterbytes
(
bin
))
return
(
"""static const char
%(bname)
s[] = {
%(bcode)
s };"""
%
return
(
"""static const char
%(bname)
s[] = {
%(bcode)
s };"""
%
...
@@ -313,6 +378,102 @@ class GpuKernelBase(object):
...
@@ -313,6 +378,102 @@ class GpuKernelBase(object):
return
(
4
,
self
.
get_params
(
node
)
.
bin_id
)
return
(
4
,
self
.
get_params
(
node
)
.
bin_id
)
def
forward_string_meth
(
name
):
def
f
(
*
args
):
res
=
getattr
(
GpuKernelBase
,
name
)(
*
args
)
try
:
res
=
res
+
'
\n
'
+
getattr
(
COp
,
name
)(
*
args
)
except
MethodNotDefined
:
pass
return
res
f
.
__name__
=
name
return
f
def
get_dtype
(
s
):
if
s
==
'*'
:
return
gpuarray
.
GpuArray
if
s
==
'size'
:
return
gpuarray
.
SIZE
if
s
==
'ssize'
:
return
gpuarray
.
SSIZE
else
:
return
numpy
.
dtype
(
s
)
class
CGpuKernelBase
(
COp
,
GpuKernelBase
):
"""
Class to combine GpuKernelBase and COp.
It adds a new section type 'kernels' where you can define kernels
with the '#kernel' tag
"""
SECTIONS
=
copy
.
copy
(
COp
.
SECTIONS
)
SECTIONS
.
add
(
'kernels'
)
kernel_re
=
re
.
compile
(
r'^#kernel ([a-zA-Z_].*?)$'
,
re
.
MULTILINE
)
c_support_code
=
forward_string_meth
(
'c_support_code'
)
c_support_code_apply
=
forward_string_meth
(
'c_support_code_apply'
)
c_support_code_struct
=
forward_string_meth
(
'c_support_code_struct'
)
c_init_code_struct
=
forward_string_meth
(
'c_init_code_struct'
)
c_cleanup_code_struct
=
forward_string_meth
(
'c_cleanup_code_struct'
)
def
_type_macros
(
self
,
node
):
define_template
=
"#define
%
s
%
s
\n
"
undef_template
=
"#undef
%
s
\n
"
define_macros
=
[]
undef_macros
=
[]
for
i
,
v
in
enumerate
(
node
.
inputs
):
if
isinstance
(
v
.
type
,
GpuArrayType
):
macro_name
=
"DTYPE_i
%
d"
%
(
i
,)
macro_value
=
pygpu
.
gpuarray
.
dtype_to_ctype
(
v
.
dtype
)
define_macros
.
append
(
define_template
%
(
macro_name
,
macro_value
))
undef_macros
.
append
(
undef_template
%
macro_name
)
for
i
,
v
in
enumerate
(
node
.
outputs
):
if
isinstance
(
v
.
type
,
GpuArrayType
):
macro_name
=
"DTYPE_o
%
d"
%
(
i
,)
macro_value
=
pygpu
.
gpuarray
.
dtype_to_ctype
(
v
.
dtype
)
define_macros
.
append
(
define_template
%
(
macro_name
,
macro_value
))
undef_macros
.
append
(
undef_template
%
macro_name
)
return
''
.
join
(
define_macros
),
''
.
join
(
undef_macros
)
def
gpu_kernels
(
self
,
node
,
name
):
if
hasattr
(
self
,
'_cached_kernels'
):
return
self
.
_cached_kernels
if
'kernels'
in
self
.
code_sections
:
code
=
self
.
code_sections
[
'kernels'
]
split
=
self
.
kernel_re
.
split
(
code
)
if
split
[
0
]
.
strip
()
!=
''
:
raise
ValueError
(
"Stray code in kernels section before the "
"first #kernel statement."
)
def_macros
,
undef_macros
=
self
.
_type_macros
(
node
)
n
=
1
res
=
[]
while
n
<
len
(
split
):
kspec
=
split
[
n
]
kcode
=
split
[
n
+
1
]
splt2
=
kspec
.
split
(
':'
)
if
len
(
splt2
)
!=
3
:
raise
ValueError
(
"Bad kernel spec:
%
s"
%
(
kspec
,))
kname
=
splt2
[
0
]
.
strip
()
ktypes
=
[
get_dtype
(
s
.
strip
())
for
s
in
splt2
[
1
]
.
split
(
','
)]
kflags
=
splt2
[
2
]
.
strip
()
kcode
=
def_macros
+
'
\n
'
+
kcode
+
'
\n
'
+
undef_macros
res
.
append
(
Kernel
(
kcode
,
ktypes
,
kname
,
flags
=
dict
(
cluda
=
True
,
cflags
=
kflags
)))
n
+=
2
self
.
_cached_kernels
=
res
return
res
else
:
return
GpuKernelBase
.
gpu_kernels
(
self
,
node
,
name
)
class
HostFromGpu
(
Op
):
class
HostFromGpu
(
Op
):
"""
"""
Transfer data to CPU.
Transfer data to CPU.
...
...
theano/gpuarray/tests/test_cgpukernelbase.py
0 → 100644
浏览文件 @
1b8b9149
from
__future__
import
division
,
absolute_import
,
print_function
import
numpy
from
six.moves
import
xrange
import
theano
from
theano
import
tensor
,
config
,
Apply
,
Op
from
theano.gradient
import
grad_undefined
from
.config
import
mode_with_gpu
,
test_ctx_name
from
..basic_ops
import
CGpuKernelBase
from
..type
import
GpuArrayType
,
get_context
from
pygpu.gpuarray
import
dtype_to_typecode
# This is an implementation to test that CGpuKernelBase works and also
# to use as an example in the docs. It is not used for user graphs.
class
GpuEye
(
CGpuKernelBase
,
Op
):
"""
Eye for GPU.
"""
__props__
=
(
'dtype'
,
'context_name'
)
_f16_ok
=
True
def
__init__
(
self
,
dtype
=
None
,
context_name
=
None
):
if
dtype
is
None
:
dtype
=
config
.
floatX
self
.
dtype
=
dtype
self
.
context_name
=
context_name
CGpuKernelBase
.
__init__
(
self
,
[
'tstgpueye.c'
],
'APPLY_SPECIFIC(tstgpueye)'
)
def
get_params
(
self
,
node
):
return
get_context
(
self
.
context_name
)
def
c_headers
(
self
):
return
[
'<gpuarray/types.h>'
,
'<gpuarray/kernel.h>'
]
def
make_node
(
self
,
n
,
m
):
n
=
tensor
.
as_tensor_variable
(
n
)
m
=
tensor
.
as_tensor_variable
(
m
)
assert
n
.
ndim
==
0
assert
m
.
ndim
==
0
otype
=
GpuArrayType
(
dtype
=
self
.
dtype
,
broadcastable
=
(
False
,
False
),
context_name
=
self
.
context_name
)
return
Apply
(
self
,
[
n
,
m
],
[
otype
()])
def
infer_shape
(
self
,
node
,
in_shapes
):
out_shape
=
[
node
.
inputs
[
0
],
node
.
inputs
[
1
]]
return
[
out_shape
]
def
grad
(
self
,
inp
,
grads
):
return
[
grad_undefined
(
self
,
i
,
inp
[
i
])
for
i
in
xrange
(
2
)]
def
get_op_params
(
self
):
return
[(
'TYPECODE'
,
str
(
dtype_to_typecode
(
self
.
dtype
)))]
def
test_cgpukernelbase
():
op
=
GpuEye
(
dtype
=
'int32'
,
context_name
=
test_ctx_name
)
f
=
theano
.
function
([],
op
(
4
,
5
),
mode
=
mode_with_gpu
)
r
=
f
()
assert
(
numpy
.
asarray
(
r
)
==
numpy
.
eye
(
4
,
5
,
dtype
=
'int32'
))
.
all
()
theano/gpuarray/tests/tstgpueye.c
0 → 100644
浏览文件 @
1b8b9149
#section kernels
#kernel eye : *, size, size :
/* The eye name will be used to generate supporting objects. The only
you probably need to care about is the kernel object which will be
named 'k_' + <the name above> (k_eye in this case). This name also
has to match the kernel function name below.
*/
KERNEL
void
eye
(
GLOBAL_MEM
DTYPE_o0
*
a
,
ga_size
n
,
ga_size
m
)
{
ga_size
nb
=
n
<
m
?
n
:
m
;
for
(
ga_size
i
=
LID_0
;
i
<
nb
;
i
+=
LDIM_0
)
{
a
[
i
*
m
+
i
]
=
1
;
}
}
#section support_code_struct
int
APPLY_SPECIFIC
(
tstgpueye
)(
PyArrayObject
*
n
,
PyArrayObject
*
m
,
PyGpuArrayObject
**
z
,
PyGpuContextObject
*
ctx
)
{
size_t
dims
[
2
]
=
{
0
,
0
};
size_t
ls
,
gs
;
void
*
args
[
3
];
int
err
;
dims
[
0
]
=
((
DTYPE_INPUT_0
*
)
PyArray_DATA
(
n
))[
0
];
dims
[
1
]
=
((
DTYPE_INPUT_1
*
)
PyArray_DATA
(
m
))[
0
];
Py_XDECREF
(
*
z
);
*
z
=
pygpu_zeros
(
2
,
dims
,
TYPECODE
,
GA_C_ORDER
,
ctx
,
Py_None
);
if
(
*
z
==
NULL
)
return
-
1
;
args
[
0
]
=
(
*
z
)
->
ga
.
data
;
args
[
1
]
=
&
dims
[
0
];
args
[
2
]
=
&
dims
[
1
];
ls
=
1
;
gs
=
256
;
/* The k_eye name comes from the kernel declaration above. */
err
=
GpuKernel_call
(
&
k_eye
,
1
,
&
ls
,
&
gs
,
0
,
args
);
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"gpuarray error: kEye: %s. n%lu, m=%lu."
,
GpuKernel_error
(
&
k_eye
,
err
),
(
unsigned
long
)
dims
[
0
],
(
unsigned
long
)
dims
[
1
]);
return
-
1
;
}
return
0
;
}
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论