提交 67d7d461 authored 作者: Frédéric Bastien's avatar Frédéric Bastien

Merge pull request #2356 from abergeron/cudnn_r2

Cudnn r2
......@@ -688,14 +688,13 @@ To help with this, Theano defines a class, ``COp``, from which new C ops
can inherit. The class ``COp`` aims to simplify the process of implementing
C ops by doing the following :
* It allows you to define the C implementation of your op in a distinct
C code file. This makes it easier to keep your Python and C code
readable and well indented.
* It allows you to define the C implementation of your op in a distinct
C code file. This makes it easier to keep your Python and C code
readable and well indented.
* It automatically handles the methods :meth:`Op.c_code()`,
:meth:`Op.c_support_code()`, :meth:`Op.c_support_code_apply()` and
:meth:`Op.c_code_cache_version()` based on the provided external C
implementation.
* It can automatically handle all the methods that return C code,
in addition to :meth:`Op.c_code_cache_version()` based on the
provided external C implementation.
To illustrate how much simpler the class ``COp`` makes the process of defining
a new op with a C implementation, let's revisit the second example of this
......@@ -740,7 +739,7 @@ C file named vectorTimesVector.c :
.. code-block:: c
THEANO_SUPPORT_CODE_SECTION
#section support_code
// Support code function
bool vector_same_shape(PyArrayObject* arr1, PyArrayObject* arr2)
......@@ -749,7 +748,7 @@ C file named vectorTimesVector.c :
}
THEANO_APPLY_CODE_SECTION
#section support_code_apply
// Apply-specific support function
void APPLY_SPECIFIC(vector_elemwise_mult)(
......@@ -822,43 +821,46 @@ this new version of the VectorTimesVector op :
* Parent class : instead of inheriting from the class :class:`Op`,
VectorTimesVector inherits from the class ``COp``.
* Constructor : in our new op, the ``__init__()`` method has an important
use; to inform the constructor of the ``COp`` class of the location,
on the filesystem of the C implementation of this op. To do this, it
gives the path of file containing the C code as well as the name of
the function, in that file, that should be called to perform the
computation. The path should be given as a relative path from the
folder where the descendant of the ``COp`` class is defined.
* ``make_node()`` : the ``make_node()`` method is absolutely identical to
the one in our old example. Using the ``COp`` class doesn't change
anything here.
* External C code : the external C code performs the computation
associated with the op. It contains, at the very least, a 'main' function
having the same name as provided to the constructor of the Python class
``COp``. Writing this C code involves a few subtleties which deserve their
own respective sections.
* Constructor : in our new op, the ``__init__()`` method has an
important use; to inform the constructor of the ``COp`` class
of the location, on the filesystem of the C implementation of
this op. To do this, it gives a list of file paths containing
the C code for this op. To auto-generate the c_code method
with a function call you can specify the function name as the
second parameter. The paths should be given as a relative
path from the folder where the descendant of the ``COp`` class
is defined.
* ``make_node()`` : the ``make_node()`` method is absolutely
identical to the one in our old example. Using the ``COp``
class doesn't change anything here.
* External C code : the external C code implements the various
functions associated with the op. Writing this C code
involves a few subtleties which deserve their own respective
sections.
Main function
-------------
The external C implementation must implement a main function whose name
is passed by the op to the ``__init__()`` method of the ``COp`` class. This
main C function must respect the following constraints :
If you pass a function name to the ``__init__()`` method of the
``COp`` class, it must respect the following constraints:
* It must return an int. The value of that int indicates whether the
op could perform its task or not. A value of 0 indicates success while
any non-zero value will interrupt the execution of the Theano function.
Before returning a non-zero integer, the main function should call the
function ``PyErr_Format()`` to setup a Python exception.
* It must return an int. The value of that int indicates whether
the op could perform its task or not. A value of 0 indicates
success while any non-zero value will interrupt the execution
of the Theano function. When returning non-zero the function
must set a python exception indicating the details of the
problem.
* It must receive one pointer for each input to the op followed by one
pointer to a pointer for each output of the op.
* It must receive one argument for each input to the op followed
by one pointer to an argument for each output of the op. The
types for the argument is dependant on the Types (that is
theano Types) of your inputs and outputs.
For example, the main C function of an op that takes two scalars as inputs and
returns both their sum and the difference between them would have four
For example, the main C function of an op that takes two TensorTypes
(which has ``PyArrayObject *`` as its C type) as inputs and returns
both their sum and the difference between them would have four
parameters (two for the op's inputs and two for its outputs) and it's
signature would look something like this :
......@@ -870,11 +872,21 @@ signature would look something like this :
Macros
------
The ``COp`` class defines a number of macros that can you can use in your C
implementation to make it simpler and more generic.
For certain section tags, your C code can benefit from a number of
pre-defined macros. These section tags have no macros: ``init_code``,
``support_code``. All other tags will have the support macros
discussed below.
For every input array 'i' (indexed from 0) of the op, the following macros are
defined:
* ``APPLY_SPECIFIC(str)`` which will automatically append a name
unique to the :ref:`Apply node that applies the Op at the end
of the provided ``str``. The use of this macro is discussed
futher below.
For every input which has a :attr:`dtype` attribute (this means
Tensors, and equivalent types on GPU), the following macros will be
defined unless your Op class has an :attr:`Op.check_input` attribute
defined to False. In these descrptions 'i' refers to the position
(indexed from 0) in the input array.
* ``DTYPE_INPUT_{i}`` : NumPy dtype of the data in the array.
This is the variable type corresponding to the NumPy dtype, not the
......@@ -889,71 +901,87 @@ defined:
* ``TYPENUM_INPUT_{i}`` : Typenum of the data in the array
* ``ITEMSIZE_INPUT_{i}`` : Size, in bytes, of the elements in the array.
* ``ITEMSIZE_INPUT_{i}`` : Size, in bytes, of the elements in
the array.
In the same way, the macros ``DTYPE_OUTPUT_{i}``,
``ITEMSIZE_OUTPUT_{i}`` and ``TYPENUM_OUTPUT_{i}`` are defined for
every output 'i' of the op.
In the same way, the macros ``DTYPE_OUTPUT_{i}``, ``ITEMSIZE_OUTPUT_{i}`` and
``TYPENUM_OUTPUT_{i}`` are defined for every output 'i' of the op.
In addition to these macros, the ``init_code_struct``, ``code``, and
``code_cleanup`` section tags also have the following macros:
* ``FAIL`` : Code to insert at error points. A python exception
should be set prior to this code. An invocation look like this:
.. code-block:: c
if (error) {
// Set python exception
FAIL
}
The ``COp`` class also defines the macro ``APPLY_SPECIFIC(str)`` which will
automatically append the name of the :ref:`Apply node that applies the Op at
the end of the provided ``str``. The use of this macro is discussed below.
You can add a semicolon after the macro if it makes your editor
happy.
You should be aware, however, that these macros are apply-specific. As such,
any function that uses them is considered to contain apply-specific code.
* ``CONTEXT`` : Name of the context variable for this node. (only
for Ops which have a context, which is discussed elsewhere)
Finally the tag ``code`` and ``code_cleanup`` have macros to
pass the inputs and output names. These are name ``INPUT_{i}`` and
``OUTPUT_{i}`` where `i` is the 0-based index position in the input
and output arrays respectively.
Support code
------------
The file whose name is provided to the ``COp`` class is not constrained to
contain only one function. It can in fact contain many functions, with every
function but the main one acting as support code.
When we defined the VectorTimesVector op without using the ``COp`` class, we
had to make a distinction between two types of support_code : the support
code that was apply-specific and the support code that wasn't.
The apply-specific code was defined in the ` c_support_code_apply()`` method
and the elements defined in that code (global variables and functions) had to
include the name of the Apply node in their own names to avoid conflicts
between the different versions of the apply-specific code. The code that
wasn't apply-specific was simply defined in the ``c_support_code()`` method.
When using the ``COp`` class, we still have to make the distinction between
apply-specific and apply-agnostic support code but we express it differently
in the code since it is all defined in the same external C file.
These two types of support code should each be defined in their own section of
the file, like in the example above. These sections should be delimited by the
markers ``THEANO_SUPPORT_CODE_SECTION`` (to be put on its own line, at the
beginning of the apply-agnostic support code section) and
``THEANO_APPLY_CODE_SECTION`` (to be put on its own line at the beginning of
the apply-specific code section). Moreover, just like in the previous examples
of this tutorial, apply-specific functions and global variables need to
include the name of the :ref:`Apply` node in their names. To achieve this,
the macro ``APPLY_SPECIFIC(str)`` should be used when defining those elements
as well as when referring to them. In the above example, this macro is used
when defining the functions ``vector_elemwise_mult()`` and
Certain section are limited in what you can place in them due to
semantic and syntactic restrictions of the C++ language. Most of
these restrictions apply to the tags that end in ``_struct``.
When we defined the VectorTimesVector op without using the ``COp``
class, we had to make a distinction between two types of support_code
: the support code that was apply-specific and the support code that
wasn't. The apply-specific code was defined in the
``c_support_code_apply()`` method and the elements defined in that
code (global variables and functions) had to include the name of the
Apply node in their own names to avoid conflicts between the different
versions of the apply-specific code. The code that wasn't
apply-specific was simply defined in the ``c_support_code()`` method.
To make indentifiers that include the :ref:`Apply` node name use the
``APPLY_SPECIFIC(str)`` macro. In the above example, this macro is
used when defining the functions ``vector_elemwise_mult()`` and
``vector_times_vector()`` as well as when calling function
``vector_elemwise_mult()`` from inside ``vector_times_vector()``.
:note:
The macro ``APPLY_SPECIFIC(str)`` should only ever be used for
apply-specific code. It should not be used for apply-agnostic code.
The rules for knowing if a piece of code should be put in the apply-agnostic
or the apply-specific support code section of the file are simple. If it uses
any of the macros defined by the class ``COp`` then it is apply-specific and
goes in the corresponding section. If it calls any apply-specific code then
it is apply-specific. Otherwise, it is apply-agnostic and goes in the
apply-agnostic support code section.
In the above example, the ``function vector_same_shape()`` is apply-agnostic
because it uses none of the macros defined by the class ``COp`` and it doesn't
rely on any apply-specific code. The function ``vector_elemwise_mult()`` is
apply-specific because it uses the macros defined by ``COp``. Finally, the
function ``vector_times_vector()`` is apply-specific because it uses those
same macros and also because it calls ``vector_elemwise_mult()`` which is an
apply-specific function.
When using the ``COp`` class, we still have to make the distinction
between C code for each of the methods of a C class. These sections of
code are separated by ``#section <tag>`` markers. The tag determines
the name of the method this C code applies to with the rule that
``<tag>`` applies to `c_<tag>`. Unknown tags are an error and will be
reported. Duplicate tags will be merged together in the order the
appear in the C files.
The rules for knowing if where a piece of code should be put can be
sometimes tricky. The key thing to remember is that things that can
be shared between instances of the op should be apply-agnostic and go
into a section which does not end in ``_apply`` or ``_struct``. The
distinction of ``_apply`` and ``_struct`` mostly hinghes on how you
want to manange the lifetime of the object. Note that to use an
apply-specific object, you have to be in a apply-specific section, so
some portions of the code that might seem apply-agnostic may still be
apply-specific because of the data they use (this does not include
arguments).
In the above example, the ``function vector_same_shape()`` is
apply-agnostic because it uses none of the macros defined by the class
``COp`` and it doesn't rely on any apply-specific code. The function
``vector_elemwise_mult()`` is apply-specific because it uses the
macros defined by ``COp``. Finally, the function
``vector_times_vector()`` is apply-specific because it uses those same
macros and also because it calls ``vector_elemwise_mult()`` which is
an apply-specific function.
Final Note
==========
......
......@@ -17,6 +17,7 @@ import logging
import numpy
import os
import sys
import re
import warnings
import theano
......@@ -973,6 +974,32 @@ int main( int argc, const char* argv[] )
compute_map, no_recycling)
def simple_meth(tag):
def f(self):
if tag in self.code_sections:
return self.code_sections[tag]
else:
raise utils.MethodNotDefined(
'c_' + tag, type(self), type(self).__name__)
f.__name__ = 'c_' + tag
return f
def apply_meth(tag):
def f(self, node, name):
if tag in self.code_sections:
code = self.code_sections[tag]
define_macros, undef_macros = self.get_c_macros(node, name)
return os.linesep.join([define_macros, code,
undef_macros])
else:
raise utils.MethodNotDefined(
'c_' + tag, type(self), type(self).__name__)
f.__name__ = 'c_' + tag
return f
class COp(Op):
""" Class to allow an op to have an external C implementation.
......@@ -981,118 +1008,98 @@ class COp(Op):
the C implementation and the name of the function, in that file, to call
to perform the computations for the op.
"""
section_re = re.compile(r'^#section ([a-zA-Z0-9_]+)$', re.MULTILINE)
backward_re = re.compile(r'^THEANO_(APPLY|SUPPORT)_CODE_SECTION$', re.MULTILINE)
# This is the set of allowed markers
SECTIONS = set([
'init_code', 'init_code_apply', 'init_code_struct',
'support_code', 'support_code_apply', 'support_code_struct',
'cleanup_code_struct',
'code', 'code_cleanup'])
def __init__(self, func_file, func_name):
self.func_file = func_file
self.func_name = func_name
# Define the markers that can be used to delimit sections in the
# external C code
self.support_code_marker = "THEANO_SUPPORT_CODE_SECTION"
self.apply_code_marker = "THEANO_APPLY_CODE_SECTION"
self.c_code_markers = [self.support_code_marker,
self.apply_code_marker]
@classmethod
def get_path(cls, f):
"""
Convert a path relative to the location of the class file into
an aboslute path. Paths that are already absolute are passed
through unchanged.
"""
if not os.path.isabs(f):
class_file = inspect.getfile(cls)
class_dir = os.path.dirname(class_file)
f = os.path.realpath(os.path.join(class_dir, f))
return f
# Load the external C code
try:
# Attempt to find the file self.func_file in the folder where the
# concrete type of the COp instance is defined
# Get the name of the folder where the concrete type of the COp is
# defined
path_concrete_type = inspect.getfile(self.__class__)
folder_concrete_type = os.path.dirname(path_concrete_type)
# Try to open the file from there
f = open(os.path.join(folder_concrete_type, self.func_file), "r")
self.func_code = f.read()
f.close()
except IOError:
# Add information to the exception message to inform the user
# on the locations in which the class COp will look for the
# specified file
message = ("The path to the external C implementation should "
"be given as a relative path from the folder "
"where the Op is defined. ")
# Can't update the exception's message by modifying e.args
# because IOErrors don't use their attribute args to generate
# their error message
e.strerror = message + e.strerror
raise e
# Separate the contents of the file in sections and validate that at
# lest one of the necessary code sections has been defined
self.code_sections = self.parse_external_c_code(self.func_code)
if sum([marker in self.code_sections.keys()
for marker in self.c_code_markers]) == 0:
raise(RuntimeError, "The provided C implementation does not "
"define a support code section or a support code apply "
"section.")
def parse_external_c_code(self, code):
# Obtain the positions of the C code markers used in the C code
positions = [(code.index(marker), marker)
for marker in self.c_code_markers if marker in code]
# Go over the markers in their order of occurence and extract
# the C code they concern
positions.sort()
code_sections = {}
for i in range(len(positions)):
marker_start, marker = positions[i]
if i < len(positions) - 1:
# This is not the last section in the code : extract the code
# between the beginning of the current marker and the
# beginning of the next one.
next_marker_start = positions[i+1][0]
section = code[marker_start: next_marker_start]
else:
# This is the last section in the code : extract the remaining
# C code
section = code[marker_start:]
def __init__(self, func_files, func_name=None):
"""
Sections are loaded from files in order with sections in later
files overriding sections in previous files.
"""
if not isinstance(func_files, list):
func_files = [func_files]
cleaned_section = section.replace(marker, "")
code_sections[marker] = cleaned_section
self.func_files = [self.get_path(f) for f in func_files]
self.func_name = func_name
return code_sections
self.load_c_code()
if len(self.code_sections) == 0:
raise ValueError("No sections where defined in C files")
if self.func_name is not None:
if 'op_code' in self.code_sections:
# maybe a warning instead (and clearing the key)
raise ValueError('Cannot have an "op_code" section and '
'specify the func_name')
if 'op_code_cleanup' in self.code_sections:
# maybe a warning instead (and clearing the key)
raise ValueError('Cannot have an "op_code_cleanup" section '
'and specify the func_name')
def load_c_code(self):
self.func_codes = []
for func_file in self.func_files:
with open(func_file, 'r') as f:
self.func_codes.append(f.read())
self.code_sections = dict()
for i, code in enumerate(self.func_codes):
if ('THEANO_APPLY_CODE_SECTION' in code or
'THEANO_SUPPORT_CODE_SECTION' in code):
# This is backward compat code that will go away in a while
split = self.backward_re.split(code)
n = 1
while n < len(split):
if split[n] == 'APPLY':
self.code_sections['support_code_apply'] = split[n+1]
elif split[n] == 'SUPPORT':
self.code_sections['support_code'] = split[n+1]
n += 2
continue
split = self.section_re.split(code)
if split[0].strip() != '':
raise ValueError('Stray code before first #section '
'statement (in file %s): %s' %
(self.func_files[i], split[0]))
n = 1
while n < len(split):
if split[n] not in self.SECTIONS:
raise ValueError("Unknown section type (in file %s): %s" %
(self.fun_files[i], split[n]))
if split[n] not in self.code_sections:
self.code_sections[split[n]] = ""
self.code_sections[split[n]] += split[n+1]
n += 2
def c_code_cache_version(self):
return hash(self.func_code)
def c_support_code(self):
if self.support_code_marker in self.code_sections:
return self.code_sections[self.support_code_marker]
else:
raise utils.MethodNotDefined("c_support_code",
type(self), self.__class__.__name__)
def c_support_code_apply(self, node, name):
if self.apply_code_marker in self.code_sections:
apply_code = self.code_sections[self.apply_code_marker]
if hasattr(self, 'check_inputs') and self.check_inputs == False:
return apply_code
else:
define_macros, undef_macros = self.get_c_macros(node, name)
return os.linesep.join([define_macros, apply_code,
undef_macros])
else:
raise utils.MethodNotDefined("c_support_code_apply",
type(self), self.__class__.__name__)
return hash(tuple(self.func_codes))
c_init_code = simple_meth('init_code')
c_init_code_apply = apply_meth('init_code_apply')
c_support_code = simple_meth('support_code')
c_support_code_apply = apply_meth('support_code_apply')
c_support_code_struct = apply_meth('support_code_struct')
c_cleanup_code_struct = apply_meth('cleanup_code_struct')
def format_c_function_args(self, inp, out):
# Generate an string containing the arguments sent to the external C
......@@ -1100,73 +1107,149 @@ class COp(Op):
# "input0, input1, input2, &output0, &output1"
return ", ".join(list(inp) + ["&%s" % o for o in out])
def get_c_macros(self, node, name):
def get_c_macros(self, node, name, check_input=None):
define_template = "#define %s %s"
undef_template = "#undef %s"
define_macros = []
undef_macros = []
if check_input is None:
check_input = getattr(self, 'check_input', True)
if check_input:
# Extract the various properties of the input and output variables
variables = node.inputs + node.outputs
variable_names = (["INPUT_%i" % i for i in range(len(node.inputs))] +
["OUTPUT_%i" % i for i in range(len(node.inputs))])
define_template = "#define %s %s" + os.linesep
undef_template = "#undef %s" + os.linesep
define_macros = ""
undef_macros = ""
# Generate dtype macros
for i, v in enumerate(variables):
if not hasattr(v, 'dtype'):
continue
vname = variable_names[i]
# Extract the various properties of the input and output variables
variables = node.inputs + node.outputs
variable_names = (["INPUT_%i" % i for i in range(len(node.inputs))] +
["OUTPUT_%i" % i for i in range(len(node.inputs))])
variable_dtypes_names = [v.dtype for v in variables]
variable_dtypes = [numpy.dtype(d) for d in variable_dtypes_names]
variable_typenums = [d.num for d in variable_dtypes]
variable_itemsizes = [d.itemsize for d in variable_dtypes]
macro_name = "DTYPE_" + vname
macro_value = "npy_" + v.dtype
# Generate dtype macros
for i in range(len(variables)):
macro_name = "DTYPE_" + variable_names[i]
macro_value = "npy_" + variable_dtypes_names[i]
define_macros.append(define_template % (macro_name, macro_value))
undef_macros.append(undef_template % macro_name)
define_macros += define_template % (macro_name, macro_value)
undef_macros += undef_template % macro_name
d = numpy.dtype(v.dtype)
# Generate typenum macros
for i in range(len(variables)):
macro_name = "TYPENUM_" + variable_names[i]
macro_value = variable_typenums[i]
macro_name = "TYPENUM_" + vname
macro_value = d.num
define_macros += define_template % (macro_name, macro_value)
undef_macros += undef_template % macro_name
define_macros.append(define_template % (macro_name, macro_value))
undef_macros.append(undef_template % macro_name)
# Generate itemsize macros
for i in range(len(variables)):
macro_name = "ITEMSIZE_" + variable_names[i]
macro_value = variable_itemsizes[i]
macro_name = "ITEMSIZE_" + vname
macro_value = d.itemsize
define_macros += define_template % (macro_name, macro_value)
undef_macros += undef_template % macro_name
define_macros.append(define_template % (macro_name, macro_value))
undef_macros.append(undef_template % macro_name)
# Generate a macro to mark code as being apply-specific
define_macros += define_template % ("APPLY_SPECIFIC(str)",
"str##_%s" % name)
undef_macros += undef_template % "APPLY_SPECIFIC"
define_macros.append(define_template % ("APPLY_SPECIFIC(str)",
"str##_%s" % name))
undef_macros.append(undef_template % "APPLY_SPECIFIC")
return os.linesep.join(define_macros), os.linesep.join(undef_macros)
def _lquote_macro(self, txt):
res = []
spl = txt.split('\n')
for l in spl[:-1]:
res.append(l + ' \\')
res.append(spl[-1])
return os.linesep.join(res)
def get_sub_macros(self, sub):
define_macros = []
undef_macros = []
define_macros.append("#define FAIL %s" %
(self._lquote_macro(sub['fail']),))
undef_macros.append("#undef FAIL")
if 'context' in sub:
define_macros.append("#define CONTEXT %s" % (sub['context'],))
undef_macos.append("#undef CONTEXT")
return os.linesep.join(define_macros), os.linesep.join(undef_macros)
def get_io_macros(self, inputs, outputs):
define_macros = []
undef_macros = []
for i, inp in enumerate(inputs):
define_macros.append("#define INPUT_%d %s" (i, inp))
undef_macros.append("#undef INPUT_%d", (i,))
for i, out in enumerate(outputs):
define_macros.append("#define OUTPUT_%d %s" (i, inp))
undef_macros.append("#undef OUTPUT_%d", (i,))
def c_init_code_struct(self, node, name, sub):
if 'init_code_struct' in self.code_sections:
op_code = self.code_sections['init_code_struct']
def_macros, undef_macros = self.get_c_macros(node, name)
def_sub, undef_sub = self.get_sub_macros(sub)
return os.linesep.join([def_macros, def_sub,
op_code,
undef_sub, undef_macros])
else:
raise utils.MethodNotDefined(
'c_init_code_struct', type(self), type(self).__name__)
return define_macros, undef_macros
def c_code(self, node, name, inp, out, sub):
if self.func_name is not None:
assert 'code' not in self.code_sections
func_name = self.func_name
func_args = self.format_c_function_args(inp, out)
fail = sub['fail']
define_macros, undef_macros = self.get_c_macros(node, name,
check_input=False)
# Generate the C code
return """
%(define_macros)s
{
if (%(func_name)s(%(func_args)s) != 0) {
%(fail)s
}
}
%(undef_macros)s
""" % dict(func_name=self.func_name, fail=sub['fail'],
func_args=self.format_c_function_args(inp, out),
define_macros=define_macros, undef_macros=undef_macros)
else:
if 'code' in self.code_sections:
op_code = self.code_sections['code']
func_name = self.func_name
func_args = self.format_c_function_args(inp, out)
fail = sub['fail']
# Generate the code to define/undefine the C macros
define_macros, undef_macros = self.get_c_macros(node, name)
# Generate the C code
c_code = """
%(define_macros)s
{
int result = %(func_name)s(%(func_args)s);
if (result != 0)
{
%(fail)s;
}
}
%(undef_macros)s
""" % locals()
def_macros, undef_macros = self.get_c_macros(node, name)
def_sub, undef_sub = self.get_sub_macros(sub)
def_io, undef_io = self.get_io_macros(inp, out)
return c_code
return os.linesep.join([def_macros, def_sub, def_io,
op_code,
undef_io, undef_sub, undef_macros])
else:
raise utils.MethodNotDefined(
'c_code', type(self), type(self).__name__)
def c_code_cleanup(self, node, name, inputs, outputs, sub):
if 'code_cleanup' in self.code_sections:
op_code = self.code_sections['code_cleanup']
def_macros, undef_macros = self.get_c_macros(node, name)
def_sub, undef_sub = self.get_sub_macros(sub)
def_io, undef_io = self.get_io_macros(inp, out)
return os.linesep.join([def_macros, def_sub, def_io,
op_code,
undef_io, undef_sub, undef_macros])
else:
raise utils.MethodNotDefined(
'c_code_cleanup', type(self), type(self).__name__)
......@@ -3,6 +3,12 @@
#include <cudnn.h>
#ifndef CUDNN_VERSION
#include <assert.h>
// Here we define the R2 API in terms of functions in the R1 interface
// This is only for what we use
static inline const char *cudnnGetErrorString(cudnnStatus_t err) {
switch (err) {
case CUDNN_STATUS_SUCCESS:
......@@ -28,4 +34,118 @@ static inline const char *cudnnGetErrorString(cudnnStatus_t err) {
}
}
// some macros to help support cudnn R1 while using R2 code.
#define cudnnCreateTensorDescriptor cudnnCreateTensor4dDescriptor
#define cudnnDestroyTensorDescriptor cudnnDestroyTensor4dDescriptor
#define cudnnSetFilter4dDescriptor cudnnSetFilterDescriptor
typedef cudnnTensor4dDescriptor_t cudnnTensorDescriptor_t;
static inline cudnnStatus_t
cudnnGetConvolution2dForwardOutputDim(
const cudnnConvolutionDescriptor_t convDesc,
const cudnnTensorDescriptor_t inputTensorDesc,
const cudnnFilterDescriptor_t filterDesc,
int *n,
int *c,
int *h,
int *w) {
return cudnnGetOutputTensor4dDim(convDesc, CUDNN_CONVOLUTION_FWD,
n, c, h, w);
}
typedef int cudnnConvolutionFwdAlgo_t;
typedef int cudnnConvolutionFwdPreference_t;
#define CUDNN_CONVOLUTION_FWD_NO_WORKSPACE 0
static inline cudnnStatus_t
cudnnGetConvolutionForwardAlgorithm(
cudnnHandle_t handle,
const cudnnTensorDescriptor_t srcDesc,
const cudnnFilterDescriptor_t filterDesc,
const cudnnConvolutionDescriptor_t convDesc,
const cudnnTensorDescriptor_t destDesc,
cudnnConvolutionFwdPreference_t preference,
size_t memoryLimitInbytes,
cudnnConvolutionFwdAlgo_t *algo) {
*algo = 0;
return CUDNN_STATUS_SUCCESS;
}
static inline cudnnStatus_t
cudnnConvolutionForward_v2(
cudnnHandle_t handle,
const void *alpha,
const cudnnTensorDescriptor_t srcDesc,
const void *srcData,
const cudnnFilterDescriptor_t filterDesc,
const void *filterData,
const cudnnConvolutionDescriptor_t convDesc,
cudnnConvolutionFwdAlgo_t algo,
void *workSpace,
size_t workSpaceSizeInBytes,
const void *beta,
const cudnnTensorDescriptor_t destDesc,
void *destData) {
assert(*(float *)alpha == 1.0);
assert(*(float *)beta == 0.0);
return cudnnConvolutionForward(handle, srcDesc, srcData,
filterDesc, filterData,
convDesc, destDesc, destData,
CUDNN_RESULT_NO_ACCUMULATE);
}
#define cudnnConvolutionForward cudnnConvolutionForward_v2
static inline cudnnStatus_t
cudnnConvolutionBackwardFilter_v2(
cudnnHandle_t handle,
const void *alpha,
const cudnnTensorDescriptor_t srcDesc,
const void *srcData,
const cudnnTensorDescriptor_t diffDesc,
const void *diffData,
const cudnnConvolutionDescriptor_t convDesc,
const void *beta,
const cudnnFilterDescriptor_t gradDesc,
void *gradData) {
assert(*(float *)alpha == 1.0);
assert(*(float *)beta == 0.0);
return cudnnConvolutionBackwardFilter(handle, srcDesc, srcData,
diffDesc, diffData,
convDesc, gradDesc, gradData,
CUDNN_RESULT_NO_ACCUMULATE);
}
#define cudnnConvolutionBackwardFilter cudnnConvolutionBackwardFilter_v2
static inline cudnnStatus_t
cudnnConvolutionBackwardData_v2(
cudnnHandle_t handle,
const void *alpha,
const cudnnFilterDescriptor_t filterDesc,
const void *filterData,
const cudnnTensorDescriptor_t diffDesc,
const void *diffData,
const cudnnConvolutionDescriptor_t convDesc,
const void *beta,
const cudnnTensorDescriptor_t gradDesc,
void *gradData) {
assert(*(float *)alpha == 1.0);
assert(*(float *)beta == 0.0);
return cudnnConvolutionBackwardData(handle,
(cudnnFilterDescriptor_t)filterDesc,
filterData,
(cudnnTensorDescriptor_t)diffDesc,
diffData,
(cudnnConvolutionDescriptor_t)convDesc,
(cudnnTensorDescriptor_t)gradDesc,
gradData,
CUDNN_RESULT_NO_ACCUMULATE);
}
#define cudnnConvolutionBackwardData cudnnConvolutionBackwardData_v2
#endif
#endif
......@@ -2,8 +2,10 @@ import os
import theano
from theano import Apply, gof, tensor
from theano.gof import Optimizer, local_optimizer
from theano.gof.type import CDataType
from theano.scalar import as_scalar
from theano.gradient import DisconnectedType
from theano.gof import Optimizer, local_optimizer, COp
from theano.gof.type import CDataType, Generic
from theano.compat import PY3
from theano.tensor.nnet import SoftmaxGrad
from theano.sandbox.cuda.type import CudaNdarrayType
......@@ -50,12 +52,18 @@ if ((err = cudnnCreate(&_handle)) != CUDNN_STATUS_SUCCESS) {
try_run=True, output=True)
dnn_available.avail = comp and run
if dnn_available.avail:
dnn_available.msg = "cuDNN should work"
else:
if not dnn_available.avail:
dnn_available.msg = (
"Theano is not able to use cuDNN. We got this error: \n" +
str(err))
else:
v = version()
if isinstance(v, tuple) and v[0] != v[1]:
dnn_available.avail = False
dnn_available.msg = ("Mixed dnn version. The header is"
" from one version, but we link with"
" a different version %s" % str(v))
raise RuntimeError(dnn_available.msg)
return dnn_available.avail
......@@ -77,14 +85,25 @@ def c_set_tensor4d(var, desc, err, fail):
CudaNdarray_HOST_STRIDES(%(var)s)[3]?CudaNdarray_HOST_STRIDES(%(var)s)[3]:1
);
if (%(err)s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not set tensor4d descriptor: %%s",
cudnnGetErrorString(%(err)s));
PyErr_Format(PyExc_RuntimeError,
"could not set tensor4d descriptor: %%s"
"shapes=%%d %%d %%d %%d strides=%%d %%d %%d %%d",
cudnnGetErrorString(%(err)s),
CudaNdarray_HOST_DIMS(%(var)s)[0],
CudaNdarray_HOST_DIMS(%(var)s)[1],
CudaNdarray_HOST_DIMS(%(var)s)[2],
CudaNdarray_HOST_DIMS(%(var)s)[3],
CudaNdarray_HOST_STRIDES(%(var)s)[0]?CudaNdarray_HOST_STRIDES(%(var)s)[0]:CudaNdarray_HOST_DIMS(%(var)s)[2]*CudaNdarray_HOST_DIMS(%(var)s)[3]*CudaNdarray_HOST_DIMS(%(var)s)[1],
CudaNdarray_HOST_STRIDES(%(var)s)[1]?CudaNdarray_HOST_STRIDES(%(var)s)[1]:CudaNdarray_HOST_DIMS(%(var)s)[2]*CudaNdarray_HOST_DIMS(%(var)s)[3],
CudaNdarray_HOST_STRIDES(%(var)s)[2]?CudaNdarray_HOST_STRIDES(%(var)s)[2]:CudaNdarray_HOST_DIMS(%(var)s)[3],
CudaNdarray_HOST_STRIDES(%(var)s)[3]?CudaNdarray_HOST_STRIDES(%(var)s)[3]:1
);
%(fail)s
}
""" % dict(var=var, err=err, desc=desc, fail=fail)
class DnnBase(GpuOp):
class DnnBase(GpuOp, COp):
"""
Creates a handle for cudnn and pulls in the cudnn libraries and headers.
"""
......@@ -92,6 +111,9 @@ class DnnBase(GpuOp):
# the input broadcasting pattern.
check_broadcast = False
def __init__(self):
COp.__init__(self, "dnn_base.c")
def c_headers(self):
return ['cudnn.h', 'cudnn_helper.h']
......@@ -101,11 +123,6 @@ class DnnBase(GpuOp):
def c_libraries(self):
return ['cudnn']
def c_support_code(self):
return """
cudnnHandle_t _handle = NULL;
"""
def c_init_code(self):
if PY3:
error_out = "NULL"
......@@ -121,6 +138,52 @@ if ((err = cudnnCreate(&_handle)) != CUDNN_STATUS_SUCCESS) {
}""" % (error_out,)]
class DnnVersion(GpuOp):
def c_compiler(self):
return NVCC_compiler
def c_headers(self):
return ['cudnn.h']
def c_libraries(self):
return ['cudnn']
def make_node(self):
return Apply(self, [], [Generic()()])
def c_code(self, node, name, inputs, outputs, sub):
o = outputs[0]
return """
#if defined(CUDNN_VERSION)
%(o)s = PyTuple_Pack(2, PyInt_FromLong(CUDNN_VERSION), PyInt_FromLong(cudnnGetVersion()));
#else
%(o)s = PyInt_FromLong(-1);
#endif
""" % locals()
def do_constant_folding(self, node):
# Needed as we do not want to cache this information.
return False
def c_code_cache_version(self):
# Not needed, but make it clear that we do not want to cache this.
return None
def version():
"""
return the current cuDNN version we compile with.
This only check the header version, the the library we link with.
"""
if version.v is None:
f = theano.function([], DnnVersion()(),
theano.Mode(optimizer=None))
version.v = f()
return version.v
version.v = None
class GpuDnnConvDesc(GpuOp):
"""This Op builds a convolution descriptor for use in the other
convolution operations.
......@@ -216,6 +279,15 @@ class GpuDnnConvDesc(GpuOp):
PyErr_SetString(PyExc_ValueError, "bad border mode");
%(fail)s
}
#if defined(CUDNN_VERSION) && CUDNN_VERSION >= 20
err = cudnnSetConvolution2dDescriptor(
%(desc)s,
pad_h%(name)s,
pad_w%(name)s,
%(subsx)d, %(subsy)d, 1, 1,
%(conv_flag)s
);
#else
err = cudnnSetConvolutionDescriptorEx(
%(desc)s,
*(npy_int64 *)PyArray_GETPTR1(%(img_shape)s, 0),
......@@ -230,7 +302,7 @@ class GpuDnnConvDesc(GpuOp):
%(subsx)d, %(subsy)d, 1, 1,
%(conv_flag)s
);
#endif
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not set op descriptor: %%s",
cudnnGetErrorString(err));
......@@ -243,153 +315,10 @@ class GpuDnnConvDesc(GpuOp):
pad_h_spec=pad_h_spec, pad_w_spec=pad_w_spec)
def c_code_cache_version(self):
return (2,)
class GpuDnnConvBase(DnnBase):
__props__ = ()
def c_support_code_struct(self, node, name):
return """
cudnnTensor4dDescriptor_t input%(name)s;
cudnnTensor4dDescriptor_t output%(name)s;
cudnnFilterDescriptor_t kerns%(name)s;
""" % dict(name=name)
def c_init_code_struct(self, node, name, sub):
return """
cudnnStatus_t err%(name)s;
input%(name)s = NULL;
output%(name)s = NULL;
kerns%(name)s = NULL;
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&input%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor4d descriptor "
"(inp): %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
}
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&output%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor4d descriptor "
"(out): %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
}
if ((err%(name)s = cudnnCreateFilterDescriptor(&kerns%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate filter descriptor: %%s",
cudnnGetErrorString(err%(name)s));
%(fail)s
}
""" % dict(name=name, fail=sub['fail'])
def c_cleanup_code_struct(self, node, name):
return """
if (input%(name)s != NULL) {cudnnDestroyTensor4dDescriptor(input%(name)s);}
if (output%(name)s != NULL) {cudnnDestroyTensor4dDescriptor(output%(name)s);}
if (kerns%(name)s != NULL) {cudnnDestroyFilterDescriptor(kerns%(name)s);}
""" % dict(name=name)
def c_set_filter(self, var, desc, err, fail):
return """
%(err)s = cudnnSetFilterDescriptor(
%(desc)s, CUDNN_DATA_FLOAT,
CudaNdarray_HOST_DIMS(%(var)s)[0],
CudaNdarray_HOST_DIMS(%(var)s)[1],
CudaNdarray_HOST_DIMS(%(var)s)[2],
CudaNdarray_HOST_DIMS(%(var)s)[3]
);
if (%(err)s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not set filter descriptor: %%s",
cudnnGetErrorString(%(err)s));
%(fail)s
}
""" % dict(var=var, desc=desc, err=err, fail=fail)
def c_set_tensor4d(self, *arg):
return c_set_tensor4d(*arg)
def c_code(self, node, name, inputs, outputs, sub):
desc = inputs[2]
out, = outputs
checks = []
for v in inputs[:2]:
checks.append("""
if (!CudaNdarray_is_c_contiguous(%s)) {
PyErr_SetString(PyExc_ValueError, "Only contiguous inputs are supported.");
%s
}
""" % (v, sub['fail']))
sets = []
for p, v, d in zip(inputs[:2], self.conv_inputs, self.conv_types[:2]):
sets.append(getattr(self, 'c_set_'+d)(p, v + name,
'err' + name, sub['fail']))
set_out = getattr(self, 'c_set_' + self.conv_types[2])(
out, self.conv_output + name, 'err' + name,
sub['fail'])
return """
cudnnStatus_t err%(name)s;
%(checks)s
%(sets)s
{
int out_dims[4];
err%(name)s = cudnnGetOutputTensor4dDim(
%(desc)s, %(path)s,
&out_dims[0], &out_dims[1],
&out_dims[2], &out_dims[3]
);
if (err%(name)s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not get output sizes: %%s",
cudnnGetErrorString(err%(name)s));
%(fail)s
}
// workaround for cudnn R1 bug
if (%(path)s == CUDNN_CONVOLUTION_WEIGHT_GRAD &&
(out_dims[0] != CudaNdarray_HOST_DIMS(%(input2)s)[1] ||
out_dims[1] != CudaNdarray_HOST_DIMS(%(input1)s)[1])) {
out_dims[0] = CudaNdarray_HOST_DIMS(%(input2)s)[1];
out_dims[1] = CudaNdarray_HOST_DIMS(%(input1)s)[1];
// This is a horrible hack that is unfortulately necessary
int *dd = (int *)%(desc)s;
out_dims[2] = dd[5];
out_dims[3] = dd[6];
}
if (CudaNdarray_prep_output(&%(out)s, 4, out_dims) != 0) {
%(fail)s
}
}
%(set_out)s
err%(name)s = %(method)s(
_handle,
%(input1_desc)s, CudaNdarray_DEV_DATA(%(input1)s),
%(input2_desc)s, CudaNdarray_DEV_DATA(%(input2)s),
%(desc)s,
%(output_desc)s, CudaNdarray_DEV_DATA(%(out)s),
CUDNN_RESULT_NO_ACCUMULATE
);
if (err%(name)s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "error doing operation: %%s",
cudnnGetErrorString(err%(name)s));
%(fail)s
}
""" % dict(out=out, desc=desc, fail=sub['fail'],
name=name, checks='\n'.join(checks), sets='\n'.join(sets),
set_out=set_out, input1=inputs[0], input2=inputs[1],
input1_desc=self.conv_inputs[0]+name,
input2_desc=self.conv_inputs[1]+name,
output_desc=self.conv_output+name,
method=self.conv_op, path=self.path_flag)
def c_code_cache_version(self):
return (8,)
return (2, version())
class GpuDnnConv(GpuDnnConvBase):
class GpuDnnConv(DnnBase, COp):
"""
The forward convolution.
......@@ -398,11 +327,11 @@ class GpuDnnConv(GpuDnnConvBase):
:param descr: the convolution descriptor
"""
conv_inputs = 'input', 'kerns'
conv_output = 'output'
conv_types = 'tensor4d', 'filter', 'tensor4d'
conv_op = 'cudnnConvolutionForward'
path_flag = 'CUDNN_CONVOLUTION_FWD'
__props__ = ()
def __init__(self):
COp.__init__(self, ["dnn_base.c", "dnn_conv_base.c", "dnn_fwd.c"],
"APPLY_SPECIFIC(conv_fwd)")
def make_node(self, img, kern, desc):
img = as_cuda_ndarray_variable(img)
......@@ -428,8 +357,10 @@ class GpuDnnConv(GpuDnnConvBase):
top = gpu_contiguous(top)
d_img = GpuDnnConvGradI()(kerns, top, desc)
d_kerns = GpuDnnConvGradW()(img, top, desc)
d_img = GpuDnnConvGradI()(kerns, top, desc,
img.shape[2], img.shape[3])
d_kerns = GpuDnnConvGradW()(img, top, desc,
kerns.shape[2], kerns.shape[3])
return d_img, d_kerns, theano.gradient.DisconnectedType()()
......@@ -438,7 +369,7 @@ class GpuDnnConv(GpuDnnConvBase):
return [[1], [1], [0]]
class GpuDnnConvGradW(GpuDnnConvBase):
class GpuDnnConvGradW(DnnBase, COp):
"""
The convolution gradient with respect to the weights.
......@@ -447,29 +378,30 @@ class GpuDnnConvGradW(GpuDnnConvBase):
:param descr: the convolution descriptor
"""
__props__ = ()
conv_inputs = 'input', 'output',
conv_output = 'kerns'
conv_types = 'tensor4d', 'tensor4d', 'filter'
path_flag = 'CUDNN_CONVOLUTION_WEIGHT_GRAD'
conv_op = 'cudnnConvolutionBackwardFilter'
def __init__(self):
COp.__init__(self, ["dnn_base.c", "dnn_conv_base.c", "dnn_gw.c"],
"APPLY_SPECIFIC(conv_gw)")
def grad(self, inp, grads):
img, top, desc = inp
img, top, desc, h, w = inp
kerns, = grads
kerns = gpu_contiguous(kerns)
d_img = GpuDnnConvGradI()(kerns, top, desc)
d_img = GpuDnnConvGradI()(kerns, top, desc,
img.shape[2], img.shape[3])
d_top = GpuDnnConv()(img, kerns, desc)
return d_img, d_top, theano.gradient.DisconnectedType()()
return (d_img, d_top, DisconnectedType()(), DisconnectedType()(),
DisconnectedType()())
def connection_pattern(self, node):
# not connected to desc
return [[1], [1], [0]]
# not connected to desc, h, w
return [[1], [1], [0], [0], [0]]
def make_node(self, img, topgrad, desc):
def make_node(self, img, topgrad, desc, h, w):
img = as_cuda_ndarray_variable(img)
topgrad = as_cuda_ndarray_variable(topgrad)
if img.type.ndim != 4:
......@@ -481,14 +413,18 @@ class GpuDnnConvGradW(GpuDnnConvBase):
or desc.type.ctype != 'cudnnConvolutionDescriptor_t':
raise TypeError('desc must be cudnnConvolutionDescriptor_t')
h = as_scalar(h)
w = as_scalar(w)
broadcastable = [topgrad.type.broadcastable[1],
img.type.broadcastable[1],
False, False]
return Apply(self, [img, topgrad, desc],
return Apply(self, [img, topgrad, desc, h, w],
[CudaNdarrayType(broadcastable)()])
class GpuDnnConvGradI(GpuDnnConvBase):
class GpuDnnConvGradI(DnnBase, COp):
"""
The convolution gradient with respect to the inputs.
......@@ -497,29 +433,29 @@ class GpuDnnConvGradI(GpuDnnConvBase):
:param descr: the convolution descriptor
"""
__props__ = ()
conv_inputs = 'kerns', 'output',
conv_output = 'input'
conv_types = 'filter', 'tensor4d', 'tensor4d'
path_flag = 'CUDNN_CONVOLUTION_DATA_GRAD'
conv_op = 'cudnnConvolutionBackwardData'
def __init__(self):
COp.__init__(self, ["dnn_base.c", "dnn_conv_base.c", "dnn_gi.c"],
"APPLY_SPECIFIC(conv_gi)")
def grad(self, inp, grads):
kerns, top, desc = inp
kerns, top, desc, h, w = inp
img, = grads
img = gpu_contiguous(img)
d_kerns = GpuDnnConvGradW()(img, top, desc)
d_kerns = GpuDnnConvGradW()(img, top, desc,
kerns.shape[2], kerns.shape[3])
d_top = GpuDnnConv()(img, kerns, desc)
return d_kerns, d_top, theano.gradient.DisconnectedType()()
return (d_kerns, d_top, DisconnectedType()(), DisconnectedType()(),
DisconnectedType()())
def connection_pattern(self, node):
# not connected to desc
return [[1], [1], [0]]
# not connected to desc, h, w
return [[1], [1], [0], [0], [0]]
def make_node(self, kern, topgrad, desc):
def make_node(self, kern, topgrad, desc, h, w):
kern = as_cuda_ndarray_variable(kern)
topgrad = as_cuda_ndarray_variable(topgrad)
if kern.type.ndim != 4:
......@@ -531,10 +467,14 @@ class GpuDnnConvGradI(GpuDnnConvBase):
or desc.type.ctype != 'cudnnConvolutionDescriptor_t':
raise TypeError('desc must be cudnnConvolutionDescriptor_t')
h = as_scalar(h)
w = as_scalar(w)
broadcastable = [topgrad.type.broadcastable[0],
kern.type.broadcastable[1],
False, False]
return Apply(self, [kern, topgrad, desc],
return Apply(self, [kern, topgrad, desc, h, w],
[CudaNdarrayType(broadcastable)()])
......@@ -581,7 +521,7 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
img.shape[3] - kerns.shape[3] + 1)
desc = GpuDnnConvDesc(border_mode='valid', subsample=(1, 1),
conv_mode='cross')(img.shape, shape)
conv = GpuDnnConvGradW()(img, kerns, desc)
conv = GpuDnnConvGradW()(img, kerns, desc, shape[2], shape[3])
return as_cuda_ndarray_variable(conv.dimshuffle(1, 0, 2, 3))
elif (border_mode == 'full' and subsample == (1, 1) and
......@@ -597,7 +537,7 @@ def dnn_conv(img, kerns, border_mode='valid', subsample=(1, 1),
img.shape[3] + kerns.shape[3] - 1)
desc = GpuDnnConvDesc(border_mode='valid', subsample=(1, 1),
conv_mode=conv_mode)(shape, kerns.shape)
return GpuDnnConvGradI()(kerns, img, desc)
return GpuDnnConvGradI()(kerns, img, desc, shape[2], shape[3])
# Standard case: We use GpuDnnConv with suitable padding.
img = gpu_contiguous(img)
......@@ -664,14 +604,22 @@ class GpuDnnPoolDesc(GpuOp):
"descriptor: %%s", cudnnGetErrorString(err));
%(fail)s
}
#ifndef CUDNN_VERSION
err = cudnnSetPoolingDescriptor(
%(desc)s,
%(mode_flag)s,
%(wsX)d, %(wsY)d,
%(stridex)d, %(stridey)d
);
#else
err = cudnnSetPooling2dDescriptor(
%(desc)s,
%(mode_flag)s,
%(wsX)d, %(wsY)d,
0, 0,
%(stridex)d, %(stridey)d
);
#endif
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not set op descriptor: %%s",
cudnnGetErrorString(err));
......@@ -683,7 +631,7 @@ class GpuDnnPoolDesc(GpuOp):
stridey=self.stride[1])
def c_code_cache_version(self):
return (1,)
return (1, version())
class GpuDnnPool(DnnBase):
......@@ -709,8 +657,8 @@ class GpuDnnPool(DnnBase):
def c_support_code_struct(self, node, name):
return """
cudnnTensor4dDescriptor_t input%(name)s;
cudnnTensor4dDescriptor_t output%(name)s;
cudnnTensorDescriptor_t input%(name)s;
cudnnTensorDescriptor_t output%(name)s;
""" % dict(name=name)
def c_init_code_struct(self, node, name, sub):
......@@ -718,12 +666,12 @@ cudnnTensor4dDescriptor_t output%(name)s;
cudnnStatus_t err%(name)s;
input%(name)s = NULL;
output%(name)s = NULL;
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&input%(name)s)) != CUDNN_STATUS_SUCCESS) {
if ((err%(name)s = cudnnCreateTensorDescriptor(&input%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor4d descriptor "
"(inp): %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
}
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&output%(name)s)) != CUDNN_STATUS_SUCCESS) {
if ((err%(name)s = cudnnCreateTensorDescriptor(&output%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor4d descriptor "
"(out): %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
......@@ -732,8 +680,8 @@ if ((err%(name)s = cudnnCreateTensor4dDescriptor(&output%(name)s)) != CUDNN_STAT
def c_cleanup_code_struct(self, node, name):
return """
if (input%(name)s != NULL) { cudnnDestroyTensor4dDescriptor(input%(name)s); }
if (output%(name)s != NULL) { cudnnDestroyTensor4dDescriptor(output%(name)s); }
if (input%(name)s != NULL) { cudnnDestroyTensorDescriptor(input%(name)s); }
if (output%(name)s != NULL) { cudnnDestroyTensorDescriptor(output%(name)s); }
""" % dict(name=name)
def c_code(self, node, name, inputs, outputs, sub):
......@@ -759,9 +707,19 @@ if (!CudaNdarray_is_c_contiguous(%(input)s)) {
%(set_in)s
cudnnPoolingMode_t mode;
int wsX, wsY, strideX, strideY;
err%(name)s = cudnnGetPoolingDescriptor(%(desc)s, &mode, &wsX, &wsY, &strideX, &strideY);
int wsX, wsY, vpad, hpad, strideX, strideY;
#ifndef CUDNN_VERSION
err%(name)s = cudnnGetPoolingDescriptor(
%(desc)s, &mode,
&wsX, &wsY,
&strideX, &strideY);
#else
err%(name)s = cudnnGetPooling2dDescriptor(
%(desc)s, &mode,
&wsX, &wsY,
&vpad, &hpad,
&strideX, &strideY);
#endif
if (err%(name)s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
......@@ -781,13 +739,27 @@ if (CudaNdarray_prep_output(&%(out)s, 4, %(out)s_dims) != 0)
}
%(set_out)s
#ifndef CUDNN_VERSION
err%(name)s = cudnnPoolingForward(
_handle,
%(desc)s,
%(input_desc)s, CudaNdarray_DEV_DATA(%(input)s),
%(output_desc)s, CudaNdarray_DEV_DATA(%(out)s)
);
#else
{
const float alpha = 1;
const float beta = 0;
err%(name)s = cudnnPoolingForward(
_handle,
%(desc)s,
&alpha,
%(input_desc)s, CudaNdarray_DEV_DATA(%(input)s),
&beta,
%(output_desc)s, CudaNdarray_DEV_DATA(%(out)s)
);
}
#endif
if (err%(name)s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
"GpuDnnPool: error doing cudnnPoolingForward operation: %%s",
......@@ -817,7 +789,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) {
return [[1], [0]]
def c_code_cache_version(self):
return (4,)
return (4, version())
class GpuDnnPoolGrad(DnnBase):
......@@ -853,10 +825,10 @@ class GpuDnnPoolGrad(DnnBase):
def c_support_code_struct(self, node, name):
return """
cudnnTensor4dDescriptor_t input%(name)s;
cudnnTensor4dDescriptor_t input_grad%(name)s;
cudnnTensor4dDescriptor_t output%(name)s;
cudnnTensor4dDescriptor_t output_grad%(name)s;
cudnnTensorDescriptor_t input%(name)s;
cudnnTensorDescriptor_t input_grad%(name)s;
cudnnTensorDescriptor_t output%(name)s;
cudnnTensorDescriptor_t output_grad%(name)s;
""" % dict(name=name)
def c_init_code_struct(self, node, name, sub):
......@@ -866,25 +838,25 @@ input%(name)s = NULL;
input_grad%(name)s = NULL;
output%(name)s = NULL;
output_grad%(name)s = NULL;
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&input%(name)s)) != CUDNN_STATUS_SUCCESS) {
if ((err%(name)s = cudnnCreateTensorDescriptor(&input%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError,
"GpuDnnPoolGrad: could not allocate tensor4d descriptor "
"(input): %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
}
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&input_grad%(name)s)) != CUDNN_STATUS_SUCCESS) {
if ((err%(name)s = cudnnCreateTensorDescriptor(&input_grad%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError,
"GpuDnnPoolGrad: could not allocate tensor4d descriptor "
"(input_grad): %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
}
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&output%(name)s)) != CUDNN_STATUS_SUCCESS) {
if ((err%(name)s = cudnnCreateTensorDescriptor(&output%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError,
"GpuDnnPoolGrad: could not allocate tensor4d descriptor "
"(output): %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
}
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&output_grad%(name)s)) != CUDNN_STATUS_SUCCESS) {
if ((err%(name)s = cudnnCreateTensorDescriptor(&output_grad%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError,
"GpuDnnPoolGrad: could not allocate tensor4d descriptor "
"(output_grad): %%s", cudnnGetErrorString(err%(name)s));
......@@ -894,10 +866,10 @@ if ((err%(name)s = cudnnCreateTensor4dDescriptor(&output_grad%(name)s)) != CUDNN
def c_cleanup_code_struct(self, node, name):
return """
if (input%(name)s != NULL) { cudnnDestroyTensor4dDescriptor(input%(name)s); }
if (input_grad%(name)s != NULL) { cudnnDestroyTensor4dDescriptor(input_grad%(name)s); }
if (output%(name)s != NULL) { cudnnDestroyTensor4dDescriptor(output%(name)s); }
if (output_grad%(name)s != NULL) { cudnnDestroyTensor4dDescriptor(output_grad%(name)s); }
if (input%(name)s != NULL) { cudnnDestroyTensorDescriptor(input%(name)s); }
if (input_grad%(name)s != NULL) { cudnnDestroyTensorDescriptor(input_grad%(name)s); }
if (output%(name)s != NULL) { cudnnDestroyTensorDescriptor(output%(name)s); }
if (output_grad%(name)s != NULL) { cudnnDestroyTensorDescriptor(output_grad%(name)s); }
""" % dict(name=name)
def c_code(self, node, name, inputs, outputs, sub):
......@@ -949,7 +921,7 @@ if (CudaNdarray_prep_output(&%(output_grad)s, 4,
}
%(set_out)s
#ifndef CUDNN_VERSION
err%(name)s = cudnnPoolingBackward(
_handle,
%(desc)s,
......@@ -958,6 +930,22 @@ _handle,
%(output_desc)s, CudaNdarray_DEV_DATA(%(output)s),
%(output_grad_desc)s, CudaNdarray_DEV_DATA(%(output_grad)s)
);
#else
{
const float alpha = 1;
const float beta = 0;
err%(name)s = cudnnPoolingBackward(
_handle,
%(desc)s,
&alpha,
%(input_desc)s, CudaNdarray_DEV_DATA(%(input)s),
%(input_grad_desc)s, CudaNdarray_DEV_DATA(%(input_grad)s),
%(output_desc)s, CudaNdarray_DEV_DATA(%(output)s),
&beta,
%(output_grad_desc)s, CudaNdarray_DEV_DATA(%(output_grad)s)
);
}
#endif
if (err%(name)s != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
"GpuDnnPoolGrad: error doing operation: %%s",
......@@ -974,7 +962,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) {
output_grad_desc="output_grad"+name)
def c_code_cache_version(self):
return (4,)
return (4, version())
def dnn_pool(img, ws, stride=(1, 1), mode='max'):
......@@ -1015,6 +1003,7 @@ class GpuDnnSoftmaxBase(DnnBase):
def __init__(self, tensor_format, algo, mode):
assert(tensor_format in ('bc01', 'b01c'))
DnnBase.__init__(self)
self.tensor_format = tensor_format
assert(algo in ('fast', 'accurate'))
......@@ -1029,14 +1018,14 @@ class GpuDnnSoftmaxBase(DnnBase):
def _define_tensor4d_desc(self, name, id):
return """
cudnnTensor4dDescriptor_t %(id)s_%(name)s;
cudnnTensorDescriptor_t %(id)s_%(name)s;
""" % dict(name=name, id=id)
def _init_tensor4d_desc(self, name, id, fail):
return """
%(id)s_%(name)s = NULL;
if ((err%(name)s = cudnnCreateTensor4dDescriptor(&%(id)s_%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor4d descriptor "
if ((err%(name)s = cudnnCreateTensorDescriptor(&%(id)s_%(name)s)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor descriptor "
": %%s", cudnnGetErrorString(err%(name)s));
%(fail)s
}
......@@ -1045,7 +1034,7 @@ if ((err%(name)s = cudnnCreateTensor4dDescriptor(&%(id)s_%(name)s)) != CUDNN_STA
def _clean_tensor4d_desc(self, name, id):
return """
if(%(id)s_%(name)s!= NULL)
cudnnDestroyTensor4dDescriptor(%(id)s_%(name)s);
cudnnDestroyTensorDescriptor(%(id)s_%(name)s);
""" % dict(name=name, id=id)
def c_support_code_struct(self, node, name):
......@@ -1102,8 +1091,7 @@ if (%(algo)d == 1)
cudnnSoftmaxMode_t mode%(name)s = CUDNN_SOFTMAX_MODE_CHANNEL;
if (%(mode)d == 1)
mode%(name)s = CUDNN_SOFTMAX_MODE_INSTANCE;
""" % dict(name=name,
tensor_format=tensor_format, mode=mode, algo=algo)
""" % dict(name=name, tensor_format=tensor_format, mode=mode, algo=algo)
# Validate the input and build the input variables.
for input_idx, input_name in enumerate(self.softmax_inputs):
......@@ -1134,7 +1122,7 @@ if (CudaNdarray_prep_output(&%(outs)s, 4, CudaNdarray_HOST_DIMS(%(ins)s)) != 0)
return result
def c_code_cache_version(self):
return (0, 6)
return (0, 6, version())
def method(self):
raise NotImplementedError('GpuDnnSoftmaxBase::method')
......@@ -1150,15 +1138,33 @@ class GpuDnnSoftmax(GpuDnnSoftmaxBase):
def method(self):
return """
#ifndef CUDNN_VERSION
err%(name)s = cudnnSoftmaxForward(
_handle,
algo%(name)s,
mode%(name)s,
softmax_input_%(name)s,
CudaNdarray_DEV_DATA(%(ins)s),
softmax_output_%(name)s,
CudaNdarray_DEV_DATA(%(outs)s)
);
#else
{
const float alpha = 1.;
const float beta = 0.;
err%(name)s = cudnnSoftmaxForward(
_handle,
algo%(name)s,
mode%(name)s,
(void*) &alpha,
softmax_input_%(name)s,
CudaNdarray_DEV_DATA(%(ins)s),
(void*) &beta,
softmax_output_%(name)s,
CudaNdarray_DEV_DATA(%(outs)s)
);
}
#endif
"""
def grad(self, inp, grads):
......@@ -1184,6 +1190,7 @@ class GpuDnnSoftmaxGrad(GpuDnnSoftmaxBase):
def method(self):
return """
#ifndef CUDNN_VERSION
err%(name)s = cudnnSoftmaxBackward(
_handle,
algo%(name)s,
......@@ -1195,7 +1202,26 @@ err%(name)s = cudnnSoftmaxBackward(
softmax_output_%(name)s,
CudaNdarray_DEV_DATA(%(outs)s)
);
"""
#else
{
const float alpha = 1.;
const float beta = 0.;
err%(name)s = cudnnSoftmaxBackward(
_handle,
algo%(name)s,
mode%(name)s,
(void*) &alpha,
%(name1)s_%(name)s,
CudaNdarray_DEV_DATA(%(ins1)s),
%(name0)s_%(name)s,
CudaNdarray_DEV_DATA(%(ins0)s),
(void*) &beta,
softmax_output_%(name)s,
CudaNdarray_DEV_DATA(%(outs)s)
);
}
#endif
"""
# Intentation for history
......
#section support_code
static cudnnHandle_t _handle = NULL;
static int
c_set_tensor4d(CudaNdarray *var, cudnnTensorDescriptor_t desc) {
cudnnStatus_t err = cudnnSetTensor4dDescriptorEx(
desc, CUDNN_DATA_FLOAT,
CudaNdarray_HOST_DIMS(var)[0],
CudaNdarray_HOST_DIMS(var)[1],
CudaNdarray_HOST_DIMS(var)[2],
CudaNdarray_HOST_DIMS(var)[3],
CudaNdarray_HOST_STRIDES(var)[0]?CudaNdarray_HOST_STRIDES(var)[0]:CudaNdarray_HOST_DIMS(var)[2]*CudaNdarray_HOST_DIMS(var)[3]*CudaNdarray_HOST_DIMS(var)[1],
CudaNdarray_HOST_STRIDES(var)[1]?CudaNdarray_HOST_STRIDES(var)[1]:CudaNdarray_HOST_DIMS(var)[2]*CudaNdarray_HOST_DIMS(var)[3],
CudaNdarray_HOST_STRIDES(var)[2]?CudaNdarray_HOST_STRIDES(var)[2]:CudaNdarray_HOST_DIMS(var)[3],
CudaNdarray_HOST_STRIDES(var)[3]?CudaNdarray_HOST_STRIDES(var)[3]:1
);
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
"Could not set tensor4d descriptor: %s"
"shapes=%d %d %d %d strides=%d %d %d %d",
cudnnGetErrorString(err),
CudaNdarray_HOST_DIMS(var)[0],
CudaNdarray_HOST_DIMS(var)[1],
CudaNdarray_HOST_DIMS(var)[2],
CudaNdarray_HOST_DIMS(var)[3],
CudaNdarray_HOST_STRIDES(var)[0]?CudaNdarray_HOST_STRIDES(var)[0]:CudaNdarray_HOST_DIMS(var)[2]*CudaNdarray_HOST_DIMS(var)[3]*CudaNdarray_HOST_DIMS(var)[1],
CudaNdarray_HOST_STRIDES(var)[1]?CudaNdarray_HOST_STRIDES(var)[1]:CudaNdarray_HOST_DIMS(var)[2]*CudaNdarray_HOST_DIMS(var)[3],
CudaNdarray_HOST_STRIDES(var)[2]?CudaNdarray_HOST_STRIDES(var)[2]:CudaNdarray_HOST_DIMS(var)[3],
CudaNdarray_HOST_STRIDES(var)[3]?CudaNdarray_HOST_STRIDES(var)[3]:1
);
return -1;
}
return 0;
}
static int
c_set_filter(CudaNdarray *var, cudnnFilterDescriptor_t desc) {
if (!CudaNdarray_is_c_contiguous(var)) {
PyErr_SetString(PyExc_ValueError,
"Only contiguous filters (kernels) are supported.");
return -1;
}
cudnnStatus_t err = cudnnSetFilter4dDescriptor(
desc, CUDNN_DATA_FLOAT,
CudaNdarray_HOST_DIMS(var)[0],
CudaNdarray_HOST_DIMS(var)[1],
CudaNdarray_HOST_DIMS(var)[2],
CudaNdarray_HOST_DIMS(var)[3]
);
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
"Could not set filter descriptor: %s."
" dims= %d %d %d %d",
cudnnGetErrorString(err),
CudaNdarray_HOST_DIMS(var)[0],
CudaNdarray_HOST_DIMS(var)[1],
CudaNdarray_HOST_DIMS(var)[2],
CudaNdarray_HOST_DIMS(var)[3]);
return -1;
}
return 0;
}
#section init_code
{
cudnnStatus_t err;
if ((err = cudnnCreate(&_handle)) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not create cuDNN handle: %s",
cudnnGetErrorString(err));
#if PYTHON_MAJOR_VERSION >= 3
return NULL;
#else
return;
#endif
}
}
#section support_code_struct
cudnnTensorDescriptor_t APPLY_SPECIFIC(input);
cudnnTensorDescriptor_t APPLY_SPECIFIC(output);
cudnnFilterDescriptor_t APPLY_SPECIFIC(kerns);
#section init_code_struct
cudnnStatus_t APPLY_SPECIFIC(err);
APPLY_SPECIFIC(input) = NULL;
APPLY_SPECIFIC(output) = NULL;
APPLY_SPECIFIC(kerns) = NULL;
if ((APPLY_SPECIFIC(err) = cudnnCreateTensorDescriptor(&APPLY_SPECIFIC(input))) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor4d descriptor "
"(inp): %s", cudnnGetErrorString(APPLY_SPECIFIC(err)));
FAIL;
}
if ((APPLY_SPECIFIC(err) = cudnnCreateTensorDescriptor(&APPLY_SPECIFIC(output))) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate tensor4d descriptor "
"(out): %s", cudnnGetErrorString(APPLY_SPECIFIC(err)));
FAIL;
}
if ((APPLY_SPECIFIC(err) = cudnnCreateFilterDescriptor(&APPLY_SPECIFIC(kerns))) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_MemoryError, "could not allocate filter descriptor: %s",
cudnnGetErrorString(APPLY_SPECIFIC(err)));
FAIL;
}
#section cleanup_code_struct
if (APPLY_SPECIFIC(input) != NULL)
cudnnDestroyTensorDescriptor(APPLY_SPECIFIC(input));
if (APPLY_SPECIFIC(output) != NULL)
cudnnDestroyTensorDescriptor(APPLY_SPECIFIC(output));
if (APPLY_SPECIFIC(kerns) != NULL)
cudnnDestroyFilterDescriptor(APPLY_SPECIFIC(kerns));
#section support_code_struct
int
APPLY_SPECIFIC(conv_fwd)(CudaNdarray *input, CudaNdarray *kerns,
cudnnConvolutionDescriptor_t desc,
CudaNdarray **output) {
cudnnStatus_t err = CUDNN_STATUS_SUCCESS;
if (c_set_tensor4d(input, APPLY_SPECIFIC(input)) == -1)
return 1;
if (c_set_filter(kerns, APPLY_SPECIFIC(kerns)) == -1)
return 1;
{
int out_dims[4];
err = cudnnGetConvolution2dForwardOutputDim(
desc,
APPLY_SPECIFIC(input),
APPLY_SPECIFIC(kerns),
&out_dims[0], &out_dims[1], &out_dims[2], &out_dims[3]);
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
"GpuDnnConv: error while computing the output shape: %s",
cudnnGetErrorString(err));
return 1;
}
if (CudaNdarray_prep_output(output, 4, out_dims) != 0) {
return 1;
}
}
if (c_set_tensor4d(*output, APPLY_SPECIFIC(output)) == -1)
return 1;
{
cudnnConvolutionFwdAlgo_t algo;
err = cudnnGetConvolutionForwardAlgorithm(
_handle,
APPLY_SPECIFIC(input),
APPLY_SPECIFIC(kerns),
desc,
APPLY_SPECIFIC(output),
CUDNN_CONVOLUTION_FWD_NO_WORKSPACE, // TODO: add op param
0,
&algo);
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError,
"GpuDnnConv: Couldn't select convolution algorithm: %s",
cudnnGetErrorString(err));
return 1;
}
const float alpha = 1;
const float beta = 0;
err = cudnnConvolutionForward(
_handle,
(void *)&alpha,
APPLY_SPECIFIC(input), CudaNdarray_DEV_DATA(input),
APPLY_SPECIFIC(kerns), CudaNdarray_DEV_DATA(kerns),
desc,
algo,
NULL, 0,
(void *)&beta,
APPLY_SPECIFIC(output), CudaNdarray_DEV_DATA(*output));
}
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "GpuDnnConv: error doing operation: %s",
cudnnGetErrorString(err));
return 1;
}
return 0;
}
#section support_code_struct
int
APPLY_SPECIFIC(conv_gi)(CudaNdarray *kerns, CudaNdarray *output,
cudnnConvolutionDescriptor_t desc,
int h, int w,
CudaNdarray **input) {
cudnnStatus_t err = CUDNN_STATUS_SUCCESS;
if (c_set_tensor4d(output, APPLY_SPECIFIC(output)) == -1)
return 1;
if (c_set_filter(kerns, APPLY_SPECIFIC(kerns)) == -1)
return 1;
{
int out_dims[4];
out_dims[0] = CudaNdarray_HOST_DIMS(output)[0];
out_dims[1] = CudaNdarray_HOST_DIMS(kerns)[1];
out_dims[2] = h;
out_dims[3] = w;
if (CudaNdarray_prep_output(input, 4, out_dims) != 0) {
return 1;
}
}
if (c_set_tensor4d(*input, APPLY_SPECIFIC(input)) == -1)
return 1;
{
const float alpha = 1;
const float beta = 0;
err = cudnnConvolutionBackwardData(
_handle,
(void *)&alpha,
APPLY_SPECIFIC(kerns), CudaNdarray_DEV_DATA(kerns),
APPLY_SPECIFIC(output), CudaNdarray_DEV_DATA(output),
desc,
(void *)&beta,
APPLY_SPECIFIC(input), CudaNdarray_DEV_DATA(*input));
}
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "GpuDnnConvGradI: error doing operation: %s",
cudnnGetErrorString(err));
return 1;
}
return 0;
}
#section support_code_struct
int
APPLY_SPECIFIC(conv_gw)(CudaNdarray *input, CudaNdarray *output,
cudnnConvolutionDescriptor_t desc,
int h, int w,
CudaNdarray **kerns) {
cudnnStatus_t err = CUDNN_STATUS_SUCCESS;
if (c_set_tensor4d(input, APPLY_SPECIFIC(input)) == -1)
return 1;
if (c_set_tensor4d(output, APPLY_SPECIFIC(output)) == -1)
return 1;
{
int out_dims[4];
out_dims[0] = CudaNdarray_HOST_DIMS(output)[1];
out_dims[1] = CudaNdarray_HOST_DIMS(input)[1];
out_dims[2] = h;
out_dims[3] = w;
if (CudaNdarray_prep_output(kerns, 4, out_dims) != 0) {
return 1;
}
}
if (c_set_filter(*kerns, APPLY_SPECIFIC(kerns)) == -1)
return 1;
{
const float alpha = 1;
const float beta = 0;
err = cudnnConvolutionBackwardFilter(
_handle,
(void *)&alpha,
APPLY_SPECIFIC(input), CudaNdarray_DEV_DATA(input),
APPLY_SPECIFIC(output), CudaNdarray_DEV_DATA(output),
desc,
(void *)&beta,
APPLY_SPECIFIC(kerns), CudaNdarray_DEV_DATA(*kerns));
}
if (err != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "GpuDnnConvGradW: error doing operation: %s",
cudnnGetErrorString(err));
return 1;
}
return 0;
}
......@@ -27,7 +27,7 @@ from theano.sandbox import cuda
if cuda.cuda_available == False:
raise SkipTest('Optional package cuda disabled')
from theano.sandbox.cuda.dnn import GpuDnnConv, GpuDnnConvBase, dnn_conv
from theano.sandbox.cuda.dnn import GpuDnnConv, DnnBase, dnn_conv
#needed as the gpu conv don't have a perform implementation.
if theano.config.mode == 'FAST_COMPILE':
......@@ -596,7 +596,7 @@ def test_gemm_valid():
def test_dnn_valid():
if not cuda.dnn.dnn_available():
raise SkipTest(cuda.dnn.dnn_available.msg)
for t in _test_valid(GpuDnnConvBase, mode=theano_mode.including("cudnn")):
for t in _test_valid(DnnBase, mode=theano_mode.including("cudnn")):
yield t
......@@ -710,7 +710,7 @@ def test_gemm_full():
def test_dnn_full():
if not cuda.dnn.dnn_available():
raise SkipTest(cuda.dnn.dnn_available.msg)
for t in _test_full(GpuDnnConvBase, mode=theano_mode.including("cudnn")):
for t in _test_full(DnnBase, mode=theano_mode.including("cudnn")):
yield t
......@@ -762,13 +762,13 @@ def test_gemm_subsample():
def test_dnn_subsample():
if not cuda.dnn.dnn_available():
raise SkipTest(cuda.dnn.dnn_available.msg)
for t in _test_subsample(GpuDnnConvBase, theano_mode.including('cudnn')):
for t in _test_subsample(DnnBase, theano_mode.including('cudnn')):
yield t
class TestConv2DGPU(unittest.TestCase):
conv_ops = (cuda.blas.GpuConv,
cuda.dnn.GpuDnnConvBase,
cuda.dnn.DnnBase,
cuda.blas.BaseGpuCorrMM)
def test_logical_shapes(self):
......
......@@ -192,3 +192,9 @@ def test_dnn_tag():
assert cuda.dnn.dnn_available()
assert any([isinstance(n.op, cuda.dnn.GpuDnnPool)
for n in f.maker.fgraph.toposort()])
def test_version():
if not cuda.dnn.dnn_available():
raise SkipTest(cuda.dnn.dnn_available.msg)
assert isinstance(cuda.dnn.version(), (int, tuple))
......@@ -82,6 +82,7 @@ get_scalar_type.cache = {}
def as_scalar(x, name=None):
from ..tensor import TensorType, scalar_from_tensor
if isinstance(x, gof.Apply):
if len(x.outputs) != 1:
raise ValueError("It is ambiguous which output of a multi-output"
......@@ -89,9 +90,12 @@ def as_scalar(x, name=None):
else:
x = x.outputs[0]
if isinstance(x, Variable):
if not isinstance(x.type, Scalar):
if isinstance(x.type, Scalar):
return x
elif isinstance(x.type, TensorType) and x.ndim == 0:
return scalar_from_tensor(x)
else:
raise TypeError("Variable type field must be a Scalar.", x, x.type)
return x
try:
return constant(x)
except TypeError:
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论