Merge pull request #2179 from carriepl/gof_COp

Adding new superclass for C ops

Merge pull request #2179 from carriepl/gof_COp
b6407cec · Frédéric Bastien · da527a0d · 1d603462 · b6407cec · b6407cec
--- a/doc/extending/other_ops.txt
+++ b/doc/extending/other_ops.txt
@@ -242,3 +242,41 @@ Numba Ops
 Want C speed without writing C code for your new Op? You can use Numba
 to generate the C code for you! Here is an `example
 Op <https://gist.github.com/nouiz/5492778#file-theano_op-py>`_ doing that.
+.. _alternate_theano_types:
+Alternate Theano Types
+======================
+Most ops in Theano are used to manipulate tensors. However, Theano also
+supports many other variable types. The supported types are listed below,
+along with pointers to the relevant documentation.
+*       :class:`TensorType <tensor.TensorType>` : Theano type that represents
+        a multidimensional array containing elements that all have the same
+        type. Variables of this Theano type are represented in C as objects of
+        class
+        `PyArrayObject <http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#PyArrayObject>`_.
+*       :ref:`TypedList <libdoc_typed_list>` : Theano type that represents a
+        typed list (a list where every element in the list has the same Theano
+        type). Variables of this Theano type are represented in C as objects
+        of class `PyListObject <https://docs.python.org/2/c-api/list.html>`_.
+*       :ref:`Scalar <libdoc_scalar>` : Theano type that represents a C
+        primitive type. The C type associated with this Theano type is the
+        represented C primitive itself.
+*       :ref:`SparseType <sparse_ops>` : Theano type used to represent sparse
+        tensors. There is no equivalent C type for this Theano Type but you
+        can split a sparse variable into its parts as TensorVariables. Those
+        can then be used as inputs to an op with C code.
+*       :class:`Generic <theano.gof.type.Generic>` : Theano type that
+        represents a simple Python Object. Variables of this Theano type are
+        represented in C as objects of class `PyObject
+        <https://docs.python.org/2/c-api/structures.html#c.PyObject>`_.
+*       :class:`CDataType <theano.gof.type.CDataType>` :  Theano type that
+        represents a C data type. The C type associated with this Theano type
+        depends on the data being represented.
--- a/doc/tutorial/extending_theano_c.txt
+++ b/doc/tutorial/extending_theano_c.txt
@@ -383,6 +383,15 @@ commonly used.
    while ``c_support_code()`` is for support code that is not specific to
    each apply.
+    Both ``c_support_code()`` and ``c_support_code_apply ()`` are necessary
+    because a Theano op can be used more than once in a given Theano
+    function. For example, an op that adds two matrices could be used at some
+    point in the Theano function to add matrices of integers and, at another
+    point, to add matrices of doubles. Because the dtype of the inputs and
+    outputs can change between different applies of the op, any support code
+    that relies on a certain dtype is specific to a given apply of the op and
+    should therefore be defined in ``c_support_code_apply()``.
 .. method:: c_code_cache_version()
    Returns a tuple of integers representing the version of the C code in this
@@ -664,3 +673,290 @@ C code.
            """
            return c_code % locals()
+Alternate way of defining C Ops
+===============================
+The two previous examples have covered the standard way of implementing C Ops
+in Theano by inheriting from the class :class:`Op`. This process is mostly
+simple but it still involves defining many methods as well as mixing, in the
+same file, both Python and C code which tends to make the result less
+readable.
+To help with this, Theano defines a class, ``COp``, from which new C ops
+can inherit. The class ``COp`` aims to simplify the process of implementing
+C ops by doing the following :
+*       It allows you to define the C implementation of your op in a distinct
+        C code file. This makes it easier to keep your Python and C code
+        readable and well indented.
+*       It automatically handles the methods :meth:`Op.c_code()`,
+        :meth:`Op.c_support_code()`, :meth:`Op.c_support_code_apply()` and
+        :meth:`Op.c_code_cache_version()` based on the provided external C
+        implementation.
+To illustrate how much simpler the class ``COp`` makes the process of defining
+a new op with a C implementation, let's revisit the second example of this
+tutorial, the ``VectorTimesVector`` op. In that example, we implemented an op
+to perform the task of element-wise vector-vector multiplication. The two
+following blocks of code illustrate what the op would look like if it was
+implemented using the ``COp`` class.
+The new op is defined inside a Python file with the following code :
+.. code-block:: python
+    import theano
+    from theano import gof
+    class VectorTimesVector(gof.COp):
+        __props__ = ()
+        func_file = "./vectorTimesVector.c"
+        func_name = "APPLY_SPECIFIC(vector_times_vector)"
+        def __init__(self):
+            super(VectorTimesVector, self).__init__(self.func_file,
+                                                    self.func_name)
+        def make_node(self, x, y):
+            # Validate the inputs' type
+            if x.type.ndim != 1:
+                raise TypeError('x must be a 1-d vector')
+            if y.type.ndim != 1:
+                raise TypeError('y must be a 1-d vector')
+            # Create an output variable of the same type as x
+            output_var = theano.tensor.TensorType(
+                            dtype=theano.scalar.upcast(x.dtype, y.dtype),
+                            broadcastable=[False])()
+            return gof.Apply(self, [x, y], [output_var])
+And the following is the C implementation of the op, defined in an external
+C file named vectorTimesVector.c :
+.. code-block:: c
+    THEANO_SUPPORT_CODE_SECTION
+    // Support code function
+    bool vector_same_shape(PyArrayObject* arr1, PyArrayObject* arr2)
+    {
+        return (PyArray_DIMS(arr1)[0] == PyArray_DIMS(arr2)[0]);
+    }
+    THEANO_APPLY_CODE_SECTION
+    // Apply-specific support function
+    void APPLY_SPECIFIC(vector_elemwise_mult)(
+        DTYPE_INPUT_0* x_ptr, int x_str,
+        DTYPE_INPUT_1* y_ptr, int y_str,
+        DTYPE_OUTPUT_0* z_ptr, int z_str, int nbElements)
+    {
+        for (int i=0; i < nbElements; i++){
+            z_ptr[i * z_str] = x_ptr[i * x_str] * y_ptr[i * y_str];
+        }
+    }
+    // Apply-specific main function
+    int APPLY_SPECIFIC(vector_times_vector)(PyArrayObject* input0,
+                                            PyArrayObject* input1,
+                                            PyArrayObject** output0)
+    {
+        // Validate that the inputs have the same shape
+        if ( !vector_same_shape(input0, input1))
+        {
+            PyErr_Format(PyExc_ValueError, "Shape mismatch : "
+                        "input0.shape[0] and input1.shape[0] should "
+                        "match but x.shape[0] == %i and "
+                        "y.shape[0] == %i",
+                        PyArray_DIMS(input0)[0], PyArray_DIMS(input1)[0]);
+            return 1;
+        }
+        // Validate that the output storage exists and has the same
+        // dimension as x.
+        if (NULL == *output0 || !(vector_same_shape(input0, *output0)))
+        {
+            /* Reference received to invalid output variable.
+            Decrease received reference's ref count and allocate new
+            output variable */
+            Py_XDECREF(*output0);
+            *output0 = (PyArrayObject*)PyArray_EMPTY(1,
+                                                    PyArray_DIMS(input0),
+                                                    TYPENUM_OUTPUT_0,
+                                                    0);
+            if (!*output0) {
+                PyErr_Format(PyExc_ValueError,
+                            "Could not allocate output storage");
+                return 1;
+            }
+        }
+        // Perform the actual vector-vector multiplication
+        APPLY_SPECIFIC(vector_elemwise_mult)(
+                                (DTYPE_INPUT_0*)PyArray_DATA(input0),
+                                PyArray_STRIDES(input0)[0] / ITEMSIZE_INPUT_0,
+                                (DTYPE_INPUT_1*)PyArray_DATA(input1),
+                                PyArray_STRIDES(input1)[0] / ITEMSIZE_INPUT_1,
+                                (DTYPE_OUTPUT_0*)PyArray_DATA(*output0),
+                                PyArray_STRIDES(*output0)[0] / ITEMSIZE_OUTPUT_0,
+                                PyArray_DIMS(input0)[0]);
+        return 0;
+    }
+As you can see from this example, the Python and C implementations are nicely
+decoupled which makes them much more readable than when they were intertwined
+in the same file and the C code contained string formatting markers.
+Now that we have motivated the COp class, we can have a more precise look at
+what it does for us. For this, we go through the various elements that make up
+this new version of the VectorTimesVector op :
+*       Parent class : instead of inheriting from the class :class:`Op`,
+        VectorTimesVector inherits from the class ``COp``.
+*       Constructor : in our new op, the ``__init__()`` method has an important
+        use; to inform the constructor of the ``COp`` class of the location,
+        on the filesystem of the C implementation of this op. To do this, it
+        gives the path of file containing the C code as well as the name of
+        the function, in that file, that should be called to perform the
+        computation.
+*       ``make_node()`` : the ``make_node()`` method is absolutely identical to
+        the one in our old example. Using the ``COp`` class doesn't change
+        anything here.
+*       External C code : the external C code performs the computation
+        associated with the op. It contains, at the very least, a 'main' function
+        having the same name as provided to the constructor of the Python class
+        ``COp``. Writing this C code involves a few subtleties which deserve their
+        own respective sections.
+Main function
+-------------
+The external C implementation must implement a main function whose name
+is passed by the op to the ``__init__()`` method of the ``COp`` class. This
+main C function must respect the following constraints :
+*       It must return an int. The value of that int indicates whether the
+        op could perform its task or not. A value of 0 indicates success while
+        any non-zero value will interrupt the execution of the Theano function.
+        Before returning a non-zero integer, the main function should call the
+        function ``PyErr_Format()`` to setup a Python exception.
+*       It must receive one pointer for each input to the op followed by one
+        pointer to a pointer for each output of the op.
+For example, the main C function of an op that takes two scalars as inputs and
+returns both their sum and the difference between them would have four
+parameters (two for the op's inputs and two for its outputs) and it's
+signature would look something like this :
+.. code-block:: c
+    int sumAndDiffOfScalars(PyArrayObject* in0, PyArrayObject* in1,
+                            PyArrayObject** out0, PyArrayObject** out1)
+Macros
+------
+The ``COp`` class defines a number of macros that can you can use in your C
+implementation to make it simpler and more generic.
+For every input array 'i' (indexed from 0) of the op, the following macros are
+defined:
+*       ``DTYPE_INPUT_{i}`` : NumPy dtype of the data in the array.
+        This is the variable type corresponding to the NumPy dtype, not the
+        string representation of the NumPy dtype. For instance, if the op's
+        first input is a float32 ndarray, then the macro ``DTYPE_INPUT_0``
+        corresponds to ``npy_float32`` and can directly be used to declare a
+        new variable of the same dtype as the data in the array :
+        .. code-block:: c
+            DTYPE_INPUT_0 myVar = someValue;
+*       ``TYPENUM_INPUT_{i}`` : Typenum of the data in the array
+*       ``ITEMSIZE_INPUT_{i}`` : Size, in bytes, of the elements in the array.
+In the same way, the macros ``DTYPE_OUTPUT_{i}``, ``ITEMSIZE_OUTPUT_{i}`` and
+``TYPENUM_OUTPUT_{i}``  are defined for every output 'i' of the op.
+The ``COp`` class also defines the macro ``APPLY_SPECIFIC(str)`` which will
+automatically append the name of the :ref:`Apply node that applies the Op at
+the end of the provided ``str``. The use of this macro is discussed below.
+You should be aware, however, that these macros are apply-specific. As such,
+any function that uses them is considered to contain apply-specific code.
+Support code
+------------
+The file whose name is provided to the ``COp`` class is not constrained to
+contain only one function. It can in fact contain many functions, with every
+function but the main one acting as support code.
+When we defined the VectorTimesVector op without using the ``COp`` class, we
+had to make a distinction between two types of support_code : the support
+code that was apply-specific and the support code that wasn't.
+The apply-specific code was defined in the ` c_support_code_apply()`` method
+and the elements defined in that code (global variables and functions) had to
+include the name of the Apply node in their own names to avoid conflicts
+between the different versions of the apply-specific code. The code that
+wasn't apply-specific was simply defined in the ``c_support_code()`` method.
+When using the ``COp`` class, we still have to make the distinction between
+apply-specific and apply-agnostic support code but we express it differently
+in the code since it is all defined in the same external C file.
+These two types of support code should each be defined in their own section of
+the file, like in the example above. These sections should be delimited by the
+markers ``THEANO_SUPPORT_CODE_SECTION`` (to be put on its own line, at the
+beginning of the apply-agnostic support code section) and
+``THEANO_APPLY_CODE_SECTION`` (to be put on its own line at the beginning of
+the apply-specific code section). Moreover, just like in the previous examples
+of this tutorial, apply-specific functions and global variables need to
+include the name of the :ref:`Apply` node in their names. To achieve this,
+the macro ``APPLY_SPECIFIC(str)`` should be used when defining those elements
+as well as when referring to them. In the above example, this macro is used
+when defining the functions ``vector_elemwise_mult()`` and
+``vector_times_vector()`` as well as when calling function
+``vector_elemwise_mult()`` from inside ``vector_times_vector()``.
+:note:
+    The macro ``APPLY_SPECIFIC(str)`` should only ever be used for
+    apply-specific code. It should not be used for apply-agnostic code.
+The rules for knowing if a piece of code should be put in the apply-agnostic
+or the apply-specific support code section of the file are simple. If it uses
+any of the macros defined by the class ``COp`` then it is apply-specific and
+goes in the corresponding section. If it calls any apply-specific code then
+it is apply-specific. Otherwise, it is apply-agnostic and goes in the
+apply-agnostic support code section.
+In the above example, the ``function vector_same_shape()`` is apply-agnostic
+because it uses none of the macros defined by the class ``COp`` and it doesn't
+rely on any apply-specific code. The function ``vector_elemwise_mult()`` is
+apply-specific because it uses the macros defined by ``COp``. Finally, the
+function ``vector_times_vector()`` is apply-specific because it uses those
+same macros and also because it calls ``vector_elemwise_mult()`` which is an
+apply-specific function.
+Final Note
+==========
+This tutorial focuses on providing C implementations to ops that manipulate
+Theano tensors. For more information about other Theano types, you can refer
+to the section :ref:`Alternate Theano Types <alternate_theano_types>`.
--- a/theano/gof/__init__.py
+++ b/theano/gof/__init__.py
@@ -55,7 +55,7 @@ from theano.gof.link import \
    Container, Linker, LocalLinker, PerformLinker, WrapLinker, WrapLinkerMany
 from theano.gof.op import \
-    Op, OpenMPOp, PureOp, ops_with_inner_function
+    Op, OpenMPOp, PureOp, COp, ops_with_inner_function
 from theano.gof.opt import (
    Optimizer,

--- a/theano/gof/op.py
+++ b/theano/gof/op.py
@@ -13,6 +13,8 @@ __contact__   = "theano-dev <theano-dev@googlegroups.com>"
 __docformat__ = "restructuredtext en"
 import logging
+import numpy
+import os
 import sys
 import warnings
@@ -974,3 +976,177 @@ int main( int argc, const char* argv[] )
        self.update_self_openmp()
        return super(OpenMPOp, self).make_thunk(node, storage_map,
                                                compute_map, no_recycling)
+class COp(Op):
+    """ Class to allow an op to have an external C implementation.
+    An op can use this class by inheriting from it and calling its
+    __init__() method, providing it with a path to an external file containing
+    the C implementation and the name of the function, in that file, to call
+    to perform the computations for the op.
+    """
+    def __init__(self, func_file, func_name):
+        self.func_file = func_file
+        self.func_name = func_name
+        # Define the markers that can be used to delimit sections in the
+        # external C code
+        self.support_code_marker = "THEANO_SUPPORT_CODE_SECTION"
+        self.apply_code_marker = "THEANO_APPLY_CODE_SECTION"
+        self.c_code_markers = [self.support_code_marker,
+                               self.apply_code_marker]
+        # Load the external C code
+        f = open(self.func_file, "r")
+        self.func_code = f.read()
+        f.close()
+        # Separate the contents of the file in sections and validate that at
+        # lest one of the necessary code sections has been defined
+        self.code_sections = self.parse_external_c_code(self.func_code)
+        if sum([marker in self.code_sections.keys()
+               for marker in self.c_code_markers]) == 0:
+            raise(RuntimeError, "The provided C implementation does not "
+                  "define a support code section or a support code apply "
+                  "section.")
+    def parse_external_c_code(self, code):
+        # Obtain the positions of the C code markers used in the C code
+        positions = [(code.index(marker), marker)
+                     for marker in self.c_code_markers if marker in code]
+        # Go over the markers in their order of occurence and extract
+        # the C code they concern
+        positions.sort()
+        code_sections = {}
+        for i in range(len(positions)):
+            marker_start, marker = positions[i]
+            if i < len(positions) - 1:
+                # This is not the last section in the code : extract the code
+                # between the beginning of the current marker and the
+                # beginning of the next one.
+                next_marker_start = positions[i+1][0]
+                section = code[marker_start: next_marker_start]
+            else:
+                # This is the last section in the code : extract the remaining
+                # C code
+                section = code[marker_start:]
+            cleaned_section = section.replace(marker, "")
+            code_sections[marker] = cleaned_section
+        return code_sections
+    def c_code_cache_version(self):
+        return hash(self.func_code)
+    def c_support_code(self):
+        if self.support_code_marker in self.code_sections:
+            return self.code_sections[self.support_code_marker]
+        else:
+            raise utils.MethodNotDefined("c_support_code",
+                type(self), self.__class__.__name__)
+    def c_support_code_apply(self, node, name):
+        if self.apply_code_marker in self.code_sections:
+            apply_code = self.code_sections[self.apply_code_marker]
+            if hasattr(self, 'check_inputs') and self.check_inputs == False:
+                return apply_code
+            else:
+                define_macros, undef_macros = self.get_c_macros(node, name)
+                return os.linesep.join([define_macros, apply_code,
+                                        undef_macros])
+        else:
+            raise utils.MethodNotDefined("c_support_code_apply",
+                type(self), self.__class__.__name__)
+    def format_c_function_args(self, inp, out):
+        # Generate an string containing the arguments sent to the external C
+        # function. The argstring will be of format :
+        # "input0, input1, input2, &output0, &output1"
+        return ", ".join(list(inp) + ["&%s" % o for o in out])
+    def get_c_macros(self, node, name):
+        define_template = "#define %s %s" + os.linesep
+        undef_template = "#undef %s" + os.linesep
+        define_macros = ""
+        undef_macros = ""
+        # Extract the various properties of the input and output variables
+        variables = node.inputs + node.outputs
+        variable_names = (["INPUT_%i" % i for i in range(len(node.inputs))] +
+                          ["OUTPUT_%i" % i for i in range(len(node.inputs))])
+        variable_dtypes_names = [v.dtype for v in variables]
+        variable_dtypes = [numpy.dtype(d) for d in variable_dtypes_names]
+        variable_typenums = [d.num for d in variable_dtypes]
+        variable_itemsizes = [d.itemsize for d in variable_dtypes]
+        # Generate dtype macros
+        for i in range(len(variables)):
+            macro_name = "DTYPE_" + variable_names[i]
+            macro_value = "npy_" + variable_dtypes_names[i]
+            define_macros += define_template % (macro_name, macro_value)
+            undef_macros += undef_template % macro_name
+        # Generate typenum macros
+        for i in range(len(variables)):
+            macro_name = "TYPENUM_" + variable_names[i]
+            macro_value = variable_typenums[i]
+            define_macros += define_template % (macro_name, macro_value)
+            undef_macros += undef_template % macro_name
+        # Generate itemsize macros
+        for i in range(len(variables)):
+            macro_name = "ITEMSIZE_" + variable_names[i]
+            macro_value = variable_itemsizes[i]
+            define_macros += define_template % (macro_name, macro_value)
+            undef_macros += undef_template % macro_name
+        # Generate a macro to mark code as being apply-specific
+        define_macros += define_template % ("APPLY_SPECIFIC(str)",
+                                            "str##_%s" % name)
+        undef_macros += undef_template % "APPLY_SPECIFIC"
+        return define_macros, undef_macros
+    def c_code(self, node, name, inp, out, sub):
+        func_name = self.func_name
+        func_args = self.format_c_function_args(inp, out)
+        fail = sub['fail']
+        # Generate the code to define/undefine the C macros
+        define_macros, undef_macros = self.get_c_macros(node, name)
+        # Generate the C code
+        c_code = """
+        %(define_macros)s
+        {
+            int result = %(func_name)s(%(func_args)s);
+            if (result != 0)
+            {
+                %(fail)s;
+            }
+        }
+        %(undef_macros)s
+        """ % locals()
+        return c_code