Added more complex example on a C op

98dea78f · Pierre Luc Carrier · b1fc6111 · 98dea78f
--- a/doc/tutorial/extending_theano_c.txt
+++ b/doc/tutorial/extending_theano_c.txt
@@ -310,9 +310,10 @@ class Op that are related to the C implementation. Of particular interest are:
        :meth:`Op.c_no_compile_args` to specify requirements regarding how
        the op's C code should be compiled.

-This section describes the methods :meth:`Op.c_code`, :meth:`Op.c_support_code` and
-:meth:`Op.c_code_cache_version` because they are the ones that are most commonly
-used.
+This section describes the methods :meth:`Op.c_code`,
+:meth:`Op.c_support_code`, :meth:`Op.c_support_code_apply` and
+:meth:`Op.c_code_cache_version` because they are the ones that are most
+commonly used.

 .. method:: c_code(node, name, input_names, output_names, sub)

@@ -333,8 +334,17 @@ used.

    Finally, ``sub`` is a dictionary of extras parameters to the c_code
    method. Among other things, it contains ``sub['fail']`` which is a string
-    of C code that you should execute (after ensuring that a Python exception
-    is set) if your C code needs to raise an exception.
+    of C code that you should include in your C code (after ensuring that a
+    Python exception is set) if it needs to raise an exception. Ex:
+
+    .. code-block:: python
+
+        c_code = """
+            PyErr_Format(PyExc_ValueError, "X does not have the right value");
+            %(fail)s;
+        """ % {'fail' : sub['fail']}
+
+    to raise a ValueError Python exception with the specified message.

    :note:
        Your C code should not return the output of the computation but
@@ -343,9 +353,19 @@ used.

 .. method:: c_support_code()

-    Returns a string containing the support C code for this op. This code
+    Returns a string containing some support C code for this op. This code
+    will be included at the global scope level and can be used to define
+    functions and structs that will be used by every apply of this op.
+
+.. method:: c_support_code_apply()
+
+    Returns a string containing some support C code for this op. This code
    will be included at the global scope level and can be used to define
-    functions and structs that will be used by the op's main C code.
+    functions and structs that will be used by this op. The difference between
+    this method and ``c_support_code()`` is that the C code specified in
+    ``c_support_code_apply()`` should be specific to each apply of the Op,
+    while ``c_support_code()`` is for support code that is not specific to
+    each apply.

 .. method:: c_code_cache_version()

@@ -367,11 +387,13 @@ used.


 Simple C Op example
-=====================
+===================

-In this section, we put together every concept that was covered in this
+In this section, we put together the concepts that were covered in this
 tutorial to generate an op which multiplies every element in a vector
-by a scalar and returns the resulting vector.
+by a scalar and returns the resulting vector. This is intended to be a simple
+example so the methods ``c_support_code()`` and ``c_support_code_apply()`` are
+not used because they are not required.

 In the C code below notice how the reference count on the output variable is
 managed. Also take note of how the new variables required for the op's
@@ -393,10 +415,6 @@ need to validate that the output storage has been allocated and has the same
 shape as our vector input. If it is not the case, we allocate a new output
 storage with the right shape and number of dimensions.

-:note:
-    Given the simple nature of this op, there was no need to use the
-    ``c_support_code()`` function.
-
 .. code-block:: python

    import numpy
@@ -429,6 +447,9 @@ storage with the right shape and number of dimensions.
            x, y = inp
            z, = out

+            # Extract the dtypes of the inputs and outputs storage to
+            # be able to declare pointers for those dtypes in the C
+            # code.
            dtype_x = node.inputs[0].dtype
            dtype_y = node.inputs[1].dtype
            dtype_z = node.outputs[0].dtype
@@ -481,3 +502,156 @@ storage with the right shape and number of dimensions.
            """

            return c_code % locals()
+
+
+More complex C Op example
+=========================
+
+This section introduces a new example, slightly more complex than the previous
+one, with an op to perform an element-wise multiplication between the elements
+of two vectors. This new example differs from the previous one in its use
+of the methods ``c_support_code()`` and ``c_support_code_apply()`` (it does
+not `need` to use them but it does so to explain their use) and its capacity
+to support inputs of different dtypes.
+
+Recall the method ``c_support_code()`` is meant to produce code that will
+be used for every apply of the op. This means that the C code in this
+method must be valid in every setting your op supports. If the op is meant
+to supports inputs of various dtypes, the C code in this method should be
+generic enough to work with every supported dtype. If the op operates on
+inputs that can be vectors or matrices, the C code in this method should
+be able to accomodate both kinds of inputs.
+
+In our example, the method ``c_support_code()`` is used to declare a C
+function to validate that two vectors have the same shape. Because our
+op only supports vectors as inputs, this function is allowed to rely
+on its inputs being vectors. However, our op should support multiple
+dtypes so this function cannot rely on a specific dtype in its inputs.
+
+The method ``c_support_code_apply()``, on the other hand, is allowed
+to depend on the inputs to the op because it is apply-specific. Therefore, we
+use it to define a function to perform the multiplication between two vectors.
+Variables or functions defined in the method ``c_support_code_apply()`` will
+be included at the global scale for every apply of the Op. Because of this,
+the names of those variables and functions should include the name of the op,
+like in the example. Otherwise, using the op twice in the same graph will give
+rise to conflicts as some elements will be declared more than once.
+
+The last interesting difference occurs in the ``c_code()`` method. Because the
+dtype of the output is variable and not guaranteed to be the same as any of
+the inputs (because of the upcast in the method ``make_node()``), the typenum
+of the output has to be obtained in the Python code and then included in the
+C code.
+
+.. code-block:: python
+
+    class VectorTimesVector(gof.Op):
+        __props__ = ()
+
+        def __init__(self, **kwargs):
+            gof.Op.__init__(self, **kwargs)
+
+        def make_node(self, x, y):
+            # Validate the inputs' type
+            if x.type.ndim != 1:
+                raise TypeError('x must be a 1-d vector')
+            if y.type.ndim != 1:
+                raise TypeError('y must be a 1-d vector')
+
+            # Create an output variable of the same type as x
+            print x.dtype
+            print y.dtype
+            print theano.scalar.upcast(x.dtype, y.dtype)
+            print "----"
+            output_var = theano.tensor.TensorType(
+                            dtype=theano.scalar.upcast(x.dtype, y.dtype),
+                            broadcastable=[False])()
+
+            return gof.Apply(self, [x, y], [output_var])
+
+        def c_code_cache_version(self):
+            return (1, 0, 1)
+
+        def c_support_code(self):
+            c_support_code = """
+            bool vector_same_shape(PyArrayObject* arr1,
+                PyArrayObject* arr2)
+            {
+                return (PyArray_DIMS(arr1)[0] == PyArray_DIMS(arr2)[0]);
+            }
+            """
+
+            return c_support_code
+
+        def c_support_code_apply(self, node, name):
+            dtype_x = node.inputs[0].dtype
+            dtype_y = node.inputs[1].dtype
+            dtype_z = node.outputs[0].dtype
+
+            c_support_code = """
+            void vector_elemwise_mult_%(name)s(npy_%(dtype_x)s* x_ptr,
+                int x_str, npy_%(dtype_y)s* y_ptr, int y_str,
+                npy_%(dtype_z)s* z_ptr, int z_str, int nbElements)
+            {
+                for (int i=0; i < nbElements; i++){
+                    z_ptr[i * z_str] = x_ptr[i * x_str] * y_ptr[i * y_str];
+                }
+            }
+            """
+
+            return c_support_code % locals()
+
+        def c_code(self, node, name, inp, out, sub):
+            x, y = inp
+            z, = out
+
+            dtype_x = node.inputs[0].dtype
+            dtype_y = node.inputs[1].dtype
+            dtype_z = node.outputs[0].dtype
+
+            itemsize_x = numpy.dtype(dtype_x).itemsize
+            itemsize_y = numpy.dtype(dtype_y).itemsize
+            itemsize_z = numpy.dtype(dtype_z).itemsize
+
+            typenum_z = numpy.dtype(dtype_z).num
+
+            fail = sub['fail']
+
+            c_code = """
+            // Validate that the inputs have the same shape
+            if ( !vector_same_shape(%(x)s, %(y)s))
+            {
+                PyErr_Format(PyExc_ValueError, "x.shape[0] != y.shape[0]");
+                %(fail)s;
+            }
+
+            // Validate that the output storage exists and has the same
+            // dimension as x.
+            if (NULL == %(z)s || !(vector_same_shape(%(x)s, %(z)s)))
+            {
+                /* Reference received to invalid output variable.
+                Decrease received reference's ref count and allocate new
+                output variable */
+                Py_XDECREF(%(z)s);
+                %(z)s = (PyArrayObject*)PyArray_EMPTY(1,
+                                                    PyArray_DIMS(%(x)s),
+                                                    %(typenum_z)s,
+                                                    0);
+
+                if (!%(z)s) {
+                    %(fail)s;
+                }
+            }
+
+            // Perform the vector elemwise multiplication
+            vector_elemwise_mult_%(name)s(
+                                    (npy_%(dtype_x)s*)PyArray_DATA(%(x)s),
+                                    PyArray_STRIDES(%(x)s)[0] / %(itemsize_x)s,
+                                    (npy_%(dtype_y)s*)PyArray_DATA(%(y)s),
+                                    PyArray_STRIDES(%(y)s)[0] / %(itemsize_y)s,
+                                    (npy_%(dtype_z)s*)PyArray_DATA(%(z)s),
+                                    PyArray_STRIDES(%(z)s)[0] / %(itemsize_z)s,
+                                    PyArray_DIMS(%(x)s)[0]);
+            """
+
+            return c_code % locals()