done with half of the part of advanced tutorial's first example that deals with C code

04e168d0 · Olivier Breuleux · 5786d6bf · 04e168d0 · 04e168d0 · 04e168d0
--- a/doc/advanced/compilation.txt
+++ b/doc/advanced/compilation.txt
@@ -4,3 +4,15 @@
 =======================
 Compilation and Linking
 =======================
+.. index::
+   single: Linker
+.. _linker:
+Linker
+======
+WRITEME
--- a/doc/tutorials/advanced/ex1/ctype.txt
+++ b/doc/tutorials/advanced/ex1/ctype.txt
@@ -3,6 +3,441 @@
 Implementing double in C
 ========================
+The previous two sections described how to define a double :ref:`type`
+and arithmetic operations on that Type, but all of them were
+implemented in pure Python. In this section we will see how to define
+the double type in such a way that it can be used by operations
+implemented in C (which we will define in the section after that).
+How does it work?
+=================
+In order to be C-compatible, a Type must provide a C interface to the
+Python data that satisfy the constraints it puts forward. In other
+words, it must define C code that can convert a Python reference into
+some type suitable for manipulation in C and it must define C code
+that can convert some C structure in which the C implementation of an
+operation stores its results into a reference to an object that can be
+used from Python and is a valid value for the Type.
+For example, in the current example, we have a Type which represents a
+Python float. First, we will choose a corresponding C type. The
+natural choice would be the primitive ``double`` type. Then, we need
+to write code that will take a ``PyObject*``, check that it is a
+Python ``float`` and extract its value as a ``double``. Finally, we
+need to write code that will take a C ``double`` and will build a
+``PyObject*`` of Python type ``float`` that we can work with from
+Python. We will be using CPython and thus special care must be given
+to making sure reference counts are updated properly!
+The C code we will write makes use of CPython's C API which you can
+find here_.
+.. _here: http://docs.python.org/c-api/index.html
+What needs to be defined
+========================
+In order to be C-compatible, a Type must define several additional
+methods, which all start with the ``c_`` prefix. The complete list can
+be found in the documentation for :ref:`type`. Here, we'll focus on
+the most important ones:
+- **c_declare(name, sub)**
+  - This must return C code which declares variables. These variables
+    will be available to operations defined in C. You may also write
+    typedefs.
+- **c_init(name, sub)**
+  - This must return C code which initializes the variables declared
+    in c_declare. Either this or c_extract will be called.
+- **c_extract(name, sub)**
+  - This must return C code which takes a reference to a Python object
+    and initializes the variables declared in c_declare to match the
+    Python object's data. Either this or c_init will be called.
+- **c_sync(name, sub)**
+  - When the computations are done, transfer the results from the C
+    structure we put them in to the destination Python object. This
+    will only be called for the inputs.
+- **c_cleanup(name, sub)**
+  - When we are done using the data, clean up whatever we allocated
+    and decrease the appropriate reference counts.
+- **c_compile_args(), c_headers(), c_libraries(), c_support_code()**
+  - Allows you to specify headers, libraries, special g++ arguments or
+    helper functions/structs that the type needs. See :ref:`type`.
+Each of these functions take two arguments, ``name`` and ``sub`` which
+must be used to parameterize the C code they return. ``name`` is a
+string which is chosen by the compiler to represent a :ref:`result` of
+the Type in such a way that there are no name conflicts between
+different pieces of data. Therefore, all variables declared in
+``c_declare`` should have a name which includes ``name``. Furthermore,
+the name of the variable containing a pointer to the Python object
+associated to the Result is ``py_<name>``.
+``sub``, on the other hand, is a dictionary containing bits of C code
+suitable for use in certain situations. For instance, ``sub['fail']``
+contains code that should be inserted wherever an error is identified.
+The example code below should help you understand how everything plays
+out:
+.. warning::
+   If some error condition occurs and you want to fail and/or raise an
+   Exception, you must use the ``fail`` code contained in
+   ``sub['fail']`` (there is an example in the definition of c_extract
+   below). You must *NOT* use the ``return`` statement anywhere, ever,
+   nor ``break`` outside of your own loops or ``goto`` to strange
+   places or anything like that. Failure to comply with this
+   restriction could lead to erratic behavior, segfaults and/or memory
+   leaks because Theano defines its own cleanup system and assumes
+   that you are not meddling with it. Furthermore, advanced operations
+   or types might do code transformations on your code such as
+   inserting it in a loop - in that case they can call your code
+   generating methods with custom failure code that takes into account
+   what they are doing!
+Defining the methods
+====================
+**c_declare**
+.. code-block:: python
+    def c_declare(name, sub):
+        return """
+        double %(name)s;
+        """ % dict(name = name)
+    double.c_declare = c_declare
+Very straightforward. All we need to do is write C code to declare a
+double. That double will be named whatever is passed to our function
+in the "name" argument. That will usually be some mangled name like
+"V0", "V2" or "V92" depending on how many nodes there are in the
+computation graph and what rank the current node has. This function
+will be called for all Results whose type is ``double``.
+You can declare as many variables as you want there and you can also
+do typedefs. Make sure that the name of each variable contains the
+``name`` argument in order to avoid name collisions (collisions *will*
+happen if you don't parameterize the variable names as indicated
+here). Also note that you cannot declare a variable called
+``py_<name>`` or ``storage_<name>`` because Theano already defines
+them.
+What you declare there is basically the C interface you are giving to
+your Type. If you wish people to develop operations that make use of
+it, it's best to publish it somewhere.
+**c_init**
+.. code-block:: python
+    def c_init(self, name, sub):
+        return """
+        %(name)s = 0.0;
+        """ % dict(name = name)
+Still straightforward. This function simply has to initialize the
+double we declared previously to a suitable value. This is useful if
+we want to avoid dealing with garbage values, especially if our data
+type is a pointer. This is not going to be called for all Results with
+the ``double`` type. Indeed, if a Result is an input which we pass
+from Python we will want to extract that input from a Python object,
+therefore it is the c_extract method that will be called instead of
+c_init. You can therefore not assume, when writing c_extract, that the
+initialization has been done (in fact you can assume that it *hasn't*
+been done).
+``c_init`` will typically be called on output Results, but in general
+you should only assume that either c_init or c_extract has been
+called, without knowing for sure which of the two.
+**c_extract**
+.. code-block:: python
+    def c_extract(self, name, sub):
+        return """
+        if (!PyFloat_Check(py_%(name)s)) {
+            PyErr_SetString(PyExc_TypeError, "expected a float");
+            %(fail)s
+        }
+        %(name)s = PyFloat_AsDouble(py_%(name)s);
+        """ % dict(name = name, fail = sub['fail'])
+This method is slightly more sophisticated. What happens here is that
+we have a reference to a Python object which Theano has placed in
+``py_%(name)s`` where ``%(name)s`` must be substituted for the name
+given in the inputs. This special variable is declared by Theano as
+``PyObject* py_%(name)s`` where ``PyObject*`` is a pointer to a Python
+object as defined by CPython's C API. This is the reference that
+corresponds, on the Python side of things, to a Result with the
+``double`` type. It is what the end user will give and what he or she
+expects to get back.
+In this example, the user will give a Python ``float``. The first
+thing we should do is verify that what we got is indeed a Python
+``float``. The ``PyFloat_Check`` function is provided by CPython's C
+API and does this for us. If the check fails, we set an exception and
+then we insert code for failure. The code for failure is in
+``sub["fail"]`` and it basically does a ``goto`` to cleanup code.
+If the check passes then we convert the Python float into a double
+using the PyFloat_AsDouble function (yet again provided by CPython's C
+API) and we put it in our double variable that we declared previously.
+**c_sync**
+.. code-block:: python
+    def c_sync(self, name, sub):
+        return """
+        Py_XDECREF(py_%(name)s);
+        py_%(name)s = PyFloat_FromDouble(%(name)s);
+        if (!py_%(name)s) {
+            printf("PyFloat_FromDouble failed on: %%f\\n", %(name)s);
+            Py_XINCREF(Py_None);
+            py_%(name)s = Py_None;
+        }
+        """ % dict(name = name)
+This function is probably the trickiest. What happens here is that we
+have computed some operation on doubles and we have put the result
+into the double variable ``%(name)s``. Now, we need to put this data
+into a Python object that we can manipulate on the Python side of
+things. This Python object must be put into the ``py_%(name)s``
+variable which Theano recognizes (this is the same pointer we get in
+c_extract).
+Now, that pointer is already a pointer to a valid Python object
+(unless you or a careless implementer did terribly wrong things with
+it). If we want to point to another object, we need to tell Python
+that we don't need the old one anymore, meaning that we need to
+*decrease the previous object's reference count*. The first line,
+``Py_XDECREF(py_%(name)s)`` does exactly this. If it is forgotten,
+Python will not be able to reclaim the data even if it is not used
+anymore and there will be memory leaks! This is especially important
+if the data you work on is large.
+Now that we have decreased the reference count, we call
+``PyFloat_FromDouble`` on our double variable in order to convert it
+to a Python ``float``. This returns a new reference which we assign to
+``py_%(name)s``. From there Theano will do the rest and the end user
+will happily see a Python ``float`` come out of his computations.
+The rest of the code is not absolutely necessary and it is basically
+"good practice". PyFloat_FromDouble can return NULL on failure. NULL
+is a pretty bad reference to have and neither Python nor Theano like
+it. If this happens we change the NULL pointer (which will cause us
+problems) to a pointer to None (which is *not* a NULL pointer). Since
+None is an object like the others we need to increase its reference
+count before we can set a new pointer to it. This situation is
+unlikely to ever happen, but if it ever does, better safe than sorry.
+.. warning::
+   I said this already but it really needs to be emphasized that if
+   you are going to change the ``py_%(name)s`` pointer to point to a
+   new reference, you *must* decrease the reference count of whatever
+   it was pointing to before you do the change. This is only valid if
+   you change the pointer, if you are not going to change the pointer,
+   do *NOT* decrease its reference count!
+**c_cleanup**
+.. code-block:: python
+    def c_cleanup(self, name, sub):
+        return ""
+We actually have nothing to do here. We declared a double on the stack
+so the C language will reclaim it for us when its scope ends. We
+didn't malloc() anything so there's nothing to free(). Furthermore,
+the ``py_%(name)s`` pointer hasn't changed so we don't need to do
+anything with it. Therefore, we have nothing to cleanup. Sweet!
+There are however two important things to keep in mind:
+First, note that ``c_sync`` and ``c_cleanup`` might be called in
+sequence, so they need to play nice together. In particular, let's say
+that you allocate memory in ``c_init`` or ``c_extract`` for some
+reason. You might want to either embed what you allocated to some
+Python object in ``c_sync`` or to free it in ``c_cleanup``. If you do
+the former, you don't want to free the allocated storage so you should
+set the pointer to it to NULL to avoid that ``c_cleanup`` mistakenly
+frees it. Another option is to declare a variable in c_declare that
+you set to true in c_sync to notify c_cleanup that c_sync was called.
+Second, whenever you use %(fail)s in c_extract or in the code of an
+:ref:`operation <op>` you can count on c_cleanup being called right
+after that. Therefore, it's important to make sure that c_cleanup
+doesn't depend on any code prior to a reference to
+%(fail)s. Furthermore, because of the way Theano blocks code together,
+only the variables declared in c_declare will be visible in c_cleanup!
+What the generated C will look like
+===================================
+``c_init`` and ``c_extract`` will only be called if there is a Python
+object on which we want to apply computations using C
+code. Conversely, ``c_sync`` will only be called if we want to
+communicate the values we have computed to Python and ``c_cleanup``
+will only be called when we don't need to process the data with C
+anymore. In other words, the use of these functions for a given Result
+depends on the the relationship between Python and C with respect to
+that Result. For instance, imagine you define the following function
+and call it:
+.. code-block:: python
+   x, y, z = double('x'), double('y'), double('z')
+   a = add(x, y)
+   b = mul(a, z)
+   f = function([x, y, z], b)
+   f(1.0, 2.0, 3.0)
+Using the CLinker, the code that will be produced will look roughly
+like this:
+.. code-block:: c
+   // BEGIN defined by Theano
+   PyObject* py_x = ...;
+   PyObject* py_y = ...;
+   PyObject* py_z = ...;
+   PyObject* py_a = ...; // note: this reference won't actually be used for anything
+   PyObject* py_b = ...;
+   // END defined by Theano
+   {
+     double x; //c_declare for x
+     x = ...; //c_extract for x
+     {
+       double y; //c_declare for y
+       y = ...; //c_extract for y
+       {
+         double z; //c_declare for z
+         z = ...; //c_extract for z
+         {
+           double a; //c_declare for a
+           a = 0; //c_init for a
+           {
+             double b; //c_declare for b
+             b = 0; //c_init for b
+             {
+               a = x + y; //c_code for add
+               {
+                 b = a * z; //c_code for mul
+               labelmul:
+                 //c_cleanup for mul
+               }
+             labeladd:
+               //c_cleanup for add
+             }
+           labelb:
+             py_b = ...; //c_sync for b
+             //c_cleanup for b
+           }
+         labela:
+           //c_cleanup for a
+         }
+       labelz:
+         //c_cleanup for z
+       }
+     labely:
+       //c_cleanup for y
+     }
+   labelx:
+     //c_cleanup for x
+   }
+It's not very good looking, but it gives you an idea of how things
+work (note that the variable names won't be x, y, z, etc. - they will
+get a unique mangled name). The ``fail`` code runs a goto to the
+appropriate label in order to run all cleanup that needs to be
+done. Note which variables get extracted (the three inputs x, y and
+z), which ones only get initialized (the temporary variable a and the
+output b) and which one is synced (the final output b).
+The C code above is a single C block for the whole graph. Depending on
+which :ref:`linker` is used to process the computation graph, it is
+possible that one such block is generated for each operation and that
+we transit through Python after each operation. In that situation,
+``a`` would be synced by the addition block and extracted by the
+multiplication block.
+Final version
+=============
+.. code-block:: python
+   class Double(gof.Type):
+       def filter(self, x, strict=False):
+           if strict and not isinstance(x, float):
+               raise TypeError('Expected a float!')
+           return float(x)
+       def values_eq_approx(self, x, y, tolerance=1e-4):
+           return abs(x - y) / (x + y) < tolerance
+       def c_declare(self, name, sub):
+           return """
+           double %(name)s;
+           """ % dict(name = name)
+       def c_init(self, name, sub):
+           return """
+           %(name)s = 0.0;
+           """ % dict(name = name)
+       def c_extract(self, name, sub):
+           return """
+           if (!PyFloat_Check(py_%(name)s)) {
+               PyErr_SetString(PyExc_TypeError, "expected a float");
+               %(fail)s
+           }
+           %(name)s = PyFloat_AsDouble(py_%(name)s);
+           """ % dict(sub, name = name)
+       def c_sync(self, name, sub):
+           return """
+           Py_XDECREF(py_%(name)s);
+           py_%(name)s = PyFloat_FromDouble(%(name)s);
+           if (!py_%(name)s) {
+               printf("PyFloat_FromDouble failed on: %%f\\n", %(name)s);
+               Py_XINCREF(Py_None);
+               py_%(name)s = Py_None;
+           }
+           """ % dict(name = name)
+       def c_cleanup(self, name, sub):
+           return ""
+   double = Double()
 **Next:** `Implementing the arithmetic Ops in C`_

--- a/doc/tutorials/advanced/ex1/op.txt
+++ b/doc/tutorials/advanced/ex1/op.txt
@@ -245,11 +245,9 @@ operators (well, pending revision of this tutorial, I guess):
   class BinaryDoubleOp(gof.Op):
-       def __init__(self, name, fn, gradfnx, gradfny):
+       def __init__(self, name, fn):
           self.name = name
           self.fn = fn
-           self.gradfnx = gradfnx
-           self.gradfny = gradfny
       def make_node(self, x, y):
           if isinstance(x, (int, float)):

--- a/theano/tensor/basic.py
+++ b/theano/tensor/basic.py
@@ -319,6 +319,7 @@ class Tensor(Type):
        return """
        Py_XDECREF(py_%(name)s);
        if (!%(name)s) {
+            Py_XINCREF(Py_None);
            py_%(name)s = Py_None;
        }
        else if ((void*)py_%(name)s != (void*)%(name)s) {