Final conv arithmetic tutorial review

f27c11e2 · Francesco Visin · d75cf06c · f27c11e2
--- a/doc/tutorial/conv_arithmetic.txt
+++ b/doc/tutorial/conv_arithmetic.txt
@@ -80,9 +80,8 @@ Here is an example of a discrete convolution:
 .. figure:: conv_arithmetic_figures/numerical_no_padding_no_strides.gif
    :figclass: align-center
-The light blue grid is called the *input feature map*. (An example of this is
+The light blue grid is called the *input feature map*. A *kernel* (shaded area)
-what was referred to earlier as *channels* for images and sound clips.) A
+of value
-*kernel* (shaded area) of value
 .. math::
@@ -94,11 +93,14 @@ what was referred to earlier as *channels* for images and sound clips.) A
 slides across the input feature map. At each location, the product between each
 element of the kernel and the input element it overlaps is computed and the
-results are summed up to obtain the output in the current location. The
+results are summed up to obtain the output in the current location. The final 
-procedure can be repeated using different kernels to form as many output feature
+output of this procedure is a matrix called *output feature map* (in green). 
-maps as desired. The final outputs of this procedure are called *output feature
-maps*. To keep the drawing simple, a single input feature map is represented,
+This procedure can be repeated using different kernels to form as many output
-but it is not uncommon to have multiple feature maps stacked one onto another.
+feature maps (a.k.a. *output channels*) as desired. Note also that to keep the
+drawing simple a single input feature map is being represented, but it is not
+uncommon to have multiple feature maps stacked one onto another (an example of
+this is what was referred to earlier as *channels* for images and sound clips).
 .. note::
@@ -109,15 +111,18 @@ but it is not uncommon to have multiple feature maps stacked one onto another.
    used in this tutorial.
 If there are multiple input and output feature maps, the collection of kernels
-form a 4D array (``num_kernels, num_input_channels, filter_rows,
+form a 4D array (``output_channels, input_channels, filter_rows,
 filter_columns``). For each output channel, each input channel is convolved with
-a distinct kernel and the resulting set of feature maps is summed elementwise
+a distinct part of the kernel and the resulting set of feature maps is summed
-to produce the corresponding output feature map.
+elementwise to produce the corresponding output feature map. The result of this 
+procedure is a set of output feature maps, one for each output channel, that is
+the output of the convolution.
-The convolution depicted above is an instance of a 2-D convolution, but it can
+The convolution depicted above is an instance of a 2-D convolution, but can be
-be generalized to N-D convolutions. For instance, in a 3-D convolution, the
+generalized to N-D convolutions. For instance, in a 3-D convolution, the kernel
-kernel would be a *cuboid* and would slide across the height, width and depth
+would be a *cuboid* and would slide across the height, width and depth of the
-of the input feature map.
+input feature map.
 The collection of kernels defining a discrete convolution has a shape
 corresponding to some permutation of :math:`(n, m, k_1, \ldots, k_N)`, where
@@ -256,7 +261,7 @@ relationship:
            input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2),
            border_mode=(p1, p2), subsample=(1, 1))
        # output.shape[2] == (i1 - k1) + 2 * p1 + 1
-        # output.shape[3] == (i2 - k2) + 2 * p1 + 1
+        # output.shape[3] == (i2 - k2) + 2 * p2 + 1
 Here is an example for :math:`i = 5`, :math:`k = 4` and :math:`p = 2`:
@@ -646,6 +651,8 @@ It is indeed the case, as shown in here for :math:`i = 5`, :math:`k = 4` and
 Formally, the following relationship applies for zero padded convolutions:
+.. _Relationship8:
 .. admonition:: Relationship 8
    A convolution described by :math:`s = 1`, :math:`k` and :math:`p` has an
@@ -773,6 +780,8 @@ For the moment, it will be assumed that the convolution is non-padded (:math:`p
 = 0`) and that its input size :math:`i` is such that :math:`i - k` is a multiple
 of :math:`s`. In that case, the following relationship holds:
+.. _Relationship11:
 .. admonition:: Relationship 11
    A convolution described by :math:`p = 0`, :math:`k` and :math:`s` and whose
@@ -801,7 +810,8 @@ Zero padding, non-unit strides, transposed
 When the convolution's input size :math:`i` is such that :math:`i + 2p - k` is a
 multiple of :math:`s`, the analysis can extended to the zero padded case by
-combining Relationship 8 and Relationship 11:
+combining :ref:`Relationship 8 <Relationship8>` and
+:ref:`Relationship 11 <Relationship11>`:
 .. admonition:: Relationship 12
@@ -859,7 +869,7 @@ between the :math:`s` different cases that all lead to the same :math:`i'`:
        o_prime2 = s2 * (output.shape[3] - 1) + a2 + k2 - 2 * p2
        input = theano.tensor.nnet.conv2d_grad_wrt_inputs(
            output, filters, input_shape=(b, c1, o_prime1, o_prime2),
-            filter_shape=(c1, c2, k, k), border_mode=(p1, p2),
+            filter_shape=(c1, c2, k1, k2), border_mode=(p1, p2),
            subsample=(s1, s2))
 Here is an example for :math:`i = 6`, :math:`k = 3`, :math:`s = 2` and :math:`p