add tutorial doc for separable convolutions

993bd3cd · affanv14 · 99cb1b4e · 993bd3cd · 993bd3cd
--- a/doc/tutorial/conv_arithmetic.txt
+++ b/doc/tutorial/conv_arithmetic.txt
@@ -957,6 +957,7 @@ the final output. A few examples of works using grouped convolutions are `Krizhe

 A special case of grouped  convolutions is when :math:`n` equals the number of input
 channels. This is called depth-wise convolutions or channel-wise convolutions.
+depth-wise convolutions also forms a part of separable convolutions.

 An example to use Grouped convolutions would be:

@@ -977,6 +978,49 @@ An example to use Grouped convolutions would be:
       "Aggregated Residual Transformations for Deep Neural Networks".
       arxiv preprint arXiv:1611.05431 (2016).

+Separable Convolutions
+----------------------
+
+Separable convolutions consists of two consecutive convolution operations.
+First is depth-wise convolutions which performs convolutions separately for
+each channel of the input. The output of this operation is the given as input
+to point-wise convolutions which mixes the channels to give the final output.
+
+As we can see from this diagram, modified from `Vanhoucke(2014)`_ [#]_, depth-wise
+convolutions is performed with :math:`c2` single channel depth-wise filters
+to give a total of :math:`c2` output channels in the intermediate output where
+each channel in the input separately performs convolutions with separate kernels
+to give :math:`c2 / n` channels to the intermediate output, where :math:`n` is
+the number of input channels. The intermediate output then performs point-wise
+convolutions with :math:`c3` 1x1 filters which mixes the channels of the intermediate
+output to give the final output.
+
+.. image:: conv_arithmetic_figures/sep2D.jpg
+    :align: center
+
+Separable convolutions is used as follows:
+
+    .. code-block:: python
+
+        output = theano.tensor.nnet.separable_conv2d(
+            input, depthwise_filters, pointwise_filters, num_channels = c1,
+            input_shape=(b, c1, i1, i2), depthwise_filter_shape=(c2, 1, k1, k2),
+            pointwise_filter_shape=(c3, c2, 1, 1), border_mode=(p1, p2),
+            subsample=(s1, s2), filter_dilation=(d1, d2))
+        # output.shape[0] == b
+        # output.shape[1] == c3
+        # output.shape[2] == (i1 + 2 * p1 - k1 - (k1 - 1) * (d1 - 1)) // s1 + 1
+        # output.shape[3] == (i2 + 2 * p2 - k2 - (k2 - 1) * (d2 - 1)) // s2 + 1
+
+.. _Vanhoucke(2014):
+   http://scholar.google.co.in/scholar_url?url=http://vincent.vanhoucke.com/
+   publications/vanhoucke-iclr14.pdf&hl=en&sa=X&scisig=AAGBfm0x0bgnudAqSVgZ
+   ALfu8uPjYOIWwQ&nossl=1&oi=scholarr&ved=0ahUKEwjLreLjr_DVAhULwI8KHWmHAM8QgAMIJigAMAA
+
+.. [#] Vincent Vanhoucke. "Learning Visual Representations at Scale",
+   International Conference on Learning Representations(2014).
+
+
 Quick reference
 ===============


--- a/doc/tutorial/conv_arithmetic_figures/sep2D.jpg
+++ b/doc/tutorial/conv_arithmetic_figures/sep2D.jpg