forgot modes.txt

2dc97b65 · Razvan Pascanu · 6586354a · 2dc97b65
--- a/doc/tutorial/modes.txt
+++ b/doc/tutorial/modes.txt
+
+.. _using_modes:
+
+===============================
+Using different compiling modes
+===============================
+
+Mode
+====
+
+Everytime :ref:`theano.function <libdoc_compile_function>` is called
+the symbolic relationships between the input and output Theano *variables* 
+are optimized and compiled. The way this compilation occurs 
+is controlled by the value of the ``mode`` parameter.
+
+Theano defines the following modes by name:
+
+- ``FAST_COMPILE``: Apply just a few optimizations, but use C op implementations where possible.
+- ``FAST_RUN``: Apply all optimizations, and use C op implementations where possible.
+- ``DEBUG_MODE``: Verify the correctness of all optimizations, and compare C and python 
+    implementations. This mode can take much longer than the other modes, 
+    but can identify many kinds of problems.
+
+The default mode is typically ``FAST_RUN``, but it can be controlled via
+the environment variable ``THEANO_DEFAULT_MODE``, which can in turn be
+overridden by setting `theano.compile.mode.default_mode` directly,
+which can in turn be overridden by passing the keyword argument to
+:ref:`theano.function <libdoc_compile_function>`.
+
+
+
+.. _using_debugmode:
+
+Using DebugMode
+===============
+
+
+While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode, 
+it is useful at first to run your code using the DebugMode  
+(available via ``mode='DEBUG_MODE'``). The DebugMode is designed to 
+do several self-checks and assertations that can help to diagnose 
+possible programming errors that can lead to incorect output. Note that 
+``DEBUG_MODE`` is much slower then ``FAST_RUN`` or ``FAST_COMPILE`` so 
+use it only during development, not when you luch 1000 process on a 
+cluster.
+
+
+DebugMode is used as follows:
+
+.. code-block:: python
+
+    x = theano.dvector('x')
+
+    f = theano.function(x, 10*x, mode='DEBUG_MODE')
+
+    f(5) 
+    f(0) 
+    f(7) 
+
+
+If any problem is detected, DebugMode will raise an exception according to
+what went wrong, either at call time (e.g. ``f(5)``) or compile time (e.g
+``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions
+should *not* be ignored; talk to your local Theano guru or email the
+users list if you cannot make the exception go away.
+
+Some kinds of errors can only be detected for certain input value combinations.
+In the example above, there is no way to guarantee that a future call to say,
+``f(-1)`` won't cause a problem.  DebugMode is not a silver bullet.
+
+If you instantiate DebugMode using the constructor ``compile.DebugMode``
+rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
+constructor arguments.  See :ref:`DebugMode <compile_debugMode>` for details.
+
+The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``)
+is quite strict, and can raise several different Exception types. For a
+list of possible exeption go here.
+
+
+.. _using_profilemode:
+
+ProfileMode
+===========
+
+Beside checking for errors, another important task is to profile your 
+code. For this Theano uses a special mode called ProfileMode which has 
+to be passed as an argument to :ref:`theano.function <libdoc_compile_function>`. Using the ProfileMode is a three-step process.
+
+
+Creating a ProfileMode Instance
+-------------------------------
+
+First create a ProfileMode instance. 
+
+>>> from theano import ProfileMode
+>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
+
+The ProfileMode constructor takes as input an optimizer and a
+linker. Which optimizer and linker to use will depend on the
+application. For example, a user wanting to profile the Python
+implementation only, should use the gof.PerformLinker (or "py" for
+short). On the other hand, a user wanting to profile his graph using C
+implementations wherever possible should use the ``gof.OpWiseCLinker``
+(or "c|py"). For testing the speed of your code we would recommend 
+using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker.
+
+Compiling your Graph with ProfileMode
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Once the ProfileMode instance is created, simply compile your graph as you
+would normally, by specifying the mode parameter.
+
+>>> # with functions
+>>> f = theano.function([input1,input2],[output1], mode=profmode)
+>>> # with modules
+>>> m = theano.Module()
+>>> minst = m.make(mode=profmode)
+
+Retrieving Timing Information
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Once your graph is compiled, simply run the program or operation you wish to
+profile, then call ``profmode.print_summary()``. This will provide you with
+the desired timing information, indicating where your graph is spending most
+of its time.
+
+This is best shown through an example.
+Lets use the example of logistic
+regression.  (Code for this example is in the file
+``benchmark/regression/regression.py``.) 
+
+Compiling the module with ProfileMode and calling ``profmode.print_summary()``
+generates the following output:
+
+.. code-block:: python
+    
+    """
+    ProfileMode.print_summary()
+    ---------------------------
+
+    local_time 0.0749197006226 (Time spent running thunks)
+    Apply-wise summary: <fraction of local_time spent at this position> (<Apply position>, <Apply Op name>)
+            0.069   15      _dot22
+            0.064   1       _dot22
+            0.053   0       InplaceDimShuffle{x,0}
+            0.049   2       InplaceDimShuffle{1,0}
+            0.049   10      mul
+            0.049   6       Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
+            0.049   3       InplaceDimShuffle{x}
+            0.049   4       InplaceDimShuffle{x,x}
+            0.048   14      Sum{0}
+            0.047   7       sub
+            0.046   17      mul
+            0.045   9       sqr
+            0.045   8       Elemwise{sub}
+            0.045   16      Sum
+            0.044   18      mul
+       ... (remaining 6 Apply instances account for 0.25 of the runtime)
+    Op-wise summary: <fraction of local_time spent on this kind of Op> <Op name>
+            0.139   * mul
+            0.134   * _dot22
+            0.092   * sub
+            0.085   * Elemwise{Sub{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1779f10>}}[(0, 0)]
+            0.053   * InplaceDimShuffle{x,0}
+            0.049   * InplaceDimShuffle{1,0}
+            0.049   * Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
+            0.049   * InplaceDimShuffle{x}
+            0.049   * InplaceDimShuffle{x,x}
+            0.048   * Sum{0}
+            0.045   * sqr
+            0.045   * Sum
+            0.043   * Sum{1}
+            0.042   * Elemwise{Mul{output_types_preference=<theano.scalar.basic.transfer_type object at 0x17a0f50>}}[(0, 1)]
+            0.041   * Elemwise{Add{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1736a50>}}[(0, 0)]
+            0.039   * Elemwise{Second{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1736d90>}}[(0, 1)]
+       ... (remaining 0 Ops account for 0.00 of the runtime)
+    (*) Op is running a c implementation
+
+    """
+
+
+The summary has two components to it. In the first section called the
+Apply-wise summary, timing information is provided for the worst
+offending Apply nodes. This corresponds to individual Op applications
+within your graph which take the longest to execute (so if you use
+``dot`` twice, you will see two entries there). In the second portion,
+the Op-wise summary, the execution time of all Apply nodes executing
+the same Op are grouped together and the total execution time per Op
+is shown (so if you use ``dot`` twice, you will see only one entry
+there corresponding to the sum of the time spent in each of them).
+
+Note that the ProfileMode also shows which Ops were running a c
+implementation.
+