Tutorial about NaNs

5a7d9fa4 · hantek · 2907f95a · 5a7d9fa4 · 5a7d9fa4
--- a/doc/library/compile/nanguardmode.txt
+++ b/doc/library/compile/nanguardmode.txt
+.. _nanguardmode:
+=================
+:mod:`nanguardmode`
+=================
+.. module:: nanguardmode
+   :platform: Unix, Windows
+   :synopsis: defines NanGuardMode
+.. moduleauthor:: LISA
+Guide
+=====
+The NanGuardMode aims to prevent the model from outputing NaNs or Infs. It has
+a number of self-checks, which can help to find out which apply node is
+generating those incorrect output.
--- a/doc/tutorial/nan_tutorial.txt
+++ b/doc/tutorial/nan_tutorial.txt
+.. _nan_tutorial:
+=================
+Dealing with NaNs
+=================
+Having a model yielding NaNs or Infs is quite common if some of the tiny
+components in your model are not set correctly. NaNs are hard to deal with
+because sometimes it is caused by a bug or error in the code, sometimes it's
+because of the numerical stability of your actual computing systerm, and even,
+sometimes it relates to your algorithm. Here we try to outline common, basic
+issues which cause the model to yield NaNs, as well as provide nails and
+hammers to diagnose it. 
+Check Superparameters and Weight Initialization
+-----------------------------------------------
+Most frequently, the cause would be that some of the superparameters, especially
+learning rates, are set incorrectly. A high learning rate can blow up your whole
+model into NaN outputs even within one epoch of training. So the first and
+easiest way is try to lower it. Keep halving your learning rate until you start
+to get resonable output values.
+Other superparameters may also play a role. For example, are your training
+algorithms involve regularization terms? If so, are their corresponding
+penalties set resonably? Search a wider superparameter space with a few (one or
+two) training eopchs each to see if the NaNs could disappear.
+Some models can be very sensitive to the initialization of weight vectors. If
+those weights are not initialized in a proper range, then it is not surprising
+that the model ends up with yielding NaNs. 
+Run in DebugMode
+-----------------
+If adjusting superparameters doesn't work for you, you can still get help from
+Theano's DebugMode. Run your code in DebugMode with flag
+DebugMode.check_py=False. This will give you clue about which op is causing this
+problem, and then you can inspect into that op in more detail. 
+Theano's MonitorMode can also help. It can be used to step through the execution
+of a function. You can inspect the inputs and outputs of each node being
+executed when the function is called. For how to use that, please check
+:ref:`faq_monitormode`. 
+Numerical Stability
+-------------------
+After you have located the op which causes the problem, it may turn out that the
+NaNs yielded by that op are related to numerical issues. For example,
+1 / log(p(x) + 1) may result in NaNs for those nodes who have learned to yield
+a low probability p(x) for some input x.
+Algorithm Related
+-----------------
+The hardest thing is that, after tracing back through all the former processes,
+it turns out that nothing goes wrong. If unfortunately you reaches here, there
+is high chance that something is wrong in your algorithm. Go back to the
+mathematics and find out if everything is derived correctly.