many doc syntax fix to remove warning during doc generation.

875c5037 · Frederic · 31b55f92 · 875c5037 · 875c5037 · 875c5037
--- a/doc/internal/dev_start_guide.txt
+++ b/doc/internal/dev_start_guide.txt
@@ -56,6 +56,23 @@ then go to your fork's github page on the github website, select your feature
 branch and hit the "Pull Request" button in the top right corner.
 If you don't get any feedback, bug us on the theano-dev mailing list.
+History not clean
+-----------------
+In some case you could have stuff commited in your feature branch that
+are not needed in the final pull request. There is a `page
+<http://sandofsky.com/blog/git-workflow.html>`_ that talk about
+this. In summary:
+* Commits to the trunk should be a lot cleaner than commits to your
+  feature branch; not just for ease of reviewing like I said but also
+  because intermediate commits can break blame (the bisecting tool)
+* `git merge --squash` will put all of the commits from your feature branch into one commit.
+* There are other tools that are useful if your branch is too big for one squash.
 Details about ``PYTHONPATH``
 ----------------------------

--- a/doc/sandbox/elemwise_compiler.txt
+++ b/doc/sandbox/elemwise_compiler.txt
@@ -19,7 +19,9 @@ The broadcast mode serves to calculate the rank of the corresponding output and
  * {{{(accumulate, Accumulator)}}}
    * output.rank = min(input.rank)
    * for the inputs of greater rank, we use Accumulator (sum, product, etc.) to accumulate over the first dimensions
      * e.g. {{{if Accumulator == sum, order == c, x.rank == 2, y.rank == 1 and z = f(x, y) then z[i] = f(sum_j(x[i, j]), y[i])}}}
    * if {{{order == f}}} ([3, 5], [5]) => [5] or ([7, 8, 9], [8, 9]) => [8, 9]
    * if {{{order == c}}} ([3, 5], [3]) => [3] or ([7, 8, 9], [7, 8]) => [7, 8]
@@ -27,6 +29,7 @@ The broadcast mode serves to calculate the rank of the corresponding output and
 This does not cover all cases of broadcasting, but I believe they cover enough. Other cases of broadcasting can be emulated with proper transposition and/or slicing.
 * Could you give some examples of what kinds of broadcasting are and are not covered by your proposed implementation?
  * For rank <= 2, I think only operations of the form {{{add(ones(3,1), ones(1,3)))}}} are missing. I actually didn't think of that one before now.
  * In general, it only handles f(shape(head, ...), shape(head, ...), ...) and f(shape(..., tail), shape(..., tail), ...)
  * Maybe I could add a general case later... the thing is that I think the ones I am considering here are easier to streamline.
@@ -71,8 +74,11 @@ An Optimizer should look at the operations in the graph and figure out whether t
 The input ranks become the output ranks and gradients of the same rank as the outputs are added to the input list. If an output was given mode {{{broadcast}}}, then all inputs used to calculate it had to be broadcasted to that shape, so we must sum over the broadcasted dimensions on the gradient. The mode that we give to those inputs is therefore {{{(accumulate, sum)}}}. Inversely, if an output was given mode {{{(accumulate, sum)}}}, then all inputs used to calculate it had to be summed over those dimensions. Therefore, we give them mode {{{broadcast}}} in grad. Other accumulators than sum might prove more difficult. For example, the ith gradient for product is grad*product/x_i. Not sure how to handle that automatically.
 * I don't exactly follow this paragraph, but I think I catch the general idea and it seems to me like it will work very well.
  * In a nutshell for {{{broadcast}}} I calculate the gradient as normal assuming the shape is broadcasted and then I sum over what I had to broadcast.
 * Could you explain why the accumulator gradient (e.g. product) can be trickier?
  * I thought about it and I figured that the general case is {{{g_accum[N-i+1], g_m[i] = grad_fn(accum[i-1], m[i], g_accum[N-i])}}} where {{{g_accum}}} is the accumulated gradient wrt the accumulator {{{accum}}}. It can be short-circuited in sum and product's case: for sum, grad_fn is the identity on its last argument so {{{g_m[i] == g_accum[i] == g_accum[0] == g_z for all i}}}. In product's case, {{{accum[i-1] == product(m[1:i-1]) and g_accum[N-i] == g_z * product(m[i+1:N])}}}, multiply them together and you obtain {{{g_z * product(m)/m[i]}}} where obviously we only need to compute {{{product(m)}}} once. It's worth handling those two special cases, for the general case I don't know.
--- a/doc/sandbox/randomnumbers.txt
+++ b/doc/sandbox/randomnumbers.txt
--- a/doc/tutorial/conditions.txt
+++ b/doc/tutorial/conditions.txt
@@ -53,6 +53,7 @@ In this example, IfElse Op spend less time (about an half) than Switch
 since it computes only one variable instead of both.
 .. code-block:: python
  >>> python ifelse_switch.py
  time spent evaluating both values 0.6700 sec
  time spent evaluating one value 0.3500 sec