提交 875c5037 authored 作者: Frederic's avatar Frederic

many doc syntax fix to remove warning during doc generation.

上级 31b55f92
...@@ -56,6 +56,23 @@ then go to your fork's github page on the github website, select your feature ...@@ -56,6 +56,23 @@ then go to your fork's github page on the github website, select your feature
branch and hit the "Pull Request" button in the top right corner. branch and hit the "Pull Request" button in the top right corner.
If you don't get any feedback, bug us on the theano-dev mailing list. If you don't get any feedback, bug us on the theano-dev mailing list.
History not clean
-----------------
In some case you could have stuff commited in your feature branch that
are not needed in the final pull request. There is a `page
<http://sandofsky.com/blog/git-workflow.html>`_ that talk about
this. In summary:
* Commits to the trunk should be a lot cleaner than commits to your
feature branch; not just for ease of reviewing like I said but also
because intermediate commits can break blame (the bisecting tool)
* `git merge --squash` will put all of the commits from your feature branch into one commit.
* There are other tools that are useful if your branch is too big for one squash.
Details about ``PYTHONPATH`` Details about ``PYTHONPATH``
---------------------------- ----------------------------
......
...@@ -19,7 +19,9 @@ The broadcast mode serves to calculate the rank of the corresponding output and ...@@ -19,7 +19,9 @@ The broadcast mode serves to calculate the rank of the corresponding output and
* {{{(accumulate, Accumulator)}}} * {{{(accumulate, Accumulator)}}}
* output.rank = min(input.rank) * output.rank = min(input.rank)
* for the inputs of greater rank, we use Accumulator (sum, product, etc.) to accumulate over the first dimensions * for the inputs of greater rank, we use Accumulator (sum, product, etc.) to accumulate over the first dimensions
* e.g. {{{if Accumulator == sum, order == c, x.rank == 2, y.rank == 1 and z = f(x, y) then z[i] = f(sum_j(x[i, j]), y[i])}}} * e.g. {{{if Accumulator == sum, order == c, x.rank == 2, y.rank == 1 and z = f(x, y) then z[i] = f(sum_j(x[i, j]), y[i])}}}
* if {{{order == f}}} ([3, 5], [5]) => [5] or ([7, 8, 9], [8, 9]) => [8, 9] * if {{{order == f}}} ([3, 5], [5]) => [5] or ([7, 8, 9], [8, 9]) => [8, 9]
* if {{{order == c}}} ([3, 5], [3]) => [3] or ([7, 8, 9], [7, 8]) => [7, 8] * if {{{order == c}}} ([3, 5], [3]) => [3] or ([7, 8, 9], [7, 8]) => [7, 8]
...@@ -27,6 +29,7 @@ The broadcast mode serves to calculate the rank of the corresponding output and ...@@ -27,6 +29,7 @@ The broadcast mode serves to calculate the rank of the corresponding output and
This does not cover all cases of broadcasting, but I believe they cover enough. Other cases of broadcasting can be emulated with proper transposition and/or slicing. This does not cover all cases of broadcasting, but I believe they cover enough. Other cases of broadcasting can be emulated with proper transposition and/or slicing.
* Could you give some examples of what kinds of broadcasting are and are not covered by your proposed implementation? * Could you give some examples of what kinds of broadcasting are and are not covered by your proposed implementation?
* For rank <= 2, I think only operations of the form {{{add(ones(3,1), ones(1,3)))}}} are missing. I actually didn't think of that one before now. * For rank <= 2, I think only operations of the form {{{add(ones(3,1), ones(1,3)))}}} are missing. I actually didn't think of that one before now.
* In general, it only handles f(shape(head, ...), shape(head, ...), ...) and f(shape(..., tail), shape(..., tail), ...) * In general, it only handles f(shape(head, ...), shape(head, ...), ...) and f(shape(..., tail), shape(..., tail), ...)
* Maybe I could add a general case later... the thing is that I think the ones I am considering here are easier to streamline. * Maybe I could add a general case later... the thing is that I think the ones I am considering here are easier to streamline.
...@@ -71,8 +74,11 @@ An Optimizer should look at the operations in the graph and figure out whether t ...@@ -71,8 +74,11 @@ An Optimizer should look at the operations in the graph and figure out whether t
The input ranks become the output ranks and gradients of the same rank as the outputs are added to the input list. If an output was given mode {{{broadcast}}}, then all inputs used to calculate it had to be broadcasted to that shape, so we must sum over the broadcasted dimensions on the gradient. The mode that we give to those inputs is therefore {{{(accumulate, sum)}}}. Inversely, if an output was given mode {{{(accumulate, sum)}}}, then all inputs used to calculate it had to be summed over those dimensions. Therefore, we give them mode {{{broadcast}}} in grad. Other accumulators than sum might prove more difficult. For example, the ith gradient for product is grad*product/x_i. Not sure how to handle that automatically. The input ranks become the output ranks and gradients of the same rank as the outputs are added to the input list. If an output was given mode {{{broadcast}}}, then all inputs used to calculate it had to be broadcasted to that shape, so we must sum over the broadcasted dimensions on the gradient. The mode that we give to those inputs is therefore {{{(accumulate, sum)}}}. Inversely, if an output was given mode {{{(accumulate, sum)}}}, then all inputs used to calculate it had to be summed over those dimensions. Therefore, we give them mode {{{broadcast}}} in grad. Other accumulators than sum might prove more difficult. For example, the ith gradient for product is grad*product/x_i. Not sure how to handle that automatically.
* I don't exactly follow this paragraph, but I think I catch the general idea and it seems to me like it will work very well. * I don't exactly follow this paragraph, but I think I catch the general idea and it seems to me like it will work very well.
* In a nutshell for {{{broadcast}}} I calculate the gradient as normal assuming the shape is broadcasted and then I sum over what I had to broadcast. * In a nutshell for {{{broadcast}}} I calculate the gradient as normal assuming the shape is broadcasted and then I sum over what I had to broadcast.
* Could you explain why the accumulator gradient (e.g. product) can be trickier? * Could you explain why the accumulator gradient (e.g. product) can be trickier?
* I thought about it and I figured that the general case is {{{g_accum[N-i+1], g_m[i] = grad_fn(accum[i-1], m[i], g_accum[N-i])}}} where {{{g_accum}}} is the accumulated gradient wrt the accumulator {{{accum}}}. It can be short-circuited in sum and product's case: for sum, grad_fn is the identity on its last argument so {{{g_m[i] == g_accum[i] == g_accum[0] == g_z for all i}}}. In product's case, {{{accum[i-1] == product(m[1:i-1]) and g_accum[N-i] == g_z * product(m[i+1:N])}}}, multiply them together and you obtain {{{g_z * product(m)/m[i]}}} where obviously we only need to compute {{{product(m)}}} once. It's worth handling those two special cases, for the general case I don't know. * I thought about it and I figured that the general case is {{{g_accum[N-i+1], g_m[i] = grad_fn(accum[i-1], m[i], g_accum[N-i])}}} where {{{g_accum}}} is the accumulated gradient wrt the accumulator {{{accum}}}. It can be short-circuited in sum and product's case: for sum, grad_fn is the identity on its last argument so {{{g_m[i] == g_accum[i] == g_accum[0] == g_z for all i}}}. In product's case, {{{accum[i-1] == product(m[1:i-1]) and g_accum[N-i] == g_z * product(m[i+1:N])}}}, multiply them together and you obtain {{{g_z * product(m)/m[i]}}} where obviously we only need to compute {{{product(m)}}} once. It's worth handling those two special cases, for the general case I don't know.
...@@ -37,61 +37,61 @@ So the proposal is to provide the missing functionality (the last three requirem ...@@ -37,61 +37,61 @@ So the proposal is to provide the missing functionality (the last three requirem
== Syntax == == Syntax ==
{{{ .. code-block:: python
#!python
# create a random generator, providing a default seed to condition how RandomOp instances are produced.
r = MetaRandom(metaseed=872364)
# create a different random generator #!python
rr = MetaRandom(metaseed=99) # create a random generator, providing a default seed to condition how RandomOp instances are produced.
r = MetaRandom(metaseed=872364)
# create an Op to produce a stream of random numbers. # create a different random generator
# This generates random numbers uniformly between 0.0 and 1.0 excluded rr = MetaRandom(metaseed=99)
# u will remember that it was made from r.
u = r.uniform(shape=(3,4,5), low=0.0, high=1.0)
# create a second Op for more random numbers # create an Op to produce a stream of random numbers.
# v will remember that it was made from r. # This generates random numbers uniformly between 0.0 and 1.0 excluded
v = r.uniform(shape=(8,), low=-1.0, high=0.0) # u will remember that it was made from r.
u = r.uniform(shape=(3,4,5), low=0.0, high=1.0)
# create a third Op with a different underlying random state # create a second Op for more random numbers
# w will remember that it was made from rr. # v will remember that it was made from r.
w = rr.uniform(shape=(), low=-10., high=10.) v = r.uniform(shape=(8,), low=-1.0, high=0.0)
# compile a function to draw random numbers # create a third Op with a different underlying random state
# note: un-named state inputs will be added automatically. # w will remember that it was made from rr.
# note: it is not necessary to draw samples for u, even though w = rr.uniform(shape=(), low=-10., high=10.)
# u was created by r before v.
fn_v = compile.function([], [v])
# this prints some representation of v's rng in fn_v. # compile a function to draw random numbers
# The .rng property works for Result instances produced by MetaRandom. # note: un-named state inputs will be added automatically.
print fn_v.state[v.rng] # note: it is not necessary to draw samples for u, even though
# u was created by r before v.
fn_v = compile.function([], [v])
# compile a function to draw each of u, v, w # this prints some representation of v's rng in fn_v.
# note: un-named state inputs will be added automatically # The .rng property works for Result instances produced by MetaRandom.
# note: This function (especially its internal state) is independent from fn_v. print fn_v.state[v.rng]
fn_uvw = compile.function([], [u,v,w])
# N.B. The random number streams of fn_v and fn_uvw are independent. # compile a function to draw each of u, v, w
assert fn_v.state[v.rng] != fn_uvw.state[v.rng] # note: un-named state inputs will be added automatically
# note: This function (especially its internal state) is independent from fn_v.
fn_uvw = compile.function([], [u,v,w])
fn_v() # returns random numbers A (according to metaseed 872364) # N.B. The random number streams of fn_v and fn_uvw are independent.
fn_v() # returns different random numbers B assert fn_v.state[v.rng] != fn_uvw.state[v.rng]
# note that v's stream here is identical to the one in fn_v() fn_v() # returns random numbers A (according to metaseed 872364)
fn_uvw() # returns random numbers C, A, E fn_v() # returns different random numbers B
#explicitly re-seed v's random stream in fn_v # note that v's stream here is identical to the one in fn_v()
r.seed(fn_v, 872364) fn_uvw() # returns random numbers C, A, E
fn_v() # returns random numbers A (as above)
fn_v() # returns random numbers B (as above)
#re-seed w's random stream in fn_uvw, but not u's or v's #explicitly re-seed v's random stream in fn_v
rr.seed(fn_uvw, 99) r.seed(fn_v, 872364)
fn_uvw() # returns random numbers D, B, E fn_v() # returns random numbers A (as above)
fn_v() # returns random numbers B (as above)
#re-seed w's random stream in fn_uvw, but not u's or v's
rr.seed(fn_uvw, 99)
fn_uvw() # returns random numbers D, B, E
}}}
== {{{MetaRandom}}} == == {{{MetaRandom}}} ==
...@@ -106,30 +106,31 @@ The use of multiple {{{MetaRandom}}} objects in a single function is mostly for ...@@ -106,30 +106,31 @@ The use of multiple {{{MetaRandom}}} objects in a single function is mostly for
The typical case is that only one (global) {{{MetaRandom}}} object is used to produce all the random streams in a function, so seeding (once) will reset the entire function. The typical case is that only one (global) {{{MetaRandom}}} object is used to produce all the random streams in a function, so seeding (once) will reset the entire function.
{{{ .. code-block:: python
class MetaRandom(obj):
def __init__(self, metaseed=<N>): ... # new functions will be initialized so that seed(fn, <N>) has no effect on output. class MetaRandom(obj):
def __init__(self, metaseed=<N>): ... # new functions will be initialized so that seed(fn, <N>) has no effect on output.
def __contains__(self, Result): ... # True if Result was returned by a call to self.<distribution> def __contains__(self, Result): ... # True if Result was returned by a call to self.<distribution>
def results(self): ... # Iterate over returned Result instances in creation order. def results(self): ... # Iterate over returned Result instances in creation order.
def seed(self, fn, bits): ... # See below. def seed(self, fn, bits): ... # See below.
def getstate(self, fn): ... # See below. def getstate(self, fn): ... # See below.
def setstate(self, fn, state): ... # See below. def setstate(self, fn, state): ... # See below.
def uniform(...): ... # return a Result of an Apply of a RandomOp. def uniform(...): ... # return a Result of an Apply of a RandomOp.
# The return value is also stored internally for __contains__ and results(). # The return value is also stored internally for __contains__ and results().
def normal(...): ... def normal(...): ...
def bernoulli(...): ... def bernoulli(...): ...
... ...
}}}
=== {{{MetaRandom.getstate}}} === === {{{MetaRandom.getstate}}} ===
{{{ .. code-block:: python
def getstate(self, fn): ...
}}} def getstate(self, fn): ...
''return'':: ''return''::
list, set, dict, instance... something to store the random number generators associated with every one of {{{self}}}'s members in {{{fn}}} list, set, dict, instance... something to store the random number generators associated with every one of {{{self}}}'s members in {{{fn}}}
...@@ -137,9 +138,10 @@ def getstate(self, fn): ... ...@@ -137,9 +138,10 @@ def getstate(self, fn): ...
Re-install the random number generators in {{{rstates}}} to the {{{randomobj}}} members in {{{fn}} Re-install the random number generators in {{{rstates}}} to the {{{randomobj}}} members in {{{fn}}
{{{ .. code-block:: python
def setstate(self, fn, rstates): ....
}}} def setstate(self, fn, rstates): ....
''fn:: ''fn::
a CompileFunction instance, generally with some Apply instances inside that are members of {{{self}}}. a CompileFunction instance, generally with some Apply instances inside that are members of {{{self}}}.
''rstates'':: ''rstates''::
...@@ -150,9 +152,10 @@ def setstate(self, fn, rstates): .... ...@@ -150,9 +152,10 @@ def setstate(self, fn, rstates): ....
=== {{{MetaRandom.seed}}} === === {{{MetaRandom.seed}}} ===
{{{ .. code-block:: python
def seed(self, fn, bits): ....
}}} def seed(self, fn, bits): ....
''fn:: ''fn::
a CompileFunction instance, generally with some Apply instances inside that are members of {{{self}}}. a CompileFunction instance, generally with some Apply instances inside that are members of {{{self}}}.
''bits'':: ''bits''::
...@@ -164,75 +167,73 @@ Set the states of self's members in fn in a deterministic way based on bits. ...@@ -164,75 +167,73 @@ Set the states of self's members in fn in a deterministic way based on bits.
Each member of self should generate independent samples after this call. Each member of self should generate independent samples after this call.
Seed is like a dynamically-computed setstate. If the user runs Seed is like a dynamically-computed setstate. If the user runs
{{{ .. code-block:: python
r.seed(fn, 99)
state_99 = r.getstate(fn)
}}}
then any time afterward both {{{r.setstate(fn, state_99)}}} and {{{r.seed(fn, 99)}}} will put {{{fn}}} into the same state.
r.seed(fn, 99)
state_99 = r.getstate(fn)
then any time afterward both {{{r.setstate(fn, state_99)}}} and {{{r.seed(fn, 99)}}} will put {{{fn}}} into the same state.
= Potential Other syntax =
{{{ = Potential Other syntax =
#!python
# create a random state
r = RandomState(name = 'r')
# create a different random state
rr = RandomState(name = 'rr')
# create an Op to produce a stream of random numbers. .. code-block:: python
# That stream is a function of r's seed.
# This generates random numbers uniformly between 0.0 and 1.0 excluded
u = r.uniform(shape=(3,4,5), 0.0, 1.0)
# create a second Op for more random numbers #!python
# This stream is seeded using a different function of r's seed. # create a random state
# u and v should be independent r = RandomState(name = 'r')
v = r.uniform(shape=(8,), -1.0, 0.0)
# create a third Op with a different underlying random state # create a different random state
w = rr.uniform(shape=(), -10., 10.) rr = RandomState(name = 'rr')
# compile a function to draw random numbers # create an Op to produce a stream of random numbers.
# note: it is not necessary to draw samples for u. # That stream is a function of r's seed.
# we provide the seed for the RandomState r in the inputs list as a "Type 4" input # This generates random numbers uniformly between 0.0 and 1.0 excluded
fn_v = compile.function([(r, 872364)], [v]) u = r.uniform(shape=(3,4,5), 0.0, 1.0)
# compile a function to draw each of u, v, w # create a second Op for more random numbers
# we provide the seeds for the RandomStates r and rr in the inputs list as "Type 4" inputs # This stream is seeded using a different function of r's seed.
# note: the random state for r here is seeded independently from the one in fn_v, which means # u and v should be independent
# random number generation of fn_v and fn_uvw will not interfere. Since the seed is the v = r.uniform(shape=(8,), -1.0, 0.0)
# same, it means they will produce the same sequence of tensors for the output v.
fn_uvw = compile.function([(r, 872364), (rr, 99)], [u,v,w])
# create a third Op with a different underlying random state
w = rr.uniform(shape=(), -10., 10.)
fn_v() # returns random numbers A # compile a function to draw random numbers
fn_v() # returns different random numbers B # note: it is not necessary to draw samples for u.
# we provide the seed for the RandomState r in the inputs list as a "Type 4" input
fn_v = compile.function([(r, 872364)], [v])
# note that v's stream here is identical to the one in fn_v() # compile a function to draw each of u, v, w
fn_uvw() # returns random numbers C, A, E # we provide the seeds for the RandomStates r and rr in the inputs list as "Type 4" inputs
# note: the random state for r here is seeded independently from the one in fn_v, which means
# random number generation of fn_v and fn_uvw will not interfere. Since the seed is the
# same, it means they will produce the same sequence of tensors for the output v.
fn_uvw = compile.function([(r, 872364), (rr, 99)], [u,v,w])
#re-seed v's random stream in fn
fn_v.r = 872364
### Is this state readable? What should we do here: fn_v() # returns random numbers A
print fn_v.r fn_v() # returns different random numbers B
fn() # returns random numbers A # note that v's stream here is identical to the one in fn_v()
fn_uvw() # returns random numbers C, A, E
### Is this state well-defined? #re-seed v's random stream in fn
### Does there even exist a number such that fn_v.r = N would have no effect on the rng states? fn_v.r = 872364
print fn_v.r
fn() # returns random numbers B ### Is this state readable? What should we do here:
print fn_v.r
#re-seed w's random stream, but not u's or v's fn() # returns random numbers A
fn_uvw.rr = 99
fn_uvw() # returns random numbers D, B, E
}}} ### Is this state well-defined?
### Does there even exist a number such that fn_v.r = N would have no effect on the rng states?
print fn_v.r
fn() # returns random numbers B
#re-seed w's random stream, but not u's or v's
fn_uvw.rr = 99
fn_uvw() # returns random numbers D, B, E
...@@ -53,6 +53,7 @@ In this example, IfElse Op spend less time (about an half) than Switch ...@@ -53,6 +53,7 @@ In this example, IfElse Op spend less time (about an half) than Switch
since it computes only one variable instead of both. since it computes only one variable instead of both.
.. code-block:: python .. code-block:: python
>>> python ifelse_switch.py >>> python ifelse_switch.py
time spent evaluating both values 0.6700 sec time spent evaluating both values 0.6700 sec
time spent evaluating one value 0.3500 sec time spent evaluating one value 0.3500 sec
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论