@@ -19,7 +19,9 @@ The broadcast mode serves to calculate the rank of the corresponding output and
* {{{(accumulate, Accumulator)}}}
* output.rank = min(input.rank)
* for the inputs of greater rank, we use Accumulator (sum, product, etc.) to accumulate over the first dimensions
* e.g. {{{if Accumulator == sum, order == c, x.rank == 2, y.rank == 1 and z = f(x, y) then z[i] = f(sum_j(x[i, j]), y[i])}}}
* if {{{order == f}}} ([3, 5], [5]) => [5] or ([7, 8, 9], [8, 9]) => [8, 9]
* if {{{order == c}}} ([3, 5], [3]) => [3] or ([7, 8, 9], [7, 8]) => [7, 8]
...
...
@@ -27,6 +29,7 @@ The broadcast mode serves to calculate the rank of the corresponding output and
This does not cover all cases of broadcasting, but I believe they cover enough. Other cases of broadcasting can be emulated with proper transposition and/or slicing.
* Could you give some examples of what kinds of broadcasting are and are not covered by your proposed implementation?
* For rank <= 2, I think only operations of the form {{{add(ones(3,1), ones(1,3)))}}} are missing. I actually didn't think of that one before now.
* In general, it only handles f(shape(head, ...), shape(head, ...), ...) and f(shape(..., tail), shape(..., tail), ...)
* Maybe I could add a general case later... the thing is that I think the ones I am considering here are easier to streamline.
...
...
@@ -71,8 +74,11 @@ An Optimizer should look at the operations in the graph and figure out whether t
The input ranks become the output ranks and gradients of the same rank as the outputs are added to the input list. If an output was given mode {{{broadcast}}}, then all inputs used to calculate it had to be broadcasted to that shape, so we must sum over the broadcasted dimensions on the gradient. The mode that we give to those inputs is therefore {{{(accumulate, sum)}}}. Inversely, if an output was given mode {{{(accumulate, sum)}}}, then all inputs used to calculate it had to be summed over those dimensions. Therefore, we give them mode {{{broadcast}}} in grad. Other accumulators than sum might prove more difficult. For example, the ith gradient for product is grad*product/x_i. Not sure how to handle that automatically.
* I don't exactly follow this paragraph, but I think I catch the general idea and it seems to me like it will work very well.
* In a nutshell for {{{broadcast}}} I calculate the gradient as normal assuming the shape is broadcasted and then I sum over what I had to broadcast.
* Could you explain why the accumulator gradient (e.g. product) can be trickier?
* I thought about it and I figured that the general case is {{{g_accum[N-i+1], g_m[i] = grad_fn(accum[i-1], m[i], g_accum[N-i])}}} where {{{g_accum}}} is the accumulated gradient wrt the accumulator {{{accum}}}. It can be short-circuited in sum and product's case: for sum, grad_fn is the identity on its last argument so {{{g_m[i] == g_accum[i] == g_accum[0] == g_z for all i}}}. In product's case, {{{accum[i-1] == product(m[1:i-1]) and g_accum[N-i] == g_z * product(m[i+1:N])}}}, multiply them together and you obtain {{{g_z * product(m)/m[i]}}} where obviously we only need to compute {{{product(m)}}} once. It's worth handling those two special cases, for the general case I don't know.
@@ -37,61 +37,61 @@ So the proposal is to provide the missing functionality (the last three requirem
== Syntax ==
{{{
#!python
# create a random generator, providing a default seed to condition how RandomOp instances are produced.
r = MetaRandom(metaseed=872364)
.. code-block:: python
# create a different random generator
rr = MetaRandom(metaseed=99)
#!python
# create a random generator, providing a default seed to condition how RandomOp instances are produced.
r = MetaRandom(metaseed=872364)
# create an Op to produce a stream of random numbers.
# This generates random numbers uniformly between 0.0 and 1.0 excluded
# u will remember that it was made from r.
u = r.uniform(shape=(3,4,5), low=0.0, high=1.0)
# create a different random generator
rr = MetaRandom(metaseed=99)
# create a second Op for more random numbers
# v will remember that it was made from r.
v = r.uniform(shape=(8,), low=-1.0, high=0.0)
# create an Op to produce a stream of random numbers.
# This generates random numbers uniformly between 0.0 and 1.0 excluded
# u will remember that it was made from r.
u = r.uniform(shape=(3,4,5), low=0.0, high=1.0)
# create a third Op with a different underlying random state
# w will remember that it was made from rr.
w = rr.uniform(shape=(), low=-10., high=10.)
# create a second Op for more random numbers
# v will remember that it was made from r.
v = r.uniform(shape=(8,), low=-1.0, high=0.0)
# compile a function to draw random numbers
# note: un-named state inputs will be added automatically.
# note: it is not necessary to draw samples for u, even though
# u was created by r before v.
fn_v = compile.function([], [v])
# create a third Op with a different underlying random state
# w will remember that it was made from rr.
w = rr.uniform(shape=(), low=-10., high=10.)
# this prints some representation of v's rng in fn_v.
# The .rng property works for Result instances produced by MetaRandom.
print fn_v.state[v.rng]
# compile a function to draw random numbers
# note: un-named state inputs will be added automatically.
# note: it is not necessary to draw samples for u, even though
# u was created by r before v.
fn_v = compile.function([], [v])
# compile a function to draw each of u, v, w
# note: un-named state inputs will be added automatically
# note: This function (especially its internal state) is independent from fn_v.
fn_uvw = compile.function([], [u,v,w])
# this prints some representation of v's rng in fn_v.
# The .rng property works for Result instances produced by MetaRandom.
print fn_v.state[v.rng]
# N.B. The random number streams of fn_v and fn_uvw are independent.
assert fn_v.state[v.rng] != fn_uvw.state[v.rng]
# compile a function to draw each of u, v, w
# note: un-named state inputs will be added automatically
# note: This function (especially its internal state) is independent from fn_v.
fn_uvw = compile.function([], [u,v,w])
fn_v() # returns random numbers A (according to metaseed 872364)
fn_v() # returns different random numbers B
# N.B. The random number streams of fn_v and fn_uvw are independent.
assert fn_v.state[v.rng] != fn_uvw.state[v.rng]
# note that v's stream here is identical to the one in fn_v()
fn_uvw() # returns random numbers C, A, E
fn_v() # returns random numbers A (according to metaseed 872364)
fn_v() # returns different random numbers B
#explicitly re-seed v's random stream in fn_v
r.seed(fn_v, 872364)
fn_v() # returns random numbers A (as above)
fn_v() # returns random numbers B (as above)
# note that v's stream here is identical to the one in fn_v()
fn_uvw() # returns random numbers C, A, E
#re-seed w's random stream in fn_uvw, but not u's or v's
rr.seed(fn_uvw, 99)
fn_uvw() # returns random numbers D, B, E
#explicitly re-seed v's random stream in fn_v
r.seed(fn_v, 872364)
fn_v() # returns random numbers A (as above)
fn_v() # returns random numbers B (as above)
#re-seed w's random stream in fn_uvw, but not u's or v's
rr.seed(fn_uvw, 99)
fn_uvw() # returns random numbers D, B, E
}}}
== {{{MetaRandom}}} ==
...
...
@@ -106,8 +106,9 @@ The use of multiple {{{MetaRandom}}} objects in a single function is mostly for
The typical case is that only one (global) {{{MetaRandom}}} object is used to produce all the random streams in a function, so seeding (once) will reset the entire function.
{{{
class MetaRandom(obj):
.. code-block:: python
class MetaRandom(obj):
def __init__(self, metaseed=<N>): ... # new functions will be initialized so that seed(fn, <N>) has no effect on output.
def __contains__(self, Result): ... # True if Result was returned by a call to self.<distribution>
...
...
@@ -122,14 +123,14 @@ class MetaRandom(obj):
def normal(...): ...
def bernoulli(...): ...
...
}}}
=== {{{MetaRandom.getstate}}} ===
{{{
def getstate(self, fn): ...
}}}
.. code-block:: python
def getstate(self, fn): ...
''return''::
list, set, dict, instance... something to store the random number generators associated with every one of {{{self}}}'s members in {{{fn}}}
...
...
@@ -137,9 +138,10 @@ def getstate(self, fn): ...
Re-install the random number generators in {{{rstates}}} to the {{{randomobj}}} members in {{{fn}}
{{{
def setstate(self, fn, rstates): ....
}}}
.. code-block:: python
def setstate(self, fn, rstates): ....
''fn::
a CompileFunction instance, generally with some Apply instances inside that are members of {{{self}}}.