提交 839eda94 authored 作者: Razvan Pascanu's avatar Razvan Pascanu

changes into the tutorials

上级 0fc7c8ca
...@@ -8,8 +8,8 @@ Baby steps - Adding two numbers together ...@@ -8,8 +8,8 @@ Baby steps - Adding two numbers together
Adding two scalars Adding two scalars
================== ==================
So, to get us started and get a feel of what we're working with, let's So, to get us started with Theano and get a feel of what we're working with,
make a simple function: add two numbers together. Here is how you do let's make a simple function: add two numbers together. Here is how you do
it: it:
>>> x = T.dscalar('x') >>> x = T.dscalar('x')
...@@ -26,7 +26,7 @@ array(28.4) ...@@ -26,7 +26,7 @@ array(28.4)
Let's break this down into several steps. The first step is to define Let's break this down into several steps. The first step is to define
two symbols, or Variables, representing the quantities that you want two symbols representing the quantities that you want
to add. Note that from now on, we will use the term :term:`Variable` to add. Note that from now on, we will use the term :term:`Variable`
to mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Variable to mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Variable
objects). The output of the function ``f`` is a ``numpy.ndarray`` objects). The output of the function ``f`` is a ``numpy.ndarray``
...@@ -36,7 +36,6 @@ If you are following along and typing into an interpreter, you may have ...@@ -36,7 +36,6 @@ If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function`` noticed that there was a slight delay in executing the ``function``
instruction. Behind the scenes, ``f`` was being compiled into C code. instruction. Behind the scenes, ``f`` was being compiled into C code.
.. TODO: help
------------------------------------------- -------------------------------------------
...@@ -64,8 +63,7 @@ TensorType(float64, scalar) ...@@ -64,8 +63,7 @@ TensorType(float64, scalar)
>>> x.type == T.dscalar >>> x.type == T.dscalar
True True
You can learn more about the structures in Theano in You can learn more about the structures in Theano in :ref:`graphstructures`.
the :ref:`advtutorial` and in :ref:`graphstructures`.
By calling ``T.dscalar`` with a string argument, you create a By calling ``T.dscalar`` with a string argument, you create a
:term:`Variable` representing a floating-point scalar quantity with the :term:`Variable` representing a floating-point scalar quantity with the
......
...@@ -138,6 +138,9 @@ with respect to the second. In this way, Theano can be used for ...@@ -138,6 +138,9 @@ with respect to the second. In this way, Theano can be used for
.. note:: .. note::
The second argument of ``T.grad`` can be a list, case in which it
will
The variable of ``T.grad`` has the same dimensions as the The variable of ``T.grad`` has the same dimensions as the
second argument. This is exactly like the first derivative if the second argument. This is exactly like the first derivative if the
first argument is a scalar or a tensor of size 1 but not if it is first argument is a scalar or a tensor of size 1 but not if it is
......
...@@ -10,7 +10,7 @@ Let's start an interactive session and import Theano. ...@@ -10,7 +10,7 @@ Let's start an interactive session and import Theano.
>>> from theano import * >>> from theano import *
Many of symbols you will need to use are in the ``tensor`` subpackage Many of symbols you will need to use are in the ``tensor`` subpackage
of theano. Let's import that subpackage under a handy name. I like of Theano. Let's import that subpackage under a handy name. I like
``T`` (and many tutorials use this convention). ``T`` (and many tutorials use this convention).
>>> import theano.tensor as T >>> import theano.tensor as T
......
...@@ -8,10 +8,9 @@ NumPy refresher ...@@ -8,10 +8,9 @@ NumPy refresher
Here are some quick guides to NumPy: Here are some quick guides to NumPy:
* `Numpy quick guide for Matlab users <http://www.scipy.org/NumPy_for_Matlab_Users>`__ * `Numpy quick guide for Matlab users <http://www.scipy.org/NumPy_for_Matlab_Users>`__
* `More detailed table showing the NumPy equivalent of Matlab commands <http://www.scribd.com/doc/26685/Matlab-Python-and-R>`__ * `Numpy User Guide <http://docs.scipy.org/doc/numpy/user/index.html>`__
* `More detailed Numpy tutorial <http://www.scipy.org/Tentative_NumPy_Tutorial>`__
.. TODO [DefineBroadcasting Broadcasting]
.. Broadcastable - Implicitly assume that all previous entries are true.
.. [TODO: More doc, e.g. see _test_tensor.py] .. [TODO: More doc, e.g. see _test_tensor.py]
...@@ -20,8 +19,10 @@ Matrix conventions for machine learning ...@@ -20,8 +19,10 @@ Matrix conventions for machine learning
Rows are horizontal and columns are vertical. Rows are horizontal and columns are vertical.
Every row is an example. Therefore, inputs[10,5] is a matrix of 10 examples with 5 dimensions per. Every row is an example. Therefore, inputs[10,5] is a matrix of 10 examples
So to make a NN out of it, multiply by a weight matrix of size (5, #hid). where each example has dimension 5. If this would be the input of a
neural network then the weights from the input the the first hidden
layer would represent a matrix of size (5, #hid).
If I have an array: If I have an array:
...@@ -43,3 +44,22 @@ To access the entry in the 3rd row (row #2) and the 1st column (column #0): ...@@ -43,3 +44,22 @@ To access the entry in the 3rd row (row #2) and the 1st column (column #0):
To remember this, keep in mind that we read left-to-right, top-to-bottom, To remember this, keep in mind that we read left-to-right, top-to-bottom,
so each thing that is contiguous is a row. That is, there are 3 rows so each thing that is contiguous is a row. That is, there are 3 rows
and 2 columns. and 2 columns.
Broadcasting
============
Numpy does :term:`broadcasting` of numpy arrays of different shapes during
arithmetic operations. What this means in general is that the smaller
array is *broadcasted* across the larger array so that they have
compatible shapes. The example below shows an instance of
*broadcastaing*:
>>> a = numpy.asarray([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([2., 4., 6.])
The smaller array ``b`` in this case is *broadcasted* to the same size
as a during the multiplication. This trick is often useful in
simplifying how expression are written. More details about *broadcasting*
can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__ .
"""Provide Scan and related functions """Provide Scan and related functions
Scanning a function over sequential input(s) producing sequential output(s). Scanning a function over sequential input(s) producing sequential output(s).
Scanning is a general form of recurrence, which can be used for looping. Scanning is a general form of recurrence, which can be used for looping.
The idea is that you 'scan' a function along some input sequence, producing an output at each The idea is that you 'scan' a function along some input sequence, producing
time-step that can be seen (but not modified) by the function at the next time-step. an output at each time-step that can be seen (but not modified) by the
(Technically, the function can see the previous K time-steps.) function at the next time-step. (Technically, the function can see the
previous K time-steps.)
So for example, ``sum()`` could be computed by scanning the ``z+x_i`` function over a list, So for example, ``sum()`` could be computed by scanning the ``z+x_i``
given an initial state of ``z=0``. function over a list, given an initial state of ``z=0``.
Special cases: Special cases:
- A ``reduce()`` operation can be performed by returning only the last output of a scan. - A ``reduce()`` operation can be performed by returning only the last
output of a scan.
- A ``map()`` operation can be performed by applying a function that ignores each previous - A ``map()`` operation can be performed by applying a function that
output. ignores each previous output.
Often a for loop can be expressed as a scan() operation, and scan is the closest that theano Often a for loop can be expressed as a scan() operation, and scan is the
comes to looping. closest that theano comes to looping.
This module provides scanning functionality with the `Scan` Op. This module provides scanning functionality with the `Scan` Op.
""" """
__docformat__ = 'restructedtext en' __docformat__ = 'restructedtext en'
import traceback
import numpy import numpy
import theano import theano
import theano.compile
from theano.tensor import opt from theano.tensor import opt
from theano import gof from theano import gof
from theano.compile import optdb from theano.compile import optdb
''' # Logging function for sending warning or info
TODO : move out of sandbox ! import logging
''' _logger = logging.getLogger('theano.scan')
def warning(*msg):
_logger.warning('WARNING theano.scan: '+' '.join(msg))
def info(*msg):
_logger.info('INFO theano.scan: '+' '.join(msg))
# Hashing a list; list used by scan are list of numbers, therefore a list
# can be hashed by hashing all elements in the list
def hash_list(list):
hash_value = 0
for v in list:
hash_value ^= v
return hash_value
# Hashing a dictionary; the dictionary used by scan has as keys numbers and
# as values either numbers or list of numbers
def hash_dict(dictionary):
hash_value = 0
for k,v in dictionary,iteritems():
# hash key
hash_value ^= k
if type(v) in (list,tuple):
hash_value ^= hash_list(v)
else:
hash_value ^= v
return hash_value
class Scan(theano.Op):
"""Scan a function `fn` over several inputs producing several outputs
This Op implements a generalization of scan in which `fn` may consult several previous def scan(fn, sequnces, non_sequences, seed_values, inplace_map={},
outputs from the past, from positions (taps) relative to the current time. The number of sequences_taps={}, outputs_taps = {},
taps (T_j) to use for each output (y_j) must be provided when creating a Scan Op. len = theano.tensor.zero(), force_gradient = False,
truncate_gradient = -1, go_backwards = False, mode = 'FAST_RUN'):
'''The function creates a more intuitive interface to the scan op.
Apply Inputs: This function first creates a scan op object, and afterwards applies it
to the input data. The scan operation iterates over X sequences producing
Y outputs. The function that is applied recursively may consult several
previous outputs from the past as well as past values and future values
of the input. You can see it as havin the inputs :
X sequence inputs x_1, x_2, ... x_X X sequences inptus x_1, x_2, .. x_X
Y initial states (u_1, u_2, ... u_Y) for our outputs. Each must have appropriate length Y seeds/initial values ( u_1, u_2, .. u_Y) for the outputs
(T_1, T_2, ..., T_Y).
W other inputs w_1, w_2, ... w_W W non sequences inputs w_1, w_2, .. w_W
Apply Outputs: Outputs :
Y sequence outputs y_1, y_2, ... y_Y Y sequence outputs y_1, y_2, .. y_Y
Each output y_j is computed one time-step at a time according to the formula: Each otuput y_j computed one time step at a time according to the
formula:
.. code-block:: python .. code-block:: python
(y_1[t], y_2[t],.., y_Y[t]) = fn( (y_1[t], y_2[t], .. y_Y[t]) = f(
x_1[t], x_2[t], ... x_X[t], # X current input values x_1[t-K_1],.. x_1[t],x_1[t+1],.. x_1[t+L_1], # x_1 past and future
y_1(t-1), y_1(t-2), .., y_1(t-T_1), # T_1 previous outputs for y_1 #values
y_2(t-1), y_2(t-2), ..., y_2(t-T_2), # T_2 previous outputs for y_2 x_2[t-K-2],.. x_2[t],x_2[t+1],.. x_2[t+L_2], # x_2 past and future
..., # ... # values
y_Y(t-1), y_Y(t-2), ..., y_Y(t-T_Y), # T_Y previous outputs for y_Y ... # ...
w_1, w_2,..., w_W) # W 'timeless' inputs y_1[t-1], y_1[t-2], .. y[t - T_1], # past values of y_1
y_2[t-1], y_2[t-2], .. y[t - T_2],, # past values of y_2
...
w_1, w_2, .., w_W) # 'timeless' inputs
:param fn: fn is a lambda expression or a function that given a list of
symbolic inputs returns the update list and symbolic outputs list of the
function that shall be applied recursively.
:param sequences:list of sequences over which the scan op should iterate;
sequnces length should also cover past and future taps; for example if
you also use for a sequence the past tap -3 and future tap +4, to total
length should be n+7, where first 3 values of sequence are those
corresponding to -3 -2 -1 and the last 4 values correspond to n+1 n+2
n+3 and n+4
:param non_sequences: list of inputs over which it shouldn't iterate
:param seed_values: seeds (initial values) of the outputs; if past taps
are this seeds should contain enough values to cover this past values;
note that index 0 of a seed belongs to the largest past tap
:param inplace_map: a dictionary telling which output should be
computed in place of which input sequence ; input sequence has to be
of the same shape as the output
:param sequence_taps: a dictionary telling for each sequence what past
and future taps it should use; past values should be negative, future
taps positives; by default 0 is added in this dictionary (current value)
if nothing is provided
:param outputs_taps: a dictionary telling for each output what past
taps it should use (negative values); by default -1 is added to this
dictionary if nothing is provided
:param len: a value (or theano scalar) describing for how many steps
the scan should iterate; 0 means that it should iterate over the entire
length of the input sequence(s)
:param force_gradient: a flag telling scan op that the gradient can be
computed even though inplace or updates are used - use this on your own
risk
:param truncate_gradient: tells for how many steps should scan go
back in time on the backward pass of backpropagation through time
:param go_backwards: a flag indicating if scan should iterate back from
the end of the sequence to the begining (if it is true) or from 0 to
the end
:param mode: indicates the mode that should be used to compile the
function that will be applied recursively
'''
# check if inputs are just single variables instead of lists
if not (type(sequences) in (list, tuple)):
seqs = [sequences]
elif seqs = sequences
So `fn` must accept X + T_1 + T_2 + ... + T_Y + W arguments. if not type(seed_values) in (list,tuple)):
seeds = [seed_values]
elif
seeds = seed_values
There are two high-level methods (`symbolic`, `compiled`) for creating a Scan Op besides if not (type(non_sequences) in (list,tuple)):
the low-level `__init__` constructor. ***Why would you call them?*** non_seqs = [non_sequences]
elif
non_seqs = non_sequences
When applying a Scan Op to theano Variables, the order of arguments is very important! When
using the full flexibility of Scan there can be a lot of arguments, but it is essential to
put them in the following order:
1. "Ignored inputs" (x_i with i < n_inplace_ignore) that will be overwritten by an inplace scan.
2. Inputs that will be overwritten by an inplace scan (x_i with i < n_inplace) # compute number of sequences and number of seeds
n_seqs = len(seqs)
3. Remaining Inputs (x_i with i >= n_inplace) # see if there are outputs that do not feed anything back to the function
# applied recursively
outs_tapkeys = outputs_taps.keys()
for k in outs_tapkeys.sort():
if outputs_taps[k] == []
# add empty lists where you have outputs that do not have past
# values
seeds = seeds[:k] + [[]] + seeds[k:]
3. Output states (u_j) corresponding to the outputs that are computed inplace (j < n_seeds = len(seeds)
n_inplace)
4. Remaining output states not given in 3 (u_j with j >= n_inplace) # update sequences_taps[idx] to contain 0 if it is not defined
for i in xrange(n_seqs):
if not sequences_taps.has_key(i):
sequences_taps.update({i:[0]})
# if input sequence is not actually used by the recursive function
elif sequences_taps[i] == []:
sequences_taps.__delitem__(i)
elif not (sequences_taps[i] in (list,tuple)):
sequences_taps[i] = [sequences_taps[i]]
5. Other inputs (w_1, w_2, ... w_W) # update outputs_taps[idx] to contain -1 if it is not defined
for i in xrange(n_seeds):
if not outputs_taps.has_key(i):
outputs_taps.update({i:-1})
# if output sequence is not actually used as input to the recursive
# function
elif outputs_taps[i] == []:
outputs_taps.__delitem__(i)
elif not(outputs_taps[i] in (list,tuple)):
outputs_taps[i] = [outputs_taps[i]]
Inplace Operation # create theano inputs for the recursive function
================= args = []
for (i,seq) in enumerate(seqs):
if sequences_taps.has_key(i):
for k in len(sequences_taps[i]):
args += [seq[0].type() ]
for (i,seed) in enumerate(seeds):
if outputs_taps.has_key(i):
for k in len(outputs_taps[i]):
args += [seed[0].type() ]
The Scan Op supports computing some (`n_inplace`) of the outputs y_j using the memory from args += non_seqs
corresponding inputs x_j. next_outs, updates = fn(*args)
It is not possible to indicate precisely which outputs overwrite which inputs, but without
loss of generality we assume that each of the first `n_inplace` outputs (y_j) overwrites
the corresponding input (x_j).
Note that using inplace computations destroys information, and may make it # Create the Scan op object
impossible to compute the gradient. local_op = Scan( (args,next_outs, updates), n_seqs,n_seeds,inplace_map,
As long as the function 'fn' does not update any of the other sequences_taps, outputs_taps, force_gradient, truncate_gradient,
parameters (w_1,..) a gradient of this operation is supported. go_backwards, mode)
***Who will care about this? Someone just using the Op? Someone writing an inplace
optimization?***
Ignored Inputs # Call the object on the input sequences, seeds, and non sequences
============== return local_op( *( [thenao.tensor.as_tensor(len)] \
+ seqs \
+ seeds \
+ non_seqs))
**** Behaviour? Rationale? Use case?
"""
@classmethod
def symbolic(cls,(in_args,out_args), n_ins, n_outs,\
n_inplace=0, n_inplace_ignore=0, taps={},
mode = 'FAST_RUN'):
# if in_args is not a list assume it is just a variable and
# convert it to a list (if this is neither the case the code will
# raise an error somewhere else !)
if not( type(in_args) in (list,tuple)):
in_args = [in_args]
# if out_args is not a list assume it is just a variable and
# convert it to a list
if not (type(out_args) in (list,tuple)):
out_args = [out_args]
# Create fn ''' The class implementing the scan op
my_fn = theano.compile.sandbox.pfunc(in_args, out_args, mode = mode)
# Create gradient function The actual class. I would not recommend using it directly unless you really
gy_next = [out_args[0].type()] know what you are doing'
g_inputs = theano.tensor.grad(out_args[0],in_args,g_cost=gy_next[-1]) '''
for y_next in out_args[1:] : class Scan(theano.Op):
gy_next +=[y_next.type()] def __init__(self,(inputs, outputs, updates),n_seqs, n_seeds,
g_ls = theano.tensor.grad(y_next,in_args,g_cost=gy_next[-1]) inplace_map={}, seqs_taps={}, outs_taps={},
for i in xrange(len(in_args)): force_gradient = False, truncate_gradient = -1,
g_inputs[i] += g_ls[i] go_backwards = False, inplace=False):
g_fn=theano.compile.sandbox.pfunc(gy_next+in_args,g_inputs, '''
mode=mode) :param inputs: list of symbolic inputs of the function that will
be applied recursively
:param outputs: list of symbolic outputs for the function applied
recursively
:param updates: list of updates for the function applied recursively
:param n_seqs: number of sequences in the input over which it needs
to iterate
:param n_seeds: number of outputs (same as the number of seeds)
:param inplace_map: dictionary discribing which output should be
computed inplace of which input
return cls(my_fn, g_fn, n_ins, n_outs,\ :param seqs_taps: dictionary discribing which past and future taps
n_inplace,n_inplace_ignore, taps) of the input sequences are used by the recursive function
@classmethod :param outs_taps: dictionary discribing which past taps of the
def compiled(cls,fn,n_ins, n_outs,\ outputs the recursive function is using
n_inplace=0, n_inplace_ignore=0, taps={}):
"""Return a Scan instance that will scan the callable `fn` over `n_ins` inputs and
`n_outs` outputs.
:param force_gradient: a flag indicating if the gradient is still
computable even though inplace operation or updates are used
""" :param truncate_gradient: if different from -1 it tells after how
return cls(fn, None, n_ins, n_outs, \ many steps in the backward pass of BPTT
n_inplace, n_inplace_ignore, taps= taps) '''
# check inplace map
for _out,_in in inplace_map.iteritems():
if _out > n_seeds:
raise ValueError(('Inplace map reffers to an unexisting'\
'output %d')% _out)
if _in > n_seqs:
raise ValueError(('Inplace map reffers to an unexisting'\
'input sequence %d')%_in)
if (_in >= 0) and (min(seqs_taps[_in]) < 0):
raise ValueError(('Input sequence %d uses past values that '\
'will be overwritten by inplace operation')%_in)
def __init__(self,fn,grad_fn,n_ins,n_outs,
n_inplace=0, n_inplace_ignore=0,
taps={}, inplace=False):
"""Create an instance of the scan class
To use Scan, first you need to create it specifying the number of inputs, outputs, #check sequences past taps
inplace outputs (see notes below), and inputs to be ignored, a dictionary describing for k,v in seqs_taps.map_iteritems():
the time taps used, the function that will be applied recursively and optionally, the if k > n_seqs:
gradient function (or a symbolic definition of the function and the op will compute the raise ValueError(('Sequences past taps dictionary reffers to '
gradient on its own). Secondly you just call the op with a list of parameters. 'an unexisting sequence %d')%k)
:param fn: compiled function that takes you from time step t-1 to t #check outputs past taps
for k,v in outs_taps.map_iteritems():
if k > n_seeds:
raise ValueError(('Sequences past taps dictionary reffers to '
'an unexisting sequence %d')%k)
if max(v) > -1:
raise ValueError(('Can not require future value %d of output'
'%d')%(k,max(v)))
:param grad_fn: gradient of the function applied recursevly
:param n_ins: number of inputs; in the list of arguments
they start from 0 to 'n_ins'
:param n_outs: number of outputs; in the list of arguments you
need to give the initial state of each outputs, this will be from
'n_ins' to 'n_outs'; each initial state should be a matrix where
the first dimension is time and should be sufficiently large to
cover the time taps. The matrix for an initial state should be
ordered such that if you use k delays, index 0 of matrix stands for
the value at time -k, index 1 for value at time 1-k, index 2 for
value at time 2-k and index k-1 for value at time -1
:param n_inplace: indicates the number of outputs that should be
computed inplace; in the list of arguments there will be the first
'n_inplace' outputs in place of the first 'n_inplace' inputs
:param n_inplace_ignore: indicates the number of inputs that are
given just to be replaced by the inplace computation and which
should not be given as arguments to the function applied
recursevly
:param taps: a dictionary which for each output index gives
a list of what taps it uses; a tap is given as an int,
where x stands for output(t - x); note that a past trace of 1 makes
no sense, since you get that by default
:param inplace: is used by the optimizer that allows the inplace
computation
"""
if n_ins < 1:
raise ValueError('Scan should iterate over at least on one input')
if n_outs <1:
raise ValueError('Scan should have at least one output')
if (n_inplace > n_ins):
raise ValueError('Number of inplace outputs should be smaller than '
'the number of inputs.')
if (n_inplace < 0):
raise ValueError('Number of inplace outputs should be larger '
'or equal to 0')
if (n_inplace_ignore > n_inplace):
raise ValueError('Number of inputs to ignore should not be '\
'larger than number of inplace outputs')
if (n_inplace_ignore < 0):
raise ValueError('n_inplace_ignore should be non-negative')
self.destroy_map = {} self.destroy_map = {}
if inplace: if inplace:
for i in xrange(n_inplace): self.destroy_map = inplace_map
self.destroy_map.update( {i:[i]} )
self.seqs_taps = seqs_taps
for (k,v) in taps.iteritems(): self.outs_taps = outs_taps
if k < 0 or k > n_outs: self.n_seqs = n_seqs
raise ValueError('Taps dictionary contains wrong key!') self.n_seeds = n_seeds
for vi in v: self.n_args = n_seqs+n_seeds+1
# why is it illegal to specify vi < 2? self.inplace_map = inplace_map
# what is special about vi == 1?
#
# Would it be simpler to just leave v alone if it is non-empty (checking that
# all vi are >=1) and set v = [1] for all missing output keys?
if vi < 2:
raise ValueError('Taps dictionary contains wrong values!')
self.taps = taps
self.n_ins = n_ins
self.n_outs = n_outs
self.n_inplace = n_inplace
self.inplace = inplace self.inplace = inplace
self.n_inplace_ignore = n_inplace_ignore self.inputs = inputs
self.fn = fn self.outputs = outputs
self.grad_fn = grad_fn self.updates = updates
self.force_gradient = force_gradient
self.truncate_gradient = truncate_gradient
self.go_backwards = go_backwards
self.fn = theano.function(inputs,outputs, \
updates = updates, mode = mode)
def make_node(self, *inputs): g_y = [outputs[0].type()]
"""Create an node for the Scan operation g_args = theano.tensor.grad(outputs[0],inputs, g_cost = g_y[-1])
# for all outputs compute gradients and then sum them up
for y in outputs[1:]:
g_y += [y.type()]
g_args_y = theano.tensor.grad(y,inputs, g_cost=g_y[-1])
for i in xrange(len(g_args)):
g_args[i] += g_args_y[i]
:param inputs: list of inputs for the operations; they should be
at least 'self.n_ins'+'self.n_outs' arguments; first 'self.n_inplace'
are inputs that are replaced inplace, followed by oter inputs up
to 'self.n_ins'; next 'self.n_outs' are ouputs followed by other
arguments that will be given to the function applied recursevly
"""
self.g_ins = g_y+inputs
self.g_outs = g_args
def make_node(self,*inputs):
n_args = len(inputs) n_args = len(inputs)
min_n_args = self.n_ins+self.n_outs if n_args < self.n_args :
if n_args < min_n_args: err = 'There should be at least '+str(self.n_args)+ 'arguments'
err = 'There should be at least '+str(min_n_args)+ 'arguments'
raise ValueError(err) raise ValueError(err)
# Create list of output datatypes # Create list of output datatypes
out_types = [] out_types = []
for i in xrange(self.n_ins,self.n_ins+self.n_outs): for i in xrange(self.n_seqs+1, self.n_seqs+self.n_seeds+1):
out_types += [theano.tensor.Tensor(dtype=inputs[i].dtype,\ out_types += [theano.tensor.Tensor(dtype=inputs[i].dtype,\
broadcastable=(False,)+inputs[i].broadcastable[1:])()] broadcastable=(False,)+inputs[i].broadcastable[1:])()]
return theano.Apply(self,inputs, out_types) return theano.Apply(self,inputs, out_types)
def __eq__(self,other): def __eq__(self,other):
rval = type(self) == type(other) rval = type(self) == type(other)
if rval: if rval:
rval = (self.fn is other.fn) and \ rval = (self.inputs == other.inputs) and \
(self.grad_fn is other.grad_fn) and \ (self.outputs == other.outputs) and \
(self.n_ins == other.n_ins) and \ (self.updates == other.updates) and \
(self.n_outs == other.n_outs) and \ (self.g_ins == other.g_ins) and \
(self.n_inplace == other.n_inplace) and \ (self.g_outs == other.g_outs) and \
(self.n_inplace_ignore == other.n_inplace_ignore) and\ (self.seqs_taps == other.seqs_taps) and \
(self.outs_taps == other.outs_taps) and \
(self.inplace_map == other.inplace_map) and \
(self.n_seqs == other.n_seqs) and\
(self.inplace == other.inplace) and\ (self.inplace == other.inplace) and\
(self.taps == other.taps) (self.go_backwards == other.go_backwards) and\
(self.truncate_gradient == other.truncate_gradient) and\
(self.force_gradient = other.force_gradient) and\
(self.n_seeds == other.n_seeds) and\
(self.n_args == other.n_args)
return rval return rval
def __hash__(self): def __hash__(self):
# hash the taps dictionary
taps_hash = 0
for k,v in self.taps.iteritems():
taps_hash ^= k
for vi in v :
taps_hash ^= vi
return hash(type(self)) ^ \ return hash(type(self)) ^ \
hash(self.fn) ^ \ hash(self.n_seqs) ^ \
hash(self.grad_fn) ^ \ hash(self.n_seeds) ^ \
hash(self.n_ins) ^ \ hash(self.force_gradient) ^\
hash(self.n_outs) ^ \
hash(self.n_inplace) ^ \
hash(self.n_inplace_ignore) ^\
hash(self.inplace) ^\ hash(self.inplace) ^\
taps_hash hash(self.go_backwards) ^\
hash(self.truncate_gradient) ^\
hash(self.n_args) ^ \
hash_list(self.outputs) ^ \
hash_list(self.inputs) ^ \
hash_list(g_ins) ^ \
hash_list(h_outs) ^ \
hash_dict(self.seqs_taps) ^\
hash_dict(self.outs_taps) ^\
hash_dict(self.inplace_map) ^\
hash_dict(self.updates)
def grad(self, inputs, g_outs):
if self.grad_fn == None: def perform(self,node,args, outs):
print 'Warning! no gradient for the recursive function was given'
return [None for i in inputs]
else:
y = self(*inputs)
if not( type(y) in (list,tuple)):
y = [y]
for i in xrange(len(y)): n_steps = 0
if g_outs[i] == None: if (self.n_seqs ==0 ) and (args[0] == 0)
g_outs[i] = theano.tensor.zeros_like(y[i]) raise ValueError('Scan does not know over how many steps it '
'should iterate! No input sequence or number of steps to '
'iterate given !')
if (args[0] != 0):
n_steps = args[0]
for i in xrange(self.n_seqs):
if self.seqs_taps.has_key(i):
# compute actual length of the sequence ( we need to see what
# past taps this sequence has, and leave room for them
seq_len = args[i+1].shape[0] + min(self.seqs_taps[i+1])
if self.seqs_taps[i+1][2] > 0:
# using future values, so need to end the sequence earlier
seq_len -= self.seqs_taps[i+1][2]
if n_steps == 0 :
# length of the sequences, leaving room for the largest
n_steps = seq_len
if seq_len != n_steps :
warning(('Input sequence %d has a shorter length then the '
'expected number of steps %d')%(i,n_steps))
n_steps = min(seq_len,n_steps)
# check if we deal with an inplace operation
inplace_map = self.inplace_map
if not self.inplace: #if it was not optimized to work inplace
inplace_map = {}
# Construct my gradient class:
gradScan = ScanGrad(self.grad_fn,
self.n_ins- self.n_inplace_ignore, self.n_outs,
self.taps)
# check lengths of seeds
for i in xrange(self.n_seqs+1, \
self.n_seqs+self.n_seeds+1):
if self.outs_taps.has_key(i-self.n_seqs-1):
req_size = abs(min(self.outs_taps[i-self.n_seqs-1]))-1
if args[i].shape[0] < req_size:
warning(('Initial state for output %d has fewer values then '
'required by the maximal past value %d. Scan will use 0s'
' for missing values')%(i-self.n_iterable-1,req_size))
args = g_outs + y + \ self.n_steps = n_steps
inputs[self.n_inplace_ignore:] y = self.scan(self.fn, args[1:],self.n_seqs, self.n_seeds,
self.seqs_taps, self.outs_taps, n_steps, self.go_backwards,
inplace_map)
grads = gradScan(*args)
rval = [None for i in inputs[:self.n_inplace_ignore]]+grads
return rval
# write to storage
for i in xrange(self.n_seeds):
outs[i][0]=y[i]
def perform(self,node,args, outs):
# find number of timesteps, note that a precondition is to have
# atleast one input to iterate over
n_steps = len(args[0])
# check if we deal with a inplace operation def scan(fn, args, n_seqs, n_seeds, seqs_taps, outs_taps, n_steps,
n_inplace = self.n_inplace go_backwards, inplace_map):
n_inplace_ignore = self.n_inplace_ignore y = []
if not self.inplace: #if it was not optimized to work inplace for i in xrange(self.n_seeds):
n_inplace = 0 if inplace_map.has_key(i) and (inplace_map[i] >= 0):
y += [args[inplace_map[i]]]
else:
y_shape = (n_steps,)+args[i+self.n_seqs].shape[1:]
y += [numpy.empty(y_shape,
dtype=args[i+self.n_seqs].dtype)]
#iterate
if go_backwards:
the_range = xrange(n_steps-1,-1,-1)
else:
the_range = xrange(n_steps)
seqs_mins = {}
for j in xrange(self.n_seqs):
if seqs_taps.has_key(j):
seqs_mins.update({j: min(seqs_taps[j])})
# check lengths of inputs outs_mins = {}
for i in xrange(self.n_ins): seed_size = {}
if args[i].shape[0] != n_steps: for j in xrange(self.n_seeds):
raise ValueError('All inputs should have n_steps length!') if outs_taps.has_key(j):
outs_mins.update({j: min(outs_taps[j])})
seed_size.update({j: args[n_seqs+j].shape[0]})
# check lengths of initial states
for i in xrange(self.n_ins, self.n_ins+self.n_outs):
req_size = 1
if self.taps.has_key(i- self.n_ins):
req_size = max(self.taps[i-self.n_ins])
if len(args[i].shape) == 0:
raise ValueError('Wrong initial state! ')
if args[i].shape[0] < req_size:
raise ValueError('Wrong initial state! ')
# allocate space for the outputs for i in the_range:
y = []
# inplace outputs
for i in xrange(n_inplace):
y += [args[i]]
# add outputs
for i in xrange(self.n_ins+n_inplace,self.n_ins+self.n_outs):
y_shape = (n_steps,)+args[i].shape[1:]
y += [numpy.empty(y_shape, dtype = args[i].dtype)]
# iterate
for i in xrange(n_steps):
fn_args = [] fn_args = []
# get a time slice of inputs
for j in xrange(n_inplace_ignore, self.n_ins):
fn_args += [args[j][i]]
# get past values of outputs (t-1 + taps) # sequences over which scan iterates
for j in xrange(self.n_outs): for j in xrange(self.n_seqs):
# get list of taps if seqs_taps.has_key(j):
ls_taps = [1] ls_taps = seqs_taps[j]
if self.taps.has_key(j): min_tap = seqs_mins[j]
ls_taps += self.taps[j] for tap_value in ls_taps:
maxVal = max(ls_taps) k = i - min_tap + tap_value
fn_args += [args[j][k]]
# seeds or past values of outputs
for j in xrange(self.n_seeds):
if outs_taps.has_key(j):
ls_taps = outs_taps[j]
min_tap = outs_mins[j]
seed_sz = seed_size[j]
for tap_value in ls_taps: for tap_value in ls_taps:
if i - tap_value < 0: if i + tap_value < 0:
fn_args += [args[j+self.n_ins][maxVal-tap_value+i]] k = i + seed_sz + tap_value
if k < 0
# past value not provided.. issue a warning and use 0s
fn_args += [numpy.zeros(args[j][0].shape)]
warning('Past value %d for output %d not given in seeds' %
(j,tap_value))
else: else:
fn_args += [y[j][i-tap_value]] fn_args += [args[j][k]]
else:
fn_args += [y[j][i + tap_value]]
# get the none iterable parameters # get the non-iterable sequences
fn_args += list(args[(self.n_ins+self.n_outs):]) fn_args += list(args[(self.n_seqs+self.n_seedss):]
# compute output # compute output
something = self.fn(*fn_args) something = fn(*fn_args)
# update y and inplace outputs #update outputs
for j in xrange(self.n_outs): for j in xrange(self.n_seeds):
y[j][i] = something[j] y[j][i] = something[j]
return y
# write to storage
for i in xrange(self.n_outs): def grad(self, args, g_outs):
outs[i][0]=y[i] if (not self.force_gradient) and \
((self.updates.keys() != []) or (self.inplace_map.keys() != [])):
warning('Can not compute gradients if inplace or updates ' \
'are used. Use force_gradient if you know for sure '\
'that the gradient can be computed automatically.')
return [None for i in inputs]
else:
# forward pass
y = self(*args)
if not( type(y) in (list,tuple)):
y = [y]
# backwards pass
for i in xrange(len(y)):
if g_outs[i] == None:
g_outs[i] = theano.tensor.zeros_like(y[i])
g_args = [self.n_steps]+g_outs + y
# check if go_backwards is true
if self.go_backwards:
for seq in args[1:self.n_seqs]:
g_args += [seq[::-1]]
else:
g_args += args[1:self.n_seqs]
g_args += args[1+self.n_seqs: ]
g_scan = ScanGrad((self.g_ins,self.g_outs), self.n_seqs, \
self.n_seeds,self.seqs_taps, self.outs_taps,
self.truncate_gradient)
return g_scan(g_args)
@gof.local_optimizer([None]) @gof.local_optimizer([None])
def scan_make_inplace(node): def scan_make_inplace(node):
op = node.op op = node.op
if isinstance(op, Scan) and (not op.inplace) and (op.n_inplace>0): if isinstance(op, Scan) and (not op.inplace) \
return Scan(op.fn, op.grad_fn, op.n_ins,\ and (op.inplace_map.keys() != []):
op.n_outs, op.n_inplace, op.n_inplace_ignore,\ return Scan((op.inputs, op.outputs, op.updates), op.n_seqs, \
op.taps,inplace=True\ op.n_seeds, op.inplace_map, op.seqs_taps, op.outs_taps, \
op.force_gradient, op.truncate_gradient, \
op.go_backwards, inplace=True \
).make_node(*node.inputs).outputs ).make_node(*node.inputs).outputs
return False return False
optdb.register('scan_make_inplace', opt.in2out(scan_make_inplace,\ optdb.register('scan_make_inplace', opt.in2out(scan_make_inplace,\
ignore_newtrees=True), 75, 'fast_run', 'inplace') ignore_newtrees=True), 75, 'fast_run', 'inplace')
...@@ -428,144 +587,160 @@ optdb.register('scan_make_inplace', opt.in2out(scan_make_inplace,\ ...@@ -428,144 +587,160 @@ optdb.register('scan_make_inplace', opt.in2out(scan_make_inplace,\
class ScanGrad(theano.Op): class ScanGrad(theano.Op):
"""Gradient Op for Scan""" """Gradient Op for Scan"""
def __init__(self,(g_ins, g_outs) , n_seqs, n_outs,
def __init__(self, grad_fn, n_ins, n_outs, seqs_taps = {}, outs_taps= {}, truncate_gradient = -1):
taps = {},inplace=False): self.grad_fn = theano.function(g_ins, g_outs)
self.grad_fn = grad_fn self.inputs = g_ins
self.n_ins = n_ins # number of inputs of Scan op not of Grad Scan !! self.outputs = g_outs
self.n_outs = n_outs # number of outs of Scan op not of Grad Scan !! self.n_seqs = n_seqs
self.inplace = inplace self.truncate_gradient = truncate_gradient
self.taps = taps self.n_outs = n_outs
self.seqs_taps = seqs_taps
self.outs_taps = outs_taps
self.destroy_map = {} self.destroy_map = {}
if self.inplace:
for i in xrange(self.n_outs):
# claiming that output "-i" is destroying inputs is the way to
# declare that no real output is aliased to any inputs. We just
# trash the inputs by using them as workspace.
self.destroy_map.update( {-i:[i]})
def __eq__(self,other): def __eq__(self,other):
rval = type(self) == type(other) rval = type(self) == type(other)
if rval: if rval:
rval = (self.grad_fn is other.grad_fn) and \ rval = (self.inputs == other.inputs) and \
(self.n_ins == other.n_ins) and \ (self.outputs == other.outputs) and \
(self.n_seqs == other.n_seqs) and \
(self.n_outs == other.n_outs) and \ (self.n_outs == other.n_outs) and \
(self.inplace == other.inplace) and \ (self.truncate_gradient == other.truncate_gradient) and\
(self.taps == other.taps) (self.seqs_taps == other.seqs_taps) and \
(self.outs_taps == other.outs_taps)
return rval return rval
def __hash__(self): def __hash__(self):
taps_hash = 0
for k,v in self.taps.iteritems():
taps_hash ^= k
for vi in v :
taps_hash ^= vi
return hash(type(self)) ^ \ return hash(type(self)) ^ \
hash(self.grad_fn) ^ \ hash(self.n_seqs) ^ \
hash(self.n_ins) ^ \
hash(self.n_outs) ^ \ hash(self.n_outs) ^ \
hash(self.inplace) ^ taps_hash hash(self.truncate_gradient) ^\
hash_list(self.inputs) ^ \
hash_list(self.outputs) ^ \
hash_dict(self.seqs_taps) ^ \
hash_dict(self.outs_taps)
def make_node(self, *args): def make_node(self, *args):
# input of the gradient op : # input of the gradient op :
# | g_outs | y | ins | outs | other_args | # | g_outs | y | seqs | outs | non_seqs |
# | n_outs | n_outs | n_ins | n_outs | unknown | # | n_outs | n_outs | n_seqs | n_outs | unknown |
# return # return
# | grad of ins | grad of outs | grad of other_args| # | grad of seqs | grad of outs | grad of non_seqs |
# | n_ins | n_outs | unknown | # | n_seqs | n_outs | unknown |
return theano.Apply(self, list(args), return theano.Apply(self, list(args),
[i.type() for i in args[self.n_outs+self.n_outs:] ]) [i.type() for i in args[1+2*self.n_outs:] ])
def perform(self, node, args, storage): def perform(self, node, args, storage):
# get scan inputs # get scan inputs
inputs = args[self.n_outs+self.n_outs:] n_steps = args[0]
ins = inputs[:self.n_ins] inputs = args[2*self.n_outs+1:]
initSt = inputs[self.n_ins:self.n_ins+self.n_outs] seqs = inputs[:self.n_seqs]
otherArgs = inputs[self.n_outs+self.n_ins:] seeds = inputs[self.n_seqs:self.n_seqs+self.n_outs]
non_seqs = inputs[self.n_outs+self.n_seqs:]
# generate space for gradient # generate space for gradient
# not do if inplace !? g_seqs = [numpy.zeros_like(k) for k in seqs]
g_ins = [numpy.zeros_like(k) for k in ins] g_seeds = [numpy.zeros_like(k) for k in seeds]
g_initSt = [numpy.zeros_like(k) for k in initSt] g_non_seqs = [numpy.zeros_like(k) for k in non_seqs]
g_otherArgs = [numpy.zeros_like(k) for k in otherArgs]
# get gradient from above # get gradient from above
g_outs = args[:self.n_outs] g_outs = args[:self.n_outs]
# we modify g_outs inplace ..
if not self.inplace:
g_outs = [gout.copy() for gout in g_outs]
# get the output of the scan operation # get the output of the scan operation
outs = args[self.n_outs:2*self.n_outs] outs = args[self.n_outs:2*self.n_outs]
# check for Nones (non - differentiable )
#for i,g_o in enumerate(g_outs):
# if numpy.all(g_o == 0.):
# g_outs[i] = numpy.zeros_like(outs[i])
# go back through time to 0 (use a time window !?) # go back through time to 0 or n_steps - truncate_gradient
for i in xrange(len(ins[0])-1,-1,-1): lower_limit = n_steps - self.truncate_gradient
if lower_limit > n_steps-1:
the_range = xrange(n_steps-1,-1,-1)
elif lower_limit < -1:
the_range = xrange(n_steps-1,-1,-1)
else:
the_range = xrange(n_steps-1, lower_limit,-1)
seqs_mins = {}
for j in xrange(self.n_seqs):
if self.seqs_taps.has_key(j):
seqs_mins.update({j: min(self.seqs_taps[j])})
outs_mins = {}
seed_size = {}
for j in xrange(self.n_outs):
if self.outs_taps.has_key(j):
outs_mins.update({j: min(self.outs_taps[j])})
seed_size.update({j: g_seeds[j]..shape[0]})
for i in the_range:
# time slice of inputs # time slice of inputs
_ins = [arg[i] for arg in ins] _ins = []
for j in xrange(self.n_seqs)
if self.seqs_taps.has_key(j):
ls_taps = self.seqs_taps[j]
min_tap = seqs_mins[j]
for tap_value in ls_taps:
k = i - min_tap + tap_value
_ins += [ins[j][k]]
# time slice of outputs + taps # time slice of outputs + taps
_outs = [] _outs = []
for j in xrange(self.n_outs): for j in xrange(self.n_outs):
ls_taps = [1] if self.outs_taps.has_key(j):
if self.taps.has_key(j): ls_taps = self.outs_taps[j]
ls_taps += self.taps[j] min_tap = outs_mins[j]
maxVal = max(ls_taps) seed_sz = seed_size[j]
for tap_value in ls_taps: for tap_value in ls_taps:
if i - tap_value < 0: if i + tap_value < 0:
_outs += [initSt[j][maxVal-tap_value+i]] k = i + seed_sz + tap_value
if k < 0 :
#past value not provided .. issue a warning and use 0
_outs += [numpy.zeros(seeds[j][0].shape)]
warning('Past value %d for output $d not given' \
%(j,tap_value))
else: else:
_outs += [outs[j][i- tap_value]] _outs += [seeds[j][[k]]
else:
_outs += [outs[j][i + tap_value]]
g_out = [arg[i] for arg in g_outs] g_out = [arg[i] for arg in g_outs]
grad_args = g_out + _ins + _outs + otherArgs grad_args = g_out + _ins + _outs + non_seqs
grads=self.grad_fn(*grad_args) grads=self.grad_fn(*grad_args)
# get gradient for inputs # get gradient for inputs
for j in xrange(self.n_ins): pos = 0
g_ins[j][i] = grads[j] for j in xrange(self.n_seqs):
if self.seqs_taps.has_key(j):
ls_taps = self.seqs_taps[j]
min_tap = seqs_mins[j]
for tap_value in ls_taps :
k = i - min_tap + tap_value
g_ins[j][k] += grads[pos]
pos += 1
# get gradient for outputs # get gradient for outputs
pos = self.n_ins
for j in xrange(self.n_outs): for j in xrange(self.n_outs):
ls_taps = [1] if self.outs_taps.has_key(j):
if self.taps.has_key(j): ls_taps = self.outs_taps[j]
ls_taps += self.taps[j] min_tap = outs_mins[j]
maxVal = max(ls_taps) seed_sz = seed_size[j]
for tap_value in ls_taps: for tap_value in ls_taps:
if i - tap_value < 0: if i+tap_value < 0 :
g_initSt[j][maxVal-tap_value+i] += grads[pos] k = i + seed_sz + tap_value
pos +=1 if k > 0 :
else: g_seeds[j][k] += grads[pos]
g_outs[j][i-tap_value]+= grads[pos]
pos += 1 pos += 1
for j in xrange(len(g_otherArgs)): for j in xrange(len(g_non_seqs)):
g_otherArgs[j] += grads[j+pos] g_non_seqs[j] += grads[j+pos]
# return the gradient
for i in xrange(len(g_ins)):
storage[i][0] = g_ins[i]
for i in xrange(len(g_initSt)):
storage[i+self.n_ins][0] = g_initSt[i]
for i in xrange(len(g_otherArgs)):
storage[i+self.n_ins+self.n_outs][0] = g_otherArgs[i]
# return the gradient
@gof.local_optimizer([None]) for i,v in enumerate(g_ins + g_seeds+ g_non_seqs):
def grad_scan_make_inplace(node): storage[i][0] = v
op = node.op
if isinstance(op, ScanGrad) and (not op.inplace):
return ScanGrad(op.grad_fn, op.n_ins, op.n_outs, op.taps,
inplace=True).make_node(*node.inputs).outputs
return False
optdb.register('grad_scan_make_inplace', opt.in2out(grad_scan_make_inplace,\
ignore_newtrees=True), 75, 'fast_run', 'inplace')
...@@ -7,8 +7,6 @@ import random ...@@ -7,8 +7,6 @@ import random
import numpy.random import numpy.random
from theano.tests import unittest_tools as utt from theano.tests import unittest_tools as utt
def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None, def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None,
mode = None, cast_to_output_type = False): mode = None, cast_to_output_type = False):
pt = [numpy.array(p) for p in pt] pt = [numpy.array(p) for p in pt]
...@@ -75,455 +73,21 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None, ...@@ -75,455 +73,21 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None,
class T_Scan(unittest.TestCase):
def setUp(self):
utt.seed_rng()
x_1 = theano.tensor.dscalar('x_1')
self.my_f = theano.function([x_1],[x_1]) #dummy function
# Naming convention :
# u_1,u_2,.. -> inputs, arrays to iterate over
# x_1,x_2,.. -> outputs at t-1 that are required in the recurrent
# computation
# iu_1,iu_2,.. -> inplace inputs, inputs that are being replaced by
# outputs during computation
# du_1,du_2,.. -> dummy inputs used to do inplace computation, they
# are not passed to my_f
# ix_1,ix_2,.. -> inplace outputs at t-1
# x_1_next,.. -> outputs at t
# ix_1_next,.. -> inplace outputs at time t
# w_1,w_2,.. -> weights, paramters over which scan does not iterate
# my_f -> compiled function that will be applied recurrently
# my_op -> operator class
# final_f -> compiled function that applies the Scan operation
# out_1,.. -> outputs of the Scan operation
###################################################################
def test_numberOfIterableInputs(self):
def t1():
my_op = Scan.compiled(self.my_f,-1,1)
def t2():
my_op = Scan.compiled(self.my_f,0,1)
self.failUnlessRaises(ValueError,t1)
self.failUnlessRaises(ValueError,t2)
###################################################################
def test_numberOfOutputs(self):
def t1():
my_op = Scan.compiled(self.my_f,1,-1)
def t2():
my_op = Scan.compiled(self.my_f,1,0)
self.failUnlessRaises(ValueError,t1)
self.failUnlessRaises(ValueError,t2)
#####################################################################
def test_numberOfInplaceOutputs(self):
def t1():
my_op =Scan.compiled(self.my_f,1,1,n_inplace = -1)
def t2():
my_op =Scan.compiled(self.my_f,1,1,n_inplace = 2)
def t3():
my_op =Scan.compiled(self.my_f,2,1,n_inplace=2)
def t4():
my_op =Scan.compiled(self.my_f,1,2,n_inplace=2)
def t5():
my_op =Scan.compiled(self.my_f,1,1,n_inplace=1,n_inplace_ignore=2)
self.failUnlessRaises(ValueError,t1)
self.failUnlessRaises(ValueError,t2)
self.failUnlessRaises(ValueError,t3)
self.failUnlessRaises(ValueError,t4)
self.failUnlessRaises(ValueError,t5)
#####################################################################
def test_taps(self):
def t1():
my_op = Scan.compiled(self.my_f,1,1, taps={2:[3]})
def t2():
my_op = Scan.compiled(self.my_f,1,2, taps={0:[0]})
def t3():
my_op = Scan.compiled(self.my_f,1,2, taps={0:[1]})
self.failUnlessRaises(ValueError,t1)
self.failUnlessRaises(ValueError,t2)
self.failUnlessRaises(ValueError,t3)
#####################################################################
def test_makeNode(self):
def t1():
######### Test inputs of different lengths
# define the function that is applied recurrently
u_1 = theano.tensor.dscalar('u_1')
u_2 = theano.tensor.dscalar('u_2')
x_1 = theano.tensor.dscalar('x_1')
x_1_next = u_1+u_2*x_1
my_f = theano.function([u_1,u_2,x_1],[x_1_next])
# define the function that applies the scan operation
my_op = Scan.compiled(my_f,2,1)
u_1 = theano.tensor.dvector('u_1')
u_2 = theano.tensor.dvector('u_2')
x_1 = theano.tensor.dvector('x_1')
x_1_next = my_op(u_1,u_2,x_1)
final_f = theano.function([u_1,u_2,x_1],[x_1_next])
# test the function final_f
u_1 = numpy.random.rand(3)
u_2 = numpy.random.rand(2)
x_1 = [numpy.random.rand()]
out = final_f(u_1,u_2,x_1)
def t2():
######### Test function does not return correct number of outputs
# define the function that is applied recurrently
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
x_1_next = u_1 * x_1
my_f = theano.function([u_1,x_1],[x_1_next])
# define the function that applies the scan operation
my_op = Scan.compiled(my_f,1,2)
u_1 = theano.tensor.dvector('u_1')
x_1 = theano.tensor.dvector('x_1')
x_2 = theano.tensor.dvector('x_2')
x_1_next,x_2_next = my_op(u_1,x_1,x_2)
final_f = theano.function([u_1,x_1,x_2],[x_1_next,x_2_next])
#generate data
u_1 = numpy.random.rand(3)
x_1 = [numpy.random.rand()]
x_2 = [numpy.random.rand()]
out_1,out_2 = final_f(u_1,x_1,x_2)
self.failUnlessRaises(ValueError,t1)
self.failUnlessRaises(TypeError,t2)
#####################################################################
def test_generator(self):
# compile my_f
u_1 = theano.tensor.dscalar('u_1') # dummy input,
# required if no inplace is used!
x_1 = theano.tensor.dscalar('x_1')
w_1 = theano.tensor.dscalar('w_1')
x_1_next = x_1*w_1
my_f = theano.function([u_1,x_1,w_1],[x_1_next])
# create operation
my_op = Scan.compiled(my_f,1,1)
u_1 = theano.tensor.dvector('u_1') # dummy input, there is no
#inplace, so output will not be put in place of this u_1!
x_1 = theano.tensor.dvector('x_1')
w_1 = theano.tensor.dscalar('w_1')
x_1_next = my_op(u_1,x_1,w_1)
final_f = theano.function([u_1,x_1,w_1],[x_1_next])
#generate data
x_1 = numpy.ndarray(3) # dummy input, just tells for how many time
# steps to run recursively
out_1 = final_f(x_1,[2],2)
self.failUnless(numpy.all(out_1 == numpy.asarray([4,8,16])))
#####################################################################
def test_generator_inplace_no_ignore(self):
# compile my_f
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
w_1 = theano.tensor.dscalar('w_1')
x_1_next = x_1*w_1
my_f = theano.function([u_1,x_1,w_1],[x_1_next])
# create operation
my_op = Scan.compiled(my_f,1,1,n_inplace=1)
iu_1 = theano.tensor.dvector('iu_1')
ix_1 = theano.tensor.dvector('ix_1')
w_1 = theano.tensor.dscalar('w_1')
ix_1_next= my_op(iu_1,ix_1,w_1)
final_f = theano.function([theano.In(iu_1, mutable=True),ix_1,w_1],
[ix_1_next], mode='FAST_RUN')
#generate data
iu_1 = numpy.ndarray(3)
out_1 = final_f(iu_1,[2],2)
# not concretely implemented yet ..
self.failUnless(numpy.all(out_1 == numpy.asarray([4,8,16])))
self.failUnless(numpy.all(out_1 == iu_1))
#####################################################################
def test_generator_inplace_no_ignore_2states(self):
# compile my_f
u_1 = theano.tensor.dscalar('u_1')
u_2 = theano.tensor.dscalar('u_2')
x_1 = theano.tensor.dscalar('x_1')
x_2 = theano.tensor.dscalar('x_2')
w_1 = theano.tensor.dscalar('w_1')
x_1_next = x_1*w_1
x_2_next = x_2*w_1
my_f = theano.function([u_1,u_2,x_1,x_2,w_1],[x_1_next,x_2_next])
# create operation
my_op = Scan.compiled(my_f,2,2,n_inplace=2)
iu_1 = theano.tensor.dvector('iu_1')
iu_2 = theano.tensor.dvector('iu_2')
ix_1 = theano.tensor.dvector('ix_1')
ix_2 = theano.tensor.dvector('ix_2')
w_1 = theano.tensor.dscalar('w_1')
ix_1_next,ix_2_next= my_op(iu_1,iu_2,ix_1,ix_2,w_1)
final_f = theano.function([theano.In(iu_1, mutable=True),
theano.In(iu_2, mutable=True),ix_1,ix_2,
w_1],[ix_1_next,ix_2_next], mode='FAST_RUN')
#generate data # Naming convention :
iu_1 = numpy.ndarray(3) # u_1,u_2,.. -> sequences
iu_2 = numpy.ndarray(3) # s_1,s_2,.. -> initial states
out_1,out_2 = final_f(iu_1,iu_2,[2],[1],2) # w_1,w_2,.. -> non-sequences
# not concretely implemented yet .. ###################################
self.failUnless(numpy.all(out_1 == numpy.asarray([4,8,16])))
self.failUnless(numpy.all(out_1 == iu_1))
self.failUnless(numpy.all(out_2 == numpy.asarray([2,4,8])))
self.failUnless(numpy.all(out_2 == iu_2))
####################################################################### class T_Scan(unittest.TestCase):
def test_generator_inplace(self): def setUp(self):
#compile my_f utt.seed_rng()
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
x_2 = theano.tensor.dscalar('x_2')
x_1_next = u_1 + x_1
x_2_next = x_1 * x_2
my_f = theano.function([u_1,x_1,x_2],[x_1_next,x_2_next])
# create operation
my_op = Scan.compiled(my_f,2,2,n_inplace=2,n_inplace_ignore=1)
du_1 = theano.tensor.dvector('du_1')
iu_1 = theano.tensor.dvector('iu_1')
ix_1 = theano.tensor.dvector('ix_1')
ix_2 = theano.tensor.dvector('ix_2')
ix_1_next,ix_2_next = my_op(du_1,iu_1,ix_1,ix_2)
final_f=theano.function([theano.In(du_1, mutable = True),
theano.In(iu_1, mutable = True),
ix_1,ix_2],[ix_1_next,ix_2_next],mode='FAST_RUN')
# generate data
du_1 = numpy.asarray([0.,0.,0.])
iu_1 = numpy.asarray([1.,1.,1.])
ix_1 = [1]
ix_2 = [1]
out_1,out_2 = final_f(du_1,iu_1,ix_1,ix_2)
self.failUnless(numpy.all(out_1 == numpy.asarray([2,3,4])))
self.failUnless(numpy.all(out_2 == numpy.asarray([1,2,6])))
self.failUnless(numpy.all(out_1 == du_1))
self.failUnless(numpy.all(out_2 == iu_1))
#####################################################################
def tets_iterateOnlyOverX(self):
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
x_1_next = u_1*x_1
my_f = theano.function([u_1,x_1],[x_1_next])
my_op = Scan.compiled(my_f,1,1)
u_1 = theano.tensor.dvector('u_1')
x_1 = theano.tensor.dvector('x_1')
x_1_next = my_op(u_1,x_1)
final_f = theano.function([x_1,u_1],[x_1_next])
u_1 = numpy.asarray([2,2,2])
out_1 = final_f(inp,2)
self.failUnless(numpy.all(out_1==numpy.asarray([4,8,16])))
#####################################################################
def test_iterateOverSeveralInputs(self):
u_1 = theano.tensor.dscalar('u_1') # input 1
u_2 = theano.tensor.dscalar('u_2') # input 2
x_1 = theano.tensor.dscalar('x_1') # output
x_1_next = (u_1+u_2)*x_1
my_f = theano.function([u_1,u_2,x_1],[x_1_next])
my_op = Scan.compiled(my_f,2,1)
u_1 = theano.tensor.dvector('u_1')
u_2 = theano.tensor.dvector('u_2')
x_1 = theano.tensor.dvector('x_1')
x_1_next = my_op(u_1,u_2,x_1)
final_f = theano.function([u_1,u_2,x_1],[x_1_next])
u_1 = numpy.asarray([1,1,1])
u_2 = numpy.asarray([1,1,1])
x_1 = [2]
out_1 = final_f(u_1,u_2,x_1)
self.failUnless(numpy.all(out_1==numpy.asarray([4,8,16])))
#####################################################################
def test_iterateOverSeveralInputsSeveralInplace(self):
iu_1 = theano.tensor.dscalar('iu_1')
u_1 = theano.tensor.dscalar('u_1')
u_2 = theano.tensor.dscalar('u_2')
u_3 = theano.tensor.dscalar('u_3')
u_4 = theano.tensor.dscalar('u_4')
ix_1 = theano.tensor.dscalar('ix_1')
ix_2 = theano.tensor.dscalar('ix_2')
x_1 = theano.tensor.dscalar('x_1')
w_1 = theano.tensor.dscalar('w_1')
ix_1_next = u_3 + u_4
ix_2_next = ix_1 + ix_2
x_1_next = x_1 + u_3 + u_4 + ix_1 + ix_2
my_f = theano.function([iu_1,u_1,u_2,u_3,u_4,ix_1,ix_2,x_1,w_1],\
[ix_1_next,ix_2_next, x_1_next])
my_op = Scan.compiled(my_f,6,3, n_inplace=2,\
n_inplace_ignore=1)
du_1 = theano.tensor.dvector('du_1')
iu_1 = theano.tensor.dvector('iu_1')
u_1 = theano.tensor.dvector('u_1')
u_2 = theano.tensor.dvector('u_2')
u_3 = theano.tensor.dvector('u_3')
u_4 = theano.tensor.dvector('u_4')
x_1 = theano.tensor.dvector('x_1')
ix_1 = theano.tensor.dvector('ix_1')
ix_2 = theano.tensor.dvector('ix_2')
w_1 = theano.tensor.dscalar('w_1')
[ix_1_next,ix_2_next,x_1_next]= \
my_op(du_1,iu_1,u_1,u_2,u_3,u_4,x_1,ix_1,ix_2,w_1)
final_f=theano.function([theano.In(du_1, mutable = True),
theano.In(iu_1, mutable = True),
u_1,u_2,u_3,u_4,ix_1,ix_2,x_1,w_1],
[ix_1_next,ix_2_next,
x_1_next],mode='FAST_RUN')
#generate data
du_1 = numpy.asarray([0.,0.,0.])
iu_1 = numpy.asarray([0.,1.,2.])
u_1 = numpy.asarray([1.,2.,3.])
u_2 = numpy.asarray([1.,1.,1.])
u_3 = numpy.asarray([2.,2.,2.])
u_4 = numpy.asarray([3.,2.,1.])
x_1 = [1.]
ix_1 = [1.]
ix_2 = [1.]
w_1 = 2.
out_1,out_2,out_3 = final_f(du_1,iu_1,u_1,u_2,u_3,u_4,\
ix_1,ix_2,x_1,w_1)
self.failUnless(numpy.all(out_3 == numpy.asarray([8.,19.,33.])))
self.failUnless(numpy.all(out_1 == numpy.asarray([5.,4.,3.])))
self.failUnless(numpy.all(out_2 == numpy.asarray([2.,7.,11.])))
self.failUnless(numpy.all(out_1 == du_1))
self.failUnless(numpy.all(out_2 == iu_1))
#####################################################################
def test_computeInPlaceArguments(self):
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
w_1 = theano.tensor.dscalar('w_1')
x_1_next = u_1*w_1+x_1
my_f = theano.function([u_1,x_1,theano.In(w_1,update=w_1*2)],
[x_1_next])
my_op = Scan.compiled(my_f,1,1)
u_1 = theano.tensor.dvector('u_1')
x_1 = theano.tensor.dvector('x_1')
w_1 = theano.tensor.dscalar('w_1')
x_1_next = my_op(u_1,x_1,w_1)
final_f = theano.function([u_1,x_1,w_1], [x_1_next])
u_1 = [1.,1.,1.]
x_1 = [1.]
w_1 = 1.
out_1 = final_f(u_1,x_1,w_1)
self.failUnless(numpy.all(out_1 == numpy.asarray([2,4,8])))
#####################################################################
def test_timeTaps(self):
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
x_1_t2 = theano.tensor.dscalar('x_1_t2')
x_1_t4 = theano.tensor.dscalar('x_1_t4')
x_1_next = u_1+x_1+x_1_t2+x_1_t4
my_f = theano.function([u_1,x_1,x_1_t2,x_1_t4],[x_1_next])
my_op = Scan.compiled(my_f,1,1,taps={0:[2,4]})
u_1 = theano.tensor.dvector('u_1')
x_1 = theano.tensor.dvector('x_1')
x_1_next = my_op(u_1,x_1)
final_f = theano.function([u_1,x_1],[x_1_next])
u_1 = [1.,1.,1.,1.,1.]
x_1 = [1.,2.,3.,4.]
out_1 = final_f(u_1,x_1)
self.failUnless(numpy.all(out_1==numpy.asarray([9.,16.,29.,50.,89.])))
#####################################################################
def test_constructFunction(self):
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
x_1_next = u_1 + x_1
my_op = Scan.symbolic(([u_1,x_1],x_1_next),1,1)
u_1 = theano.tensor.dvector('u_1')
x_1 = theano.tensor.dvector('x_1')
x_1_next = my_op(u_1,x_1)
final_f = theano.function([u_1,x_1],[x_1_next])
u_1 = [1.,1.,1.]
x_1 = [1.]
out_1 = final_f(u_1,x_1)
self.failUnless(numpy.all(out_1==numpy.asarray([2.,3.,4.])))
######################################################################
def test_gradOneInputOneOutput(self):
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
x_1_next = u_1*x_1
my_op = Scan.symbolic( ([u_1,x_1],x_1_next), 1,1)
u_1 = [1.,2.,3.]
x_1 = [1.]
verify_grad( my_op , [u_1,x_1] )
#######################################################################
def test_gradManyInputsManyOutputs(self):
u_1 = theano.tensor.dscalar('u_1')
u_2 = theano.tensor.dscalar('u_2')
x_1 = theano.tensor.dscalar('x_1')
x_2 = theano.tensor.dscalar('x_2')
x_1_next = x_1*u_1+x_2
x_2_next = x_2*u_2+x_1
my_op = Scan.symbolic( ([u_1,u_2,x_1,x_2],
[x_1_next,x_2_next]),
2,2)
u_1 = [1.,.2,3.]
u_2 = [1.5,1.25,.35]
x_1 = [.5]
x_2 = [.65]
verify_grad(my_op, [u_1,u_2,x_1,x_2])
######################################################################
def test_gradTimeTaps(self):
u_1 = theano.tensor.dscalar('u_1')
x_1 = theano.tensor.dscalar('x_1')
x_1_t_2 = theano.tensor.dscalar('x_1_t_2')
x_1_next = x_1_t_2*x_1*u_1
my_op = Scan.symbolic( ([u_1,x_1,x_1_t_2],
[x_1_next]),
1,1,taps={0:[2]})
u_1 = [1.,2.,3.,4.]
x_1 = [2.,3.]
verify_grad(my_op, [u_1,x_1])
#######################################################################
def test_gradManyInputsManyOutputsTimeTaps(self):
u_1 = theano.tensor.dscalar('u_1')
u_2 = theano.tensor.dscalar('u_2')
x_1 = theano.tensor.dscalar('x_1')
x_1_2 = theano.tensor.dscalar('x_1_2')
x_2 = theano.tensor.dscalar('x_2')
x_2_2 = theano.tensor.dscalar('x_2_2')
x_1_n = x_1*x_2_2 + u_1*x_1_2
x_2_n = x_2*x_1_2 + u_2*x_2_2
my_op = Scan.symbolic(([u_1,u_2,x_1,x_1_2,
x_2,x_2_2],[x_1_n,
x_2_n]),2,2,taps=
{0:[2],1:[2]})
u_1 = [1.,2.,3.,4.]
u_2 = [3.,2.,4.,1.]
x_1 = [0.1,0.2]
x_2 = [1.5,3.5]
verify_grad(my_op, [u_1,u_2,x_1,x_2]) def test_one(self):
pass
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论