提交 52057806 authored 作者: Yann N. Dauphin's avatar Yann N. Dauphin

merge

...@@ -20,7 +20,9 @@ since 2007. But it is also approachable enough to be used in the classroom ...@@ -20,7 +20,9 @@ since 2007. But it is also approachable enough to be used in the classroom
News News
==== ====
* Theano 0.6rc3 was released. Everybody is encouraged to update. * Ian Goodfellow did a `12h class with exercises on Theano <https://github.com/goodfeli/theano_exercises>`_.
* Theano 0.6 was released. Everybody is encouraged to update.
* New technical report on Theano: `Theano: new features and speed improvements <http://arxiv.org/abs/1211.5590>`_. * New technical report on Theano: `Theano: new features and speed improvements <http://arxiv.org/abs/1211.5590>`_.
However, please keep citing the other paper below in scientific work involving Theano. However, please keep citing the other paper below in scientific work involving Theano.
......
...@@ -3,8 +3,8 @@ ...@@ -3,8 +3,8 @@
Easy Installation of an optimized Theano on Ubuntu Easy Installation of an optimized Theano on Ubuntu
================================================== ==================================================
These instruction was done for Ubuntu 11.04, 11.10 and 12.04. You can These instruction was done for Ubuntu 11.04, 11.10, 12.04, 12.10, 13.04
probably do something similar on older computer. and 13.10. You can probably do something similar on older computer.
.. note:: .. note::
...@@ -49,7 +49,7 @@ probably do something similar on older computer. ...@@ -49,7 +49,7 @@ probably do something similar on older computer.
Installation steps Installation steps
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
Ubuntu 11.10/12.04/12.10/13.04: Ubuntu 11.10/12.04/12.10/13.04/13.10:
1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git`` 1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git``
2) ``sudo pip install Theano`` 2) ``sudo pip install Theano``
...@@ -236,15 +236,4 @@ Test GPU configuration ...@@ -236,15 +236,4 @@ Test GPU configuration
Ubuntu 12.10: default gcc version 4.7.2. gcc 4.4.7, 4.5.4 and 4.6.3 availables. Ubuntu 12.10: default gcc version 4.7.2. gcc 4.4.7, 4.5.4 and 4.6.3 availables.
Ubuntu 13.10: default gcc version 4.8.1. gcc 4.4.7, 4.6.4 and 4.7.3 availables.
...@@ -607,6 +607,27 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`. ...@@ -607,6 +607,27 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.
have shape (2, 60). have shape (2, 60).
.. function:: tile(x, reps, ndim=None)
Construct an array by repeating the input `x` according to `reps`
pattern.
Tiles its input according to `reps`. The length of `reps` is the
number of dimension of `x` and contains the number of times to
tile `x` in each dimension.
:see: `numpy.tile
<http://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html>`_
documentation for examples.
:see: :func:`theano.tensor.extra_ops.repeat
<theano.tensor.extra_ops.repeat>`
:note: Currently, `reps` must be a constant, `x.ndim` and
`len(reps)` must be equal and, if specified, `ndim` must be
equal to both.
Creating Tensor Creating Tensor
=============== ===============
...@@ -1542,6 +1563,86 @@ Gradient / Differentiation ...@@ -1542,6 +1563,86 @@ Gradient / Differentiation
:rtype: variable or list of variables (matching `wrt`) :rtype: variable or list of variables (matching `wrt`)
:returns: gradients of the cost with respect to each of the `wrt` terms :returns: gradients of the cost with respect to each of the `wrt` terms
.. function:: subgraph_grad(wrt, end, start=None, cost=None, details=False)
With respect to `wrt`, computes gradients of cost and/or from existing
`start` gradients, up to the `end` variables of a symbolic digraph.
In other words, computes gradients for a subgraph of the
symbolic theano function. Ignores all disconnected inputs.
This can be useful when one needs to perform the gradient descent
iteratively (e.g. one layer at a time in an MLP), or when a particular
operation is not differentiable in theano (e.g. stochastic sampling
from a multinomial). In the latter case, the gradient of the
non-differentiable process could be approximated by user-defined
formula, which could be calculated using the gradients of a cost
with respect to samples (0s and 1s). These gradients are obtained
by performing a subgraph_grad from the `cost` or previously known gradients
(`start`) up to the outputs of the stochastic process (`end`).
A dictionary mapping gradients obtained from the user-defined
differentiation of the process, to variables, could then be fed into
another subgraph_grad as `start` with any other `cost` (e.g. weight decay).
In an MLP, we could use subgraph_grad to iteratively backpropagate:
>>> x, t = theano.tensor.fvector('x'), theano.tensor.fvector('t')
>>> w1 = theano.shared(np.random.randn(3,4))
>>> w2 = theano.shared(np.random.randn(4,2))
>>> a1 = theano.tensor.tanh(theano.tensor.dot(x,w1))
>>> a2 = theano.tensor.tanh(theano.tensor.dot(a1,w2))
>>> cost2 = theano.tensor.sqr(a2 - t).sum()
>>> cost2 += theano.tensor.sqr(w2.sum())
>>> cost1 = theano.tensor.sqr(w1.sum())
>>> params = [[w2],[w1]]
>>> costs = [cost2,cost1]
>>> grad_ends = [[a1], [x]]
>>> next_grad = None
>>> param_grads = []
>>> for i in xrange(2):
>>> param_grad, next_grad = theano.subgraph_grad(
>>> wrt=params[i], end=grad_ends[i],
>>> start=next_grad, cost=costs[i]
>>> )
>>> next_grad = dict(zip(grad_ends[i], next_grad))
>>> param_grads.extend(param_grad)
:type wrt : List of Variables.
Gradients are computed with respect to `wrt`.
:type end : List of Variables.
Theano variables at which to end gradient descent
(they are considered constant in theano.grad).
For convenience, the gradients with respect to these variables
are also returned.
:type start : Dictionary of Variables
:param start: If not None, a dictionary mapping variables to
their gradients. This is useful when the gradient on some
variables are known. These are used to compute the gradients
backwards up to the variables in `end`
(they are used as known_grad in theano.grad).
:type cost: Scalar (0-dimensional) Variable.
:param cost:
Additional costs for which to compute the gradients.
For example, these could be weight decay, an l1 constraint,
MSE, NLL, etc. May optionally be None if start is provided.
Warning : If the gradients of `cost` with respect to any
of the `start` variables is already part of the `start`
dictionary, then it may be counted twice with respect to `wrt`
and `end`.
:type details: bool.
:param details: When True, additionally returns the
list of gradients from `start` and of `cost`, respectively,
with respect to `wrt` (not `end`).
:rtype: Tuple of 2 or 4 Lists of Variables
:return: Returns lists of gradients with respect to `wrt` and `end`,
respectively.
.. _R_op_list: .. _R_op_list:
......
...@@ -24,6 +24,246 @@ Scan ...@@ -24,6 +24,246 @@ Scan
The full documentation can be found in the library: :ref:`Scan <lib_scan>`. The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
**Scan Example: Computing tanh(x(t).dot(W) + b) elementwise**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# defining the tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W)+b_sym), sequences=X)
compute_elementwise = theano.function(inputs = [X, W, b_sym], outputs=[results])
# test values
x = np.eye(2)
w = np.ones((2,2))
b = np.ones((2))
b[1] = 2
print compute_elementwise(x, w, b)[0]
# comparison with numpy
print np.tanh(x.dot(w) + b)
**Scan Example: Computing the sequence x(t) = tanh(x(t-1).dot(W) + y(t).dot(U) + p(T-t).dot(V))**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
X = T.vector("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
Y = T.matrix("Y")
V = T.matrix("V")
P = T.matrix("P")
results, updates = theano.scan(lambda
y,p,x_tm1:T.tanh(T.dot(x_tm1,W)+T.dot(y,U)+T.dot(p,V)),
sequences=[Y,P[::-1]], outputs_info=[X])
compute_seq = theano.function(inputs = [X, W, Y, U, P, V], outputs=[results])
# test values
x = np.zeros((2))
x[1] = 1
w = np.ones((2,2))
y = np.ones((5,2))
y[0,:] = -3
u = np.ones((2,2))
p = np.ones((5,2))
p[0,:] = 3
v = np.ones((2,2))
print compute_seq(x,w,y,u,p,v)[0]
# comparison with numpy
x_res = np.zeros((5,2))
x_res[0] = np.tanh(x.dot(w) + y[0].dot(u) + p[4].dot(v))
for i in range(1,5):
x_res[i] = np.tanh(x_res[i-1].dot(w) + y[i].dot(u) + p[4-i].dot(v))
**Scan Example: Computing norms of lines of X**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), sequences=[X])
compute_norm_lines = theano.function(inputs = [X], outputs=[results])
# test value
x = np.diag(np.arange(1,6),1)
print compute_norm_lines(x)[0]
# comparison with numpy
print np.sqrt((x**2).sum(1))
**Scan Example: Computing norms of columns of X**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), sequences=[X.T])
compute_norm_cols = theano.function(inputs = [X], outputs=[results])
# test value
x = np.diag(np.arange(1,6),1)
print compute_norm_cols(x)[0]
# comparison with numpy
print np.sqrt((x**2).sum(0))
**Scan Example: Computing trace of X**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
floatX = "float32"
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda i, j, t_f:T.cast(X[i,j]+t_f, floatX), \
sequences=[T.arange(X.shape[0]), T.arange(X.shape[1])], \
outputs_info=np.asarray(0., dtype=floatX))
result = results[-1]
compute_trace = theano.function(inputs = [X], outputs=[result])
# test value
x = np.eye(5)
x[0] = np.arange(5)
compute_trace(x)[0]
# comparison with numpy
print np.diagonal(x).sum()
**Scan Example: Computing the sequence x(t) = x(t-2).dot(U) + x(t-1).dot(V) + tanh(x(t-1).dot(W) + b)**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
V = T.matrix("V")
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda x_tm2,x_tm1:T.dot(x_tm2,U) + T.dot(x_tm1,V) \
+ T.tanh(T.dot(x_tm1,W) + b_sym), \
n_steps=n_sym, outputs_info=[dict(initial = X, taps = [-2,-1])])
compute_seq2 = theano.function(inputs = [X, U, V, W, b_sym, n_sym], outputs=[results])
# test values
x = np.zeros((2,2)) # the initial value must be able to return x[-2]
x[1,1] = 1
w = 0.5*np.ones((2,2))
u = 0.5*(np.ones((2,2))-np.eye(2))
v = 0.5*np.ones((2,2))
n = 10
b = np.ones((2))
print compute_seq2(x,u,v,w,b,n)
# comparison with numpy
x_res = numpy.zeros((10,2))
x_res[0] = x[0].dot(u) + x[1].dot(v) + numpy.tanh(x[1].dot(w) + b)
x_res[1] = x[1].dot(u) + x_res[0].dot(v) + numpy.tanh(x_res[0].dot(w) + b)
x_res[2] = x_res[0].dot(u) + x_res[1].dot(v) \
+ numpy.tanh(x_res[1].dot(w) + b)
for i in range(2,10):
x_res[i] = (x_res[i-2].dot(u) + x_res[i-1].dot(v) \
+ numpy.tanh(x_res[i-1].dot(w) + b))
**Scan Example: Computing the Jacobian of y = tanh(v.dot(A)) wrt x**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
v = T.vector()
A = T.matrix()
y = T.tanh(T.dot(v,A))
results, updates = theano.scan(lambda i:T.grad(y[i], v), sequences = [T.arange(y.shape[0])])
compute_jac_t = theano.function([A,v], [results], allow_input_downcast = True) # shape (d_out, d_in)
# test values
x = np.eye(5)[0]
w = np.eye(5,3)
w[2] = np.ones((3))
print compute_jac_t(w,x)[0]
# compare with numpy
print ((1 - np.tanh(x.dot(w))**2)*w).T
Note that we need to iterate over the indices of ``y`` and not over the elements of ``y``. The reason is that scan create a placeholder variable for its internal function and this placeholder variable does not have the same dependencies than the variables that will replace it.
**Scan Example: Accumulate number of loop during a scan**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define shared variables
k = theano.shared(0)
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda:{k:(k+1)}, n_steps=n_sym)
accumulator = theano.function([n_sym], [], updates=updates, allow_input_downcast = True)
k.get_value()
accumulator(5)
k.get_value()
**Scan Example: Computing tanh(v.dot(W) + b)*d where b is binomial**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
# define shared random stream
trng = T.shared_randomstreams.RandomStreams(1234)
d=trng.binomial(size=W[1].shape)
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W)+b_sym)*d, sequences=X)
compute_with_bnoise = theano.function(inputs = [X, W, b_sym], outputs=[results], \
updates=updates, allow_input_downcast = True)
x = np.eye(10,2)
w = np.ones((2,2))
b = np.ones((2))
print compute_with_bnoise(x, w, b)
Note that if you want to use a random variable ``d`` that will not be updated through scan loops, you should pass this variable as a ``non_sequences`` arguments.
**Scan Example: Computing pow(A,k)** **Scan Example: Computing pow(A,k)**
.. code-block:: python .. code-block:: python
......
...@@ -79,7 +79,7 @@ from theano.updates import Updates, OrderedUpdates ...@@ -79,7 +79,7 @@ from theano.updates import Updates, OrderedUpdates
#we don't import by default as we don't want to force having scipy installed. #we don't import by default as we don't want to force having scipy installed.
#import sparse #import sparse
from theano.gradient import Rop, Lop, grad from theano.gradient import Rop, Lop, grad, subgraph_grad
if config.device.startswith('gpu') or config.init_gpu_device.startswith('gpu'): if config.device.startswith('gpu') or config.init_gpu_device.startswith('gpu'):
import theano.sandbox.cuda import theano.sandbox.cuda
......
...@@ -1077,6 +1077,7 @@ class FunctionMaker(object): ...@@ -1077,6 +1077,7 @@ class FunctionMaker(object):
self.mode = mode self.mode = mode
self.accept_inplace = accept_inplace self.accept_inplace = accept_inplace
self.function_builder = function_builder self.function_builder = function_builder
self.on_unused_input = on_unused_input # Used only for the pickling
self.required = [(i.value is None) for i in self.inputs] self.required = [(i.value is None) for i in self.inputs]
self.refeed = [ self.refeed = [
...@@ -1215,6 +1216,7 @@ def _pickle_FunctionMaker(self): ...@@ -1215,6 +1216,7 @@ def _pickle_FunctionMaker(self):
accept_inplace=self.accept_inplace, accept_inplace=self.accept_inplace,
function_builder=self.function_builder, function_builder=self.function_builder,
profile=self.profile, profile=self.profile,
on_unused_input=self.on_unused_input,
) )
return (_constructor_FunctionMaker, (kwargs,)) return (_constructor_FunctionMaker, (kwargs,))
......
...@@ -507,13 +507,22 @@ class ProfileStats(object): ...@@ -507,13 +507,22 @@ class ProfileStats(object):
print >> file, header_str print >> file, header_str
atimes = [( topos = {} # Only do the topo once per fct.
atimes = []
for a, t in self.apply_time.items():
if a.fgraph not in topos:
topo = a.fgraph.toposort()
topos[a.fgraph] = topo
else:
topo = topos[a.fgraph]
atimes.append((
t * 100 / local_time, t * 100 / local_time,
t, t,
a, a,
a.fgraph.toposort().index(a), topo.index(a),
self.apply_callcount[a]) self.apply_callcount[a]))
for a, t in self.apply_time.items()] del topos
atimes.sort() atimes.sort()
atimes.reverse() atimes.reverse()
tot = 0 tot = 0
......
...@@ -117,19 +117,10 @@ AddConfigVar('mode', ...@@ -117,19 +117,10 @@ AddConfigVar('mode',
enum = EnumStr("g++", "") enum = EnumStr("g++", "")
# Test whether or not g++ is present: disable C code if it is not. # Test whether or not g++ is present: disable C code if it is not.
# Using the dummy file descriptor below is a workaround for a crash experienced
# in an unusual Python 2.4.4 Windows environment with the default stdin=None.
dummy_stdin = open(os.devnull)
try: try:
try: rc = call_subprocess_Popen(['g++', '-v'])
rc = call_subprocess_Popen(['g++', '-v'], stdout=subprocess.PIPE, except OSError:
stderr=subprocess.PIPE, rc = 1
stdin=dummy_stdin).wait()
except OSError:
rc = 1
finally:
dummy_stdin.close()
del dummy_stdin
if rc == 0: if rc == 0:
# Keep the default linker the same as the one for the mode FAST_RUN # Keep the default linker the same as the one for the mode FAST_RUN
AddConfigVar('linker', AddConfigVar('linker',
......
...@@ -57,7 +57,10 @@ from theano.gof.link import \ ...@@ -57,7 +57,10 @@ from theano.gof.link import \
from theano.gof.op import \ from theano.gof.op import \
Op, OpenMPOp, PureOp, ops_with_inner_function Op, OpenMPOp, PureOp, ops_with_inner_function
from theano.gof.opt import (Optimizer, optimizer, SeqOptimizer, from theano.gof.opt import (
Optimizer,
optimizer, inplace_optimizer,
SeqOptimizer,
MergeOptimizer, MergeOptMerge, MergeOptimizer, MergeOptMerge,
LocalOptimizer, local_optimizer, LocalOptGroup, LocalOptimizer, local_optimizer, LocalOptGroup,
OpSub, OpRemove, PatternSub, OpSub, OpRemove, PatternSub,
......
...@@ -29,7 +29,8 @@ from theano.compat.six import b, BytesIO, StringIO ...@@ -29,7 +29,8 @@ from theano.compat.six import b, BytesIO, StringIO
from theano.gof.utils import flatten from theano.gof.utils import flatten
from theano.configparser import config from theano.configparser import config
from theano.gof.cc import hash_from_code from theano.gof.cc import hash_from_code
from theano.misc.windows import call_subprocess_Popen from theano.misc.windows import (subprocess_Popen, call_subprocess_Popen,
output_subprocess_Popen)
# we will abuse the lockfile mechanism when reading and writing the registry # we will abuse the lockfile mechanism when reading and writing the registry
from theano.gof import compilelock from theano.gof import compilelock
...@@ -1438,8 +1439,12 @@ def get_gcc_shared_library_arg(): ...@@ -1438,8 +1439,12 @@ def get_gcc_shared_library_arg():
def std_include_dirs(): def std_include_dirs():
return (numpy.distutils.misc_util.get_numpy_include_dirs() numpy_inc_dirs = numpy.distutils.misc_util.get_numpy_include_dirs()
+ [distutils.sysconfig.get_python_inc()]) py_inc = distutils.sysconfig.get_python_inc()
py_plat_spec_inc = distutils.sysconfig.get_python_inc(plat_specific=True)
python_inc_dirs = ([py_inc] if py_inc == py_plat_spec_inc
else [py_inc, py_plat_spec_inc])
return numpy_inc_dirs + python_inc_dirs
def std_lib_dirs_and_libs(): def std_lib_dirs_and_libs():
...@@ -1512,11 +1517,8 @@ def gcc_llvm(): ...@@ -1512,11 +1517,8 @@ def gcc_llvm():
pass pass
p = None p = None
try: try:
p = call_subprocess_Popen(['g++', '--version'], p_out = output_subprocess_Popen(['g++', '--version'])
stdout=subprocess.PIPE, output = p_out[0] + p_out[1]
stderr=subprocess.PIPE)
p.wait()
output = p.stdout.read() + p.stderr.read()
except OSError: except OSError:
# Typically means g++ cannot be found. # Typically means g++ cannot be found.
# So it is not an llvm compiler. # So it is not an llvm compiler.
...@@ -1569,11 +1571,11 @@ class GCC_compiler(object): ...@@ -1569,11 +1571,11 @@ class GCC_compiler(object):
GCC_compiler.march_flags = [] GCC_compiler.march_flags = []
def get_lines(cmd, parse=True): def get_lines(cmd, parse=True):
p = call_subprocess_Popen(cmd, p = subprocess_Popen(cmd,
stdout=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, stderr=subprocess.PIPE,
stdin=subprocess.PIPE, stdin=subprocess.PIPE,
shell=True) shell=True)
# For mingw64 with GCC >= 4.7, passing os.devnull # For mingw64 with GCC >= 4.7, passing os.devnull
# as stdin (which is the default) results in the process # as stdin (which is the default) results in the process
# waiting forever without returning. For that reason, # waiting forever without returning. For that reason,
...@@ -1713,7 +1715,7 @@ class GCC_compiler(object): ...@@ -1713,7 +1715,7 @@ class GCC_compiler(object):
continue continue
mj, mn, patch = [int(vp) for vp in version] mj, mn, patch = [int(vp) for vp in version]
if (((mj, mn) == (4, 6) and patch < 4) or if (((mj, mn) == (4, 6) and patch < 4) or
((mj, mn) == (4, 7) and patch < 3) or ((mj, mn) == (4, 7) and patch <= 3) or
((mj, mn) == (4, 8) and patch < 1)): ((mj, mn) == (4, 8) and patch < 1)):
new_flags[i] = p.rstrip('-avx') new_flags[i] = p.rstrip('-avx')
...@@ -1811,21 +1813,15 @@ class GCC_compiler(object): ...@@ -1811,21 +1813,15 @@ class GCC_compiler(object):
os.write(fd, src_code) os.write(fd, src_code)
os.close(fd) os.close(fd)
fd = None fd = None
proc = call_subprocess_Popen( p_ret = call_subprocess_Popen(
['g++', path, '-o', exe_path] + flags, ['g++', path, '-o', exe_path] + flags)
stdout=subprocess.PIPE, if p_ret != 0:
stderr=subprocess.PIPE)
proc.wait()
if proc.returncode != 0:
compilation_ok = False compilation_ok = False
elif try_run: elif try_run:
# Try to execute the program # Try to execute the program
try: try:
proc = call_subprocess_Popen([exe_path], p_ret = call_subprocess_Popen([exe_path])
stdout=subprocess.PIPE, run_ok = (p_ret == 0)
stderr=subprocess.PIPE)
proc.wait()
run_ok = (proc.returncode == 0)
finally: finally:
os.remove(exe_path) os.remove(exe_path)
finally: finally:
...@@ -1958,14 +1954,14 @@ class GCC_compiler(object): ...@@ -1958,14 +1954,14 @@ class GCC_compiler(object):
print >> sys.stderr, ' '.join(cmd) print >> sys.stderr, ' '.join(cmd)
try: try:
p = call_subprocess_Popen(cmd, stderr=subprocess.PIPE) p_out = output_subprocess_Popen(cmd)
compile_stderr = decode(p.communicate()[1]) compile_stderr = decode(p_out[1])
except Exception: except Exception:
# An exception can occur e.g. if `g++` is not found. # An exception can occur e.g. if `g++` is not found.
print_command_line_error() print_command_line_error()
raise raise
status = p.returncode status = p_out[2]
if status: if status:
print '===============================' print '==============================='
......
...@@ -16,27 +16,17 @@ import numpy ...@@ -16,27 +16,17 @@ import numpy
import theano import theano
from theano.configparser import config, AddConfigVar, ConfigParam, StrParam from theano.configparser import config, AddConfigVar, ConfigParam, StrParam
from theano.gof.utils import flatten from theano.gof.utils import flatten
from theano.misc.windows import call_subprocess_Popen from theano.misc.windows import output_subprocess_Popen
_logger = logging.getLogger("theano.gof.compiledir") _logger = logging.getLogger("theano.gof.compiledir")
# Using the dummy file descriptors below is a workaround for a crash
# experienced in an unusual Python 2.4.4 Windows environment with the default
# None values.
dummy_err = open(os.devnull, 'w')
p = None
try: try:
p = call_subprocess_Popen(['g++', '-dumpversion'], p_out = output_subprocess_Popen(['g++', '-dumpversion'])
stdout=subprocess.PIPE, gcc_version_str = p_out[0].strip().decode()
stderr=dummy_err.fileno())
p.wait()
gcc_version_str = p.stdout.readline().strip().decode()
except OSError: except OSError:
# Typically means gcc cannot be found. # Typically means gcc cannot be found.
gcc_version_str = 'GCC_NOT_FOUND' gcc_version_str = 'GCC_NOT_FOUND'
del p
del dummy_err
def local_bitwidth(): def local_bitwidth():
......
...@@ -165,8 +165,12 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1): ...@@ -165,8 +165,12 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1):
my_pid = os.getpid() my_pid = os.getpid()
no_display = (verbosity == 0) no_display = (verbosity == 0)
# Acquire lock.
nb_error = 0 nb_error = 0
# The number of time we sleep when their is no errors.
# Used to don't display it the first time to display it less frequently.
# And so don't get as much email about this!
nb_wait = 0
# Acquire lock.
while True: while True:
try: try:
last_owner = 'no_owner' last_owner = 'no_owner'
...@@ -214,7 +218,7 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1): ...@@ -214,7 +218,7 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1):
last_owner = read_owner last_owner = read_owner
time_start = time.time() time_start = time.time()
no_display = (verbosity == 0) no_display = (verbosity == 0)
if not no_display: if not no_display and nb_wait > 0:
if read_owner == 'failure': if read_owner == 'failure':
msg = 'unknown process' msg = 'unknown process'
else: else:
...@@ -225,6 +229,7 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1): ...@@ -225,6 +229,7 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1):
tmp_dir) tmp_dir)
if verbosity <= 1: if verbosity <= 1:
no_display = True no_display = True
nb_wait += 1
time.sleep(random.uniform(min_wait, max_wait)) time.sleep(random.uniform(min_wait, max_wait))
try: try:
......
差异被折叠。
...@@ -179,23 +179,33 @@ class Query(object): ...@@ -179,23 +179,33 @@ class Query(object):
class EquilibriumDB(DB): class EquilibriumDB(DB):
""" A set of potential optimizations which should be applied in an """A set of potential optimizations which should be applied in an
arbitrary order until equilibrium is reached. arbitrary order until equilibrium is reached.
Canonicalize, Stabilize, and Specialize are all equilibrium optimizations. Canonicalize, Stabilize, and Specialize are all equilibrium optimizations.
:param ignore_newtrees: If False, we will apply local opt on new
node introduced during local optimization application. This
could result in less fgraph iterations, but this don't mean it
will be faster globally.
.. note:: .. note::
We can put LocalOptimizer and Optimizer as EquilibriumOptimizer We can put LocalOptimizer and Optimizer as EquilibriumOptimizer
suppor both. suppor both.
""" """
def __init__(self, ignore_newtrees=True):
super(EquilibriumDB, self).__init__()
self.ignore_newtrees = ignore_newtrees
def query(self, *tags, **kwtags): def query(self, *tags, **kwtags):
opts = super(EquilibriumDB, self).query(*tags, **kwtags) opts = super(EquilibriumDB, self).query(*tags, **kwtags)
return opt.EquilibriumOptimizer(opts, return opt.EquilibriumOptimizer(
max_use_ratio=config.optdb.max_use_ratio, opts,
failure_callback=opt.NavigatorOptimizer.warn_inplace) max_use_ratio=config.optdb.max_use_ratio,
ignore_newtrees=self.ignore_newtrees,
failure_callback=opt.NavigatorOptimizer.warn_inplace)
class SequenceDB(DB): class SequenceDB(DB):
......
...@@ -544,6 +544,109 @@ def grad(cost, wrt, consider_constant=None, ...@@ -544,6 +544,109 @@ def grad(cost, wrt, consider_constant=None,
rval, = rval rval, = rval
return rval return rval
def subgraph_grad(wrt, end, start=None, cost=None, details=False):
'''
With respect to `wrt`, computes gradients of cost and/or from existing
`start` gradients, up to the `end` variables of a symbolic digraph.
In other words, computes gradients for a subgraph of the
symbolic theano function. Ignores all disconnected inputs.
This can be useful when one needs to perform the gradient descent
iteratively (e.g. one layer at a time in an MLP), or when a particular
operation is not differentiable in theano (e.g. stochastic sampling
from a multinomial). In the latter case, the gradient of the
non-differentiable process could be approximated by user-defined
formula, which could be calculated using the gradients of a cost
with respect to samples (0s and 1s). These gradients are obtained
by performing a subgraph_grad from the `cost` or previously known gradients
(`start`) up to the outputs of the stochastic process (`end`).
A dictionary mapping gradients obtained from the user-defined
differentiation of the process, to variables, could then be fed into
another subgraph_grad as `start` with any other `cost` (e.g. weight decay).
:type wrt : List of Variables.
Gradients are computed with respect to `wrt`.
:type end : List of Variables.
Theano variables at which to end gradient descent
(they are considered constant in theano.grad).
For convenience, the gradients with respect to these variables
are also returned.
:type start : Dictionary of Variables
:param start: If not None, a dictionary mapping variables to
their gradients. This is useful when the gradient on some
variables are known. These are used to compute the gradients
backwards up to the variables in `end`
(they are used as known_grad in theano.grad).
:type cost: Scalar (0-dimensional) Variable.
:param cost:
Additional costs for which to compute the gradients.
For example, these could be weight decay, an l1 constraint,
MSE, NLL, etc. May optionally be None if start is provided.
Warning : If the gradients of `cost` with respect to any
of the `start` variables is already part of the `start`
dictionary, then it may be counted twice with respect to `wrt`
and `end`.
:type details: bool.
:param details: When True, additionally returns the
list of gradients from `start` and of `cost`, respectively,
with respect to `wrt` (not `end`).
:rtype: Tuple of 2 or 4 Lists of Variables
:return: Returns lists of gradients with respect to `wrt` and `end`,
respectively.
'''
assert ((cost is not None) or (start is not None))
assert isinstance(end, list)
assert isinstance(wrt, list)
if start is not None:
assert isinstance(start, dict)
params = list(set(wrt + end))
start_grads = None
cost_grads = None
if start is not None:
start_grads = list(
theano.grad(
cost=None, wrt=params, known_grads=start,
consider_constant=end,
disconnected_inputs='ignore'
)
)
if cost is not None:
cost_grads = list(
theano.grad(
cost=cost, wrt=params,
consider_constant=end,
disconnected_inputs='ignore'
)
)
grads = None
if start is None:
grads = cost_grads
else:
grads = start_grads
if cost_grads is not None:
for i in range(len(grads)):
grads[i] += cost_grads[i]
pgrads = OrderedDict(zip(params, grads))
# separate wrt from end grads:
wrt_grads = list(pgrads[k] for k in wrt)
end_grads = list(pgrads[k] for k in end)
if details:
return wrt_grads, end_grads, start_grads, cost_grads
return wrt_grads, end_grads
def _node_to_pattern(node): def _node_to_pattern(node):
""" given an apply node, obtain its connection pattern """ given an apply node, obtain its connection pattern
......
...@@ -203,6 +203,7 @@ if __name__ == "__main__": ...@@ -203,6 +203,7 @@ if __name__ == "__main__":
cuda version 5.5 5.0 4.2 4.1 4.0 3.2 3.0 # note cuda version 5.5 5.0 4.2 4.1 4.0 3.2 3.0 # note
gpu gpu
K6000/NOECC 0.06s
K20m/ECC 0.07s K20m/ECC 0.07s
K20/NOECC 0.07s K20/NOECC 0.07s
M2090 0.19s M2090 0.19s
......
...@@ -2,9 +2,11 @@ import os ...@@ -2,9 +2,11 @@ import os
import subprocess import subprocess
def call_subprocess_Popen(command, **params): def subprocess_Popen(command, **params):
""" """
Utility function to work around windows behavior that open windows Utility function to work around windows behavior that open windows.
:see: call_subprocess_Popen and output_subprocess_Popen
""" """
startupinfo = None startupinfo = None
if os.name == 'nt': if os.name == 'nt':
...@@ -36,3 +38,40 @@ def call_subprocess_Popen(command, **params): ...@@ -36,3 +38,40 @@ def call_subprocess_Popen(command, **params):
if stdin is not None: if stdin is not None:
del stdin del stdin
return proc return proc
def call_subprocess_Popen(command, **params):
"""
Calls subprocess_Popen and discards the output, returning only the
exit code.
"""
if 'stdout' in params or 'stderr' in params:
raise TypeError("don't use stderr or stdout with call_subprocess_Popen")
null = open(os.devnull, 'wb')
# stdin to devnull is a workaround for a crash in a weird Windows
# environement where sys.stdin was None
params.setdefault('stdin', null)
params['stdout'] = null
params['stderr'] = null
p = subprocess_Popen(command, **params)
p.wait()
return p.returncode
def output_subprocess_Popen(command, **params):
"""
Calls subprocess_Popen, returning the output, error and exit code
in a tuple.
"""
if 'stdout' in params or 'stderr' in params:
raise TypeError("don't use stderr or stdout with output_subprocess_Popen")
# stdin to devnull is a workaround for a crash in a weird Windows
# environement where sys.stdin was None
if not hasattr(params, 'stdin'):
null = open(os.devnull, 'wb')
params['stdin'] = null
params['stdout'] = subprocess.PIPE
params['stderr'] = subprocess.PIPE
p = subprocess_Popen(command, **params)
# we need to use communicate to make sure we don't deadlock around
# the stdour/stderr pipe.
out = p.communicate()
return out + (p.returncode,)
...@@ -296,38 +296,15 @@ class GpuDimShuffle(GpuOp): ...@@ -296,38 +296,15 @@ class GpuDimShuffle(GpuOp):
def __init__(self, input_broadcastable, new_order): def __init__(self, input_broadcastable, new_order):
input_broadcastable = tuple(input_broadcastable) input_broadcastable = tuple(input_broadcastable)
self.input_broadcastable = input_broadcastable self.input_broadcastable = input_broadcastable
new_order = tuple(new_order)
self.new_order = new_order self.new_order = new_order
# list of dimensions of the input to drop
self.drop = []
# this maps i before dropping dimensions to j after dropping
# dimensions so self.shuffle can be set properly later on
i2j = {}
j = 0
for i, b in enumerate(input_broadcastable): for i, b in enumerate(input_broadcastable):
if i not in new_order: if i not in new_order:
# we want to drop this dimension because it's not a if not b:
# value in new_order
if b == 1: # 1 aka True
self.drop.append(i)
else:
# we cannot drop non-broadcastable dimensions # we cannot drop non-broadcastable dimensions
raise ValueError("You cannot drop a non-broadcastable" raise ValueError("You cannot drop a non-broadcastable"
" dimension.", " dimension.",
(input_broadcastable, new_order)) (input_broadcastable, new_order))
else:
i2j[i] = j
j += 1
# transposition of non-broadcastable dimensions This is how
# the dimensions will be permuted, without accounting for the
# extra 'x' broadcastable dimensions to insert.
self.shuffle = [i2j[x] for x in new_order if x != 'x']
# list of dimensions of the output that are broadcastable and
# were not in the original input
self.augment = [i for i, x in enumerate(new_order) if x == 'x']
self.view_map = {0: [0]} self.view_map = {0: [0]}
...@@ -481,8 +458,6 @@ class GpuDimShuffle(GpuOp): ...@@ -481,8 +458,6 @@ class GpuDimShuffle(GpuOp):
print self print self
print "IN BROAD", self.input_broadcastable print "IN BROAD", self.input_broadcastable
print "NEW ORDER", self.new_order print "NEW ORDER", self.new_order
print "SHUFFLE", self.shuffle
print "AUGMENT", self.augment
print '------------' print '------------'
print '' print ''
print sio.getvalue() print sio.getvalue()
...@@ -1198,7 +1173,11 @@ class GpuCAReduce(GpuOp): ...@@ -1198,7 +1173,11 @@ class GpuCAReduce(GpuOp):
n_threads.z += 1; n_threads.z += 1;
else else
break; break;
}""" % locals() }
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
""" % locals()
if len(self.reduce_mask) == 2: if len(self.reduce_mask) == 2:
threads_y = '' threads_y = ''
...@@ -1509,6 +1488,8 @@ class GpuCAReduce(GpuOp): ...@@ -1509,6 +1488,8 @@ class GpuCAReduce(GpuOp):
n_threads.z += 1; n_threads.z += 1;
} }
n_threads.z -= 1; n_threads.z -= 1;
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
dim3 n_blocks(1,1,1); dim3 n_blocks(1,1,1);
%(makecall)s %(makecall)s
...@@ -1605,7 +1586,7 @@ class GpuCAReduce(GpuOp): ...@@ -1605,7 +1586,7 @@ class GpuCAReduce(GpuOp):
""" % locals() """ % locals()
def c_code_cache_version_apply(self, node): def c_code_cache_version_apply(self, node):
version = [8] # the version corresponding to the c code in this Op version = [9] # the version corresponding to the c code in this Op
# now we insert versions for the ops on which we depend... # now we insert versions for the ops on which we depend...
scalar_node = Apply(self.scalar_op, scalar_node = Apply(self.scalar_op,
...@@ -3192,13 +3173,27 @@ class GpuAlloc(GpuOp): ...@@ -3192,13 +3173,27 @@ class GpuAlloc(GpuOp):
# If the output is a constant, it will have to be deepcopied # If the output is a constant, it will have to be deepcopied
# each time the function is called. So we do not fold. # each time the function is called. So we do not fold.
return False return False
elif (not isinstance(client[0], basestring) elif (#The following ops work inplace of their input id 0.
and isinstance(client[0].op, ( client[1] == 0 and
tensor.IncSubtensor, isinstance(client[0].op, (
tensor.AdvancedIncSubtensor1, #Ops that will work inplace on the Alloc. So if they
GpuIncSubtensor, #get constant_folded, they would copy the
GpuAdvancedIncSubtensor1 #constant and this is less efficients.
))):
#Not doing the constant folding could also lower
#the peak memory usage, as we the "constant" won't
#always exists.
#theano.tensor.subtensor.AdvancedIncSubtensor,
GpuIncSubtensor,
GpuAdvancedIncSubtensor1,
theano.sandbox.cuda.blas.GpuGemm,
theano.sandbox.cuda.blas.GpuGemv,
theano.sandbox.cuda.blas.GpuGer,
))):
return False
#If the clients is a transfer, we don't want to fold. We
#let the moving opt finish before deciding what to do.
elif isinstance(client[0].op, HostFromGpu):
return False return False
return True return True
......
...@@ -5093,7 +5093,7 @@ int fprint_CudaNdarray(FILE * fd, const CudaNdarray *self) ...@@ -5093,7 +5093,7 @@ int fprint_CudaNdarray(FILE * fd, const CudaNdarray *self)
int CudaNdarray_prep_output(CudaNdarray ** arr, int nd, int CudaNdarray_prep_output(CudaNdarray ** arr, int nd,
const int * dims) const int * dims, int fortran)
{ {
bool allocated = false; bool allocated = false;
if (*arr == NULL) if (*arr == NULL)
...@@ -5105,7 +5105,7 @@ int CudaNdarray_prep_output(CudaNdarray ** arr, int nd, ...@@ -5105,7 +5105,7 @@ int CudaNdarray_prep_output(CudaNdarray ** arr, int nd,
allocated = true; allocated = true;
} }
if (CudaNdarray_alloc_contiguous(*arr, nd, dims)) if (CudaNdarray_alloc_contiguous(*arr, nd, dims, fortran))
{ {
if (allocated) if (allocated)
{ {
......
...@@ -160,6 +160,12 @@ CudaNdarray_CheckExact(const PyObject * ob); ...@@ -160,6 +160,12 @@ CudaNdarray_CheckExact(const PyObject * ob);
DllExport bool DllExport bool
CudaNdarray_is_c_contiguous(const CudaNdarray * self); CudaNdarray_is_c_contiguous(const CudaNdarray * self);
/**
* Return true for a F-contiguous CudaNdarray, else false
*/
DllExport bool
CudaNdarray_is_f_contiguous(const CudaNdarray * self);
/**** /****
* Returns the number of elements necessary in host_structure and dev_structure for a given number of dimensions. * Returns the number of elements necessary in host_structure and dev_structure for a given number of dimensions.
*/ */
...@@ -326,10 +332,13 @@ CudaNdarray_set_nd(CudaNdarray * self, const int nd) ...@@ -326,10 +332,13 @@ CudaNdarray_set_nd(CudaNdarray * self, const int nd)
* Allocate storage space for a tensor of rank 'nd' and given dimensions. * Allocate storage space for a tensor of rank 'nd' and given dimensions.
* (No-op if self already has a contiguous tensor of the right dimensions) * (No-op if self already has a contiguous tensor of the right dimensions)
* *
* If fortran is non-zeros, a fortran order is made, otherwise it is a c order.
*
* Note: CudaNdarray_alloc_contiguous is templated to work for both int dimensions and npy_intp dimensions * Note: CudaNdarray_alloc_contiguous is templated to work for both int dimensions and npy_intp dimensions
*/ */
template<typename inttype> template<typename inttype>
static int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const inttype * dim) static int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd,
const inttype * dim, int fortran=0)
{ {
// allocate an empty ndarray with c_contiguous access // allocate an empty ndarray with c_contiguous access
// return 0 on success // return 0 on success
...@@ -342,11 +351,23 @@ static int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const i ...@@ -342,11 +351,23 @@ static int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const i
{ {
return -1; return -1;
} }
for (int i = nd-1; i >= 0; --i) if (fortran)
{
for (int i = 0; i < nd; i++)
{
CudaNdarray_set_stride(self, i, (dim[i] == 1) ? 0 : size);
CudaNdarray_set_dim(self, i, dim[i]);
size = size * dim[i];
}
}
else
{ {
CudaNdarray_set_stride(self, i, (dim[i] == 1) ? 0 : size); for (int i = nd-1; i >= 0; --i)
CudaNdarray_set_dim(self, i, dim[i]); {
size = size * dim[i]; CudaNdarray_set_stride(self, i, (dim[i] == 1) ? 0 : size);
CudaNdarray_set_dim(self, i, dim[i]);
size = size * dim[i];
}
} }
// If the allocated buffer is already of the right size, we don't need to // If the allocated buffer is already of the right size, we don't need to
...@@ -497,6 +518,27 @@ CudaNdarray_is_c_contiguous(const CudaNdarray * self) ...@@ -497,6 +518,27 @@ CudaNdarray_is_c_contiguous(const CudaNdarray * self)
return c_contiguous; return c_contiguous;
} }
/**
* True iff the strides look like [1, dim[0], dim[0]*dim[1], ...]
*/
DllExport inline bool ALWAYS_INLINE
CudaNdarray_is_f_contiguous(const CudaNdarray * self)
{
bool f_contiguous = true;
int size = 1;
for (int i = 0; (i < self->nd) && f_contiguous; i++)
{
if (CudaNdarray_HOST_DIMS(self)[i] == 1)
continue;
if (CudaNdarray_HOST_STRIDES(self)[i] != size)
{
f_contiguous = false;
}
size = size * CudaNdarray_HOST_DIMS(self)[i];
}
return f_contiguous;
}
DllExport PyObject * CudaNdarray_IS_C_Contiguous(CudaNdarray * self); DllExport PyObject * CudaNdarray_IS_C_Contiguous(CudaNdarray * self);
DllExport int CudaNdarray_gemm(float alpha, const CudaNdarray * A, const CudaNdarray * B, float beta, CudaNdarray * C); DllExport int CudaNdarray_gemm(float alpha, const CudaNdarray * A, const CudaNdarray * B, float beta, CudaNdarray * C);
...@@ -525,8 +567,9 @@ DllExport int CudaNdarray_inplace_elemwise(PyObject* py_self, PyObject * py_othe ...@@ -525,8 +567,9 @@ DllExport int CudaNdarray_inplace_elemwise(PyObject* py_self, PyObject * py_othe
// *arr may initially be NULL, a pointer to an ndarray of the wrong size, // *arr may initially be NULL, a pointer to an ndarray of the wrong size,
// or a pointer to an ndarray of the right size. In the last case it will // or a pointer to an ndarray of the right size. In the last case it will
// not change. // not change.
// If fortran is non-zero, a fortran order is expected/created
DllExport int CudaNdarray_prep_output(CudaNdarray ** arr, int nd, DllExport int CudaNdarray_prep_output(CudaNdarray ** arr, int nd,
const int * dims); const int * dims, int fortran = 0);
DllExport inline const char* ALWAYS_INLINE cublasGetErrorString(cublasStatus err){ DllExport inline const char* ALWAYS_INLINE cublasGetErrorString(cublasStatus err){
if(CUBLAS_STATUS_SUCCESS == err) if(CUBLAS_STATUS_SUCCESS == err)
......
...@@ -16,7 +16,7 @@ from theano.gof.cmodule import (std_libs, std_lib_dirs, ...@@ -16,7 +16,7 @@ from theano.gof.cmodule import (std_libs, std_lib_dirs,
std_include_dirs, dlimport, std_include_dirs, dlimport,
get_lib_extension) get_lib_extension)
from theano.gof.python25 import any from theano.gof.python25 import any
from theano.misc.windows import call_subprocess_Popen from theano.misc.windows import output_subprocess_Popen
_logger = logging.getLogger("theano.sandbox.cuda.nvcc_compiler") _logger = logging.getLogger("theano.sandbox.cuda.nvcc_compiler")
_logger.setLevel(logging.WARN) _logger.setLevel(logging.WARN)
...@@ -98,12 +98,8 @@ nvcc_version = None ...@@ -98,12 +98,8 @@ nvcc_version = None
def is_nvcc_available(): def is_nvcc_available():
"""Return True iff the nvcc compiler is found.""" """Return True iff the nvcc compiler is found."""
def set_version(): def set_version():
p = call_subprocess_Popen([nvcc_path, '--version'], p_out = output_subprocess_Popen([nvcc_path, '--version'])
stdout=subprocess.PIPE, ver_line = decode(p_out[0]).strip().split('\n')[-1]
stderr=subprocess.PIPE)
p.wait()
ver_line = decode(p.stdout.readlines()[-1])
build, version = ver_line.split(',')[1].strip().split() build, version = ver_line.split(',')[1].strip().split()
assert build == 'release' assert build == 'release'
......
差异被折叠。
...@@ -109,11 +109,13 @@ def test_careduce(): ...@@ -109,11 +109,13 @@ def test_careduce():
((4100,4,3),[1,2]),((5,4100,3),[1,2]),((5,4,4100),[1,2]),#011 ((4100,4,3),[1,2]),((5,4100,3),[1,2]),((5,4,4100),[1,2]),#011
#((4100,4,3),[0,2]),((5,4100,3),[0,2]),((5,4,4100),[0,2]),#101 ##not implemented #((4100,4,3),[0,2]),((5,4100,3),[0,2]),((5,4,4100),[0,2]),#101 ##not implemented
((4100,4,3),[0,1,2]),((5,4100,3),[0,1,2]),((5,4,4100),[0,1,2]),#111 ((4100,4,3),[0,1,2]),((5,4100,3),[0,1,2]),((5,4,4100),[0,1,2]),#111
((65,4,3),[0,1,2]),((5,65,3),[0,1,2]),((5,4,65),[0,1,2]),#111
((4100,4,3,2),[2,3]),((4,4100,3,2),[2,3]),((4,3,4100,2),[2,3]),((4,3,2,4100),[2,3]),#0011 ((4100,4,3,2),[2,3]),((4,4100,3,2),[2,3]),((4,3,4100,2),[2,3]),((4,3,2,4100),[2,3]),#0011
((4100,4,3,2),[1,3]),((4,4100,3,2),[1,3]),((4,3,4100,2),[1,3]),((4,3,2,4100),[1,3]),#0101 ((4100,4,3,2),[1,3]),((4,4100,3,2),[1,3]),((4,3,4100,2),[1,3]),((4,3,2,4100),[1,3]),#0101
((4100,4,3,2),[0,2,3]),((4,4100,3,2),[0,2,3]),((4,3,4100,2),[0,2,3]),#((4,3,2,4100),[0,2,3]),#1011 ((4100,4,3,2),[0,2,3]),((4,4100,3,2),[0,2,3]),((4,3,4100,2),[0,2,3]),#((4,3,2,4100),[0,2,3]),#1011
((4100,4,3,2),[1,2,3]),((4,4100,3,2),[1,2,3]),((4,3,4100,2),[1,2,3]),((4,3,2,4100),[1,2,3]),#0111 ((4100,4,3,2),[1,2,3]),((4,4100,3,2),[1,2,3]),((4,3,4100,2),[1,2,3]),((4,3,2,4100),[1,2,3]),#0111
((65,4,3,2),[1,2,3]),((4,65,3,2),[1,2,3]),((4,3,65,2),[1,2,3]),((4,3,2,65),[1,2,3]),#0111
((4100,2,3,4),[0,1,2,3]),((2,4100,3,4),[0,1,2,3]),((2,3,4100,4),[0,1,2,3]),((2,3,4,4100),[0,1,2,3]),((128,1,3,3), [0,1,2,3]),#1111 ((4100,2,3,4),[0,1,2,3]),((2,4100,3,4),[0,1,2,3]),((2,3,4100,4),[0,1,2,3]),((2,3,4,4100),[0,1,2,3]),((128,1,3,3), [0,1,2,3]),#1111
......
import operator
import sys import sys
import numpy import numpy
...@@ -213,20 +214,29 @@ def test_huge_elemwise_fusion(): ...@@ -213,20 +214,29 @@ def test_huge_elemwise_fusion():
""" """
shape = (2, 3, 4, 5, 6) shape = (2, 3, 4, 5, 6)
ttype = tensor.tensor(dtype='float32', broadcastable=(False,) * len(shape)) ttype = tensor.tensor(dtype='float32', broadcastable=(False,) * len(shape))
vars = [tensor.tanh(ttype) for x in range(7)] gpu_ptr_size = theano.sandbox.cuda.opt.get_device_type_sizes()['gpu_ptr_size']
f = pfunc(vars, [vars[0] - vars[1] - vars[2] - vars[3] - vars[4] - if gpu_ptr_size == 8:
vars[5] - vars[6]], mode=mode_with_gpu) nb_in = 7
len_topo = 10
elif gpu_ptr_size == 4:
nb_in = 8
len_topo = 11
else:
raise Exception("Unexpected value for gpu_ptr_size", gpu_ptr_size)
vars = [tensor.tanh(ttype) for x in range(nb_in)]
f = pfunc(vars, [reduce(operator.sub, vars)], mode=mode_with_gpu)
topo = f.maker.fgraph.toposort() topo = f.maker.fgraph.toposort()
#theano.printing.debugprint(f) #theano.printing.debugprint(f)
#for i, node in enumerate(topo): #for i, node in enumerate(topo):
# print >> sys.stdout, i, node # print >> sys.stdout, i, node
assert len(topo) == 10 assert len(topo) == len_topo
assert sum([isinstance(node.op, cuda.GpuElemwise) for node in topo]) == 2 assert sum([isinstance(node.op, cuda.GpuElemwise) for node in topo]) == 2
assert isinstance(topo[7].op.scalar_op, theano.scalar.basic.Sub) assert isinstance(topo[-3].op.scalar_op, theano.scalar.basic.Sub)
assert isinstance(topo[8].op.scalar_op, theano.scalar.basic.Composite) assert isinstance(topo[-2].op.scalar_op, theano.scalar.basic.Composite)
#let debugmode catch errors #let debugmode catch errors
gen = lambda: theano._asarray(numpy.random.rand(*shape), dtype='float32') gen = lambda: theano._asarray(numpy.random.rand(*shape), dtype='float32')
f(gen(), gen(), gen(), gen(), gen(), gen(), gen()) f(*[gen() for i in range(nb_in)])
# Test the case where we can't put the computation on the gpu! their is too # Test the case where we can't put the computation on the gpu! their is too
# many dimensions to the input to have 2 inputs to the op! # many dimensions to the input to have 2 inputs to the op!
......
...@@ -3,12 +3,12 @@ import os ...@@ -3,12 +3,12 @@ import os
import numpy import numpy
import theano import theano
from theano import Op, Type, Apply, Variable, Constant from theano import Op, Apply
from theano import tensor, scalar, config from theano import tensor, scalar, config
from theano.scalar import Scalar from theano.scalar import Scalar
from theano.tensor.basic import Alloc from theano.tensor.basic import Alloc
from theano.gof.python25 import all, any from theano.gof.python25 import any
from theano.gof.utils import MethodNotDefined from theano.gof.utils import MethodNotDefined
from theano.compat import PY3 from theano.compat import PY3
...@@ -257,7 +257,7 @@ class GpuFromHost(Op): ...@@ -257,7 +257,7 @@ class GpuFromHost(Op):
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
ev, = eval_points ev, = eval_points
if isintance(ev, GpuArrayType): if isinstance(ev, GpuArrayType):
return [host_from_gpu(ev)] return [host_from_gpu(ev)]
else: else:
return ev return ev
...@@ -317,7 +317,7 @@ class GpuFromCuda(Op): ...@@ -317,7 +317,7 @@ class GpuFromCuda(Op):
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
ev, = eval_points ev, = eval_points
if isintance(ev, GpuArrayType): if isinstance(ev, GpuArrayType):
return [cuda_from_gpu(ev)] return [cuda_from_gpu(ev)]
else: else:
return ev return ev
...@@ -651,6 +651,36 @@ class GpuAlloc(HideC, Alloc): ...@@ -651,6 +651,36 @@ class GpuAlloc(HideC, Alloc):
def c_code_cache_version(self): def c_code_cache_version(self):
return (2,) return (2,)
def do_constant_folding(self, node):
for client in node.outputs[0].clients:
if client[0] == 'output':
# If the output is a constant, it will have to be deepcopied
# each time the function is called. So we do not fold.
return False
elif (#The following ops work inplace of their input id 0.
client[1] == 0 and
isinstance(client[0].op, (
#Ops that will work inplace on the Alloc. So if they
#get constant_folded, they would copy the
#constant and this is less efficients.
#Not doing the constant folding could also lower
#the peak memory usage, as we the "constant" won't
#always exists.
#theano.tensor.subtensor.AdvancedIncSubtensor,
theano.sandbox.gpuarray.subtensor.GpuIncSubtensor,
#theano.sandbox.gpuarray.subtensor.GpuAdvancedIncSubtensor1,
theano.sandbox.gpuarray.blas.GpuGemm,
theano.sandbox.gpuarray.blas.GpuGemv,
#theano.sandbox.gpuarray.blas.GpuGer, Not Yet implemented
))):
return False
#If the clients is a transfer, we don't want to fold. We
#let the moving opt finish before deciding what to do.
elif isinstance(client[0].op, HostFromGpu):
return False
return True
gpu_alloc = GpuAlloc() gpu_alloc = GpuAlloc()
......
...@@ -200,13 +200,13 @@ from theano.gof import local_optimizer, LocalOptGroup ...@@ -200,13 +200,13 @@ from theano.gof import local_optimizer, LocalOptGroup
from theano.tensor.opt import in2out from theano.tensor.opt import in2out
@local_optimizer([gpugemv_no_inplace]) @local_optimizer([gpugemv_no_inplace], inplace=True)
def local_inplace_gpuagemv(node): def local_inplace_gpuagemv(node):
if node.op == gpugemv_no_inplace: if node.op == gpugemv_no_inplace:
return [gpugemv_inplace(*node.inputs)] return [gpugemv_inplace(*node.inputs)]
@local_optimizer([gpugemm_no_inplace]) @local_optimizer([gpugemm_no_inplace], inplace=True)
def local_inplace_gpuagemm(node): def local_inplace_gpuagemm(node):
if node.op == gpugemm_no_inplace: if node.op == gpugemm_no_inplace:
return [gpugemm_inplace(*node.inputs)] return [gpugemm_inplace(*node.inputs)]
......
...@@ -1281,7 +1281,10 @@ class GpuCAReduceCuda(HideC, CAReduce): ...@@ -1281,7 +1281,10 @@ class GpuCAReduceCuda(HideC, CAReduce):
n_threads.z += 1; n_threads.z += 1;
else else
break; break;
}""" % locals() }
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
""" % locals()
if len(self.reduce_mask) == 2: if len(self.reduce_mask) == 2:
threads_y = '' threads_y = ''
...@@ -1601,6 +1604,8 @@ class GpuCAReduceCuda(HideC, CAReduce): ...@@ -1601,6 +1604,8 @@ class GpuCAReduceCuda(HideC, CAReduce):
n_threads.z += 1; n_threads.z += 1;
} }
n_threads.z -= 1; n_threads.z -= 1;
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
dim3 n_blocks(1,1,1); dim3 n_blocks(1,1,1);
%(makecall)s %(makecall)s
...@@ -1697,7 +1702,7 @@ class GpuCAReduceCuda(HideC, CAReduce): ...@@ -1697,7 +1702,7 @@ class GpuCAReduceCuda(HideC, CAReduce):
""" % locals() """ % locals()
def c_code_cache_version_apply(self, node): def c_code_cache_version_apply(self, node):
version = [8] # the version corresponding to the c code in this Op version = [9] # the version corresponding to the c code in this Op
# now we insert versions for the ops on which we depend... # now we insert versions for the ops on which we depend...
scalar_node = Apply(self.scalar_op, scalar_node = Apply(self.scalar_op,
......
...@@ -341,17 +341,20 @@ def local_gpua_crossentropysoftmaxargmax1hotwithbias(node): ...@@ -341,17 +341,20 @@ def local_gpua_crossentropysoftmaxargmax1hotwithbias(node):
@op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx]) @op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx])
def local_gpua_crossentropysoftmax1hotwithbiasdx(node): def local_gpua_crossentropysoftmax1hotwithbiasdx(node):
return GpuCrossentropySoftmax1HotWithBiasDx() return GpuCrossentropySoftmax1HotWithBiasDx()
@register_opt() @register_opt()
@op_lifter([tensor.nnet.Softmax]) @op_lifter([tensor.nnet.Softmax])
def local_gpua_softmax(node): def local_gpua_softmax(node):
return GpuSoftmax() return GpuSoftmax()
@register_opt() @register_opt()
@op_lifter([tensor.nnet.SoftmaxWithBias]) @op_lifter([tensor.nnet.SoftmaxWithBias])
def local_gpua_softmaxwithbias(node): def local_gpua_softmaxwithbias(node):
return GpuSoftmaxWithBias() return GpuSoftmaxWithBias()
@register_opt() @register_opt()
@op_lifter([gpu_from_host, ConvOp]) @op_lifter([gpu_from_host, ConvOp])
def local_gpu_conv(node): def local_gpu_conv(node):
......
...@@ -32,11 +32,13 @@ if not theano.sandbox.gpuarray.pygpu_activated: ...@@ -32,11 +32,13 @@ if not theano.sandbox.gpuarray.pygpu_activated:
from theano.sandbox.gpuarray.type import (GpuArrayType, from theano.sandbox.gpuarray.type import (GpuArrayType,
gpuarray_shared_constructor) gpuarray_shared_constructor)
from theano.sandbox.gpuarray.basic_ops import (host_from_gpu, gpu_from_host, from theano.sandbox.gpuarray.basic_ops import (
gpu_alloc, gpu_from_cuda, host_from_gpu, gpu_from_host,
cuda_from_gpu, HostFromGpu, gpu_alloc, GpuAlloc,
GpuFromHost, GpuReshape, gpu_from_cuda,
GpuEye) cuda_from_gpu, HostFromGpu,
GpuFromHost, GpuReshape,
GpuEye)
from theano.tests import unittest_tools as utt from theano.tests import unittest_tools as utt
utt.seed_rng() utt.seed_rng()
...@@ -290,6 +292,13 @@ GpuAllocTester = makeTester( ...@@ -290,6 +292,13 @@ GpuAllocTester = makeTester(
) )
class TestAlloc(theano.tensor.tests.test_basic.TestAlloc):
dtype = "float32"
mode = mode_with_gpu
shared = staticmethod(gpuarray_shared_constructor)
allocs = [GpuAlloc, GpuAlloc, T.Alloc]
def test_shape(): def test_shape():
x = GpuArrayType(dtype='float32', broadcastable=[False, False, False])() x = GpuArrayType(dtype='float32', broadcastable=[False, False, False])()
v = gpuarray.zeros((3, 4, 5), dtype='float32') v = gpuarray.zeros((3, 4, 5), dtype='float32')
......
import unittest
from theano import scalar, gof from theano import scalar, gof
from theano.gof import FunctionGraph
from theano.gof.python25 import all, any from theano.gof.python25 import all, any
from theano.tests.unittest_tools import SkipTest
from theano.tensor.tests.test_elemwise import (test_Broadcast, test_DimShuffle, from theano.tensor.tests.test_elemwise import (test_Broadcast, test_DimShuffle,
test_CAReduce) test_CAReduce)
...@@ -126,11 +122,13 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY): ...@@ -126,11 +122,13 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY):
((4100,4,3),[1,2]),((5,4100,3),[1,2]),((5,4,4100),[1,2]),#011 ((4100,4,3),[1,2]),((5,4100,3),[1,2]),((5,4,4100),[1,2]),#011
#((4100,4,3),[0,2]),((5,4100,3),[0,2]),((5,4,4100),[0,2]),#101 ##not implemented #((4100,4,3),[0,2]),((5,4100,3),[0,2]),((5,4,4100),[0,2]),#101 ##not implemented
((4100,4,3),[0,1,2]),((5,4100,3),[0,1,2]),((5,4,4100),[0,1,2]),#111 ((4100,4,3),[0,1,2]),((5,4100,3),[0,1,2]),((5,4,4100),[0,1,2]),#111
((65,4,3),[0,1,2]),((5,65,3),[0,1,2]),((5,4,65),[0,1,2]),#111
((4100,4,3,2),[2,3]),((4,4100,3,2),[2,3]),((4,3,4100,2),[2,3]),((4,3,2,4100),[2,3]),#0011 ((4100,4,3,2),[2,3]),((4,4100,3,2),[2,3]),((4,3,4100,2),[2,3]),((4,3,2,4100),[2,3]),#0011
((4100,4,3,2),[1,3]),((4,4100,3,2),[1,3]),((4,3,4100,2),[1,3]),((4,3,2,4100),[1,3]),#0101 ((4100,4,3,2),[1,3]),((4,4100,3,2),[1,3]),((4,3,4100,2),[1,3]),((4,3,2,4100),[1,3]),#0101
((4100,4,3,2),[0,2,3]),((4,4100,3,2),[0,2,3]),((4,3,4100,2),[0,2,3]),#((4,3,2,4100),[0,2,3]),#1011 ((4100,4,3,2),[0,2,3]),((4,4100,3,2),[0,2,3]),((4,3,4100,2),[0,2,3]),#((4,3,2,4100),[0,2,3]),#1011
((4100,4,3,2),[1,2,3]),((4,4100,3,2),[1,2,3]),((4,3,4100,2),[1,2,3]),((4,3,2,4100),[1,2,3]),#0111 ((4100,4,3,2),[1,2,3]),((4,4100,3,2),[1,2,3]),((4,3,4100,2),[1,2,3]),((4,3,2,4100),[1,2,3]),#0111
((65,4,3,2),[1,2,3]),((4,65,3,2),[1,2,3]),((4,3,65,2),[1,2,3]),((4,3,2,65),[1,2,3]),#0111
((4100,2,3,4),[0,1,2,3]),((2,4100,3,4),[0,1,2,3]),((2,3,4100,4),[0,1,2,3]),((2,3,4,4100),[0,1,2,3]),((128,1,3,3), [0,1,2,3]),#1111 ((4100,2,3,4),[0,1,2,3]),((2,4100,3,4),[0,1,2,3]),((2,3,4100,4),[0,1,2,3]),((2,3,4,4100),[0,1,2,3]),((128,1,3,3), [0,1,2,3]),#1111
#test pattern implemented by reshape #test pattern implemented by reshape
......
...@@ -26,4 +26,6 @@ class G_subtensor(T_subtensor): ...@@ -26,4 +26,6 @@ class G_subtensor(T_subtensor):
dtype='float32', dtype='float32',
ignore_topo=(HostFromGpu, GpuFromHost, ignore_topo=(HostFromGpu, GpuFromHost,
DeepCopyOp)) DeepCopyOp))
# GPU opt can't run in fast_compile only.
self.fast_compile = False
assert self.sub == GpuSubtensor assert self.sub == GpuSubtensor
...@@ -26,8 +26,10 @@ if cuda_available: ...@@ -26,8 +26,10 @@ if cuda_available:
from theano.sandbox.cuda import (CudaNdarrayType, from theano.sandbox.cuda import (CudaNdarrayType,
float32_shared_constructor) float32_shared_constructor)
def matVecModM(A, s, m): def matVecModM(A, s, m):
return numpy.int32(numpy.sum((numpy.int64(A)*s) % m, 1) % m) assert A.dtype == 'int64'
return numpy.int32(numpy.sum((A*s) % m, 1) % m)
def multMatVect(v, A, m1, B, m2): def multMatVect(v, A, m1, B, m2):
...@@ -142,24 +144,30 @@ MASK2 = numpy.int32(65535) #2^16 - 1 ...@@ -142,24 +144,30 @@ MASK2 = numpy.int32(65535) #2^16 - 1
MULT2 = numpy.int32(21069) MULT2 = numpy.int32(21069)
NORM = 4.656612873077392578125e-10; #1./2^31 NORM = 4.656612873077392578125e-10; #1./2^31
A1p0 = numpy.asarray([[0, 4194304, 129], [1, 0, 0], [0, 1, 0]]) #A1p0 = numpy.asarray([[0, 4194304, 129], [1, 0, 0], [0, 1, 0]],
A2p0 = numpy.asarray([[32768, 0, 32769], [1, 0, 0], [0, 1, 0]]) # dtype='int64')
#A2p0 = numpy.asarray([[32768, 0, 32769], [1, 0, 0], [0, 1, 0]],
# dtype='int64')
A1p72 = numpy.asarray([[1516919229, 758510237, 499121365], A1p72 = numpy.asarray([[1516919229, 758510237, 499121365],
[1884998244, 1516919229, 335398200], [1884998244, 1516919229, 335398200],
[601897748, 1884998244, 358115744]]) [601897748, 1884998244, 358115744]],
dtype='int64')
A2p72 = numpy.asarray([[1228857673, 1496414766, 954677935], A2p72 = numpy.asarray([[1228857673, 1496414766, 954677935],
[1133297478, 1407477216, 1496414766], [1133297478, 1407477216, 1496414766],
[2002613992, 1639496704, 1407477216]]) [2002613992, 1639496704, 1407477216]],
dtype='int64')
A1p134 = numpy.asarray( A1p134 = numpy.asarray(
[[1702500920, 1849582496, 1656874625], [[1702500920, 1849582496, 1656874625],
[828554832, 1702500920, 1512419905], [828554832, 1702500920, 1512419905],
[1143731069, 828554832, 102237247]]) [1143731069, 828554832, 102237247]],
dtype='int64')
A2p134 = numpy.asarray( A2p134 = numpy.asarray(
[[796789021, 1464208080, 607337906], [[796789021, 1464208080, 607337906],
[1241679051, 1431130166, 1464208080], [1241679051, 1431130166, 1464208080],
[1401213391, 1178684362, 1431130166]]) [1401213391, 1178684362, 1431130166]],
dtype='int64')
np_int32_vals = [numpy.int32(i) for i in (0, 7, 9, 15, 16, 22, 24)] np_int32_vals = [numpy.int32(i) for i in (0, 7, 9, 15, 16, 22, 24)]
......
...@@ -909,7 +909,22 @@ class UnaryScalarOp(ScalarOp): ...@@ -909,7 +909,22 @@ class UnaryScalarOp(ScalarOp):
node.inputs[0].type != node.outputs[0].type): node.inputs[0].type != node.outputs[0].type):
raise theano.gof.utils.MethodNotDefined() raise theano.gof.utils.MethodNotDefined()
dtype = node.inputs[0].dtype dtype = node.inputs[0].type.dtype_specs()[1]
fct_call = self.c_code_contiguous_raw(dtype, 'n', 'x', 'z')
return """
{
npy_intp n = PyArray_SIZE(%(z)s);
%(dtype)s * x = (%(dtype)s*) PyArray_DATA(%(x)s);
%(dtype)s * z = (%(dtype)s*) PyArray_DATA(%(z)s);
%(fct_call)s;
}
""" % locals()
def c_code_contiguous_raw(self, dtype, n, i, o):
if not config.lib.amdlibm:
raise theano.gof.utils.MethodNotDefined()
if dtype.startswith('npy_'):
dtype = dtype[4:]
if dtype == 'float32' and self.amd_float32 is not None: if dtype == 'float32' and self.amd_float32 is not None:
dtype = 'float' dtype = 'float'
fct = self.amd_float32 fct = self.amd_float32
...@@ -918,12 +933,7 @@ class UnaryScalarOp(ScalarOp): ...@@ -918,12 +933,7 @@ class UnaryScalarOp(ScalarOp):
fct = self.amd_float64 fct = self.amd_float64
else: else:
raise theano.gof.utils.MethodNotDefined() raise theano.gof.utils.MethodNotDefined()
return """ return "%(fct)s(%(n)s, %(i)s, %(o)s)" % locals()
npy_intp n = PyArray_SIZE(%(z)s);
%(dtype)s * x = (%(dtype)s*) PyArray_DATA(%(x)s);
%(dtype)s * z = (%(dtype)s*) PyArray_DATA(%(z)s);
%(fct)s(n, x, z);
""" % locals()
class BinaryScalarOp(ScalarOp): class BinaryScalarOp(ScalarOp):
...@@ -2964,7 +2974,40 @@ class Composite(ScalarOp): ...@@ -2964,7 +2974,40 @@ class Composite(ScalarOp):
# We need to clone the graph as sometimes its nodes already # We need to clone the graph as sometimes its nodes already
# contain a reference to an fgraph. As we want the Composite # contain a reference to an fgraph. As we want the Composite
# to be pickable, we can't have reference to fgraph. # to be pickable, we can't have reference to fgraph.
inputs, outputs = gof.graph.clone(inputs, outputs)
# Also, if there is Composite in the inner graph, we want to
# remove them. In that case, we do a more complicated clone
# that will flatten Composite. We don't need to do this
# recusively, as the way the fusion optimizer work, we have
# only 1 new Composite each time at the output.
if len(outputs) > 1 or not any([isinstance(var.owner.op, Composite)
for var in outputs]):
# No inner Composite
inputs, outputs = gof.graph.clone(inputs, outputs)
else:
# Inner Composite that we need to flatten
assert len(outputs) == 1
# 1. Create a new graph from inputs up to the
# Composite
res = theano.compile.rebuild_collect_shared(
inputs=inputs,
outputs=outputs[0].owner.inputs,
copy_inputs_over=False) # Clone also the inputs
# 2. We continue this partial clone with the graph in
# the inner Composite
res2 = theano.compile.rebuild_collect_shared(
inputs=outputs[0].owner.op.inputs,
outputs=outputs[0].owner.op.outputs,
replace=dict(zip(outputs[0].owner.op.inputs, res[1]))
)
assert len(res2[1]) == len(outputs)
assert len(res[0]) == len(inputs)
assert res[0] != inputs
inputs, outputs = res[0], res2[1]
# Next assert comment just for speed
#assert not any([isinstance(node.op, Composite) for node in
# theano.gof.graph.ops(inputs, outputs)])
self.inputs = copy(inputs) self.inputs = copy(inputs)
self.outputs = copy(outputs) self.outputs = copy(outputs)
self.inputs_type = tuple([input.type for input in inputs]) self.inputs_type = tuple([input.type for input in inputs])
......
...@@ -68,19 +68,17 @@ class test_composite(unittest.TestCase): ...@@ -68,19 +68,17 @@ class test_composite(unittest.TestCase):
fn = gof.DualLinker().accept(g).make_function() fn = gof.DualLinker().accept(g).make_function()
assert fn(1.0, 2.0) == 1.5 assert fn(1.0, 2.0) == 1.5
# def test_sin(self): def test_flatten(self):
# x = inputs() #Test that we flatten multiple Composite.
# e = sin(x) x, y, z = inputs()
# C = Composite([x], [e]) C = Composite([x, y], [x + y])
# c = C.make_node(x) CC = Composite([x, y], [C(x * y, y)])
# # print c.c_code(['x'], ['z'], dict(id = 0)) assert not isinstance(CC.outputs[0].owner.op, Composite)
# g = FunctionGraph([x], [c.out])
# fn = gof.DualLinker().accept(g).make_function() # Test with multiple outputs
# assert fn(0) == 0 CC = Composite([x, y, z], [C(x * y, y), C(x * z, y)])
# assert fn(3.14159265358/2) == 1 #We don't flatten that case.
# assert fn(3.14159265358) == 0 assert isinstance(CC.outputs[0].owner.op, Composite)
# WRITEME: Test for sin, pow, and other scalar ops.
def test_with_constants(self): def test_with_constants(self):
x, y, z = inputs() x, y, z = inputs()
......
差异被折叠。
...@@ -173,12 +173,9 @@ SOMEPATH/Canopy_64bit/User/lib/python2.7/site-packages/numpy/distutils/system_in ...@@ -173,12 +173,9 @@ SOMEPATH/Canopy_64bit/User/lib/python2.7/site-packages/numpy/distutils/system_in
warnings.warn('Specified path %s is invalid.' % d) warnings.warn('Specified path %s is invalid.' % d)
""" """
#I'm not able to remove all printed stuff #I'm not able to remove all printed stuff
with_context = warnings.catch_warnings(record=True) with warnings.catch_warnings(record=True):
with_context.__enter__() numpy.distutils.system_info.system_info.verbosity = 0
try:
blas_info = numpy.distutils.system_info.get_info("blas_opt") blas_info = numpy.distutils.system_info.get_info("blas_opt")
finally:
with_context.__exit__(None, None, None)
# If we are in a EPD installation, mkl is available # If we are in a EPD installation, mkl is available
if "EPD" in sys.version: if "EPD" in sys.version:
...@@ -1193,32 +1190,31 @@ def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip=True): ...@@ -1193,32 +1190,31 @@ def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip=True):
# it also might be the case that there is a dimshuffle between the + # it also might be the case that there is a dimshuffle between the +
# and the dot22. local_dot_to_dot22 in particular will put in such things. # and the dot22. local_dot_to_dot22 in particular will put in such things.
if M.owner and isinstance(M.owner.op, T.DimShuffle): if (M.owner and isinstance(M.owner.op, T.DimShuffle) and
M.owner.inputs[0].owner and
isinstance(M.owner.inputs[0].owner.op, Dot22)):
MM = M.owner.inputs[0] MM = M.owner.inputs[0]
if tuple(M.owner.op.new_order) == (0,): if M.owner.op.new_order == (0,):
# it is making a column MM into a vector # it is making a column MM into a vector
if MM.owner and MM.owner.op == _dot22: MMl, MMr = MM.owner.inputs
MMl, MMr = MM.owner.inputs g = gemm_no_inplace(L.dimshuffle(0, 'x'),
g = gemm_no_inplace(L.dimshuffle(0, 'x'), alpha, MMl, MMr, beta)
alpha, MMl, MMr, beta) rval = [g.dimshuffle(0)]
rval = [g.dimshuffle(0)] return rval, MM
return rval, MM if M.owner.op.new_order == (1,):
if tuple(M.owner.op.new_order) == (1,):
# it is making a row MM into a vector # it is making a row MM into a vector
if MM.owner and MM.owner.op == _dot22: MMl, MMr = MM.owner.inputs
MMl, MMr = MM.owner.inputs g = gemm_no_inplace(L.dimshuffle('x', 0),
g = gemm_no_inplace(L.dimshuffle('x', 0), alpha, MMl, MMr, beta)
alpha, MMl, MMr, beta) rval = [g.dimshuffle(1)]
rval = [g.dimshuffle(1)] return rval, MM
return rval, MM if len(M.owner.op.new_order) == 0:
if tuple(M.owner.op.new_order) == ():
# it is making a row MM into a vector # it is making a row MM into a vector
if MM.owner and MM.owner.op == _dot22: MMl, MMr = MM.owner.inputs
MMl, MMr = MM.owner.inputs g = gemm_no_inplace(L.dimshuffle('x', 'x'),
g = gemm_no_inplace(L.dimshuffle('x', 'x'), alpha, MMl, MMr, beta)
alpha, MMl, MMr, beta) rval = [g.dimshuffle()]
rval = [g.dimshuffle()] return rval, MM
return rval, MM
# this is False'd out because of inadequate testing. # this is False'd out because of inadequate testing.
# TODO see ticket #237 # TODO see ticket #237
...@@ -1382,29 +1378,31 @@ def _gemm_from_factored_list(lst): ...@@ -1382,29 +1378,31 @@ def _gemm_from_factored_list(lst):
"""Returns None, or a list to replace node.outputs """Returns None, or a list to replace node.outputs
""" """
# Make every pair in list have matching dtypes
# sM can be a tuple of 2 elements or a theano variable.
# We should not use __len__ as theano variables don't support
# it. I don't want to change this to isinstance(sM, tuple)
# as I'm not able to make a test that triggers this case.
def is_pair(sM):
try:
s, M = sM
return True
except Exception:
return False
lst2 = [] lst2 = []
# Remove the tuple that can't be cast correctly. # Remove the tuple that can't be cast correctly.
# This can happen when we try to cast a complex to a real # This can happen when we try to cast a complex to a real
for sM in lst: for sM in lst:
if is_pair(sM): # Make every pair in list have matching dtypes
# sM can be a tuple of 2 elements or a theano variable.
if isinstance(sM, tuple):
sm0, sm1 = sM sm0, sm1 = sM
sm0 = T.as_tensor_variable(sm0) sm0 = T.as_tensor_variable(sm0)
if theano.scalar.upcast(sm0.dtype, sm1.dtype) == sm1.dtype: if theano.scalar.upcast(sm0.dtype, sm1.dtype) == sm1.dtype:
lst2.append((T.cast(sm0, sm1.dtype), sM[1])) lst2.append((T.cast(sm0, sm1.dtype), sM[1]))
lst = lst2 lst = lst2
def item_to_var(t):
try:
s, M = t
except Exception:
return t
if s == 1:
return M
if s == -1:
return -M
return s * M
# Try every pair in the sM_list, trying to turn it into a gemm operation # Try every pair in the sM_list, trying to turn it into a gemm operation
for i in xrange(len(lst) - 1): for i in xrange(len(lst) - 1):
s_i, M_i = lst[i] s_i, M_i = lst[i]
...@@ -1421,16 +1419,6 @@ def _gemm_from_factored_list(lst): ...@@ -1421,16 +1419,6 @@ def _gemm_from_factored_list(lst):
s_j, M_j) s_j, M_j)
#print 'GOT IT', gemm_of_sM_list #print 'GOT IT', gemm_of_sM_list
if gemm_of_sM_list: if gemm_of_sM_list:
def item_to_var(t):
try:
s, M = t
except Exception:
return t
if s == 1:
return M
if s == -1:
return -M
return s * M
assert len(gemm_of_sM_list) == 1 assert len(gemm_of_sM_list) == 1
add_inputs = [item_to_var(input) add_inputs = [item_to_var(input)
...@@ -1715,20 +1703,19 @@ def local_dot_to_dot22(node): ...@@ -1715,20 +1703,19 @@ def local_dot_to_dot22(node):
_logger.info('Not optimizing dot with inputs %s %s %s %s', _logger.info('Not optimizing dot with inputs %s %s %s %s',
x, y, x.type, y.type) x, y, x.type, y.type)
@local_optimizer([gemm_no_inplace], inplace=True)
@local_optimizer([gemm_no_inplace])
def local_inplace_gemm(node): def local_inplace_gemm(node):
if node.op == gemm_no_inplace: if node.op == gemm_no_inplace:
return [gemm_inplace(*node.inputs)] return [gemm_inplace(*node.inputs)]
@local_optimizer([gemv_no_inplace]) @local_optimizer([gemv_no_inplace], inplace=True)
def local_inplace_gemv(node): def local_inplace_gemv(node):
if node.op == gemv_no_inplace: if node.op == gemv_no_inplace:
return [gemv_inplace(*node.inputs)] return [gemv_inplace(*node.inputs)]
@local_optimizer([ger]) @local_optimizer([ger], inplace=True)
def local_inplace_ger(node): def local_inplace_ger(node):
if node.op == ger: if node.op == ger:
return [ger_destructive(*node.inputs)] return [ger_destructive(*node.inputs)]
......
...@@ -774,8 +774,7 @@ class Elemwise(OpenMPOp): ...@@ -774,8 +774,7 @@ class Elemwise(OpenMPOp):
super(Elemwise, self).perform(node, inputs, output_storage) super(Elemwise, self).perform(node, inputs, output_storage)
maxsize = max(len(input.shape) for input in inputs) maxsize = max(len(input.shape) for input in inputs)
for dims in izip(*[([(1, True)] * (maxsize - len(input.shape)) for dims in izip(*[zip(input.shape, sinput.type.broadcastable)
+ zip(input.shape, sinput.type.broadcastable))
for input, sinput in zip(inputs, node.inputs)]): for input, sinput in zip(inputs, node.inputs)]):
if max(d for d, b in dims) != 1 and (1, False) in dims: if max(d for d, b in dims) != 1 and (1, False) in dims:
# yes there may be more compact ways to write this code, # yes there may be more compact ways to write this code,
...@@ -808,34 +807,36 @@ class Elemwise(OpenMPOp): ...@@ -808,34 +807,36 @@ class Elemwise(OpenMPOp):
out_shape.append(max(values)) out_shape.append(max(values))
out_shape = tuple(out_shape) out_shape = tuple(out_shape)
if not self.inplace_pattern: # Commented as we don't reuse outputs now.
for output, storage in izip(node.outputs, output_storage): #
odat = storage[0] # if not self.inplace_pattern:
if odat is not None: # for output, storage in izip(node.outputs, output_storage):
if odat.shape != out_shape: # odat = storage[0]
# It is unsafe to try to resize odat, # if odat is not None:
# we have to allocate output storage. # if odat.shape != out_shape:
odat = None # # It is unsafe to try to resize odat,
if odat is None: # # we have to allocate output storage.
odat = numpy.ndarray(out_shape, dtype=output.type.dtype) # odat = None
storage[0] = odat # if odat is None:
else: # odat = numpy.ndarray(out_shape, dtype=output.type.dtype)
for i, (output, storage) in enumerate( # storage[0] = odat
izip(node.outputs, output_storage)): # else:
#i is an output idx # for i, (output, storage) in enumerate(
if i in self.inplace_pattern: # izip(node.outputs, output_storage)):
odat = inputs[self.inplace_pattern[i]] # #i is an output idx
else: # if i in self.inplace_pattern:
odat = storage[0] # odat = inputs[self.inplace_pattern[i]]
if odat is not None: # else:
if odat.shape != out_shape: # odat = storage[0]
# It is unsafe to try to resize odat, # if odat is not None:
# we have to allocate output storage. # if odat.shape != out_shape:
odat = None # # It is unsafe to try to resize odat,
if odat is None: # # we have to allocate output storage.
odat = numpy.ndarray(out_shape, # odat = None
dtype=output.type.dtype) # if odat is None:
storage[0] = odat # odat = numpy.ndarray(out_shape,
# dtype=output.type.dtype)
# storage[0] = odat
ufunc_args = inputs # + output_storage ufunc_args = inputs # + output_storage
if self.nfunc and len(inputs) == self.nfunc_spec[1]: if self.nfunc and len(inputs) == self.nfunc_spec[1]:
...@@ -860,26 +861,25 @@ class Elemwise(OpenMPOp): ...@@ -860,26 +861,25 @@ class Elemwise(OpenMPOp):
if nout == 1: if nout == 1:
variables = [variables] variables = [variables]
i = 0
for variable, storage, nout in izip(variables, output_storage, for variable, storage, nout in izip(variables, output_storage,
node.outputs): node.outputs):
if str(getattr(variable, "dtype", "")) == 'object': if getattr(variable, "dtype", "") == 'object':
# Since numpy 1.6, function created with numpy.frompyfunc # Since numpy 1.6, function created with numpy.frompyfunc
# always return an ndarray with dtype object # always return an ndarray with dtype object
variable = numpy.asarray(variable, dtype=nout.dtype) variable = numpy.asarray(variable, dtype=nout.dtype)
# The storage has been resized earlier. if i in self.inplace_pattern:
if hasattr(variable, 'shape'): odat = inputs[self.inplace_pattern[i]]
assert storage[0].shape == variable.shape odat[...] = variable
storage[0] = odat
# Sometimes NumPy return a Python type.
elif not isinstance(variable, numpy.ndarray):
variable = numpy.asarray(variable, nout.dtype)
storage[0] = variable
else: else:
# If variable has not shape, then it is a scalar. storage[0] = variable
assert numpy.prod(storage[0].shape) == 1 i += 1
storage[0][...] = variable
assert str(storage[0].dtype) != 'object'
# the following should be used instead of the previous loop,
# unfortunately it tends to segfault
# self.ufunc(*(ufunc_args+[s[0] for s in output_storage]))
def infer_shape(self, node, i_shapes): def infer_shape(self, node, i_shapes):
rval = [] rval = []
......
...@@ -571,6 +571,8 @@ def repeat(x, repeats, axis=None): ...@@ -571,6 +571,8 @@ def repeat(x, repeats, axis=None):
:param axis: int, optional. :param axis: int, optional.
:see: :func:`tensor.tile <tensor.tile>`
.. versionadded:: 0.6 .. versionadded:: 0.6
""" """
return RepeatOp(axis=axis)(x, repeats) return RepeatOp(axis=axis)(x, repeats)
......
...@@ -95,7 +95,7 @@ class SoftmaxWithBias(gof.Op): ...@@ -95,7 +95,7 @@ class SoftmaxWithBias(gof.Op):
return ['<iostream>', '<cmath>'] return ['<iostream>', '<cmath>']
@staticmethod @staticmethod
def c_code_template(): def c_code_template(dtype):
# this implementation was lifted from # this implementation was lifted from
# /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx # /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx
...@@ -107,6 +107,10 @@ class SoftmaxWithBias(gof.Op): ...@@ -107,6 +107,10 @@ class SoftmaxWithBias(gof.Op):
#TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1] #TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1]
init_decl = """ init_decl = """
npy_intp* Nx = PyArray_DIMS(%(x)s); npy_intp* Nx = PyArray_DIMS(%(x)s);
npy_intp Sx = 0;
npy_intp Sb = 0;
npy_intp Ssm = 0;
if (PyArray_NDIM(%(x)s) != 2) if (PyArray_NDIM(%(x)s) != 2)
{ {
...@@ -151,6 +155,10 @@ class SoftmaxWithBias(gof.Op): ...@@ -151,6 +155,10 @@ class SoftmaxWithBias(gof.Op):
%(fail)s %(fail)s
} }
} }
Sx = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
Sb = PyArray_STRIDES(%(b)s)[0]/sizeof(dtype_%(b)s);
Ssm = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
""" """
begin_row_loop = """ begin_row_loop = """
...@@ -163,9 +171,7 @@ class SoftmaxWithBias(gof.Op): ...@@ -163,9 +171,7 @@ class SoftmaxWithBias(gof.Op):
const dtype_%(x)s* __restrict__ x_i = (dtype_%(x)s*)(PyArray_BYTES(%(x)s) + PyArray_STRIDES(%(x)s)[0] * i); const dtype_%(x)s* __restrict__ x_i = (dtype_%(x)s*)(PyArray_BYTES(%(x)s) + PyArray_STRIDES(%(x)s)[0] * i);
const dtype_%(b)s* __restrict__ b_i = (dtype_%(b)s*)(PyArray_BYTES(%(b)s)); const dtype_%(b)s* __restrict__ b_i = (dtype_%(b)s*)(PyArray_BYTES(%(b)s));
dtype_%(sm) s* __restrict__ sm_i = (dtype_%(sm)s*)(PyArray_BYTES(%(sm)s) + PyArray_STRIDES(%(sm)s)[0] * i); dtype_%(sm) s* __restrict__ sm_i = (dtype_%(sm)s*)(PyArray_BYTES(%(sm)s) + PyArray_STRIDES(%(sm)s)[0] * i);
"""
inside_row_loop = """
npy_intp Sx = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s); npy_intp Sx = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
npy_intp Sb = PyArray_STRIDES(%(b)s)[0]/sizeof(dtype_%(b)s); npy_intp Sb = PyArray_STRIDES(%(b)s)[0]/sizeof(dtype_%(b)s);
npy_intp Ssm = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s); npy_intp Ssm = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
...@@ -182,6 +188,9 @@ class SoftmaxWithBias(gof.Op): ...@@ -182,6 +188,9 @@ class SoftmaxWithBias(gof.Op):
row_max = (row_ij > row_max) ? row_ij : row_max; row_max = (row_ij > row_max) ? row_ij : row_max;
} }
"""
inside_row_loop = """
for (j = 0; j < Nx[1]; ++j) for (j = 0; j < Nx[1]; ++j)
{ {
dtype_%(sm)s row_ij = x_i[j * Sx] + b_i[j * Sb]; dtype_%(sm)s row_ij = x_i[j * Sx] + b_i[j * Sb];
...@@ -201,6 +210,42 @@ class SoftmaxWithBias(gof.Op): ...@@ -201,6 +210,42 @@ class SoftmaxWithBias(gof.Op):
""" """
# Get the vectorized version of exp if it exist
try:
vec_exp = theano.scalar.exp.c_code_contiguous_raw(dtype,
"Nx[1]", "sm_i", "sm_i")
inside_row_loop_contig = """
for (j = 0; j < Nx[1]; ++j)
{
dtype_%%(sm)s row_ij = x_i[j * Sx] + b_i[j * Sb];
//std::cout << "2 " << j << " " << row_ij << " " << row_max << "\\n";
dtype_%%(sm)s sm_ij = row_ij - row_max;
//std::cout << "3 " << j << " " << sm_ij << "\\n";
sm_i[j * Ssm] = sm_ij;
}
%(vec_exp)s;
for (j = 0; j < Nx[1]; ++j)
{
sum += sm_i[j * Ssm];
}
//cblas_dscal(x.N, 1.0 / sum, &mat_at(s,i,0), s.n);
double sum_inv = 1.0 / sum;
for (j = 0; j < Nx[1]; ++j)
{
sm_i[j * Ssm] *= sum_inv;
}
""" % locals()
inside_row_loop = """
if(Ssm == 1){
%(inside_row_loop_contig)s
}else{
%(inside_row_loop)s
}
""" % locals()
except theano.gof.utils.MethodNotDefined:
pass
end_row_loop = """ end_row_loop = """
} }
""" """
...@@ -210,12 +255,13 @@ class SoftmaxWithBias(gof.Op): ...@@ -210,12 +255,13 @@ class SoftmaxWithBias(gof.Op):
def c_code(self, node, name, inp, out, sub): def c_code(self, node, name, inp, out, sub):
x, b = inp x, b = inp
sm, = out sm, = out
code_template = ''.join(self.c_code_template()) code_template = ''.join(self.c_code_template(
node.inputs[0].type.dtype_specs()[1]))
return code_template % dict(locals(), **sub) return code_template % dict(locals(), **sub)
@staticmethod @staticmethod
def c_code_cache_version(): def c_code_cache_version():
return (6,) return (8,)
softmax_with_bias = SoftmaxWithBias() softmax_with_bias = SoftmaxWithBias()
...@@ -384,7 +430,7 @@ class Softmax(gof.Op): ...@@ -384,7 +430,7 @@ class Softmax(gof.Op):
return ['<iostream>', '<cmath>'] return ['<iostream>', '<cmath>']
@staticmethod @staticmethod
def c_code_template(): def c_code_template(dtype):
# this implementation was lifted from # this implementation was lifted from
# /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx # /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx
...@@ -396,6 +442,8 @@ class Softmax(gof.Op): ...@@ -396,6 +442,8 @@ class Softmax(gof.Op):
#TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1] #TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1]
init_decl = """ init_decl = """
npy_intp* Nx = PyArray_DIMS(%(x)s); npy_intp* Nx = PyArray_DIMS(%(x)s);
npy_intp Sx1 = 0;
npy_intp Ssm1 = 0;
if (PyArray_NDIM(%(x)s) != 2) if (PyArray_NDIM(%(x)s) != 2)
{ {
...@@ -413,7 +461,7 @@ class Softmax(gof.Op): ...@@ -413,7 +461,7 @@ class Softmax(gof.Op):
|| (PyArray_DIMS(%(sm)s)[0] != PyArray_DIMS(%(x)s)[0]) || (PyArray_DIMS(%(sm)s)[0] != PyArray_DIMS(%(x)s)[0])
|| (PyArray_DIMS(%(sm)s)[1] != PyArray_DIMS(%(x)s)[1])) || (PyArray_DIMS(%(sm)s)[1] != PyArray_DIMS(%(x)s)[1]))
{ {
if (NULL != %(sm)s) Py_XDECREF(%(sm)s); Py_XDECREF(%(sm)s);
%(sm)s = (PyArrayObject*)PyArray_SimpleNew(2, PyArray_DIMS(%(x)s), %(sm)s = (PyArrayObject*)PyArray_SimpleNew(2, PyArray_DIMS(%(x)s),
type_num_%(x)s); type_num_%(x)s);
if(!%(sm)s) { if(!%(sm)s) {
...@@ -422,6 +470,8 @@ class Softmax(gof.Op): ...@@ -422,6 +470,8 @@ class Softmax(gof.Op):
%(fail)s %(fail)s
} }
} }
Sx1 = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
Ssm1 = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
""" """
begin_row_loop = """ begin_row_loop = """
...@@ -433,11 +483,6 @@ class Softmax(gof.Op): ...@@ -433,11 +483,6 @@ class Softmax(gof.Op):
const dtype_%(x)s* __restrict__ x_i = (dtype_%(x)s*)(PyArray_BYTES(%(x)s) + PyArray_STRIDES(%(x)s)[0] * i); const dtype_%(x)s* __restrict__ x_i = (dtype_%(x)s*)(PyArray_BYTES(%(x)s) + PyArray_STRIDES(%(x)s)[0] * i);
dtype_%(sm) s* __restrict__ sm_i = (dtype_%(sm)s*)(PyArray_BYTES(%(sm)s) + PyArray_STRIDES(%(sm)s)[0] * i); dtype_%(sm) s* __restrict__ sm_i = (dtype_%(sm)s*)(PyArray_BYTES(%(sm)s) + PyArray_STRIDES(%(sm)s)[0] * i);
"""
inside_row_loop = """
npy_intp Sx = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
npy_intp Ssm = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
size_t row_max_j=0; size_t row_max_j=0;
dtype_%(sm)s row_max = x_i[0]; dtype_%(sm)s row_max = x_i[0];
...@@ -445,46 +490,82 @@ class Softmax(gof.Op): ...@@ -445,46 +490,82 @@ class Softmax(gof.Op):
// Get the maximum value of the row // Get the maximum value of the row
for (j = 1; j < Nx[1]; ++j) for (j = 1; j < Nx[1]; ++j)
{ {
dtype_%(sm)s row_ij = x_i[j * Sx] ; dtype_%(sm)s row_ij = x_i[j * Sx1] ;
//std::cout << "1 " << row_ij << "\\n"; //std::cout << "1 " << row_ij << "\\n";
row_max_j = (row_ij > row_max) ? j : row_max_j; row_max_j = (row_ij > row_max) ? j : row_max_j;
row_max = (row_ij > row_max) ? row_ij : row_max; row_max = (row_ij > row_max) ? row_ij : row_max;
} }
"""
inside_row_loop = """
for (j = 0; j < Nx[1]; ++j) for (j = 0; j < Nx[1]; ++j)
{ {
dtype_%(sm)s row_ij = x_i[j * Sx] ; dtype_%(sm)s row_ij = x_i[j * Sx1] ;
//std::cout << "2 " << j << " " << row_ij << " " << row_max << "\\n"; //std::cout << "2 " << j << " " << row_ij << " " << row_max << "\\n";
dtype_%(sm)s sm_ij = exp(row_ij - row_max); dtype_%(sm)s sm_ij = exp(row_ij - row_max);
//std::cout << "3 " << j << " " << sm_ij << "\\n"; //std::cout << "3 " << j << " " << sm_ij << "\\n";
sum += sm_ij; sum += sm_ij;
sm_i[j * Ssm] = sm_ij; sm_i[j * Ssm1] = sm_ij;
} }
//cblas_dscal(x.N, 1.0 / sum, &mat_at(s,i,0), s.n); //cblas_dscal(x.N, 1.0 / sum, &mat_at(s,i,0), s.n);
double sum_inv = 1.0 / sum; double sum_inv = 1.0 / sum;
for (j = 0; j < Nx[1]; ++j) for (j = 0; j < Nx[1]; ++j)
{ {
sm_i[j * Ssm] *= sum_inv; sm_i[j * Ssm1] *= sum_inv;
} }
""" """
# Get the vectorized version of exp if it exist
try:
vec_exp = theano.scalar.exp.c_code_contiguous_raw(dtype,
"Nx[1]", "sm_i", "sm_i")
inside_row_loop_contig = """
for (j = 0; j < Nx[1]; ++j)
{
sm_i[j * Ssm1] = x_i[j * Sx1] - row_max;
}
%(vec_exp)s;
for (j = 0; j < Nx[1]; ++j)
{
sum += sm_i[j * Ssm1];
}
//cblas_dscal(x.N, 1.0 / sum, &mat_at(s,i,0), s.n);
double sum_inv = 1.0 / sum;
for (j = 0; j < Nx[1]; ++j)
{
sm_i[j * Ssm1] *= sum_inv;
}
""" % locals()
inside_row_loop = """
if(Ssm1 == 1){
%(inside_row_loop_contig)s
}else{
%(inside_row_loop)s
}
""" % locals()
except theano.gof.utils.MethodNotDefined:
pass
end_row_loop = """ end_row_loop = """
} }
""" """
return (init_decl, begin_row_loop, inside_row_loop, end_row_loop) return (init_decl, begin_row_loop, inside_row_loop, end_row_loop)
def c_code(self, node, name, inp, out, sub): def c_code(self, node, name, inp, out, sub):
x, = inp x, = inp
sm, = out sm, = out
code_template = ''.join(self.c_code_template()) code_template = ''.join(self.c_code_template(
node.inputs[0].type.dtype_specs()[1]))
return code_template % dict(locals(), **sub) return code_template % dict(locals(), **sub)
@staticmethod @staticmethod
def c_code_cache_version(): def c_code_cache_version():
return (1,) return (3,)
softmax = Softmax() softmax = Softmax()
...@@ -863,7 +944,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op): ...@@ -863,7 +944,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
return ['<iostream>', '<cmath>'] return ['<iostream>', '<cmath>']
@staticmethod @staticmethod
def c_code_template(): def c_code_template(dtype):
# this implementation was lifted from # this implementation was lifted from
# /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx # /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx
...@@ -874,7 +955,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op): ...@@ -874,7 +955,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
#TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1] #TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1]
(init_decl, begin_row_loop, inside_row_loop, end_row_loop) = \ (init_decl, begin_row_loop, inside_row_loop, end_row_loop) = \
SoftmaxWithBias.c_code_template() SoftmaxWithBias.c_code_template(dtype)
return (init_decl, return (init_decl,
""" """
if (PyArray_NDIM(%(y_idx)s) != 1) if (PyArray_NDIM(%(y_idx)s) != 1)
...@@ -947,7 +1028,8 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op): ...@@ -947,7 +1028,8 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
nll, sm, am = out nll, sm, am = out
y_idx_type = node.inputs[2].type.dtype_specs()[1] y_idx_type = node.inputs[2].type.dtype_specs()[1]
am_type = y_idx_type am_type = y_idx_type
code_template = ''.join(self.c_code_template()) dtype = node.inputs[0].type.dtype_specs()[1]
code_template = ''.join(self.c_code_template(dtype))
return code_template % dict(locals(), **sub) return code_template % dict(locals(), **sub)
......
差异被折叠。
...@@ -1928,7 +1928,8 @@ class TestAlloc(unittest.TestCase): ...@@ -1928,7 +1928,8 @@ class TestAlloc(unittest.TestCase):
#AdvancedIncSubtensor1 #AdvancedIncSubtensor1
(some_matrix[arange(60)], 2), (some_matrix[arange(60)], 2),
#AdvancedIncSubtensor #AdvancedIncSubtensor
(some_matrix[idx, idx], 1)]): (some_matrix[idx, idx], 1)
]):
derp = sum(dot(subtensor, variables)) derp = sum(dot(subtensor, variables))
fobj = theano.function([some_vector], derp, mode=self.mode) fobj = theano.function([some_vector], derp, mode=self.mode)
...@@ -1936,14 +1937,18 @@ class TestAlloc(unittest.TestCase): ...@@ -1936,14 +1937,18 @@ class TestAlloc(unittest.TestCase):
fgrad = theano.function([some_vector], grad_derp, fgrad = theano.function([some_vector], grad_derp,
mode=self.mode) mode=self.mode)
topo_obj = fobj.maker.fgraph.toposort() topo_obj = fobj.maker.fgraph.toposort()
#<= is needed as the GPU currently don't implement
#AdvancedIncSubtensor. When this is the case it can be
#replaced with ==.
assert numpy.sum([isinstance(node.op, alloc) assert numpy.sum([isinstance(node.op, alloc)
for node in topo_obj]) == 0 for node in topo_obj]) <= 1
topo_grad = fgrad.maker.fgraph.toposort() topo_grad = fgrad.maker.fgraph.toposort()
#print subtensor #print subtensor
#theano.printing.debugprint(fgrad) #theano.printing.debugprint(fgrad)
assert numpy.sum([isinstance(node.op, alloc) assert numpy.sum([isinstance(node.op, alloc)
for node in topo_grad]) == n_alloc for node in topo_grad]) == n_alloc, (
alloc, subtensor, n_alloc, topo_grad)
fobj(test_params) fobj(test_params)
fgrad(test_params) fgrad(test_params)
...@@ -6736,6 +6741,17 @@ class TestTensorInstanceMethods(unittest.TestCase): ...@@ -6736,6 +6741,17 @@ class TestTensorInstanceMethods(unittest.TestCase):
# Test equivalent advanced indexing # Test equivalent advanced indexing
assert_array_equal(X[:,indices].eval({X: x}), x[:,indices]) assert_array_equal(X[:,indices].eval({X: x}), x[:,indices])
def test_cumsum(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.cumsum().eval({X: x}), x.cumsum())
def test_cumprod(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.cumprod().eval({X: x}), x.cumprod())
def test_norm(): def test_norm():
x = theano.tensor.vector('x') x = theano.tensor.vector('x')
n = x.norm(2) n = x.norm(2)
......
...@@ -1091,7 +1091,7 @@ class TestGemv(TestCase, unittest_tools.TestOptimizationMixin): ...@@ -1091,7 +1091,7 @@ class TestGemv(TestCase, unittest_tools.TestOptimizationMixin):
# Assert that the dot was optimized somehow # Assert that the dot was optimized somehow
self.assertFunctionContains0(f, T.dot) self.assertFunctionContains0(f, T.dot)
self.assertFunctionContains1(f, Gemv(False)) self.assertFunctionContains1(f, Gemv(True))
# Assert they produce the same output # Assert they produce the same output
assert numpy.allclose(f(), numpy.dot(v.get_value(), w.get_value())) assert numpy.allclose(f(), numpy.dot(v.get_value(), w.get_value()))
......
...@@ -164,7 +164,8 @@ class TensorType(Type): ...@@ -164,7 +164,8 @@ class TensorType(Type):
" Theano C code does not support that.", " Theano C code does not support that.",
msg, msg,
"object shape", data.shape, "object shape", data.shape,
"object strides", data.strides) "object strides", data.strides,
"object dtype", data.dtype)
i = 0 i = 0
for b in self.broadcastable: for b in self.broadcastable:
......
...@@ -11,6 +11,7 @@ from theano.tensor.utils import hash_from_ndarray ...@@ -11,6 +11,7 @@ from theano.tensor.utils import hash_from_ndarray
from theano.tensor.type import TensorType from theano.tensor.type import TensorType
class AsTensorError(TypeError): class AsTensorError(TypeError):
"""Raised when as_tensor_variable isn't able to create a """Raised when as_tensor_variable isn't able to create a
TensorVariable. TensorVariable.
...@@ -509,13 +510,11 @@ class _tensor_py_operators: ...@@ -509,13 +510,11 @@ class _tensor_py_operators:
def sort(self, axis=-1, kind='quicksort', order=None): def sort(self, axis=-1, kind='quicksort', order=None):
"""See `theano.tensor.sort`""" """See `theano.tensor.sort`"""
from theano.tensor.sort import sort return theano.tensor.sort(self, axis, kind, order)
return sort(self, axis, kind, order)
def argsort(self, axis=-1, kind='quicksort', order=None): def argsort(self, axis=-1, kind='quicksort', order=None):
"""See `theano.tensor.argsort`""" """See `theano.tensor.argsort`"""
from theano.tensor.sort import argsort return theano.tensor.argsort(self, axis, kind, order)
return argsort(self, axis, kind, order)
def clip(self, a_min, a_max): def clip(self, a_min, a_max):
"Clip (limit) the values in an array." "Clip (limit) the values in an array."
...@@ -529,16 +528,14 @@ class _tensor_py_operators: ...@@ -529,16 +528,14 @@ class _tensor_py_operators:
def repeat(self, repeats, axis=None): def repeat(self, repeats, axis=None):
"""See `theano.tensor.repeat`""" """See `theano.tensor.repeat`"""
from theano.tensor.extra_ops import repeat return theano.tensor.extra_ops.repeat(self, repeats, axis)
return repeat(self, repeats, axis)
def round(self, mode="half_away_from_zero"): def round(self, mode="half_away_from_zero"):
"""See `theano.tensor.round`""" """See `theano.tensor.round`"""
return theano.tensor.basic.round(self, mode) return theano.tensor.basic.round(self, mode)
def trace(self): def trace(self):
from theano.sandbox.linalg import trace return theano.sandbox.linalg.trace(self)
return trace(self)
# TO TRUMP NUMPY OPERATORS # TO TRUMP NUMPY OPERATORS
__array_priority__ = 1000 __array_priority__ = 1000
...@@ -549,6 +546,12 @@ class _tensor_py_operators: ...@@ -549,6 +546,12 @@ class _tensor_py_operators:
def zeros_like(model, dtype=None): def zeros_like(model, dtype=None):
return theano.tensor.basic.zeros_like(model, dtype=dtype) return theano.tensor.basic.zeros_like(model, dtype=dtype)
def cumsum(self, axis=None):
return theano.tensor.extra_ops.cumsum(self, axis)
def cumprod(self, axis=None):
return theano.tensor.extra_ops.cumprod(self, axis)
class TensorVariable(_tensor_py_operators, Variable): class TensorVariable(_tensor_py_operators, Variable):
"""Subclass to add the tensor operators to the basic `Variable` class.""" """Subclass to add the tensor operators to the basic `Variable` class."""
......
...@@ -62,7 +62,7 @@ import sys ...@@ -62,7 +62,7 @@ import sys
import time import time
import theano import theano
from theano.misc.windows import call_subprocess_Popen from theano.misc.windows import output_subprocess_Popen
def main(stdout=None, stderr=None, argv=None, theano_nose=None, def main(stdout=None, stderr=None, argv=None, theano_nose=None,
...@@ -271,19 +271,17 @@ def run(stdout, stderr, argv, theano_nose, batch_size, time_profile, ...@@ -271,19 +271,17 @@ def run(stdout, stderr, argv, theano_nose, batch_size, time_profile,
time.ctime(), test_id, data["ids"][test_id])) time.ctime(), test_id, data["ids"][test_id]))
f_rawlog.flush() f_rawlog.flush()
proc = call_subprocess_Popen( p_out = output_subprocess_Popen(
([python, theano_nose, '-v', '--with-id'] ([python, theano_nose, '-v', '--with-id']
+ [str(test_id)] + argv + + [str(test_id)] + argv +
['--disabdocstring']), ['--disabdocstring']))
# the previous option calls a custom Nosetests plugin # the previous option calls a custom Nosetests plugin
# precluding automatic sustitution of doc. string for # precluding automatic sustitution of doc. string for
# test name in display # test name in display
# (see class 'DisabDocString' in file theano-nose) # (see class 'DisabDocString' in file theano-nose)
stderr=subprocess.PIPE,
stdout=dummy_out.fileno())
# recovering and processing data from pipe # recovering and processing data from pipe
err = proc.stderr.read() err = p_out[1]
# print the raw log # print the raw log
f_rawlog.write(err) f_rawlog.write(err)
f_rawlog.flush() f_rawlog.flush()
......
...@@ -554,6 +554,52 @@ def test_disconnected_cost_grad(): ...@@ -554,6 +554,52 @@ def test_disconnected_cost_grad():
except theano.gradient.DisconnectedInputError: except theano.gradient.DisconnectedInputError:
return return
raise AssertionError("A disconnected gradient has been ignored.") raise AssertionError("A disconnected gradient has been ignored.")
def test_subgraph_grad():
# Tests that the grad method with no known_grads
# matches what happens if you use successive subgraph_grads
x = theano.tensor.fvector('x')
t = theano.tensor.fvector('t')
w1 = theano.shared(np.random.randn(3,4))
w2 = theano.shared(np.random.randn(4,2))
a1 = theano.tensor.tanh(theano.tensor.dot(x,w1))
a2 = theano.tensor.tanh(theano.tensor.dot(a1,w2))
cost2 = theano.tensor.sqr(a2 - t).sum()
cost2 += theano.tensor.sqr(w2.sum())
cost1 = theano.tensor.sqr(w1.sum())
params = [[w2],[w1]]
costs = [cost2,cost1]
grad_ends = [[a1], [x]]
inputs = [t, x]
rng = np.random.RandomState([2012, 11, 15])
values = [rng.randn(2), rng.randn(3)]
values = [np.cast[ipt.dtype](value) for ipt, value in zip(inputs, values)]
wrt = [w2, w1]
cost = cost2 + cost1
true_grads = theano.grad(cost, wrt)
true_grads = theano.function(inputs, true_grads)
true_grads = true_grads(*values)
from theano.gof.python25 import OrderedDict
next_grad = None
param_grads = []
for i in xrange(2):
param_grad, next_grad = theano.subgraph_grad(
wrt=params[i], end=grad_ends[i],
start=next_grad, cost=costs[i]
)
next_grad = OrderedDict(zip(grad_ends[i], next_grad))
param_grads.extend(param_grad)
pgrads = theano.function(inputs, param_grads)
pgrads = pgrads(*values)
for true_grad, pgrad in zip(true_grads, pgrads):
assert(np.sum(np.abs(true_grad - pgrad)) < 0.00001)
class TestConsiderConstant(unittest.TestCase): class TestConsiderConstant(unittest.TestCase):
......
...@@ -1136,3 +1136,214 @@ class T_graphstructures(unittest.TestCase): ...@@ -1136,3 +1136,214 @@ class T_graphstructures(unittest.TestCase):
assert e.owner.inputs[1] is mul_variable assert e.owner.inputs[1] is mul_variable
assert e.owner.inputs[1].owner.inputs[0] is y assert e.owner.inputs[1].owner.inputs[0] is y
assert e.owner.inputs[1].owner.inputs[1] is z assert e.owner.inputs[1].owner.inputs[1] is z
class T_scan(unittest.TestCase):
## All tests here belong to
## http://deeplearning.net/software/theano/tutorial/loop.html
## Theano/doc/tutorial/loop.txt
## Any change you do here also add it to the tutorial !
def test_elemwise(self):
# defining the tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W)+b_sym), \
sequences=X)
compute_elementwise = theano.function(inputs = [X, W, b_sym], \
outputs=[results])
# test values
x = numpy.eye(2)
w = numpy.ones((2,2))
b = numpy.ones((2))
b[1] = 2
print "Scan results:", compute_elementwise(x, w, b)[0]
# comparison with numpy
print "Numpy results:", numpy.tanh(x.dot(w) + b)
def test_sequence(self):
# define tensor variables
X = T.vector("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
Y = T.matrix("Y")
V = T.matrix("V")
P = T.matrix("P")
results, updates = theano.scan(lambda \
y,p,x_tm1:T.tanh(T.dot(x_tm1,W) + \
T.dot(y,U)+T.dot(p,V)), \
sequences=[Y,P[::-1]], outputs_info=[X])
compute_seq = theano.function(inputs = [X, W, Y, U, P, V], \
outputs=[results])
# test values
x = numpy.zeros((2))
x[1] = 1
w = numpy.ones((2,2))
y = numpy.ones((5,2))
y[0,:] = -3
u = numpy.ones((2,2))
p = numpy.ones((5,2))
p[0,:] = 3
v = numpy.ones((2,2))
print "Scan results", compute_seq(x,w,y,u,p,v)[0]
# comparison with numpy
x_res = numpy.zeros((5,2))
x_res[0] = numpy.tanh(x.dot(w) + y[0].dot(u) + p[4].dot(v))
for i in range(1,5):
x_res[i] = numpy.tanh(x_res[i-1].dot(w) \
+ y[i].dot(u) + p[4-i].dot(v))
print "Numpy results:", x_res
def test_norm(self):
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), \
sequences=[X])
compute_norm_lines = theano.function(inputs = [X], outputs=[results])
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), \
sequences=[X.T])
compute_norm_cols = theano.function(inputs = [X], outputs=[results])
# test value
x = numpy.diag(numpy.arange(1,6),1)
print "Scan results:", compute_norm_lines(x)[0], \
compute_norm_cols(x)[0]
# comparison with numpy
print "Numpy results:", numpy.sqrt((x**2).sum(1)), \
numpy.sqrt((x**2).sum(0))
def test_trace(self):
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda i, j, t_f:T.cast(X[i,j] + \
t_f, theano.config.floatX), \
sequences=[T.arange(X.shape[0]), \
T.arange(X.shape[1])], \
outputs_info=numpy.asarray(0., \
dtype=theano.config.floatX))
result = results[-1]
compute_trace = theano.function(inputs = [X], outputs=[result])
# test value
x = numpy.eye(5)
x[0] = numpy.arange(5)
print "Scan results:", compute_trace(x)[0]
# comparison with numpy
print "Numpy results:", numpy.diagonal(x).sum()
def test_taps(self):
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
V = T.matrix("V")
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda x_tm2,x_tm1:T.dot(x_tm2,U) \
+ T.dot(x_tm1,V) + T.tanh(T.dot(x_tm1,W) + b_sym), \
n_steps=n_sym, \
outputs_info=[dict(initial = X, taps = [-2,-1])])
compute_seq2 = theano.function(inputs = [X, U, V, W, b_sym, \
n_sym], outputs=[results])
# test values
x = numpy.zeros((2,2))
# the initial value must be able to return x[-2]
x[1,1] = 1
w = 0.5*numpy.ones((2,2))
u = 0.5*(numpy.ones((2,2))-numpy.eye(2))
v = 0.5*numpy.ones((2,2))
n = 10
b = numpy.ones((2))
print "Scan results:", compute_seq2(x,u,v,w,b,n)
# comparison with numpy
x_res = numpy.zeros((10,2))
x_res[0] = x[0].dot(u) + x[1].dot(v) + numpy.tanh(x[1].dot(w) + b)
x_res[1] = x[1].dot(u) + x_res[0].dot(v) \
+ numpy.tanh(x_res[0].dot(w) + b)
x_res[2] = x_res[0].dot(u) + x_res[1].dot(v) \
+ numpy.tanh(x_res[1].dot(w) + b)
for i in range(2,10):
x_res[i] = (x_res[i-2].dot(u) + x_res[i-1].dot(v) \
+ numpy.tanh(x_res[i-1].dot(w) + b))
print "Numpy results:", x_res
def test_jacobian(self):
# define tensor variables
v = T.vector()
A = T.matrix()
y = T.tanh(T.dot(v,A))
results, updates = theano.scan(lambda i:T.grad(y[i], v), \
sequences = [T.arange(y.shape[0])])
compute_jac_t = theano.function([A,v], [results], \
allow_input_downcast = True) # shape (d_out, d_in)
# test values
x = numpy.eye(5)[0]
w = numpy.eye(5,3)
w[2] = numpy.ones((3))
print "Scan results:", compute_jac_t(w,x)[0]
# compare with numpy
print "Numpy results:", ((1 - numpy.tanh(x.dot(w))**2)*w).T
def test_accumulator(self):
# define shared variables
k = theano.shared(0)
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda:{k:(k+1)}, n_steps=n_sym)
accumulator = theano.function([n_sym], [], updates=updates, \
allow_input_downcast = True)
print "Before 5 steps:", k.get_value()
accumulator(5)
print "After 5 steps:", k.get_value()
def test_random(self):
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
# define shared random stream
trng = T.shared_randomstreams.RandomStreams(1234)
d=trng.binomial(size=W[1].shape)
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W) \
+ b_sym)*d, sequences=X)
compute_with_bnoise = theano.function(inputs = [X, W, b_sym], \
outputs=[results], \
updates=updates, \
allow_input_downcast = True)
x = numpy.eye(10,2)
w = numpy.ones((2,2))
b = numpy.ones((2))
print compute_with_bnoise(x, w, b)
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论