提交 52057806 authored 作者: Yann N. Dauphin's avatar Yann N. Dauphin

merge

......@@ -20,7 +20,9 @@ since 2007. But it is also approachable enough to be used in the classroom
News
====
* Theano 0.6rc3 was released. Everybody is encouraged to update.
* Ian Goodfellow did a `12h class with exercises on Theano <https://github.com/goodfeli/theano_exercises>`_.
* Theano 0.6 was released. Everybody is encouraged to update.
* New technical report on Theano: `Theano: new features and speed improvements <http://arxiv.org/abs/1211.5590>`_.
However, please keep citing the other paper below in scientific work involving Theano.
......
......@@ -3,8 +3,8 @@
Easy Installation of an optimized Theano on Ubuntu
==================================================
These instruction was done for Ubuntu 11.04, 11.10 and 12.04. You can
probably do something similar on older computer.
These instruction was done for Ubuntu 11.04, 11.10, 12.04, 12.10, 13.04
and 13.10. You can probably do something similar on older computer.
.. note::
......@@ -49,7 +49,7 @@ probably do something similar on older computer.
Installation steps
~~~~~~~~~~~~~~~~~~
Ubuntu 11.10/12.04/12.10/13.04:
Ubuntu 11.10/12.04/12.10/13.04/13.10:
1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git``
2) ``sudo pip install Theano``
......@@ -236,15 +236,4 @@ Test GPU configuration
Ubuntu 12.10: default gcc version 4.7.2. gcc 4.4.7, 4.5.4 and 4.6.3 availables.
Ubuntu 13.10: default gcc version 4.8.1. gcc 4.4.7, 4.6.4 and 4.7.3 availables.
......@@ -607,6 +607,27 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.
have shape (2, 60).
.. function:: tile(x, reps, ndim=None)
Construct an array by repeating the input `x` according to `reps`
pattern.
Tiles its input according to `reps`. The length of `reps` is the
number of dimension of `x` and contains the number of times to
tile `x` in each dimension.
:see: `numpy.tile
<http://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html>`_
documentation for examples.
:see: :func:`theano.tensor.extra_ops.repeat
<theano.tensor.extra_ops.repeat>`
:note: Currently, `reps` must be a constant, `x.ndim` and
`len(reps)` must be equal and, if specified, `ndim` must be
equal to both.
Creating Tensor
===============
......@@ -1542,6 +1563,86 @@ Gradient / Differentiation
:rtype: variable or list of variables (matching `wrt`)
:returns: gradients of the cost with respect to each of the `wrt` terms
.. function:: subgraph_grad(wrt, end, start=None, cost=None, details=False)
With respect to `wrt`, computes gradients of cost and/or from existing
`start` gradients, up to the `end` variables of a symbolic digraph.
In other words, computes gradients for a subgraph of the
symbolic theano function. Ignores all disconnected inputs.
This can be useful when one needs to perform the gradient descent
iteratively (e.g. one layer at a time in an MLP), or when a particular
operation is not differentiable in theano (e.g. stochastic sampling
from a multinomial). In the latter case, the gradient of the
non-differentiable process could be approximated by user-defined
formula, which could be calculated using the gradients of a cost
with respect to samples (0s and 1s). These gradients are obtained
by performing a subgraph_grad from the `cost` or previously known gradients
(`start`) up to the outputs of the stochastic process (`end`).
A dictionary mapping gradients obtained from the user-defined
differentiation of the process, to variables, could then be fed into
another subgraph_grad as `start` with any other `cost` (e.g. weight decay).
In an MLP, we could use subgraph_grad to iteratively backpropagate:
>>> x, t = theano.tensor.fvector('x'), theano.tensor.fvector('t')
>>> w1 = theano.shared(np.random.randn(3,4))
>>> w2 = theano.shared(np.random.randn(4,2))
>>> a1 = theano.tensor.tanh(theano.tensor.dot(x,w1))
>>> a2 = theano.tensor.tanh(theano.tensor.dot(a1,w2))
>>> cost2 = theano.tensor.sqr(a2 - t).sum()
>>> cost2 += theano.tensor.sqr(w2.sum())
>>> cost1 = theano.tensor.sqr(w1.sum())
>>> params = [[w2],[w1]]
>>> costs = [cost2,cost1]
>>> grad_ends = [[a1], [x]]
>>> next_grad = None
>>> param_grads = []
>>> for i in xrange(2):
>>> param_grad, next_grad = theano.subgraph_grad(
>>> wrt=params[i], end=grad_ends[i],
>>> start=next_grad, cost=costs[i]
>>> )
>>> next_grad = dict(zip(grad_ends[i], next_grad))
>>> param_grads.extend(param_grad)
:type wrt : List of Variables.
Gradients are computed with respect to `wrt`.
:type end : List of Variables.
Theano variables at which to end gradient descent
(they are considered constant in theano.grad).
For convenience, the gradients with respect to these variables
are also returned.
:type start : Dictionary of Variables
:param start: If not None, a dictionary mapping variables to
their gradients. This is useful when the gradient on some
variables are known. These are used to compute the gradients
backwards up to the variables in `end`
(they are used as known_grad in theano.grad).
:type cost: Scalar (0-dimensional) Variable.
:param cost:
Additional costs for which to compute the gradients.
For example, these could be weight decay, an l1 constraint,
MSE, NLL, etc. May optionally be None if start is provided.
Warning : If the gradients of `cost` with respect to any
of the `start` variables is already part of the `start`
dictionary, then it may be counted twice with respect to `wrt`
and `end`.
:type details: bool.
:param details: When True, additionally returns the
list of gradients from `start` and of `cost`, respectively,
with respect to `wrt` (not `end`).
:rtype: Tuple of 2 or 4 Lists of Variables
:return: Returns lists of gradients with respect to `wrt` and `end`,
respectively.
.. _R_op_list:
......
......@@ -24,6 +24,246 @@ Scan
The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
**Scan Example: Computing tanh(x(t).dot(W) + b) elementwise**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# defining the tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W)+b_sym), sequences=X)
compute_elementwise = theano.function(inputs = [X, W, b_sym], outputs=[results])
# test values
x = np.eye(2)
w = np.ones((2,2))
b = np.ones((2))
b[1] = 2
print compute_elementwise(x, w, b)[0]
# comparison with numpy
print np.tanh(x.dot(w) + b)
**Scan Example: Computing the sequence x(t) = tanh(x(t-1).dot(W) + y(t).dot(U) + p(T-t).dot(V))**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
X = T.vector("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
Y = T.matrix("Y")
V = T.matrix("V")
P = T.matrix("P")
results, updates = theano.scan(lambda
y,p,x_tm1:T.tanh(T.dot(x_tm1,W)+T.dot(y,U)+T.dot(p,V)),
sequences=[Y,P[::-1]], outputs_info=[X])
compute_seq = theano.function(inputs = [X, W, Y, U, P, V], outputs=[results])
# test values
x = np.zeros((2))
x[1] = 1
w = np.ones((2,2))
y = np.ones((5,2))
y[0,:] = -3
u = np.ones((2,2))
p = np.ones((5,2))
p[0,:] = 3
v = np.ones((2,2))
print compute_seq(x,w,y,u,p,v)[0]
# comparison with numpy
x_res = np.zeros((5,2))
x_res[0] = np.tanh(x.dot(w) + y[0].dot(u) + p[4].dot(v))
for i in range(1,5):
x_res[i] = np.tanh(x_res[i-1].dot(w) + y[i].dot(u) + p[4-i].dot(v))
**Scan Example: Computing norms of lines of X**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), sequences=[X])
compute_norm_lines = theano.function(inputs = [X], outputs=[results])
# test value
x = np.diag(np.arange(1,6),1)
print compute_norm_lines(x)[0]
# comparison with numpy
print np.sqrt((x**2).sum(1))
**Scan Example: Computing norms of columns of X**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), sequences=[X.T])
compute_norm_cols = theano.function(inputs = [X], outputs=[results])
# test value
x = np.diag(np.arange(1,6),1)
print compute_norm_cols(x)[0]
# comparison with numpy
print np.sqrt((x**2).sum(0))
**Scan Example: Computing trace of X**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
floatX = "float32"
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda i, j, t_f:T.cast(X[i,j]+t_f, floatX), \
sequences=[T.arange(X.shape[0]), T.arange(X.shape[1])], \
outputs_info=np.asarray(0., dtype=floatX))
result = results[-1]
compute_trace = theano.function(inputs = [X], outputs=[result])
# test value
x = np.eye(5)
x[0] = np.arange(5)
compute_trace(x)[0]
# comparison with numpy
print np.diagonal(x).sum()
**Scan Example: Computing the sequence x(t) = x(t-2).dot(U) + x(t-1).dot(V) + tanh(x(t-1).dot(W) + b)**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
V = T.matrix("V")
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda x_tm2,x_tm1:T.dot(x_tm2,U) + T.dot(x_tm1,V) \
+ T.tanh(T.dot(x_tm1,W) + b_sym), \
n_steps=n_sym, outputs_info=[dict(initial = X, taps = [-2,-1])])
compute_seq2 = theano.function(inputs = [X, U, V, W, b_sym, n_sym], outputs=[results])
# test values
x = np.zeros((2,2)) # the initial value must be able to return x[-2]
x[1,1] = 1
w = 0.5*np.ones((2,2))
u = 0.5*(np.ones((2,2))-np.eye(2))
v = 0.5*np.ones((2,2))
n = 10
b = np.ones((2))
print compute_seq2(x,u,v,w,b,n)
# comparison with numpy
x_res = numpy.zeros((10,2))
x_res[0] = x[0].dot(u) + x[1].dot(v) + numpy.tanh(x[1].dot(w) + b)
x_res[1] = x[1].dot(u) + x_res[0].dot(v) + numpy.tanh(x_res[0].dot(w) + b)
x_res[2] = x_res[0].dot(u) + x_res[1].dot(v) \
+ numpy.tanh(x_res[1].dot(w) + b)
for i in range(2,10):
x_res[i] = (x_res[i-2].dot(u) + x_res[i-1].dot(v) \
+ numpy.tanh(x_res[i-1].dot(w) + b))
**Scan Example: Computing the Jacobian of y = tanh(v.dot(A)) wrt x**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
v = T.vector()
A = T.matrix()
y = T.tanh(T.dot(v,A))
results, updates = theano.scan(lambda i:T.grad(y[i], v), sequences = [T.arange(y.shape[0])])
compute_jac_t = theano.function([A,v], [results], allow_input_downcast = True) # shape (d_out, d_in)
# test values
x = np.eye(5)[0]
w = np.eye(5,3)
w[2] = np.ones((3))
print compute_jac_t(w,x)[0]
# compare with numpy
print ((1 - np.tanh(x.dot(w))**2)*w).T
Note that we need to iterate over the indices of ``y`` and not over the elements of ``y``. The reason is that scan create a placeholder variable for its internal function and this placeholder variable does not have the same dependencies than the variables that will replace it.
**Scan Example: Accumulate number of loop during a scan**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define shared variables
k = theano.shared(0)
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda:{k:(k+1)}, n_steps=n_sym)
accumulator = theano.function([n_sym], [], updates=updates, allow_input_downcast = True)
k.get_value()
accumulator(5)
k.get_value()
**Scan Example: Computing tanh(v.dot(W) + b)*d where b is binomial**
.. code-block:: python
import theano
import theano.tensor as T
import numpy as np
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
# define shared random stream
trng = T.shared_randomstreams.RandomStreams(1234)
d=trng.binomial(size=W[1].shape)
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W)+b_sym)*d, sequences=X)
compute_with_bnoise = theano.function(inputs = [X, W, b_sym], outputs=[results], \
updates=updates, allow_input_downcast = True)
x = np.eye(10,2)
w = np.ones((2,2))
b = np.ones((2))
print compute_with_bnoise(x, w, b)
Note that if you want to use a random variable ``d`` that will not be updated through scan loops, you should pass this variable as a ``non_sequences`` arguments.
**Scan Example: Computing pow(A,k)**
.. code-block:: python
......
......@@ -79,7 +79,7 @@ from theano.updates import Updates, OrderedUpdates
#we don't import by default as we don't want to force having scipy installed.
#import sparse
from theano.gradient import Rop, Lop, grad
from theano.gradient import Rop, Lop, grad, subgraph_grad
if config.device.startswith('gpu') or config.init_gpu_device.startswith('gpu'):
import theano.sandbox.cuda
......
......@@ -1077,6 +1077,7 @@ class FunctionMaker(object):
self.mode = mode
self.accept_inplace = accept_inplace
self.function_builder = function_builder
self.on_unused_input = on_unused_input # Used only for the pickling
self.required = [(i.value is None) for i in self.inputs]
self.refeed = [
......@@ -1215,6 +1216,7 @@ def _pickle_FunctionMaker(self):
accept_inplace=self.accept_inplace,
function_builder=self.function_builder,
profile=self.profile,
on_unused_input=self.on_unused_input,
)
return (_constructor_FunctionMaker, (kwargs,))
......
......@@ -507,13 +507,22 @@ class ProfileStats(object):
print >> file, header_str
atimes = [(
topos = {} # Only do the topo once per fct.
atimes = []
for a, t in self.apply_time.items():
if a.fgraph not in topos:
topo = a.fgraph.toposort()
topos[a.fgraph] = topo
else:
topo = topos[a.fgraph]
atimes.append((
t * 100 / local_time,
t,
a,
a.fgraph.toposort().index(a),
self.apply_callcount[a])
for a, t in self.apply_time.items()]
topo.index(a),
self.apply_callcount[a]))
del topos
atimes.sort()
atimes.reverse()
tot = 0
......
......@@ -117,19 +117,10 @@ AddConfigVar('mode',
enum = EnumStr("g++", "")
# Test whether or not g++ is present: disable C code if it is not.
# Using the dummy file descriptor below is a workaround for a crash experienced
# in an unusual Python 2.4.4 Windows environment with the default stdin=None.
dummy_stdin = open(os.devnull)
try:
try:
rc = call_subprocess_Popen(['g++', '-v'], stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=dummy_stdin).wait()
except OSError:
rc = 1
finally:
dummy_stdin.close()
del dummy_stdin
rc = call_subprocess_Popen(['g++', '-v'])
except OSError:
rc = 1
if rc == 0:
# Keep the default linker the same as the one for the mode FAST_RUN
AddConfigVar('linker',
......
......@@ -57,7 +57,10 @@ from theano.gof.link import \
from theano.gof.op import \
Op, OpenMPOp, PureOp, ops_with_inner_function
from theano.gof.opt import (Optimizer, optimizer, SeqOptimizer,
from theano.gof.opt import (
Optimizer,
optimizer, inplace_optimizer,
SeqOptimizer,
MergeOptimizer, MergeOptMerge,
LocalOptimizer, local_optimizer, LocalOptGroup,
OpSub, OpRemove, PatternSub,
......
......@@ -29,7 +29,8 @@ from theano.compat.six import b, BytesIO, StringIO
from theano.gof.utils import flatten
from theano.configparser import config
from theano.gof.cc import hash_from_code
from theano.misc.windows import call_subprocess_Popen
from theano.misc.windows import (subprocess_Popen, call_subprocess_Popen,
output_subprocess_Popen)
# we will abuse the lockfile mechanism when reading and writing the registry
from theano.gof import compilelock
......@@ -1438,8 +1439,12 @@ def get_gcc_shared_library_arg():
def std_include_dirs():
return (numpy.distutils.misc_util.get_numpy_include_dirs()
+ [distutils.sysconfig.get_python_inc()])
numpy_inc_dirs = numpy.distutils.misc_util.get_numpy_include_dirs()
py_inc = distutils.sysconfig.get_python_inc()
py_plat_spec_inc = distutils.sysconfig.get_python_inc(plat_specific=True)
python_inc_dirs = ([py_inc] if py_inc == py_plat_spec_inc
else [py_inc, py_plat_spec_inc])
return numpy_inc_dirs + python_inc_dirs
def std_lib_dirs_and_libs():
......@@ -1512,11 +1517,8 @@ def gcc_llvm():
pass
p = None
try:
p = call_subprocess_Popen(['g++', '--version'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
p.wait()
output = p.stdout.read() + p.stderr.read()
p_out = output_subprocess_Popen(['g++', '--version'])
output = p_out[0] + p_out[1]
except OSError:
# Typically means g++ cannot be found.
# So it is not an llvm compiler.
......@@ -1569,11 +1571,11 @@ class GCC_compiler(object):
GCC_compiler.march_flags = []
def get_lines(cmd, parse=True):
p = call_subprocess_Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=subprocess.PIPE,
shell=True)
p = subprocess_Popen(cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
stdin=subprocess.PIPE,
shell=True)
# For mingw64 with GCC >= 4.7, passing os.devnull
# as stdin (which is the default) results in the process
# waiting forever without returning. For that reason,
......@@ -1713,7 +1715,7 @@ class GCC_compiler(object):
continue
mj, mn, patch = [int(vp) for vp in version]
if (((mj, mn) == (4, 6) and patch < 4) or
((mj, mn) == (4, 7) and patch < 3) or
((mj, mn) == (4, 7) and patch <= 3) or
((mj, mn) == (4, 8) and patch < 1)):
new_flags[i] = p.rstrip('-avx')
......@@ -1811,21 +1813,15 @@ class GCC_compiler(object):
os.write(fd, src_code)
os.close(fd)
fd = None
proc = call_subprocess_Popen(
['g++', path, '-o', exe_path] + flags,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
proc.wait()
if proc.returncode != 0:
p_ret = call_subprocess_Popen(
['g++', path, '-o', exe_path] + flags)
if p_ret != 0:
compilation_ok = False
elif try_run:
# Try to execute the program
try:
proc = call_subprocess_Popen([exe_path],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
proc.wait()
run_ok = (proc.returncode == 0)
p_ret = call_subprocess_Popen([exe_path])
run_ok = (p_ret == 0)
finally:
os.remove(exe_path)
finally:
......@@ -1958,14 +1954,14 @@ class GCC_compiler(object):
print >> sys.stderr, ' '.join(cmd)
try:
p = call_subprocess_Popen(cmd, stderr=subprocess.PIPE)
compile_stderr = decode(p.communicate()[1])
p_out = output_subprocess_Popen(cmd)
compile_stderr = decode(p_out[1])
except Exception:
# An exception can occur e.g. if `g++` is not found.
print_command_line_error()
raise
status = p.returncode
status = p_out[2]
if status:
print '==============================='
......
......@@ -16,27 +16,17 @@ import numpy
import theano
from theano.configparser import config, AddConfigVar, ConfigParam, StrParam
from theano.gof.utils import flatten
from theano.misc.windows import call_subprocess_Popen
from theano.misc.windows import output_subprocess_Popen
_logger = logging.getLogger("theano.gof.compiledir")
# Using the dummy file descriptors below is a workaround for a crash
# experienced in an unusual Python 2.4.4 Windows environment with the default
# None values.
dummy_err = open(os.devnull, 'w')
p = None
try:
p = call_subprocess_Popen(['g++', '-dumpversion'],
stdout=subprocess.PIPE,
stderr=dummy_err.fileno())
p.wait()
gcc_version_str = p.stdout.readline().strip().decode()
p_out = output_subprocess_Popen(['g++', '-dumpversion'])
gcc_version_str = p_out[0].strip().decode()
except OSError:
# Typically means gcc cannot be found.
gcc_version_str = 'GCC_NOT_FOUND'
del p
del dummy_err
def local_bitwidth():
......
......@@ -165,8 +165,12 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1):
my_pid = os.getpid()
no_display = (verbosity == 0)
# Acquire lock.
nb_error = 0
# The number of time we sleep when their is no errors.
# Used to don't display it the first time to display it less frequently.
# And so don't get as much email about this!
nb_wait = 0
# Acquire lock.
while True:
try:
last_owner = 'no_owner'
......@@ -214,7 +218,7 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1):
last_owner = read_owner
time_start = time.time()
no_display = (verbosity == 0)
if not no_display:
if not no_display and nb_wait > 0:
if read_owner == 'failure':
msg = 'unknown process'
else:
......@@ -225,6 +229,7 @@ def lock(tmp_dir, timeout=120, min_wait=5, max_wait=10, verbosity=1):
tmp_dir)
if verbosity <= 1:
no_display = True
nb_wait += 1
time.sleep(random.uniform(min_wait, max_wait))
try:
......
差异被折叠。
......@@ -179,23 +179,33 @@ class Query(object):
class EquilibriumDB(DB):
""" A set of potential optimizations which should be applied in an
"""A set of potential optimizations which should be applied in an
arbitrary order until equilibrium is reached.
Canonicalize, Stabilize, and Specialize are all equilibrium optimizations.
:param ignore_newtrees: If False, we will apply local opt on new
node introduced during local optimization application. This
could result in less fgraph iterations, but this don't mean it
will be faster globally.
.. note::
We can put LocalOptimizer and Optimizer as EquilibriumOptimizer
suppor both.
"""
def __init__(self, ignore_newtrees=True):
super(EquilibriumDB, self).__init__()
self.ignore_newtrees = ignore_newtrees
def query(self, *tags, **kwtags):
opts = super(EquilibriumDB, self).query(*tags, **kwtags)
return opt.EquilibriumOptimizer(opts,
max_use_ratio=config.optdb.max_use_ratio,
failure_callback=opt.NavigatorOptimizer.warn_inplace)
return opt.EquilibriumOptimizer(
opts,
max_use_ratio=config.optdb.max_use_ratio,
ignore_newtrees=self.ignore_newtrees,
failure_callback=opt.NavigatorOptimizer.warn_inplace)
class SequenceDB(DB):
......
......@@ -544,6 +544,109 @@ def grad(cost, wrt, consider_constant=None,
rval, = rval
return rval
def subgraph_grad(wrt, end, start=None, cost=None, details=False):
'''
With respect to `wrt`, computes gradients of cost and/or from existing
`start` gradients, up to the `end` variables of a symbolic digraph.
In other words, computes gradients for a subgraph of the
symbolic theano function. Ignores all disconnected inputs.
This can be useful when one needs to perform the gradient descent
iteratively (e.g. one layer at a time in an MLP), or when a particular
operation is not differentiable in theano (e.g. stochastic sampling
from a multinomial). In the latter case, the gradient of the
non-differentiable process could be approximated by user-defined
formula, which could be calculated using the gradients of a cost
with respect to samples (0s and 1s). These gradients are obtained
by performing a subgraph_grad from the `cost` or previously known gradients
(`start`) up to the outputs of the stochastic process (`end`).
A dictionary mapping gradients obtained from the user-defined
differentiation of the process, to variables, could then be fed into
another subgraph_grad as `start` with any other `cost` (e.g. weight decay).
:type wrt : List of Variables.
Gradients are computed with respect to `wrt`.
:type end : List of Variables.
Theano variables at which to end gradient descent
(they are considered constant in theano.grad).
For convenience, the gradients with respect to these variables
are also returned.
:type start : Dictionary of Variables
:param start: If not None, a dictionary mapping variables to
their gradients. This is useful when the gradient on some
variables are known. These are used to compute the gradients
backwards up to the variables in `end`
(they are used as known_grad in theano.grad).
:type cost: Scalar (0-dimensional) Variable.
:param cost:
Additional costs for which to compute the gradients.
For example, these could be weight decay, an l1 constraint,
MSE, NLL, etc. May optionally be None if start is provided.
Warning : If the gradients of `cost` with respect to any
of the `start` variables is already part of the `start`
dictionary, then it may be counted twice with respect to `wrt`
and `end`.
:type details: bool.
:param details: When True, additionally returns the
list of gradients from `start` and of `cost`, respectively,
with respect to `wrt` (not `end`).
:rtype: Tuple of 2 or 4 Lists of Variables
:return: Returns lists of gradients with respect to `wrt` and `end`,
respectively.
'''
assert ((cost is not None) or (start is not None))
assert isinstance(end, list)
assert isinstance(wrt, list)
if start is not None:
assert isinstance(start, dict)
params = list(set(wrt + end))
start_grads = None
cost_grads = None
if start is not None:
start_grads = list(
theano.grad(
cost=None, wrt=params, known_grads=start,
consider_constant=end,
disconnected_inputs='ignore'
)
)
if cost is not None:
cost_grads = list(
theano.grad(
cost=cost, wrt=params,
consider_constant=end,
disconnected_inputs='ignore'
)
)
grads = None
if start is None:
grads = cost_grads
else:
grads = start_grads
if cost_grads is not None:
for i in range(len(grads)):
grads[i] += cost_grads[i]
pgrads = OrderedDict(zip(params, grads))
# separate wrt from end grads:
wrt_grads = list(pgrads[k] for k in wrt)
end_grads = list(pgrads[k] for k in end)
if details:
return wrt_grads, end_grads, start_grads, cost_grads
return wrt_grads, end_grads
def _node_to_pattern(node):
""" given an apply node, obtain its connection pattern
......
......@@ -203,6 +203,7 @@ if __name__ == "__main__":
cuda version 5.5 5.0 4.2 4.1 4.0 3.2 3.0 # note
gpu
K6000/NOECC 0.06s
K20m/ECC 0.07s
K20/NOECC 0.07s
M2090 0.19s
......
......@@ -2,9 +2,11 @@ import os
import subprocess
def call_subprocess_Popen(command, **params):
def subprocess_Popen(command, **params):
"""
Utility function to work around windows behavior that open windows
Utility function to work around windows behavior that open windows.
:see: call_subprocess_Popen and output_subprocess_Popen
"""
startupinfo = None
if os.name == 'nt':
......@@ -36,3 +38,40 @@ def call_subprocess_Popen(command, **params):
if stdin is not None:
del stdin
return proc
def call_subprocess_Popen(command, **params):
"""
Calls subprocess_Popen and discards the output, returning only the
exit code.
"""
if 'stdout' in params or 'stderr' in params:
raise TypeError("don't use stderr or stdout with call_subprocess_Popen")
null = open(os.devnull, 'wb')
# stdin to devnull is a workaround for a crash in a weird Windows
# environement where sys.stdin was None
params.setdefault('stdin', null)
params['stdout'] = null
params['stderr'] = null
p = subprocess_Popen(command, **params)
p.wait()
return p.returncode
def output_subprocess_Popen(command, **params):
"""
Calls subprocess_Popen, returning the output, error and exit code
in a tuple.
"""
if 'stdout' in params or 'stderr' in params:
raise TypeError("don't use stderr or stdout with output_subprocess_Popen")
# stdin to devnull is a workaround for a crash in a weird Windows
# environement where sys.stdin was None
if not hasattr(params, 'stdin'):
null = open(os.devnull, 'wb')
params['stdin'] = null
params['stdout'] = subprocess.PIPE
params['stderr'] = subprocess.PIPE
p = subprocess_Popen(command, **params)
# we need to use communicate to make sure we don't deadlock around
# the stdour/stderr pipe.
out = p.communicate()
return out + (p.returncode,)
......@@ -296,38 +296,15 @@ class GpuDimShuffle(GpuOp):
def __init__(self, input_broadcastable, new_order):
input_broadcastable = tuple(input_broadcastable)
self.input_broadcastable = input_broadcastable
new_order = tuple(new_order)
self.new_order = new_order
# list of dimensions of the input to drop
self.drop = []
# this maps i before dropping dimensions to j after dropping
# dimensions so self.shuffle can be set properly later on
i2j = {}
j = 0
for i, b in enumerate(input_broadcastable):
if i not in new_order:
# we want to drop this dimension because it's not a
# value in new_order
if b == 1: # 1 aka True
self.drop.append(i)
else:
if not b:
# we cannot drop non-broadcastable dimensions
raise ValueError("You cannot drop a non-broadcastable"
" dimension.",
(input_broadcastable, new_order))
else:
i2j[i] = j
j += 1
# transposition of non-broadcastable dimensions This is how
# the dimensions will be permuted, without accounting for the
# extra 'x' broadcastable dimensions to insert.
self.shuffle = [i2j[x] for x in new_order if x != 'x']
# list of dimensions of the output that are broadcastable and
# were not in the original input
self.augment = [i for i, x in enumerate(new_order) if x == 'x']
self.view_map = {0: [0]}
......@@ -481,8 +458,6 @@ class GpuDimShuffle(GpuOp):
print self
print "IN BROAD", self.input_broadcastable
print "NEW ORDER", self.new_order
print "SHUFFLE", self.shuffle
print "AUGMENT", self.augment
print '------------'
print ''
print sio.getvalue()
......@@ -1198,7 +1173,11 @@ class GpuCAReduce(GpuOp):
n_threads.z += 1;
else
break;
}""" % locals()
}
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
""" % locals()
if len(self.reduce_mask) == 2:
threads_y = ''
......@@ -1509,6 +1488,8 @@ class GpuCAReduce(GpuOp):
n_threads.z += 1;
}
n_threads.z -= 1;
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
dim3 n_blocks(1,1,1);
%(makecall)s
......@@ -1605,7 +1586,7 @@ class GpuCAReduce(GpuOp):
""" % locals()
def c_code_cache_version_apply(self, node):
version = [8] # the version corresponding to the c code in this Op
version = [9] # the version corresponding to the c code in this Op
# now we insert versions for the ops on which we depend...
scalar_node = Apply(self.scalar_op,
......@@ -3192,13 +3173,27 @@ class GpuAlloc(GpuOp):
# If the output is a constant, it will have to be deepcopied
# each time the function is called. So we do not fold.
return False
elif (not isinstance(client[0], basestring)
and isinstance(client[0].op, (
tensor.IncSubtensor,
tensor.AdvancedIncSubtensor1,
GpuIncSubtensor,
GpuAdvancedIncSubtensor1
))):
elif (#The following ops work inplace of their input id 0.
client[1] == 0 and
isinstance(client[0].op, (
#Ops that will work inplace on the Alloc. So if they
#get constant_folded, they would copy the
#constant and this is less efficients.
#Not doing the constant folding could also lower
#the peak memory usage, as we the "constant" won't
#always exists.
#theano.tensor.subtensor.AdvancedIncSubtensor,
GpuIncSubtensor,
GpuAdvancedIncSubtensor1,
theano.sandbox.cuda.blas.GpuGemm,
theano.sandbox.cuda.blas.GpuGemv,
theano.sandbox.cuda.blas.GpuGer,
))):
return False
#If the clients is a transfer, we don't want to fold. We
#let the moving opt finish before deciding what to do.
elif isinstance(client[0].op, HostFromGpu):
return False
return True
......
......@@ -5093,7 +5093,7 @@ int fprint_CudaNdarray(FILE * fd, const CudaNdarray *self)
int CudaNdarray_prep_output(CudaNdarray ** arr, int nd,
const int * dims)
const int * dims, int fortran)
{
bool allocated = false;
if (*arr == NULL)
......@@ -5105,7 +5105,7 @@ int CudaNdarray_prep_output(CudaNdarray ** arr, int nd,
allocated = true;
}
if (CudaNdarray_alloc_contiguous(*arr, nd, dims))
if (CudaNdarray_alloc_contiguous(*arr, nd, dims, fortran))
{
if (allocated)
{
......
......@@ -160,6 +160,12 @@ CudaNdarray_CheckExact(const PyObject * ob);
DllExport bool
CudaNdarray_is_c_contiguous(const CudaNdarray * self);
/**
* Return true for a F-contiguous CudaNdarray, else false
*/
DllExport bool
CudaNdarray_is_f_contiguous(const CudaNdarray * self);
/****
* Returns the number of elements necessary in host_structure and dev_structure for a given number of dimensions.
*/
......@@ -326,10 +332,13 @@ CudaNdarray_set_nd(CudaNdarray * self, const int nd)
* Allocate storage space for a tensor of rank 'nd' and given dimensions.
* (No-op if self already has a contiguous tensor of the right dimensions)
*
* If fortran is non-zeros, a fortran order is made, otherwise it is a c order.
*
* Note: CudaNdarray_alloc_contiguous is templated to work for both int dimensions and npy_intp dimensions
*/
template<typename inttype>
static int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const inttype * dim)
static int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd,
const inttype * dim, int fortran=0)
{
// allocate an empty ndarray with c_contiguous access
// return 0 on success
......@@ -342,11 +351,23 @@ static int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const i
{
return -1;
}
for (int i = nd-1; i >= 0; --i)
if (fortran)
{
for (int i = 0; i < nd; i++)
{
CudaNdarray_set_stride(self, i, (dim[i] == 1) ? 0 : size);
CudaNdarray_set_dim(self, i, dim[i]);
size = size * dim[i];
}
}
else
{
CudaNdarray_set_stride(self, i, (dim[i] == 1) ? 0 : size);
CudaNdarray_set_dim(self, i, dim[i]);
size = size * dim[i];
for (int i = nd-1; i >= 0; --i)
{
CudaNdarray_set_stride(self, i, (dim[i] == 1) ? 0 : size);
CudaNdarray_set_dim(self, i, dim[i]);
size = size * dim[i];
}
}
// If the allocated buffer is already of the right size, we don't need to
......@@ -497,6 +518,27 @@ CudaNdarray_is_c_contiguous(const CudaNdarray * self)
return c_contiguous;
}
/**
* True iff the strides look like [1, dim[0], dim[0]*dim[1], ...]
*/
DllExport inline bool ALWAYS_INLINE
CudaNdarray_is_f_contiguous(const CudaNdarray * self)
{
bool f_contiguous = true;
int size = 1;
for (int i = 0; (i < self->nd) && f_contiguous; i++)
{
if (CudaNdarray_HOST_DIMS(self)[i] == 1)
continue;
if (CudaNdarray_HOST_STRIDES(self)[i] != size)
{
f_contiguous = false;
}
size = size * CudaNdarray_HOST_DIMS(self)[i];
}
return f_contiguous;
}
DllExport PyObject * CudaNdarray_IS_C_Contiguous(CudaNdarray * self);
DllExport int CudaNdarray_gemm(float alpha, const CudaNdarray * A, const CudaNdarray * B, float beta, CudaNdarray * C);
......@@ -525,8 +567,9 @@ DllExport int CudaNdarray_inplace_elemwise(PyObject* py_self, PyObject * py_othe
// *arr may initially be NULL, a pointer to an ndarray of the wrong size,
// or a pointer to an ndarray of the right size. In the last case it will
// not change.
// If fortran is non-zero, a fortran order is expected/created
DllExport int CudaNdarray_prep_output(CudaNdarray ** arr, int nd,
const int * dims);
const int * dims, int fortran = 0);
DllExport inline const char* ALWAYS_INLINE cublasGetErrorString(cublasStatus err){
if(CUBLAS_STATUS_SUCCESS == err)
......
......@@ -16,7 +16,7 @@ from theano.gof.cmodule import (std_libs, std_lib_dirs,
std_include_dirs, dlimport,
get_lib_extension)
from theano.gof.python25 import any
from theano.misc.windows import call_subprocess_Popen
from theano.misc.windows import output_subprocess_Popen
_logger = logging.getLogger("theano.sandbox.cuda.nvcc_compiler")
_logger.setLevel(logging.WARN)
......@@ -98,12 +98,8 @@ nvcc_version = None
def is_nvcc_available():
"""Return True iff the nvcc compiler is found."""
def set_version():
p = call_subprocess_Popen([nvcc_path, '--version'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
p.wait()
ver_line = decode(p.stdout.readlines()[-1])
p_out = output_subprocess_Popen([nvcc_path, '--version'])
ver_line = decode(p_out[0]).strip().split('\n')[-1]
build, version = ver_line.split(',')[1].strip().split()
assert build == 'release'
......
差异被折叠。
......@@ -109,11 +109,13 @@ def test_careduce():
((4100,4,3),[1,2]),((5,4100,3),[1,2]),((5,4,4100),[1,2]),#011
#((4100,4,3),[0,2]),((5,4100,3),[0,2]),((5,4,4100),[0,2]),#101 ##not implemented
((4100,4,3),[0,1,2]),((5,4100,3),[0,1,2]),((5,4,4100),[0,1,2]),#111
((65,4,3),[0,1,2]),((5,65,3),[0,1,2]),((5,4,65),[0,1,2]),#111
((4100,4,3,2),[2,3]),((4,4100,3,2),[2,3]),((4,3,4100,2),[2,3]),((4,3,2,4100),[2,3]),#0011
((4100,4,3,2),[1,3]),((4,4100,3,2),[1,3]),((4,3,4100,2),[1,3]),((4,3,2,4100),[1,3]),#0101
((4100,4,3,2),[0,2,3]),((4,4100,3,2),[0,2,3]),((4,3,4100,2),[0,2,3]),#((4,3,2,4100),[0,2,3]),#1011
((4100,4,3,2),[1,2,3]),((4,4100,3,2),[1,2,3]),((4,3,4100,2),[1,2,3]),((4,3,2,4100),[1,2,3]),#0111
((65,4,3,2),[1,2,3]),((4,65,3,2),[1,2,3]),((4,3,65,2),[1,2,3]),((4,3,2,65),[1,2,3]),#0111
((4100,2,3,4),[0,1,2,3]),((2,4100,3,4),[0,1,2,3]),((2,3,4100,4),[0,1,2,3]),((2,3,4,4100),[0,1,2,3]),((128,1,3,3), [0,1,2,3]),#1111
......
import operator
import sys
import numpy
......@@ -213,20 +214,29 @@ def test_huge_elemwise_fusion():
"""
shape = (2, 3, 4, 5, 6)
ttype = tensor.tensor(dtype='float32', broadcastable=(False,) * len(shape))
vars = [tensor.tanh(ttype) for x in range(7)]
f = pfunc(vars, [vars[0] - vars[1] - vars[2] - vars[3] - vars[4] -
vars[5] - vars[6]], mode=mode_with_gpu)
gpu_ptr_size = theano.sandbox.cuda.opt.get_device_type_sizes()['gpu_ptr_size']
if gpu_ptr_size == 8:
nb_in = 7
len_topo = 10
elif gpu_ptr_size == 4:
nb_in = 8
len_topo = 11
else:
raise Exception("Unexpected value for gpu_ptr_size", gpu_ptr_size)
vars = [tensor.tanh(ttype) for x in range(nb_in)]
f = pfunc(vars, [reduce(operator.sub, vars)], mode=mode_with_gpu)
topo = f.maker.fgraph.toposort()
#theano.printing.debugprint(f)
#for i, node in enumerate(topo):
# print >> sys.stdout, i, node
assert len(topo) == 10
assert len(topo) == len_topo
assert sum([isinstance(node.op, cuda.GpuElemwise) for node in topo]) == 2
assert isinstance(topo[7].op.scalar_op, theano.scalar.basic.Sub)
assert isinstance(topo[8].op.scalar_op, theano.scalar.basic.Composite)
assert isinstance(topo[-3].op.scalar_op, theano.scalar.basic.Sub)
assert isinstance(topo[-2].op.scalar_op, theano.scalar.basic.Composite)
#let debugmode catch errors
gen = lambda: theano._asarray(numpy.random.rand(*shape), dtype='float32')
f(gen(), gen(), gen(), gen(), gen(), gen(), gen())
f(*[gen() for i in range(nb_in)])
# Test the case where we can't put the computation on the gpu! their is too
# many dimensions to the input to have 2 inputs to the op!
......
......@@ -3,12 +3,12 @@ import os
import numpy
import theano
from theano import Op, Type, Apply, Variable, Constant
from theano import Op, Apply
from theano import tensor, scalar, config
from theano.scalar import Scalar
from theano.tensor.basic import Alloc
from theano.gof.python25 import all, any
from theano.gof.python25 import any
from theano.gof.utils import MethodNotDefined
from theano.compat import PY3
......@@ -257,7 +257,7 @@ class GpuFromHost(Op):
def R_op(self, inputs, eval_points):
ev, = eval_points
if isintance(ev, GpuArrayType):
if isinstance(ev, GpuArrayType):
return [host_from_gpu(ev)]
else:
return ev
......@@ -317,7 +317,7 @@ class GpuFromCuda(Op):
def R_op(self, inputs, eval_points):
ev, = eval_points
if isintance(ev, GpuArrayType):
if isinstance(ev, GpuArrayType):
return [cuda_from_gpu(ev)]
else:
return ev
......@@ -651,6 +651,36 @@ class GpuAlloc(HideC, Alloc):
def c_code_cache_version(self):
return (2,)
def do_constant_folding(self, node):
for client in node.outputs[0].clients:
if client[0] == 'output':
# If the output is a constant, it will have to be deepcopied
# each time the function is called. So we do not fold.
return False
elif (#The following ops work inplace of their input id 0.
client[1] == 0 and
isinstance(client[0].op, (
#Ops that will work inplace on the Alloc. So if they
#get constant_folded, they would copy the
#constant and this is less efficients.
#Not doing the constant folding could also lower
#the peak memory usage, as we the "constant" won't
#always exists.
#theano.tensor.subtensor.AdvancedIncSubtensor,
theano.sandbox.gpuarray.subtensor.GpuIncSubtensor,
#theano.sandbox.gpuarray.subtensor.GpuAdvancedIncSubtensor1,
theano.sandbox.gpuarray.blas.GpuGemm,
theano.sandbox.gpuarray.blas.GpuGemv,
#theano.sandbox.gpuarray.blas.GpuGer, Not Yet implemented
))):
return False
#If the clients is a transfer, we don't want to fold. We
#let the moving opt finish before deciding what to do.
elif isinstance(client[0].op, HostFromGpu):
return False
return True
gpu_alloc = GpuAlloc()
......
......@@ -200,13 +200,13 @@ from theano.gof import local_optimizer, LocalOptGroup
from theano.tensor.opt import in2out
@local_optimizer([gpugemv_no_inplace])
@local_optimizer([gpugemv_no_inplace], inplace=True)
def local_inplace_gpuagemv(node):
if node.op == gpugemv_no_inplace:
return [gpugemv_inplace(*node.inputs)]
@local_optimizer([gpugemm_no_inplace])
@local_optimizer([gpugemm_no_inplace], inplace=True)
def local_inplace_gpuagemm(node):
if node.op == gpugemm_no_inplace:
return [gpugemm_inplace(*node.inputs)]
......
......@@ -1281,7 +1281,10 @@ class GpuCAReduceCuda(HideC, CAReduce):
n_threads.z += 1;
else
break;
}""" % locals()
}
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
""" % locals()
if len(self.reduce_mask) == 2:
threads_y = ''
......@@ -1601,6 +1604,8 @@ class GpuCAReduceCuda(HideC, CAReduce):
n_threads.z += 1;
}
n_threads.z -= 1;
//Maximum for Fermi GPU on that dimensions.
n_threads.z = std::min(n_threads.z, (unsigned)64);
dim3 n_blocks(1,1,1);
%(makecall)s
......@@ -1697,7 +1702,7 @@ class GpuCAReduceCuda(HideC, CAReduce):
""" % locals()
def c_code_cache_version_apply(self, node):
version = [8] # the version corresponding to the c code in this Op
version = [9] # the version corresponding to the c code in this Op
# now we insert versions for the ops on which we depend...
scalar_node = Apply(self.scalar_op,
......
......@@ -341,17 +341,20 @@ def local_gpua_crossentropysoftmaxargmax1hotwithbias(node):
@op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx])
def local_gpua_crossentropysoftmax1hotwithbiasdx(node):
return GpuCrossentropySoftmax1HotWithBiasDx()
@register_opt()
@op_lifter([tensor.nnet.Softmax])
def local_gpua_softmax(node):
return GpuSoftmax()
@register_opt()
@op_lifter([tensor.nnet.SoftmaxWithBias])
def local_gpua_softmaxwithbias(node):
return GpuSoftmaxWithBias()
@register_opt()
@op_lifter([gpu_from_host, ConvOp])
def local_gpu_conv(node):
......
......@@ -32,11 +32,13 @@ if not theano.sandbox.gpuarray.pygpu_activated:
from theano.sandbox.gpuarray.type import (GpuArrayType,
gpuarray_shared_constructor)
from theano.sandbox.gpuarray.basic_ops import (host_from_gpu, gpu_from_host,
gpu_alloc, gpu_from_cuda,
cuda_from_gpu, HostFromGpu,
GpuFromHost, GpuReshape,
GpuEye)
from theano.sandbox.gpuarray.basic_ops import (
host_from_gpu, gpu_from_host,
gpu_alloc, GpuAlloc,
gpu_from_cuda,
cuda_from_gpu, HostFromGpu,
GpuFromHost, GpuReshape,
GpuEye)
from theano.tests import unittest_tools as utt
utt.seed_rng()
......@@ -290,6 +292,13 @@ GpuAllocTester = makeTester(
)
class TestAlloc(theano.tensor.tests.test_basic.TestAlloc):
dtype = "float32"
mode = mode_with_gpu
shared = staticmethod(gpuarray_shared_constructor)
allocs = [GpuAlloc, GpuAlloc, T.Alloc]
def test_shape():
x = GpuArrayType(dtype='float32', broadcastable=[False, False, False])()
v = gpuarray.zeros((3, 4, 5), dtype='float32')
......
import unittest
from theano import scalar, gof
from theano.gof import FunctionGraph
from theano.gof.python25 import all, any
from theano.tests.unittest_tools import SkipTest
from theano.tensor.tests.test_elemwise import (test_Broadcast, test_DimShuffle,
test_CAReduce)
......@@ -126,11 +122,13 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY):
((4100,4,3),[1,2]),((5,4100,3),[1,2]),((5,4,4100),[1,2]),#011
#((4100,4,3),[0,2]),((5,4100,3),[0,2]),((5,4,4100),[0,2]),#101 ##not implemented
((4100,4,3),[0,1,2]),((5,4100,3),[0,1,2]),((5,4,4100),[0,1,2]),#111
((65,4,3),[0,1,2]),((5,65,3),[0,1,2]),((5,4,65),[0,1,2]),#111
((4100,4,3,2),[2,3]),((4,4100,3,2),[2,3]),((4,3,4100,2),[2,3]),((4,3,2,4100),[2,3]),#0011
((4100,4,3,2),[1,3]),((4,4100,3,2),[1,3]),((4,3,4100,2),[1,3]),((4,3,2,4100),[1,3]),#0101
((4100,4,3,2),[0,2,3]),((4,4100,3,2),[0,2,3]),((4,3,4100,2),[0,2,3]),#((4,3,2,4100),[0,2,3]),#1011
((4100,4,3,2),[1,2,3]),((4,4100,3,2),[1,2,3]),((4,3,4100,2),[1,2,3]),((4,3,2,4100),[1,2,3]),#0111
((65,4,3,2),[1,2,3]),((4,65,3,2),[1,2,3]),((4,3,65,2),[1,2,3]),((4,3,2,65),[1,2,3]),#0111
((4100,2,3,4),[0,1,2,3]),((2,4100,3,4),[0,1,2,3]),((2,3,4100,4),[0,1,2,3]),((2,3,4,4100),[0,1,2,3]),((128,1,3,3), [0,1,2,3]),#1111
#test pattern implemented by reshape
......
......@@ -26,4 +26,6 @@ class G_subtensor(T_subtensor):
dtype='float32',
ignore_topo=(HostFromGpu, GpuFromHost,
DeepCopyOp))
# GPU opt can't run in fast_compile only.
self.fast_compile = False
assert self.sub == GpuSubtensor
......@@ -26,8 +26,10 @@ if cuda_available:
from theano.sandbox.cuda import (CudaNdarrayType,
float32_shared_constructor)
def matVecModM(A, s, m):
return numpy.int32(numpy.sum((numpy.int64(A)*s) % m, 1) % m)
assert A.dtype == 'int64'
return numpy.int32(numpy.sum((A*s) % m, 1) % m)
def multMatVect(v, A, m1, B, m2):
......@@ -142,24 +144,30 @@ MASK2 = numpy.int32(65535) #2^16 - 1
MULT2 = numpy.int32(21069)
NORM = 4.656612873077392578125e-10; #1./2^31
A1p0 = numpy.asarray([[0, 4194304, 129], [1, 0, 0], [0, 1, 0]])
A2p0 = numpy.asarray([[32768, 0, 32769], [1, 0, 0], [0, 1, 0]])
#A1p0 = numpy.asarray([[0, 4194304, 129], [1, 0, 0], [0, 1, 0]],
# dtype='int64')
#A2p0 = numpy.asarray([[32768, 0, 32769], [1, 0, 0], [0, 1, 0]],
# dtype='int64')
A1p72 = numpy.asarray([[1516919229, 758510237, 499121365],
[1884998244, 1516919229, 335398200],
[601897748, 1884998244, 358115744]])
[601897748, 1884998244, 358115744]],
dtype='int64')
A2p72 = numpy.asarray([[1228857673, 1496414766, 954677935],
[1133297478, 1407477216, 1496414766],
[2002613992, 1639496704, 1407477216]])
[2002613992, 1639496704, 1407477216]],
dtype='int64')
A1p134 = numpy.asarray(
[[1702500920, 1849582496, 1656874625],
[828554832, 1702500920, 1512419905],
[1143731069, 828554832, 102237247]])
[1143731069, 828554832, 102237247]],
dtype='int64')
A2p134 = numpy.asarray(
[[796789021, 1464208080, 607337906],
[1241679051, 1431130166, 1464208080],
[1401213391, 1178684362, 1431130166]])
[1401213391, 1178684362, 1431130166]],
dtype='int64')
np_int32_vals = [numpy.int32(i) for i in (0, 7, 9, 15, 16, 22, 24)]
......
......@@ -909,7 +909,22 @@ class UnaryScalarOp(ScalarOp):
node.inputs[0].type != node.outputs[0].type):
raise theano.gof.utils.MethodNotDefined()
dtype = node.inputs[0].dtype
dtype = node.inputs[0].type.dtype_specs()[1]
fct_call = self.c_code_contiguous_raw(dtype, 'n', 'x', 'z')
return """
{
npy_intp n = PyArray_SIZE(%(z)s);
%(dtype)s * x = (%(dtype)s*) PyArray_DATA(%(x)s);
%(dtype)s * z = (%(dtype)s*) PyArray_DATA(%(z)s);
%(fct_call)s;
}
""" % locals()
def c_code_contiguous_raw(self, dtype, n, i, o):
if not config.lib.amdlibm:
raise theano.gof.utils.MethodNotDefined()
if dtype.startswith('npy_'):
dtype = dtype[4:]
if dtype == 'float32' and self.amd_float32 is not None:
dtype = 'float'
fct = self.amd_float32
......@@ -918,12 +933,7 @@ class UnaryScalarOp(ScalarOp):
fct = self.amd_float64
else:
raise theano.gof.utils.MethodNotDefined()
return """
npy_intp n = PyArray_SIZE(%(z)s);
%(dtype)s * x = (%(dtype)s*) PyArray_DATA(%(x)s);
%(dtype)s * z = (%(dtype)s*) PyArray_DATA(%(z)s);
%(fct)s(n, x, z);
""" % locals()
return "%(fct)s(%(n)s, %(i)s, %(o)s)" % locals()
class BinaryScalarOp(ScalarOp):
......@@ -2964,7 +2974,40 @@ class Composite(ScalarOp):
# We need to clone the graph as sometimes its nodes already
# contain a reference to an fgraph. As we want the Composite
# to be pickable, we can't have reference to fgraph.
inputs, outputs = gof.graph.clone(inputs, outputs)
# Also, if there is Composite in the inner graph, we want to
# remove them. In that case, we do a more complicated clone
# that will flatten Composite. We don't need to do this
# recusively, as the way the fusion optimizer work, we have
# only 1 new Composite each time at the output.
if len(outputs) > 1 or not any([isinstance(var.owner.op, Composite)
for var in outputs]):
# No inner Composite
inputs, outputs = gof.graph.clone(inputs, outputs)
else:
# Inner Composite that we need to flatten
assert len(outputs) == 1
# 1. Create a new graph from inputs up to the
# Composite
res = theano.compile.rebuild_collect_shared(
inputs=inputs,
outputs=outputs[0].owner.inputs,
copy_inputs_over=False) # Clone also the inputs
# 2. We continue this partial clone with the graph in
# the inner Composite
res2 = theano.compile.rebuild_collect_shared(
inputs=outputs[0].owner.op.inputs,
outputs=outputs[0].owner.op.outputs,
replace=dict(zip(outputs[0].owner.op.inputs, res[1]))
)
assert len(res2[1]) == len(outputs)
assert len(res[0]) == len(inputs)
assert res[0] != inputs
inputs, outputs = res[0], res2[1]
# Next assert comment just for speed
#assert not any([isinstance(node.op, Composite) for node in
# theano.gof.graph.ops(inputs, outputs)])
self.inputs = copy(inputs)
self.outputs = copy(outputs)
self.inputs_type = tuple([input.type for input in inputs])
......
......@@ -68,19 +68,17 @@ class test_composite(unittest.TestCase):
fn = gof.DualLinker().accept(g).make_function()
assert fn(1.0, 2.0) == 1.5
# def test_sin(self):
# x = inputs()
# e = sin(x)
# C = Composite([x], [e])
# c = C.make_node(x)
# # print c.c_code(['x'], ['z'], dict(id = 0))
# g = FunctionGraph([x], [c.out])
# fn = gof.DualLinker().accept(g).make_function()
# assert fn(0) == 0
# assert fn(3.14159265358/2) == 1
# assert fn(3.14159265358) == 0
# WRITEME: Test for sin, pow, and other scalar ops.
def test_flatten(self):
#Test that we flatten multiple Composite.
x, y, z = inputs()
C = Composite([x, y], [x + y])
CC = Composite([x, y], [C(x * y, y)])
assert not isinstance(CC.outputs[0].owner.op, Composite)
# Test with multiple outputs
CC = Composite([x, y, z], [C(x * y, y), C(x * z, y)])
#We don't flatten that case.
assert isinstance(CC.outputs[0].owner.op, Composite)
def test_with_constants(self):
x, y, z = inputs()
......
差异被折叠。
......@@ -173,12 +173,9 @@ SOMEPATH/Canopy_64bit/User/lib/python2.7/site-packages/numpy/distutils/system_in
warnings.warn('Specified path %s is invalid.' % d)
"""
#I'm not able to remove all printed stuff
with_context = warnings.catch_warnings(record=True)
with_context.__enter__()
try:
with warnings.catch_warnings(record=True):
numpy.distutils.system_info.system_info.verbosity = 0
blas_info = numpy.distutils.system_info.get_info("blas_opt")
finally:
with_context.__exit__(None, None, None)
# If we are in a EPD installation, mkl is available
if "EPD" in sys.version:
......@@ -1193,32 +1190,31 @@ def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip=True):
# it also might be the case that there is a dimshuffle between the +
# and the dot22. local_dot_to_dot22 in particular will put in such things.
if M.owner and isinstance(M.owner.op, T.DimShuffle):
if (M.owner and isinstance(M.owner.op, T.DimShuffle) and
M.owner.inputs[0].owner and
isinstance(M.owner.inputs[0].owner.op, Dot22)):
MM = M.owner.inputs[0]
if tuple(M.owner.op.new_order) == (0,):
if M.owner.op.new_order == (0,):
# it is making a column MM into a vector
if MM.owner and MM.owner.op == _dot22:
MMl, MMr = MM.owner.inputs
g = gemm_no_inplace(L.dimshuffle(0, 'x'),
alpha, MMl, MMr, beta)
rval = [g.dimshuffle(0)]
return rval, MM
if tuple(M.owner.op.new_order) == (1,):
MMl, MMr = MM.owner.inputs
g = gemm_no_inplace(L.dimshuffle(0, 'x'),
alpha, MMl, MMr, beta)
rval = [g.dimshuffle(0)]
return rval, MM
if M.owner.op.new_order == (1,):
# it is making a row MM into a vector
if MM.owner and MM.owner.op == _dot22:
MMl, MMr = MM.owner.inputs
g = gemm_no_inplace(L.dimshuffle('x', 0),
alpha, MMl, MMr, beta)
rval = [g.dimshuffle(1)]
return rval, MM
if tuple(M.owner.op.new_order) == ():
MMl, MMr = MM.owner.inputs
g = gemm_no_inplace(L.dimshuffle('x', 0),
alpha, MMl, MMr, beta)
rval = [g.dimshuffle(1)]
return rval, MM
if len(M.owner.op.new_order) == 0:
# it is making a row MM into a vector
if MM.owner and MM.owner.op == _dot22:
MMl, MMr = MM.owner.inputs
g = gemm_no_inplace(L.dimshuffle('x', 'x'),
alpha, MMl, MMr, beta)
rval = [g.dimshuffle()]
return rval, MM
MMl, MMr = MM.owner.inputs
g = gemm_no_inplace(L.dimshuffle('x', 'x'),
alpha, MMl, MMr, beta)
rval = [g.dimshuffle()]
return rval, MM
# this is False'd out because of inadequate testing.
# TODO see ticket #237
......@@ -1382,29 +1378,31 @@ def _gemm_from_factored_list(lst):
"""Returns None, or a list to replace node.outputs
"""
# Make every pair in list have matching dtypes
# sM can be a tuple of 2 elements or a theano variable.
# We should not use __len__ as theano variables don't support
# it. I don't want to change this to isinstance(sM, tuple)
# as I'm not able to make a test that triggers this case.
def is_pair(sM):
try:
s, M = sM
return True
except Exception:
return False
lst2 = []
# Remove the tuple that can't be cast correctly.
# This can happen when we try to cast a complex to a real
for sM in lst:
if is_pair(sM):
# Make every pair in list have matching dtypes
# sM can be a tuple of 2 elements or a theano variable.
if isinstance(sM, tuple):
sm0, sm1 = sM
sm0 = T.as_tensor_variable(sm0)
if theano.scalar.upcast(sm0.dtype, sm1.dtype) == sm1.dtype:
lst2.append((T.cast(sm0, sm1.dtype), sM[1]))
lst = lst2
def item_to_var(t):
try:
s, M = t
except Exception:
return t
if s == 1:
return M
if s == -1:
return -M
return s * M
# Try every pair in the sM_list, trying to turn it into a gemm operation
for i in xrange(len(lst) - 1):
s_i, M_i = lst[i]
......@@ -1421,16 +1419,6 @@ def _gemm_from_factored_list(lst):
s_j, M_j)
#print 'GOT IT', gemm_of_sM_list
if gemm_of_sM_list:
def item_to_var(t):
try:
s, M = t
except Exception:
return t
if s == 1:
return M
if s == -1:
return -M
return s * M
assert len(gemm_of_sM_list) == 1
add_inputs = [item_to_var(input)
......@@ -1715,20 +1703,19 @@ def local_dot_to_dot22(node):
_logger.info('Not optimizing dot with inputs %s %s %s %s',
x, y, x.type, y.type)
@local_optimizer([gemm_no_inplace])
@local_optimizer([gemm_no_inplace], inplace=True)
def local_inplace_gemm(node):
if node.op == gemm_no_inplace:
return [gemm_inplace(*node.inputs)]
@local_optimizer([gemv_no_inplace])
@local_optimizer([gemv_no_inplace], inplace=True)
def local_inplace_gemv(node):
if node.op == gemv_no_inplace:
return [gemv_inplace(*node.inputs)]
@local_optimizer([ger])
@local_optimizer([ger], inplace=True)
def local_inplace_ger(node):
if node.op == ger:
return [ger_destructive(*node.inputs)]
......
......@@ -774,8 +774,7 @@ class Elemwise(OpenMPOp):
super(Elemwise, self).perform(node, inputs, output_storage)
maxsize = max(len(input.shape) for input in inputs)
for dims in izip(*[([(1, True)] * (maxsize - len(input.shape))
+ zip(input.shape, sinput.type.broadcastable))
for dims in izip(*[zip(input.shape, sinput.type.broadcastable)
for input, sinput in zip(inputs, node.inputs)]):
if max(d for d, b in dims) != 1 and (1, False) in dims:
# yes there may be more compact ways to write this code,
......@@ -808,34 +807,36 @@ class Elemwise(OpenMPOp):
out_shape.append(max(values))
out_shape = tuple(out_shape)
if not self.inplace_pattern:
for output, storage in izip(node.outputs, output_storage):
odat = storage[0]
if odat is not None:
if odat.shape != out_shape:
# It is unsafe to try to resize odat,
# we have to allocate output storage.
odat = None
if odat is None:
odat = numpy.ndarray(out_shape, dtype=output.type.dtype)
storage[0] = odat
else:
for i, (output, storage) in enumerate(
izip(node.outputs, output_storage)):
#i is an output idx
if i in self.inplace_pattern:
odat = inputs[self.inplace_pattern[i]]
else:
odat = storage[0]
if odat is not None:
if odat.shape != out_shape:
# It is unsafe to try to resize odat,
# we have to allocate output storage.
odat = None
if odat is None:
odat = numpy.ndarray(out_shape,
dtype=output.type.dtype)
storage[0] = odat
# Commented as we don't reuse outputs now.
#
# if not self.inplace_pattern:
# for output, storage in izip(node.outputs, output_storage):
# odat = storage[0]
# if odat is not None:
# if odat.shape != out_shape:
# # It is unsafe to try to resize odat,
# # we have to allocate output storage.
# odat = None
# if odat is None:
# odat = numpy.ndarray(out_shape, dtype=output.type.dtype)
# storage[0] = odat
# else:
# for i, (output, storage) in enumerate(
# izip(node.outputs, output_storage)):
# #i is an output idx
# if i in self.inplace_pattern:
# odat = inputs[self.inplace_pattern[i]]
# else:
# odat = storage[0]
# if odat is not None:
# if odat.shape != out_shape:
# # It is unsafe to try to resize odat,
# # we have to allocate output storage.
# odat = None
# if odat is None:
# odat = numpy.ndarray(out_shape,
# dtype=output.type.dtype)
# storage[0] = odat
ufunc_args = inputs # + output_storage
if self.nfunc and len(inputs) == self.nfunc_spec[1]:
......@@ -860,26 +861,25 @@ class Elemwise(OpenMPOp):
if nout == 1:
variables = [variables]
i = 0
for variable, storage, nout in izip(variables, output_storage,
node.outputs):
if str(getattr(variable, "dtype", "")) == 'object':
if getattr(variable, "dtype", "") == 'object':
# Since numpy 1.6, function created with numpy.frompyfunc
# always return an ndarray with dtype object
variable = numpy.asarray(variable, dtype=nout.dtype)
# The storage has been resized earlier.
if hasattr(variable, 'shape'):
assert storage[0].shape == variable.shape
if i in self.inplace_pattern:
odat = inputs[self.inplace_pattern[i]]
odat[...] = variable
storage[0] = odat
# Sometimes NumPy return a Python type.
elif not isinstance(variable, numpy.ndarray):
variable = numpy.asarray(variable, nout.dtype)
storage[0] = variable
else:
# If variable has not shape, then it is a scalar.
assert numpy.prod(storage[0].shape) == 1
storage[0][...] = variable
assert str(storage[0].dtype) != 'object'
# the following should be used instead of the previous loop,
# unfortunately it tends to segfault
# self.ufunc(*(ufunc_args+[s[0] for s in output_storage]))
storage[0] = variable
i += 1
def infer_shape(self, node, i_shapes):
rval = []
......
......@@ -571,6 +571,8 @@ def repeat(x, repeats, axis=None):
:param axis: int, optional.
:see: :func:`tensor.tile <tensor.tile>`
.. versionadded:: 0.6
"""
return RepeatOp(axis=axis)(x, repeats)
......
......@@ -95,7 +95,7 @@ class SoftmaxWithBias(gof.Op):
return ['<iostream>', '<cmath>']
@staticmethod
def c_code_template():
def c_code_template(dtype):
# this implementation was lifted from
# /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx
......@@ -107,6 +107,10 @@ class SoftmaxWithBias(gof.Op):
#TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1]
init_decl = """
npy_intp* Nx = PyArray_DIMS(%(x)s);
npy_intp Sx = 0;
npy_intp Sb = 0;
npy_intp Ssm = 0;
if (PyArray_NDIM(%(x)s) != 2)
{
......@@ -151,6 +155,10 @@ class SoftmaxWithBias(gof.Op):
%(fail)s
}
}
Sx = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
Sb = PyArray_STRIDES(%(b)s)[0]/sizeof(dtype_%(b)s);
Ssm = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
"""
begin_row_loop = """
......@@ -163,9 +171,7 @@ class SoftmaxWithBias(gof.Op):
const dtype_%(x)s* __restrict__ x_i = (dtype_%(x)s*)(PyArray_BYTES(%(x)s) + PyArray_STRIDES(%(x)s)[0] * i);
const dtype_%(b)s* __restrict__ b_i = (dtype_%(b)s*)(PyArray_BYTES(%(b)s));
dtype_%(sm) s* __restrict__ sm_i = (dtype_%(sm)s*)(PyArray_BYTES(%(sm)s) + PyArray_STRIDES(%(sm)s)[0] * i);
"""
inside_row_loop = """
npy_intp Sx = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
npy_intp Sb = PyArray_STRIDES(%(b)s)[0]/sizeof(dtype_%(b)s);
npy_intp Ssm = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
......@@ -182,6 +188,9 @@ class SoftmaxWithBias(gof.Op):
row_max = (row_ij > row_max) ? row_ij : row_max;
}
"""
inside_row_loop = """
for (j = 0; j < Nx[1]; ++j)
{
dtype_%(sm)s row_ij = x_i[j * Sx] + b_i[j * Sb];
......@@ -201,6 +210,42 @@ class SoftmaxWithBias(gof.Op):
"""
# Get the vectorized version of exp if it exist
try:
vec_exp = theano.scalar.exp.c_code_contiguous_raw(dtype,
"Nx[1]", "sm_i", "sm_i")
inside_row_loop_contig = """
for (j = 0; j < Nx[1]; ++j)
{
dtype_%%(sm)s row_ij = x_i[j * Sx] + b_i[j * Sb];
//std::cout << "2 " << j << " " << row_ij << " " << row_max << "\\n";
dtype_%%(sm)s sm_ij = row_ij - row_max;
//std::cout << "3 " << j << " " << sm_ij << "\\n";
sm_i[j * Ssm] = sm_ij;
}
%(vec_exp)s;
for (j = 0; j < Nx[1]; ++j)
{
sum += sm_i[j * Ssm];
}
//cblas_dscal(x.N, 1.0 / sum, &mat_at(s,i,0), s.n);
double sum_inv = 1.0 / sum;
for (j = 0; j < Nx[1]; ++j)
{
sm_i[j * Ssm] *= sum_inv;
}
""" % locals()
inside_row_loop = """
if(Ssm == 1){
%(inside_row_loop_contig)s
}else{
%(inside_row_loop)s
}
""" % locals()
except theano.gof.utils.MethodNotDefined:
pass
end_row_loop = """
}
"""
......@@ -210,12 +255,13 @@ class SoftmaxWithBias(gof.Op):
def c_code(self, node, name, inp, out, sub):
x, b = inp
sm, = out
code_template = ''.join(self.c_code_template())
code_template = ''.join(self.c_code_template(
node.inputs[0].type.dtype_specs()[1]))
return code_template % dict(locals(), **sub)
@staticmethod
def c_code_cache_version():
return (6,)
return (8,)
softmax_with_bias = SoftmaxWithBias()
......@@ -384,7 +430,7 @@ class Softmax(gof.Op):
return ['<iostream>', '<cmath>']
@staticmethod
def c_code_template():
def c_code_template(dtype):
# this implementation was lifted from
# /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx
......@@ -396,6 +442,8 @@ class Softmax(gof.Op):
#TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1]
init_decl = """
npy_intp* Nx = PyArray_DIMS(%(x)s);
npy_intp Sx1 = 0;
npy_intp Ssm1 = 0;
if (PyArray_NDIM(%(x)s) != 2)
{
......@@ -413,7 +461,7 @@ class Softmax(gof.Op):
|| (PyArray_DIMS(%(sm)s)[0] != PyArray_DIMS(%(x)s)[0])
|| (PyArray_DIMS(%(sm)s)[1] != PyArray_DIMS(%(x)s)[1]))
{
if (NULL != %(sm)s) Py_XDECREF(%(sm)s);
Py_XDECREF(%(sm)s);
%(sm)s = (PyArrayObject*)PyArray_SimpleNew(2, PyArray_DIMS(%(x)s),
type_num_%(x)s);
if(!%(sm)s) {
......@@ -422,6 +470,8 @@ class Softmax(gof.Op):
%(fail)s
}
}
Sx1 = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
Ssm1 = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
"""
begin_row_loop = """
......@@ -433,11 +483,6 @@ class Softmax(gof.Op):
const dtype_%(x)s* __restrict__ x_i = (dtype_%(x)s*)(PyArray_BYTES(%(x)s) + PyArray_STRIDES(%(x)s)[0] * i);
dtype_%(sm) s* __restrict__ sm_i = (dtype_%(sm)s*)(PyArray_BYTES(%(sm)s) + PyArray_STRIDES(%(sm)s)[0] * i);
"""
inside_row_loop = """
npy_intp Sx = PyArray_STRIDES(%(x)s)[1]/sizeof(dtype_%(x)s);
npy_intp Ssm = PyArray_STRIDES(%(sm)s)[1]/sizeof(dtype_%(sm)s);
size_t row_max_j=0;
dtype_%(sm)s row_max = x_i[0];
......@@ -445,46 +490,82 @@ class Softmax(gof.Op):
// Get the maximum value of the row
for (j = 1; j < Nx[1]; ++j)
{
dtype_%(sm)s row_ij = x_i[j * Sx] ;
dtype_%(sm)s row_ij = x_i[j * Sx1] ;
//std::cout << "1 " << row_ij << "\\n";
row_max_j = (row_ij > row_max) ? j : row_max_j;
row_max = (row_ij > row_max) ? row_ij : row_max;
}
"""
inside_row_loop = """
for (j = 0; j < Nx[1]; ++j)
{
dtype_%(sm)s row_ij = x_i[j * Sx] ;
dtype_%(sm)s row_ij = x_i[j * Sx1] ;
//std::cout << "2 " << j << " " << row_ij << " " << row_max << "\\n";
dtype_%(sm)s sm_ij = exp(row_ij - row_max);
//std::cout << "3 " << j << " " << sm_ij << "\\n";
sum += sm_ij;
sm_i[j * Ssm] = sm_ij;
sm_i[j * Ssm1] = sm_ij;
}
//cblas_dscal(x.N, 1.0 / sum, &mat_at(s,i,0), s.n);
double sum_inv = 1.0 / sum;
for (j = 0; j < Nx[1]; ++j)
{
sm_i[j * Ssm] *= sum_inv;
sm_i[j * Ssm1] *= sum_inv;
}
"""
# Get the vectorized version of exp if it exist
try:
vec_exp = theano.scalar.exp.c_code_contiguous_raw(dtype,
"Nx[1]", "sm_i", "sm_i")
inside_row_loop_contig = """
for (j = 0; j < Nx[1]; ++j)
{
sm_i[j * Ssm1] = x_i[j * Sx1] - row_max;
}
%(vec_exp)s;
for (j = 0; j < Nx[1]; ++j)
{
sum += sm_i[j * Ssm1];
}
//cblas_dscal(x.N, 1.0 / sum, &mat_at(s,i,0), s.n);
double sum_inv = 1.0 / sum;
for (j = 0; j < Nx[1]; ++j)
{
sm_i[j * Ssm1] *= sum_inv;
}
""" % locals()
inside_row_loop = """
if(Ssm1 == 1){
%(inside_row_loop_contig)s
}else{
%(inside_row_loop)s
}
""" % locals()
except theano.gof.utils.MethodNotDefined:
pass
end_row_loop = """
}
"""
return (init_decl, begin_row_loop, inside_row_loop, end_row_loop)
def c_code(self, node, name, inp, out, sub):
x, = inp
sm, = out
code_template = ''.join(self.c_code_template())
code_template = ''.join(self.c_code_template(
node.inputs[0].type.dtype_specs()[1]))
return code_template % dict(locals(), **sub)
@staticmethod
def c_code_cache_version():
return (1,)
return (3,)
softmax = Softmax()
......@@ -863,7 +944,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
return ['<iostream>', '<cmath>']
@staticmethod
def c_code_template():
def c_code_template(dtype):
# this implementation was lifted from
# /u/bergstrj/cvs/bergstrj/src/feb07/nn.cxx
......@@ -874,7 +955,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
#TODO: use this to accept float32 and int32: node.inputs[0].type.dtype_specs()[1]
(init_decl, begin_row_loop, inside_row_loop, end_row_loop) = \
SoftmaxWithBias.c_code_template()
SoftmaxWithBias.c_code_template(dtype)
return (init_decl,
"""
if (PyArray_NDIM(%(y_idx)s) != 1)
......@@ -947,7 +1028,8 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
nll, sm, am = out
y_idx_type = node.inputs[2].type.dtype_specs()[1]
am_type = y_idx_type
code_template = ''.join(self.c_code_template())
dtype = node.inputs[0].type.dtype_specs()[1]
code_template = ''.join(self.c_code_template(dtype))
return code_template % dict(locals(), **sub)
......
差异被折叠。
......@@ -1928,7 +1928,8 @@ class TestAlloc(unittest.TestCase):
#AdvancedIncSubtensor1
(some_matrix[arange(60)], 2),
#AdvancedIncSubtensor
(some_matrix[idx, idx], 1)]):
(some_matrix[idx, idx], 1)
]):
derp = sum(dot(subtensor, variables))
fobj = theano.function([some_vector], derp, mode=self.mode)
......@@ -1936,14 +1937,18 @@ class TestAlloc(unittest.TestCase):
fgrad = theano.function([some_vector], grad_derp,
mode=self.mode)
topo_obj = fobj.maker.fgraph.toposort()
#<= is needed as the GPU currently don't implement
#AdvancedIncSubtensor. When this is the case it can be
#replaced with ==.
assert numpy.sum([isinstance(node.op, alloc)
for node in topo_obj]) == 0
for node in topo_obj]) <= 1
topo_grad = fgrad.maker.fgraph.toposort()
#print subtensor
#theano.printing.debugprint(fgrad)
assert numpy.sum([isinstance(node.op, alloc)
for node in topo_grad]) == n_alloc
for node in topo_grad]) == n_alloc, (
alloc, subtensor, n_alloc, topo_grad)
fobj(test_params)
fgrad(test_params)
......@@ -6736,6 +6741,17 @@ class TestTensorInstanceMethods(unittest.TestCase):
# Test equivalent advanced indexing
assert_array_equal(X[:,indices].eval({X: x}), x[:,indices])
def test_cumsum(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.cumsum().eval({X: x}), x.cumsum())
def test_cumprod(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.cumprod().eval({X: x}), x.cumprod())
def test_norm():
x = theano.tensor.vector('x')
n = x.norm(2)
......
......@@ -1091,7 +1091,7 @@ class TestGemv(TestCase, unittest_tools.TestOptimizationMixin):
# Assert that the dot was optimized somehow
self.assertFunctionContains0(f, T.dot)
self.assertFunctionContains1(f, Gemv(False))
self.assertFunctionContains1(f, Gemv(True))
# Assert they produce the same output
assert numpy.allclose(f(), numpy.dot(v.get_value(), w.get_value()))
......
......@@ -164,7 +164,8 @@ class TensorType(Type):
" Theano C code does not support that.",
msg,
"object shape", data.shape,
"object strides", data.strides)
"object strides", data.strides,
"object dtype", data.dtype)
i = 0
for b in self.broadcastable:
......
......@@ -11,6 +11,7 @@ from theano.tensor.utils import hash_from_ndarray
from theano.tensor.type import TensorType
class AsTensorError(TypeError):
"""Raised when as_tensor_variable isn't able to create a
TensorVariable.
......@@ -509,13 +510,11 @@ class _tensor_py_operators:
def sort(self, axis=-1, kind='quicksort', order=None):
"""See `theano.tensor.sort`"""
from theano.tensor.sort import sort
return sort(self, axis, kind, order)
return theano.tensor.sort(self, axis, kind, order)
def argsort(self, axis=-1, kind='quicksort', order=None):
"""See `theano.tensor.argsort`"""
from theano.tensor.sort import argsort
return argsort(self, axis, kind, order)
return theano.tensor.argsort(self, axis, kind, order)
def clip(self, a_min, a_max):
"Clip (limit) the values in an array."
......@@ -529,16 +528,14 @@ class _tensor_py_operators:
def repeat(self, repeats, axis=None):
"""See `theano.tensor.repeat`"""
from theano.tensor.extra_ops import repeat
return repeat(self, repeats, axis)
return theano.tensor.extra_ops.repeat(self, repeats, axis)
def round(self, mode="half_away_from_zero"):
"""See `theano.tensor.round`"""
return theano.tensor.basic.round(self, mode)
def trace(self):
from theano.sandbox.linalg import trace
return trace(self)
return theano.sandbox.linalg.trace(self)
# TO TRUMP NUMPY OPERATORS
__array_priority__ = 1000
......@@ -549,6 +546,12 @@ class _tensor_py_operators:
def zeros_like(model, dtype=None):
return theano.tensor.basic.zeros_like(model, dtype=dtype)
def cumsum(self, axis=None):
return theano.tensor.extra_ops.cumsum(self, axis)
def cumprod(self, axis=None):
return theano.tensor.extra_ops.cumprod(self, axis)
class TensorVariable(_tensor_py_operators, Variable):
"""Subclass to add the tensor operators to the basic `Variable` class."""
......
......@@ -62,7 +62,7 @@ import sys
import time
import theano
from theano.misc.windows import call_subprocess_Popen
from theano.misc.windows import output_subprocess_Popen
def main(stdout=None, stderr=None, argv=None, theano_nose=None,
......@@ -271,19 +271,17 @@ def run(stdout, stderr, argv, theano_nose, batch_size, time_profile,
time.ctime(), test_id, data["ids"][test_id]))
f_rawlog.flush()
proc = call_subprocess_Popen(
p_out = output_subprocess_Popen(
([python, theano_nose, '-v', '--with-id']
+ [str(test_id)] + argv +
['--disabdocstring']),
+ [str(test_id)] + argv +
['--disabdocstring']))
# the previous option calls a custom Nosetests plugin
# precluding automatic sustitution of doc. string for
# test name in display
# (see class 'DisabDocString' in file theano-nose)
stderr=subprocess.PIPE,
stdout=dummy_out.fileno())
# recovering and processing data from pipe
err = proc.stderr.read()
err = p_out[1]
# print the raw log
f_rawlog.write(err)
f_rawlog.flush()
......
......@@ -554,6 +554,52 @@ def test_disconnected_cost_grad():
except theano.gradient.DisconnectedInputError:
return
raise AssertionError("A disconnected gradient has been ignored.")
def test_subgraph_grad():
# Tests that the grad method with no known_grads
# matches what happens if you use successive subgraph_grads
x = theano.tensor.fvector('x')
t = theano.tensor.fvector('t')
w1 = theano.shared(np.random.randn(3,4))
w2 = theano.shared(np.random.randn(4,2))
a1 = theano.tensor.tanh(theano.tensor.dot(x,w1))
a2 = theano.tensor.tanh(theano.tensor.dot(a1,w2))
cost2 = theano.tensor.sqr(a2 - t).sum()
cost2 += theano.tensor.sqr(w2.sum())
cost1 = theano.tensor.sqr(w1.sum())
params = [[w2],[w1]]
costs = [cost2,cost1]
grad_ends = [[a1], [x]]
inputs = [t, x]
rng = np.random.RandomState([2012, 11, 15])
values = [rng.randn(2), rng.randn(3)]
values = [np.cast[ipt.dtype](value) for ipt, value in zip(inputs, values)]
wrt = [w2, w1]
cost = cost2 + cost1
true_grads = theano.grad(cost, wrt)
true_grads = theano.function(inputs, true_grads)
true_grads = true_grads(*values)
from theano.gof.python25 import OrderedDict
next_grad = None
param_grads = []
for i in xrange(2):
param_grad, next_grad = theano.subgraph_grad(
wrt=params[i], end=grad_ends[i],
start=next_grad, cost=costs[i]
)
next_grad = OrderedDict(zip(grad_ends[i], next_grad))
param_grads.extend(param_grad)
pgrads = theano.function(inputs, param_grads)
pgrads = pgrads(*values)
for true_grad, pgrad in zip(true_grads, pgrads):
assert(np.sum(np.abs(true_grad - pgrad)) < 0.00001)
class TestConsiderConstant(unittest.TestCase):
......
......@@ -1136,3 +1136,214 @@ class T_graphstructures(unittest.TestCase):
assert e.owner.inputs[1] is mul_variable
assert e.owner.inputs[1].owner.inputs[0] is y
assert e.owner.inputs[1].owner.inputs[1] is z
class T_scan(unittest.TestCase):
## All tests here belong to
## http://deeplearning.net/software/theano/tutorial/loop.html
## Theano/doc/tutorial/loop.txt
## Any change you do here also add it to the tutorial !
def test_elemwise(self):
# defining the tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W)+b_sym), \
sequences=X)
compute_elementwise = theano.function(inputs = [X, W, b_sym], \
outputs=[results])
# test values
x = numpy.eye(2)
w = numpy.ones((2,2))
b = numpy.ones((2))
b[1] = 2
print "Scan results:", compute_elementwise(x, w, b)[0]
# comparison with numpy
print "Numpy results:", numpy.tanh(x.dot(w) + b)
def test_sequence(self):
# define tensor variables
X = T.vector("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
Y = T.matrix("Y")
V = T.matrix("V")
P = T.matrix("P")
results, updates = theano.scan(lambda \
y,p,x_tm1:T.tanh(T.dot(x_tm1,W) + \
T.dot(y,U)+T.dot(p,V)), \
sequences=[Y,P[::-1]], outputs_info=[X])
compute_seq = theano.function(inputs = [X, W, Y, U, P, V], \
outputs=[results])
# test values
x = numpy.zeros((2))
x[1] = 1
w = numpy.ones((2,2))
y = numpy.ones((5,2))
y[0,:] = -3
u = numpy.ones((2,2))
p = numpy.ones((5,2))
p[0,:] = 3
v = numpy.ones((2,2))
print "Scan results", compute_seq(x,w,y,u,p,v)[0]
# comparison with numpy
x_res = numpy.zeros((5,2))
x_res[0] = numpy.tanh(x.dot(w) + y[0].dot(u) + p[4].dot(v))
for i in range(1,5):
x_res[i] = numpy.tanh(x_res[i-1].dot(w) \
+ y[i].dot(u) + p[4-i].dot(v))
print "Numpy results:", x_res
def test_norm(self):
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), \
sequences=[X])
compute_norm_lines = theano.function(inputs = [X], outputs=[results])
results, updates = theano.scan(lambda x_i:T.sqrt((x_i**2).sum()), \
sequences=[X.T])
compute_norm_cols = theano.function(inputs = [X], outputs=[results])
# test value
x = numpy.diag(numpy.arange(1,6),1)
print "Scan results:", compute_norm_lines(x)[0], \
compute_norm_cols(x)[0]
# comparison with numpy
print "Numpy results:", numpy.sqrt((x**2).sum(1)), \
numpy.sqrt((x**2).sum(0))
def test_trace(self):
# define tensor variable
X = T.matrix("X")
results, updates = theano.scan(lambda i, j, t_f:T.cast(X[i,j] + \
t_f, theano.config.floatX), \
sequences=[T.arange(X.shape[0]), \
T.arange(X.shape[1])], \
outputs_info=numpy.asarray(0., \
dtype=theano.config.floatX))
result = results[-1]
compute_trace = theano.function(inputs = [X], outputs=[result])
# test value
x = numpy.eye(5)
x[0] = numpy.arange(5)
print "Scan results:", compute_trace(x)[0]
# comparison with numpy
print "Numpy results:", numpy.diagonal(x).sum()
def test_taps(self):
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
U = T.matrix("U")
V = T.matrix("V")
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda x_tm2,x_tm1:T.dot(x_tm2,U) \
+ T.dot(x_tm1,V) + T.tanh(T.dot(x_tm1,W) + b_sym), \
n_steps=n_sym, \
outputs_info=[dict(initial = X, taps = [-2,-1])])
compute_seq2 = theano.function(inputs = [X, U, V, W, b_sym, \
n_sym], outputs=[results])
# test values
x = numpy.zeros((2,2))
# the initial value must be able to return x[-2]
x[1,1] = 1
w = 0.5*numpy.ones((2,2))
u = 0.5*(numpy.ones((2,2))-numpy.eye(2))
v = 0.5*numpy.ones((2,2))
n = 10
b = numpy.ones((2))
print "Scan results:", compute_seq2(x,u,v,w,b,n)
# comparison with numpy
x_res = numpy.zeros((10,2))
x_res[0] = x[0].dot(u) + x[1].dot(v) + numpy.tanh(x[1].dot(w) + b)
x_res[1] = x[1].dot(u) + x_res[0].dot(v) \
+ numpy.tanh(x_res[0].dot(w) + b)
x_res[2] = x_res[0].dot(u) + x_res[1].dot(v) \
+ numpy.tanh(x_res[1].dot(w) + b)
for i in range(2,10):
x_res[i] = (x_res[i-2].dot(u) + x_res[i-1].dot(v) \
+ numpy.tanh(x_res[i-1].dot(w) + b))
print "Numpy results:", x_res
def test_jacobian(self):
# define tensor variables
v = T.vector()
A = T.matrix()
y = T.tanh(T.dot(v,A))
results, updates = theano.scan(lambda i:T.grad(y[i], v), \
sequences = [T.arange(y.shape[0])])
compute_jac_t = theano.function([A,v], [results], \
allow_input_downcast = True) # shape (d_out, d_in)
# test values
x = numpy.eye(5)[0]
w = numpy.eye(5,3)
w[2] = numpy.ones((3))
print "Scan results:", compute_jac_t(w,x)[0]
# compare with numpy
print "Numpy results:", ((1 - numpy.tanh(x.dot(w))**2)*w).T
def test_accumulator(self):
# define shared variables
k = theano.shared(0)
n_sym = T.iscalar("n_sym")
results, updates = theano.scan(lambda:{k:(k+1)}, n_steps=n_sym)
accumulator = theano.function([n_sym], [], updates=updates, \
allow_input_downcast = True)
print "Before 5 steps:", k.get_value()
accumulator(5)
print "After 5 steps:", k.get_value()
def test_random(self):
# define tensor variables
X = T.matrix("X")
W = T.matrix("W")
b_sym = T.vector("b_sym")
# define shared random stream
trng = T.shared_randomstreams.RandomStreams(1234)
d=trng.binomial(size=W[1].shape)
results, updates = theano.scan(lambda v:T.tanh(T.dot(v,W) \
+ b_sym)*d, sequences=X)
compute_with_bnoise = theano.function(inputs = [X, W, b_sym], \
outputs=[results], \
updates=updates, \
allow_input_downcast = True)
x = numpy.eye(10,2)
w = numpy.ones((2,2))
b = numpy.ones((2))
print compute_with_bnoise(x, w, b)
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论