提交 6026b300 authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merged

...@@ -2,7 +2,7 @@ Trunk sin last release ...@@ -2,7 +2,7 @@ Trunk sin last release
------ ------
* Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them. * Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
* fuse GpuElemwise more often(in the case where their is too many inputs that fusing all of them would bust the 256 bytes limits of parameter to gpu function) * fuse GpuElemwise more often(in the case where their is too many inputs that fusing all of them would bust the 256 bytes limits of parameter to gpu function)
* Speed up gemv by a work around scipy gemv slowness when the matrix is in c order(the default)
Theano 0.3 (2010-11-23) Theano 0.3 (2010-11-23)
----------------------- -----------------------
......
...@@ -42,7 +42,8 @@ to be installed: ...@@ -42,7 +42,8 @@ to be installed:
A `BLAS`_ installation (with Level 3 functionality) A `BLAS`_ installation (with Level 3 functionality)
Including the development headers (``-dev``, ``-devel``, depending on Including the development headers (``-dev``, ``-devel``, depending on
your Linux distribution). Mac OS X comes with the `Accelerate your Linux distribution). Mac OS X comes with the `Accelerate
framework`_ built in, and various options exist for Windows. framework`_ built in, and various options exist for Windows (see
below).
.. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms .. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
.. _Accelerate framework: http://developer.apple.com/performance/accelerateframework.html .. _Accelerate framework: http://developer.apple.com/performance/accelerateframework.html
...@@ -380,8 +381,8 @@ that fail on your platform (use the ``theano-users@googlegroups.com`` mailing li ...@@ -380,8 +381,8 @@ that fail on your platform (use the ``theano-users@googlegroups.com`` mailing li
but note that you must first register to it, by going to `theano-users`_). but note that you must first register to it, by going to `theano-users`_).
Windows V1 (bigger install, but simpler instructions + tentative GPU instructions) Windows V1 (Installing from Scratch)
---------------------------------------------------------------------------------- ------------------------------------
- Install `Python(x,y) <http://www.pythonxy.com>`_ in a directory without blank - Install `Python(x,y) <http://www.pythonxy.com>`_ in a directory without blank
spaces in the name (in particular not into ``C:\Program Files``). spaces in the name (in particular not into ``C:\Program Files``).
...@@ -437,25 +438,25 @@ Windows V1 (bigger install, but simpler instructions + tentative GPU instruction ...@@ -437,25 +438,25 @@ Windows V1 (bigger install, but simpler instructions + tentative GPU instruction
print theano.config.blas.ldflags print theano.config.blas.ldflags
This should print the same content as in your config file, i.e. nothing This should print the same content as in your config file, i.e. nothing
(if your config file was not read properly, it would print ``-lblas``, and (if your config file was not read properly, it would print '-lblas', and
trying to compile any Theano function would result in a compilation error trying to compile any Theano function would result in a compilation error
due to the system being unable to find ``blas.dll``). due to the system being unable to find 'blas.dll').
Windows V1.5 (optional follow-up to V1 instructions) Windows: Using a Faster BLAS
---------------------------------------------------- ----------------------------
- If you want a faster and/or multithreaded BLAS library, you can If you want a faster and/or multithreaded BLAS library, you can
compile GotoBLAS2. We did not try to compile ATLAS because we read that compile GotoBLAS2 (ATLAS may work too, but was not tested, and is
it is slower than Goto and more difficult to compile (especially on usually reported to be slower and more difficult to compile -- especially
Windows). on Windows).
GotoBLAS2 can be downloaded GotoBLAS2 can be downloaded
`here <http://www.tacc.utexas.edu/tacc-projects/gotoblas2/downloads>`_ `here <http://www.tacc.utexas.edu/tacc-projects/gotoblas2/downloads>`_
after registering on the website (we tested v1.13). after registering on the website (we tested v1.13).
To compile it, you will also need to install MSYS and Perl (for instance To compile it, you will also need to install MSYS and Perl,
ActivePerl). as described below.
The GotoBLAS makefiles actually expect a full UNIX environment (like The GotoBLAS makefiles actually expect a full UNIX environment (like
Cygwin) but the BLAS compilation seems to work with only MSYS and Perl. Cygwin) but the BLAS compilation seems to work with only MSYS and Perl
The LAPACK compilation fails, but Theano does not need it. (LAPACK compilation fails, but Theano does not need it).
a) Download the mingw-get command-line installer from the a) Download the mingw-get command-line installer from the
`MinGW files <http://sourceforge.net/projects/mingw/>`_ (click `MinGW files <http://sourceforge.net/projects/mingw/>`_ (click
...@@ -479,12 +480,12 @@ Windows V1.5 (optional follow-up to V1 instructions) ...@@ -479,12 +480,12 @@ Windows V1.5 (optional follow-up to V1 instructions)
/postinstall/pi.sh /postinstall/pi.sh
It will ask for your MinGW installation directory (e.g. It will ask for your MinGW installation directory (e.g.
``c:\pythonxy\mingw``). ``c:/pythonxy/mingw``).
e) Download `ActivePerl <http://www.activestate.com/activeperl>`_ and e) Download `ActivePerl <http://www.activestate.com/activeperl/downloads>`_ and
install it. install it (other Perl interpreters should also work).
f) Unpack GotoBLAS2 (e.g. using `7-zip <http://www.7-zip.org/>`_ or in f) Unpack GotoBLAS2, either using `7-zip <http://www.7-zip.org/>`_ or in
MSYS with: MSYS with:
.. code-block:: bash .. code-block:: bash
...@@ -500,47 +501,61 @@ Windows V1.5 (optional follow-up to V1 instructions) ...@@ -500,47 +501,61 @@ Windows V1.5 (optional follow-up to V1 instructions)
quickbuild.win32 1>log.txt 2>err.txt quickbuild.win32 1>log.txt 2>err.txt
Compilation should take a few minutes. Afterwards, you will probably Compilation should take a few minutes. Afterwards, you will probably
find many error messages in err.txt, but also a libgoto2.dll find many error messages in err.txt, but there should be an ``exports``
file in the exports folder. [NOTE: INSTRUCTIONS TO BE CONTINUED] folder containing in particular ``libgoto2.dll``.
i) Copy libgoto2.dll from the exports folder to ``pythonxy\mingw\bin`` i) Copy ``libgoto2.dll`` from the ``exports`` folder to ``pythonxy\mingw\bin``
and ``pythonxy\mingw\lib``. and ``pythonxy\mingw\lib``.
j) Modify your .theanorc (or .theanorc.txt) with "ldflags = -lgoto2". j) Modify your .theanorc (or .theanorc.txt) with "ldflags = -lgoto2".
This setting can also be changed in Python for testing purposes: This setting can also be changed in Python for testing purpose (in which
case it will remain only for the duration of your Python session):
.. code-block:: python .. code-block:: python
theano.config.blas.ldflags = "-lgoto2" theano.config.blas.ldflags = "-lgoto2"
- (Optional). To test the BLAS performance, you can run the script ``check_blas.py``. k) To test the BLAS performance, you can run the script
For comparison I also downloaded and compiled the unoptimized standard ``theano/misc/check_blas.py``.
BLAS. The results were the following (Intel Core2 Duo 1.86 GHz): Note that you may control the number of threads used by GotoBLAS2 with
the ``GOTO_NUM_THREADS`` environment variable (default behavior is to use
all available cores).
Here are some performance results on an Intel Core2 Duo 1.86 GHz,
compared to using Numpy's BLAS or the un-optimized standard BLAS
(compiled manually from its source code):
Standard BLAS: 166 sec (unoptimized, 1 thread) * GotoBLAS2 (2 threads): 16s
NumPy: 48 sec (1 thread) * NumPy (1 thread): 48s
Goto2: 16 sec (2 threads) * Standard BLAS (un-optimized, 1 thread): 166s
Conclusions: Conclusions:
a) The unoptimized standard BLAS is very slow. Don't use it. * The unoptimized standard BLAS is very slow and should not be used.
b) The Windows binaries of NumPy were compiled with ATLAS and are surprisingly fast. * The Windows binaries of NumPy were compiled with ATLAS and are surprisingly fast.
c) GotoBLAS is even faster, in particular if you have several kernels. * GotoBLAS2 is even faster, in particular if you can use multiple cores.
- (Optional) Gpu on Windows. Not sur it work! Can you report success/error on the `theano-users <http://groups.google.com/group/theano-users>`_ mailing list? Windows: Using the GPU
----------------------
Those are indication for 32-bit version of Python, the one that come with Python(x,y) is 32-bit. Please note that these are tentative instructions (we have not yet been able to
get the GPU to work under Windows with Theano).
Please report your own successes / failures on the
`theano-users <http://groups.google.com/group/theano-users>`_ mailing list.
Space or non ascii caracter are not always supported in path. Python support Those are instructions for the 32-bit version of Python (the one that comes
them, so your configuration file path can contain them. with Python(x,y) is 32-bit).
nvcc(at least version 3.1) don't support them well. If your USERPROFILE
directory contain those caractere, you must add in your configuration file: Blanks or non ASCII characters are not always supported in paths. Python supports
them, but nvcc (at least version 3.1) does not.
If your ``USERPROFILE`` directory (the one you get into when you run ``cmd``)
contains such characters, you must edit your Theano configuration file to
use a compilation directory located somewhere else:
.. code-block:: cfg .. code-block:: cfg
[global] [global]
base_compiledir=PATH_TO_A_DIRECTORY_WITHOUT_THOSE_CARACTERE base_compiledir=path_to_a_directory_without_such_characters
You also need to add in the configuration file those line: You also need to add in the configuration file those lines:
.. code-block:: cfg .. code-block:: cfg
...@@ -578,8 +593,10 @@ Windows V1.5 (optional follow-up to V1 instructions) ...@@ -578,8 +593,10 @@ Windows V1.5 (optional follow-up to V1 instructions)
run the program nosetests inside the Theano repository. run the program nosetests inside the Theano repository.
nosetests is installed by Python(x,y). nosetests is installed by Python(x,y).
Windows V2(smaller install, but longer instruction) Windows V2: Installing Python Components Individually
--------------------------------------------------- -----------------------------------------------------
DISCLAIMER: These are old installation instructions (to be revised).
Running Theano under Windows is currently achieved by using the `MinGW Running Theano under Windows is currently achieved by using the `MinGW
<http://www.mingw.org>`__ port of the GCC compiler. <http://www.mingw.org>`__ port of the GCC compiler.
......
...@@ -46,6 +46,9 @@ class Images2Neibs(Op): ...@@ -46,6 +46,9 @@ class Images2Neibs(Op):
return Apply(self, [ten4, neib_shape,neib_step], [T.matrix(dtype=ten4.type.dtype)]) return Apply(self, [ten4, neib_shape,neib_step], [T.matrix(dtype=ten4.type.dtype)])
def grad(self, (x, neib_shape, neib_step), (gz,)):
return [neibs2images(gz, neib_shape, x.shape), None, None]
def c_code_cache_version(self): def c_code_cache_version(self):
return (3,) return (3,)
...@@ -211,36 +214,16 @@ class Images2Neibs(Op): ...@@ -211,36 +214,16 @@ class Images2Neibs(Op):
def images2neibs(ten4, neib_shape, neib_step=None, mode='valid'): def images2neibs(ten4, neib_shape, neib_step=None, mode='valid'):
return Images2Neibs(mode)(ten4, neib_shape, neib_step) return Images2Neibs(mode)(ten4, neib_shape, neib_step)
def neibs2images(neibs, neib_shape, original_shape, neib_step=None, mode='valid'): def neibs2images(neibs, neib_shape, original_shape):
""" """
Inverse of images2neib. Don't implement neib_step and mode. Inverse of images2neib.
:type neibs: Theano variable
:param neibs: matrix like the one obtained by images2neib
:type neib_shape: Theano variable
:param neib_shape: neib_shape that was used in images2neib
:type original_shape: Theano variable
:param original_shape: original shape of the 4d tensor given to images2neib.
:type neib_step: Theano variable or None
:param neib_step: neib_step that was used in images2neib Implement only None.
None is non overlapping patches and not-adjacent patches.
:type mode: str neibs : matrix like the one obtained by images2neib
:param mode: The mode that was used in images2neib. Implement only valid. neib_shape : neib_shape that was used in images2neib
original_shape : original shape of the 4d tensor given to images2neib
Return a 4d tensor of shape `original_shape`. Return a 4d tensor of shape `original_shape`.
""" """
# TODO: handle the case where patches either overlap
# TODO: handle the case where patches are not directly adjacent
# TODO: at least separate these cases so that the following code does not incorrectly
# handle them by accident.
if neib_step != None:
raise NotImplementedError('neibs2images do not implement overlapping patches or non-adjacent patches.')
if mode != 'valid':
raise NotImplementedError('neibs2images do not implement the mode %s. It currently only implement `valid`.'%mode)
neibs = T.as_tensor_variable(neibs) neibs = T.as_tensor_variable(neibs)
neib_shape = T.as_tensor_variable(neib_shape) neib_shape = T.as_tensor_variable(neib_shape)
original_shape = T.as_tensor_variable(original_shape) original_shape = T.as_tensor_variable(original_shape)
......
...@@ -7,6 +7,8 @@ from neighbours import images2neibs, neibs2images, Images2Neibs, GpuImages2Neibs ...@@ -7,6 +7,8 @@ from neighbours import images2neibs, neibs2images, Images2Neibs, GpuImages2Neibs
from nose.plugins.skip import SkipTest from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda import theano.sandbox.cuda as cuda
from theano.tests import unittest_tools
if theano.config.mode=='FAST_COMPILE': if theano.config.mode=='FAST_COMPILE':
mode_with_gpu = theano.compile.mode.get_mode('FAST_RUN').including('gpu') mode_with_gpu = theano.compile.mode.get_mode('FAST_RUN').including('gpu')
mode_without_gpu = theano.compile.mode.get_mode('FAST_RUN').excluding('gpu') mode_without_gpu = theano.compile.mode.get_mode('FAST_RUN').excluding('gpu')
...@@ -328,8 +330,65 @@ def speed_neibs_wrap_centered(): ...@@ -328,8 +330,65 @@ def speed_neibs_wrap_centered():
for i in range(1000): for i in range(1000):
f() f()
def test_neibs_grad():
shape = (2,3,4,4)
images = T.shared(numpy.arange(numpy.prod(shape), dtype='float32').reshape(shape))
cost = T.sum(T.sqr(images2neibs(images, (2,2))), axis=[0,1])
grad = T.grad(cost, images)
f = theano.function([], [cost, grad], mode=mode_without_gpu)
got = f()
should_get = [numpy.asarray(290320.0, dtype=numpy.float32),
numpy.asarray([[[[ 0., 2., 4., 6.],
[ 8., 10., 12., 14.],
[ 16., 18., 20., 22.],
[ 24., 26., 28., 30.]],
[[ 32., 34., 36., 38.],
[ 40., 42., 44., 46.],
[ 48., 50., 52., 54.],
[ 56., 58., 60., 62.]],
[[ 64., 66., 68., 70.],
[ 72., 74., 76., 78.],
[ 80., 82., 84., 86.],
[ 88., 90., 92., 94.]]],
[[[ 96., 98., 100., 102.],
[ 104., 106., 108., 110.],
[ 112., 114., 116., 118.],
[ 120., 122., 124., 126.]],
[[ 128., 130., 132., 134.],
[ 136., 138., 140., 142.],
[ 144., 146., 148., 150.],
[ 152., 154., 156., 158.]],
[[ 160., 162., 164., 166.],
[ 168., 170., 172., 174.],
[ 176., 178., 180., 182.],
[ 184., 186., 188., 190.]]]], dtype=numpy.float32)]
assert numpy.allclose(got[0], should_get[0])
assert numpy.allclose(got[1], should_get[1])
def test_neibs_grad_verify_grad():
shape = (2,3,4,4)
images = T.dtensor4()
images_val = numpy.arange(numpy.prod(shape), dtype='float32').reshape(shape)
def fn(images):
return T.sum(T.sqr(images2neibs(images, (2,2))), axis=[0,1])
unittest_tools.verify_grad(fn, [images_val])
if __name__ == '__main__': if __name__ == '__main__':
test_neibs_gpu() #test_neibs_gpu()
test_neibs() #test_neibs()
test_neibs_grad_verify_grad()
...@@ -85,10 +85,15 @@ class Gemv(Op): ...@@ -85,10 +85,15 @@ class Gemv(Op):
def perform(self, node, inputs, out_storage): def perform(self, node, inputs, out_storage):
y, alpha, A, x, beta = inputs y, alpha, A, x, beta = inputs
if _have_fblas: if _have_fblas:
if not self.inplace:
y = y.copy()
gemv = _blas_gemv_fns[y.dtype] gemv = _blas_gemv_fns[y.dtype]
out_storage[0][0] = gemv(alpha, A, x, beta, y, overwrite_y=self.inplace)
#Here I suppose that A is in c order. If we don't make it explicitly
# as fortran order, scipy 0.7.2 seam to create a copy in fortran
# order instead of just reshaping it and using the trans flag.
#If A is already in fortran order, make it in c order and using the
# trans flag don't seam to cause slowdown.
#out_storage[0][0] = gemv(alpha, A, x, beta, y, overwrite_y=self.inplace)
out_storage[0][0] = gemv(alpha, A.T, x, beta, y, overwrite_y=self.inplace, trans=True)
else: else:
out_storage[0][0] = numpy.asarray( out_storage[0][0] = numpy.asarray(
beta * y + alpha * numpy.dot(A, x) beta * y + alpha * numpy.dot(A, x)
......
...@@ -1155,8 +1155,40 @@ class Prod(CAReduce): ...@@ -1155,8 +1155,40 @@ class Prod(CAReduce):
def grad(self, (x, ), (gz, )): def grad(self, (x, ), (gz, )):
if x.dtype[0:3] in ('int','uin'): if x.dtype[0:3] in ('int','uin'):
return [None] return [None]
prod_out = self(x)
gz = as_tensor_variable(gz)
axis = self.axis
if axis is None:
axis = range(x.type.ndim)
if axis == ():
return gz,
new_dims = []
i = 0
for j, _ in enumerate(x.type.broadcastable):
if j in axis:
new_dims.append('x')
else: else:
raise NotImplementedError('Will be implemented shortly') new_dims.append(i)
i += 1
# fill a matrix with the same shape as x by broadcasting
# values taken from gz, which has the same shape as the output
# of prod().
gz_filled_x = Elemwise(scalar.second)(x,
DimShuffle(gz.type.broadcastable, new_dims)(gz))
# do the same with the output of prod, by broadcasting along
# axises where the product was taken
prod_out_filled_x = Elemwise(scalar.second)(x,
DimShuffle(prod_out.type.broadcastable,
new_dims)(prod_out))
return [theano.tensor.mul(gz_filled_x,
theano.tensor.true_div(prod_out_filled_x, x))]
#else:
# raise NotImplementedError('Will be implemented shortly')
def __str__(self): def __str__(self):
if self.axis is None: if self.axis is None:
......
...@@ -459,9 +459,32 @@ class ShapeFeature(object): ...@@ -459,9 +459,32 @@ class ShapeFeature(object):
to promise that inputs will have a certain shape (or even to have certain shapes in to promise that inputs will have a certain shape (or even to have certain shapes in
certain dimensions). We can't automatically infer the shape of shared variable as certain dimensions). We can't automatically infer the shape of shared variable as
they can change of shape during the execution by default. they can change of shape during the execution by default.
(NOT IMPLEMENTED YET) (NOT IMPLEMENTED YET, BUT IS IN TRAC)
Using Shape information in Optimizations
========================================
To use this shape information in OPTIMIZATIONS, use the ``shape_of`` dictionary.
For example:
.. code-block:: python
try:
shape_of = node.env.shape_feature.shape_of
except AttributeError:
# This can happen when the compilation mode doesn't include the ShapeFeature.
return
shape_of_output_zero = shape_of[node.output[0]]
The ``shape_of_output_zero'' symbol will contain a tuple, whose elements are either
integers or symbolic integers.
TODO: check to see if the symbols are necessarily non-constant... or are integer literals
sometimes Theano constants?? That would be confusing.
""" """
def shape_i(self, i): def shape_i(self, i):
def op_deco(r): def op_deco(r):
......
...@@ -687,27 +687,55 @@ def test_dot_mv(): ...@@ -687,27 +687,55 @@ def test_dot_mv():
def test_gemv1(): def test_gemv1():
''' test vector1+dot(matrix,vector2) ''' ''' test vector1+dot(matrix,vector2) '''
v1 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32')) v1 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32'))
v2 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32')) v2_orig = numpy.array(numpy.random.rand(2), dtype='float32')
v2 = theano.shared( v2_orig )
m = theano.shared( numpy.array(numpy.random.rand(2,2), dtype='float32')) m = theano.shared( numpy.array(numpy.random.rand(2,2), dtype='float32'))
f = theano.function([], v2+theano.dot(m,v1), mode = mode_blas_opt) f = theano.function([], v2+theano.dot(m,v1), mode = mode_blas_opt)
# Assert they produce the same output # Assert they produce the same output
assert numpy.allclose(f(), numpy.dot(m.value,v1.value)+v2.value) assert numpy.allclose(f(), numpy.dot(m.value,v1.value)+v2_orig)
topo = f.maker.env.toposort()
assert len(topo)==1
assert isinstance(topo[0].op, Gemv)
assert topo[0].op.inplace==False
#test the inplace version
f = theano.function([], [], updates={v2:v2+theano.dot(m,v1)}
, mode = mode_blas_opt)
# Assert they produce the same output
f()
assert numpy.allclose(v2.value, numpy.dot(m.value,v1.value)+v2_orig)
topo = f.maker.env.toposort()
assert len(topo)==1
assert isinstance(topo[0].op, Gemv)
assert topo[0].op.inplace==True
assert sum([isinstance(node.op, Gemv) for node in
f.maker.env.toposort() ]) == 1
def test_gemv2(): def test_gemv2():
''' test vector1+dot(vector2,matrix) ''' ''' test vector1+dot(vector2,matrix) '''
v1 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32')) v1 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32'))
v2 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32')) v2_orig = numpy.array(numpy.random.rand(2), dtype='float32')
v2 = theano.shared( v2_orig )
m = theano.shared( numpy.array(numpy.random.rand(2,2), dtype='float32')) m = theano.shared( numpy.array(numpy.random.rand(2,2), dtype='float32'))
f = theano.function([], v2+theano.dot(v1,m), mode = mode_blas_opt) f = theano.function([], v2+theano.dot(v1,m), mode = mode_blas_opt)
# Assert they produce the same output # Assert they produce the same output
assert numpy.allclose(f(), numpy.dot(v1.value,m.value)+v2.value) assert numpy.allclose(f(), numpy.dot(v1.value,m.value)+v2.value)
topo = f.maker.env.toposort()
assert sum(isinstance(node.op, Gemv) for node in topo)==1
assert topo[-1].op.inplace==False
#test the inplace version
f = theano.function([], [], updates={v2:v2+theano.dot(v1,m)}
, mode = mode_blas_opt)
# Assert they produce the same output
f()
assert numpy.allclose(v2.value, numpy.dot(v1.value, m.value)+v2_orig)
topo = f.maker.env.toposort()
assert sum(isinstance(node.op, Gemv) for node in topo)==1
assert topo[0].op.inplace==True
assert sum([isinstance(node.op, Gemv) for node in
f.maker.env.toposort() ]) == 1
...@@ -254,5 +254,52 @@ class test_CAReduce(unittest.TestCase): ...@@ -254,5 +254,52 @@ class test_CAReduce(unittest.TestCase):
#self.with_linker(gof.CLinker(), and_) #self.with_linker(gof.CLinker(), and_)
class test_Prod(unittest.TestCase):
def setUp(self):
unittest_tools.seed_rng()
def test_prod_grad(self):
x_val = numpy.asarray([[1,2,3],[4,5,6],[7,8,9]], dtype='float32')
x = theano.tensor.dmatrix()
p = Prod(axis=0)(x)
# sanity check
fn = theano.function([x], [p])
assert numpy.allclose(fn(x_val), numpy.array([ 28., 80., 162.]))
# very basic case for the product; no broadcasting in x
g = theano.tensor.grad(p.sum(), x)
g_fn = theano.function([x], g)
assert numpy.allclose(g_fn(x_val),
numpy.asarray([[28.,40.,54.],[7.,16.,27.],[4.,10.,18.]]))
# now with some tranposition in input
x_bc = x.dimshuffle(1, 0)
p_bc = Prod(axis=0)(x_bc)
p_bc_sum = p_bc.sum()
g_bc = theano.tensor.grad(p_bc_sum, x)
g_fn_bc = theano.function([x], [p_bc,g_bc])
p_bc_ret, g_bc_ret = g_fn_bc(x_val)
assert numpy.allclose(p_bc_ret, numpy.array([ 6., 120., 504.]))
assert numpy.allclose(g_bc_ret,
numpy.asarray([[6.,3.,2.],[30.,24.,20.],[72.,63.,56.]]))
def test_verify_grad(self):
x_val = numpy.asarray([[1,2,3],[4,5,6],[7,8,9]], dtype='float32')
x = theano.tensor.dmatrix()
# now with verify_grad
unittest_tools.verify_grad(Prod(axis=0), [x_val])
# second time, with some added complexity
# verify_grad takes the sum of the matrices anyway
def fn(x2):
return theano.tensor.sqr(Prod(axis=0)(x2))
unittest_tools.verify_grad(fn, [x_val])
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
#suite = unittest.TestSuite([test_Prod('test_prod_grad')])
#unittest.TextTestRunner().run(suite)
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论