提交 a3282db2 authored 作者: abalkin's avatar abalkin

Merge remote-tracking branch 'upstream/master' into take-op-c-code-clean

# Prevent git from showing duplicate names with commands like "git shortlog"
# # See the manpage of git-shortlog for details.
# # The syntax is:
# # Name that should be used <email that should be used> Bad name <bad email>
# #
# # You can skip Bad name if it is the same as the one that should be used, and is unique.
# #
# # This file is up-to-date if the command git log --format="%aN <%aE>" | sort -u
# # gives no duplicates.
<abergeron@gmail.com> <anakha@kami.(none)> <abergeron@gmail.com> <anakha@kami.(none)>
David Warde-Farley <wardefar@iro.umontreal.ca> David Warde-Farley <dwf@cs.toronto.edu> David Warde-Farley <wardefar@iro.umontreal.ca> David Warde-Farley <dwf@cs.toronto.edu>
David Warde-Farley <wardefar@iro.umontreal.ca> David Warde Farley <dwf@cs.toronto.edu> David Warde-Farley <wardefar@iro.umontreal.ca> David Warde Farley <dwf@cs.toronto.edu>
......
...@@ -4,6 +4,131 @@ ...@@ -4,6 +4,131 @@
Release Notes Release Notes
============= =============
Theano in the development version since 0.6rc2
==============================================
up to merged PR gh-1220
Highlights:
* Speed-ups.
* Crash fixes.
* A few small interface changes.
* GPU memory leak fix.
* A few corner cases fix without incidence.
* More Theano determinism
* tensor.{dot,tensordot} more complete/faster/more GPU friendly.
* tensor.tensordot now support Rop/Lop
* tensor.dot support n-dimensional inputs as NumPy
* To support more NumPy syntax:
* Add theano.tensor.take()
* Add a_tensor_variable.{sort,dot,std,argmin,argmax,argsort,clip,conj,conjugate,repeat,round,trace,real,imag,take}
Commiters for this rc2 only:
Bug fix:
* Fix memory leak on the GPU in some corner with the Theano flags `allow_gc=False`. (Frederic B., reported by Jonas Gehring)
* Fix copy of random state between graph. (Guillaume D.)
http://deeplearning.net/software/theano/tutorial/examples.html#copying-random-state-between-theano-graphs
* Fix wrong dtype in sandbox.linalg.ExtractDiag with shape of 0. (Frederic B., reported by abalkin)
* Correctly support array with more then 2*10e32 element in AdvancedSubtensor1. (Abalkin)
* Fix wrong broadcast dimensions of output of Repeat op. (Abalkin)
We where using the inputs broadcasting pattern in some cases when we shouldn't.
* Fix theano.sandbox.linalg.eigh grad that didn't always returned the right dtype. (Frederic B., Olivier D.)
New Features:
* More Theano determinism (Ian G., Olivier D., Pascal L.)
* Add and use a new class OrderedSet.
* Modify theano.grad to be determinist.
* Warn when using a dict as the updates argument to theano.compile.function, since this makes the returned function non-deterministic.
* The Updates class was not appropriate for representing updates because it is non-deterministic; replaced it with the OrderedUpdates class.
* Implemented GpuContiguous.grad. (Ian G.)
* tensor.tensordot now support Rop/Lop (Jeremiah Lowin)
This remove the class TensorDot and TensorDotGrad. It is the Dot/Elemwise ops that are used.
* tensor.dot support n-dimensional inputs as NumPy (Jeremiah Lowin)
Work on the GPU too.
* The Theano flag `nvcc.flags` now accept `-ftz=true`, `--prec-div=false` and `--prec=sqrt=false` as value. (Frederic B.)
To enable all of them, use the Theano flag `nvcc.flags=--use_fast_math`.
* New op theano.sparse.ConstructSparseFromList (Rami Al-Rfou' Vivek Kulkarni)
* Make Theano work with Anaconda on Windows. (Pascal L.)
* Add tensor_var.diagonal and theano.tensor.{diag,diagonal}. (abalkin)
* AdvencedSubtensor1 can now have a sparse gradient. (Rami Al-Rfou', Vivek Kulkarni)
Interface Deprecation (a warning is printed):
* theano.misc.strutil.renderString -> render_string (Ian G.)
* Will get warning when using dictionary at some place as this make Theano non-deterministic.
Interface Change:
* Raise an error when theano.shared called with a theano variable. (Frederic B.)
* Don't print warning for bug before Theano 0.5 by default. (Frederic B.)
* Theano functions now always have a field name, default to None. (Frederic B.)
* Theano function fct.fgraph have a copy of the Theano function name field. (Ian G.)
This is needed to all the fgraph to know it.
* In the grad method, if it were asked to raise an error if there is no path between the variables, we didn't always returned an error. (Ian G.)
We returned the mathematical right answer 0.
* get_constant_value() renamed get_scalar_constant_value() and raise a new exception tensor.basic.NotScalarConstantError. (Ian G.)
* theano.function raise an error when triing to replace inputs with the given paramter. (Olivier D.)
This was doing nothing, the error message tell what the user probably want to do.
New Interface (reuse existing functionality):
* tensor_var.sort() as a shortcut for theano.tensor.sort. (Jeremiah Lowin)
We where already doing this for argsort.
* Add theano.tensor.take() and a_tensor_var.take() to support NumPy syntax. (abalkin)
* Add a_tensor_variable.{dot,std,argmin,argmax,argsort,clip,conj,conjugate,repeat,round,trace,real,imag}. (abalkin)
New debug feature:
* DebugMode print more info when there is an error. (Frederic B.)
* Better profiling of test time with `theano-nose --time-profile`. (Frederic B.)
* Detection of infinite loop with global optimizer. (Pascal L.)
* DebugMode.check_preallocated_output now also work on Theano function output. (Pascal L.)
Speed-ups:
* c_code for SpecifyShape op. (Frederic B.)
* cross-entropy optimization now work when specify_shape is used. (Pascal L.)
* The Scan optimization ScanSaveMem and PushOutDot1 applied more frequently. (Razvan P, reported Abalkin)
A skipped optimization warning was printed.
* dot(vector, vector) now faster with some BLAS implementation. (Eric Hunsberger)
OpenBLAS and other didn't called {s,d}dot internally when we called {s,g}gemv.
MKL was doing this.
* Compilation speed up: Take the compiledir lock only for op that generate c_code. (Frederic B)
* More scan optimization (Razvan P.)
* Opt to make RNN fast in Theano.
* Optimize some case of dot, by moving them outside of Scan.
* Move some sequences outside of scan too.
* Merge more scan inputs, mostly byproduct of other Scan optimizations.
* c_code for theano.sparse.AddSD. (Rami Al-Rfou', Vivek Kulkarni)
Crash Fixes:
* Fix crash about dimshuffle. (abalkin)
* Fix crash at compilation. (Olivier D.)
* Fix openmp detection. (Pascal L.)
Resulted in a crash with EPD on Windows.
* Fix for new BLAS interface in SciPy. (Olivier D.)
Fix crash with some development version of SciPy.
* GpuSum work with bigger shape when summing on the first dim on 3d tensor. (Frederic B., reported Chris Currivan)
* Windows compilation crash fix. (Frederic B.)
* Make CrossentropySoftmax1HotWithBiasDx and CrossentropySoftmaxArgmax1HotWithBias support uint* dtype. (Frederic B., reported by Mark Fenner)
* Fix GpuSoftmax and GpuSoftmaxWithBias crash on GTX285. (Frederic B.)
* Fix crash due to a race condition when importing theano. (Ian G.)
* Fix crash from path problem with `theano-nose --batch`. (Abalkin)
* Fix crash with tensor.roll(Var, iscalar). (Frederic B., reported by Jeremiah Lowin)
* Fix compilation crash with llvm on Mac. (Abalkin)
* Fix the grad of Scan that told wrongly that there is no connection between cost and parameters. (Razvan P.)
* The infer shape mechanism now force that broadcasted dimensions have a shape know to be equivalent to one during compilation.
Sometimes, we where not able knowing this before run time and resulted in crash. (Frederic B.)
* Fix compilation problems on GPU on Windows. (Frederic B.)
Theoretical bugfix (bug that won't happen with current Theano code, but if you messed with the internal, could have affected you):
* GpuContiguous now check the preallocated outputs strides before using it. (Pascal L.)
Others:
* Fix race condition when determining if g++ is available. (Abalkin)
* Documentation improvements. (Many people including David W-F, abalkin, Amir Elaguizy, Olivier D., Frederic B.)
* The current GPU back-end have a new function CudaNdarray_prep_output(CudaNdarray ** arr, int nd, const int * dims) (Ian G)
=============
Release Notes
=============
Theano 0.6rc2 (November 21th, 2012) Theano 0.6rc2 (November 21th, 2012)
=================================== ===================================
......
...@@ -105,7 +105,7 @@ Brian Vandenberg emailed `installation instructions on Gentoo ...@@ -105,7 +105,7 @@ Brian Vandenberg emailed `installation instructions on Gentoo
<http://groups.google.com/d/msg/theano-dev/-8WCMn2FMR0/bJPasoZXaqoJ>`_, <http://groups.google.com/d/msg/theano-dev/-8WCMn2FMR0/bJPasoZXaqoJ>`_,
focusing on how to install the appropriate dependencies. focusing on how to install the appropriate dependencies.
Nicolas Pinto provide `ebuild scripts <https://github.com/npinto/sekyfsr-gentoo-overlay/tree/master/sci-libs/Theano>`_. Nicolas Pinto provides `ebuild scripts <https://github.com/npinto/sekyfsr-gentoo-overlay/tree/master/sci-libs/Theano>`_.
Alternative installation on Mandriva 2010.2 Alternative installation on Mandriva 2010.2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -657,9 +657,9 @@ Theano dependencies is easy, but be aware that it will take a long time ...@@ -657,9 +657,9 @@ Theano dependencies is easy, but be aware that it will take a long time
Homebrew Homebrew
~~~~~~~~ ~~~~~~~~
There is some :ref:`instruction There are some `instructions
<https://github.com/samueljohn/homebrew-python>` on how to install <https://github.com/samueljohn/homebrew-python>`__ by Samuel John on how to install
Theano dependencies with Homebrew instead of MacPort by Samuel John. Theano dependencies with Homebrew instead of MacPort.
.. _gpu_macos: .. _gpu_macos:
......
...@@ -39,7 +39,7 @@ probably do something similar on older computer. ...@@ -39,7 +39,7 @@ probably do something similar on older computer.
Installation steps Installation steps
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
Ubuntu 11.10/12.04: Ubuntu 11.10/12.04/12.10:
1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git`` 1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git``
2) ``sudo pip install Theano`` 2) ``sudo pip install Theano``
...@@ -70,7 +70,7 @@ Theano/BLAS speed test: ...@@ -70,7 +70,7 @@ Theano/BLAS speed test:
.. code-block:: bash .. code-block:: bash
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py python `python -c "import os, theano; print os.path.dirname(theano.__file__)"`/misc/check_blas.py
This will print a table with different versions of BLAS/numbers of This will print a table with different versions of BLAS/numbers of
threads on multiple CPUs and GPUs. It will also print some Theano/NumPy threads on multiple CPUs and GPUs. It will also print some Theano/NumPy
...@@ -163,6 +163,8 @@ Test GPU configuration ...@@ -163,6 +163,8 @@ Test GPU configuration
Ubuntu 12.04 LTS: default gcc version 4.6.3. gcc 4.4.7 and 4.5.3 availables. Ubuntu 12.04 LTS: default gcc version 4.6.3. gcc 4.4.7 and 4.5.3 availables.
Ubuntu 12.10: default gcc version 4.7.2. gcc 4.4.7, 4.5.4 and 4.6.3 availables.
......
...@@ -1229,6 +1229,7 @@ Linear Algebra ...@@ -1229,6 +1229,7 @@ Linear Algebra
If an integer i, it is converted to an array containing If an integer i, it is converted to an array containing
the last i dimensions of the first tensor and the first the last i dimensions of the first tensor and the first
i dimensions of the second tensor: i dimensions of the second tensor:
axes = [range(a.ndim - i, b.ndim), range(i)] axes = [range(a.ndim - i, b.ndim), range(i)]
If an array, its two elements must contain compatible axes If an array, its two elements must contain compatible axes
...@@ -1251,6 +1252,8 @@ Linear Algebra ...@@ -1251,6 +1252,8 @@ Linear Algebra
are compatible. The resulting tensor will have shape (2, 5, 6) -- the are compatible. The resulting tensor will have shape (2, 5, 6) -- the
dimensions that are not being summed: dimensions that are not being summed:
.. code-block:: python
a = np.random.random((2,3,4)) a = np.random.random((2,3,4))
b = np.random.random((5,6,4,3)) b = np.random.random((5,6,4,3))
...@@ -1284,6 +1287,8 @@ Linear Algebra ...@@ -1284,6 +1287,8 @@ Linear Algebra
In an extreme case, no axes may be specified. The resulting tensor In an extreme case, no axes may be specified. The resulting tensor
will have shape equal to the concatenation of the shapes of a and b: will have shape equal to the concatenation of the shapes of a and b:
.. code-block:: python
c = np.tensordot(a, b, 0) c = np.tensordot(a, b, 0)
print(a.shape) #(2,3,4) print(a.shape) #(2,3,4)
print(b.shape) #(5,6,4,3) print(b.shape) #(5,6,4,3)
......
...@@ -7,8 +7,11 @@ ...@@ -7,8 +7,11 @@
.. note:: .. note::
Two similar implementation exists for conv2d: Two similar implementation exists for conv2d:
:func:`signal.conv2d <theano.tensor.signal.conv.conv2d>` and :func:`signal.conv2d <theano.tensor.signal.conv.conv2d>` and
:func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`. The former implements a traditional :func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
The former implements a traditional
2D convolution, while the latter implements the convolutional layers 2D convolution, while the latter implements the convolutional layers
present in convolutional neural networks (where filters are 3D and pool present in convolutional neural networks (where filters are 3D and pool
over several input channels). over several input channels).
......
...@@ -74,11 +74,11 @@ cross-entropy (note that this assumes that x will contain values between 0 and ...@@ -74,11 +74,11 @@ cross-entropy (note that this assumes that x will contain values between 0 and
.. code-block:: python .. code-block:: python
x,y,b = T.dvectors('x','y','b') x, y, b = T.dvectors('x', 'y', 'b')
W = T.dmatrix('W') W = T.dmatrix('W')
h = T.nnet.sigmoid(T.dot(W,x) + b) h = T.nnet.sigmoid(T.dot(W, x) + b)
x_recons = T.nnet.sigmoid(T.dot(V,h) + c) x_recons = T.nnet.sigmoid(T.dot(V, h) + c)
recon_cost = T.nnet.binary_crossentropy(x_recons,x).mean() recon_cost = T.nnet.binary_crossentropy(x_recons, x).mean()
.. function:: categorical_crossentropy(coding_dist,true_dist) .. function:: categorical_crossentropy(coding_dist,true_dist)
...@@ -87,7 +87,7 @@ cross-entropy (note that this assumes that x will contain values between 0 and ...@@ -87,7 +87,7 @@ cross-entropy (note that this assumes that x will contain values between 0 and
needed to identify an event from a set of possibilities, if a coding scheme is used based needed to identify an event from a set of possibilities, if a coding scheme is used based
on a given probability distribution q, rather than the "true" distribution p. Mathematically, this on a given probability distribution q, rather than the "true" distribution p. Mathematically, this
function computes :math:`H(p,q) = - \sum_x p(x) \log(q(x))`, where function computes :math:`H(p,q) = - \sum_x p(x) \log(q(x))`, where
p=coding_dist and q=true_dist p=true_dist and q=coding_dist.
:Parameters: :Parameters:
...@@ -108,6 +108,6 @@ cross-entropy (note that this assumes that x will contain values between 0 and ...@@ -108,6 +108,6 @@ cross-entropy (note that this assumes that x will contain values between 0 and
.. code-block:: python .. code-block:: python
y = T.nnet.softmax(T.dot(W,x) + b) y = T.nnet.softmax(T.dot(W, x) + b)
cost = T.nnet.categorical_crossentropy(y,o) cost = T.nnet.categorical_crossentropy(y, o)
# o is either the above-mentioned 1-of-N vector or 2D tensor # o is either the above-mentioned 1-of-N vector or 2D tensor
...@@ -7,8 +7,11 @@ ...@@ -7,8 +7,11 @@
.. note:: .. note::
Two similar implementation exists for conv2d: Two similar implementation exists for conv2d:
:func:`signal.conv2d <theano.tensor.signal.conv.conv2d>` and :func:`signal.conv2d <theano.tensor.signal.conv.conv2d>` and
:func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>. The former implements a traditional :func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
The former implements a traditional
2D convolution, while the latter implements the convolutional layers 2D convolution, while the latter implements the convolutional layers
present in convolutional neural networks (where filters are 3D and pool present in convolutional neural networks (where filters are 3D and pool
over several input channels). over several input channels).
......
...@@ -284,13 +284,13 @@ Tips for Improving Performance on GPU ...@@ -284,13 +284,13 @@ Tips for Improving Performance on GPU
Check the line similar to *Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op*. Check the line similar to *Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op*.
This can tell you if not enough of your graph is on the GPU or if there This can tell you if not enough of your graph is on the GPU or if there
is too much memory transfer. is too much memory transfer.
* Use nvcc options. nvcc support those options to speed up some * Use nvcc options. nvcc supports those options to speed up some
computations: `-ftz=true` to `flush denormals values to computations: `-ftz=true` to `flush denormals values to
zeros. <https://developer.nvidia.com/content/cuda-pro-tip-flush-denormals-confidence>`_, zeros. <https://developer.nvidia.com/content/cuda-pro-tip-flush-denormals-confidence>`_,
`--prec-div=false` and `--prec-sqrt=false` option to speed up `--prec-div=false` and `--prec-sqrt=false` options to speed up
division and square root operation by being less precise. You can division and square root operation by being less precise. You can
enable all of them with with the `nvcc.flags=--use_fast_math` Theano enable all of them with the `nvcc.flags=--use_fast_math` Theano
flags or you can enable them individually as in this example flag or you can enable them individually as in this example:
`nvcc.flags=-ftz=true --prec-div=false`. `nvcc.flags=-ftz=true --prec-div=false`.
.. _gpu_async: .. _gpu_async:
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
""" """
__docformat__ = "restructuredtext en" __docformat__ = "restructuredtext en"
import time, copy, sys, copy_reg, gc, os import copy, sys, copy_reg, gc
from itertools import izip from itertools import izip
from StringIO import StringIO from StringIO import StringIO
......
...@@ -3,13 +3,10 @@ ...@@ -3,13 +3,10 @@
__docformat__ = "restructuredtext en" __docformat__ = "restructuredtext en"
import logging import logging
import sys
import traceback
_logger = logging.getLogger('theano.compile.function') _logger = logging.getLogger('theano.compile.function')
from io import In from io import In
from function_module import orig_function from function_module import orig_function
from profiling import ProfileStats
from pfunc import pfunc from pfunc import pfunc
from numpy import any # to work in python 2.4 from numpy import any # to work in python 2.4
import warnings import warnings
...@@ -164,8 +161,9 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None, ...@@ -164,8 +161,9 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None,
if updates is None: if updates is None:
updates = [] updates = []
if isinstance(updates, dict) and \ if (isinstance(updates, dict) and
not isinstance(updates, gof.python25.OrderedDict): not isinstance(updates, gof.python25.OrderedDict) and
len(updates) > 1):
warnings.warn( warnings.warn(
"The parameter 'updates' of theano.function()" "The parameter 'updates' of theano.function()"
" expects an OrderedDict," " expects an OrderedDict,"
...@@ -186,8 +184,8 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None, ...@@ -186,8 +184,8 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None,
# compute some features of the arguments: # compute some features of the arguments:
uses_In = any([isinstance(i, In) for i in inputs]) # N.B. the square brackets are ncessary uses_In = any([isinstance(i, In) for i in inputs]) # N.B. the square brackets are ncessary
uses_tuple = any([isinstance(i, (list, tuple)) for i in inputs]) # N.B. the square brackets are ncessary uses_tuple = any([isinstance(i, (list, tuple)) for i in inputs]) # N.B. the square brackets are ncessary
uses_updates = (updates != []) uses_updates = bool(updates)
uses_givens = (givens != []) uses_givens = bool(givens)
# See if we have any mutable / borrow inputs # See if we have any mutable / borrow inputs
check_for_aliased_inputs = False check_for_aliased_inputs = False
...@@ -201,7 +199,9 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None, ...@@ -201,7 +199,9 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None,
if profile: if profile:
raise NotImplementedError('profiling not supported in old-style function') raise NotImplementedError('profiling not supported in old-style function')
if uses_updates or uses_givens: if uses_updates or uses_givens:
raise NotImplementedError("In() instances and tuple inputs triggers the old semantics, which disallow using updates and givens") raise NotImplementedError(
"In() instances and tuple inputs trigger the old "
"semantics, which disallow using updates and givens")
fn = orig_function(inputs, outputs, fn = orig_function(inputs, outputs,
mode=mode, mode=mode,
accept_inplace=accept_inplace, name=name) accept_inplace=accept_inplace, name=name)
......
...@@ -9,7 +9,7 @@ from theano import config ...@@ -9,7 +9,7 @@ from theano import config
from theano.compile import orig_function, In, Out from theano.compile import orig_function, In, Out
from theano.compile import UnusedInputError from theano.compile import UnusedInputError
from theano.compile.sharedvalue import SharedVariable, shared from theano.compile.sharedvalue import SharedVariable, shared
from theano.gof import Container, Variable, generic, graph, Constant from theano.gof import Variable, Constant
from theano.gof.python25 import any from theano.gof.python25 import any
import logging import logging
...@@ -233,8 +233,8 @@ def rebuild_collect_shared(outputs, ...@@ -233,8 +233,8 @@ def rebuild_collect_shared(outputs,
cloned_outputs.append(Out(cloned_v, borrow=v.borrow)) cloned_outputs.append(Out(cloned_v, borrow=v.borrow))
else: else:
raise TypeError('Outputs must be theano Variable or ' raise TypeError('Outputs must be theano Variable or '
'Out instances. Received ' + str(v)\ 'Out instances. Received ' + str(v)
+ ' of type '+str(type(v))) + ' of type ' + str(type(v)))
#computed_list.append(cloned_v) #computed_list.append(cloned_v)
else: else:
if isinstance(outputs, Variable): if isinstance(outputs, Variable):
...@@ -278,7 +278,8 @@ class Param(object): ...@@ -278,7 +278,8 @@ class Param(object):
def __init__(self, variable, default=None, name=None, mutable=False, def __init__(self, variable, default=None, name=None, mutable=False,
strict=False, allow_downcast=None, implicit=None, borrow=None): strict=False, allow_downcast=None, implicit=None, borrow=None):
""" """
:param variable: A variable in an expression graph to use as a compiled-function parameter :param variable: A variable in an expression graph to use as a
compiled-function parameter
:param default: The default value to use at call-time (can also be a Container where :param default: The default value to use at call-time (can also be a Container where
the function will find a value at call-time.) the function will find a value at call-time.)
...@@ -290,10 +291,11 @@ class Param(object): ...@@ -290,10 +291,11 @@ class Param(object):
:param borrow: Whether the function is allowed to alias some output to :param borrow: Whether the function is allowed to alias some output to
this input. Using None (default) means we re-use the same value as the this input. Using None (default) means we re-use the same value as the
`mutable` flag. `mutable` flag.
False: do not permit any output to be aliased to the input False: do not permit any output to be aliased to the input
:param strict: False -> function arguments may be copied or cast to match the :param strict: False -> function arguments may be copied or cast to match the
type required by the parameter `variable`. True -> function arguments must exactly match the type type required by the parameter `variable`.
True -> function arguments must exactly match the type
required by `variable`. required by `variable`.
:param allow_downcast: Only applies if `strict` is False. :param allow_downcast: Only applies if `strict` is False.
...@@ -452,6 +454,27 @@ def pfunc(params, outputs=None, mode=None, updates=None, givens=None, ...@@ -452,6 +454,27 @@ def pfunc(params, outputs=None, mode=None, updates=None, givens=None,
"provided for it being ignored. Please do not duplicate " "provided for it being ignored. Please do not duplicate "
"variables in the inputs list." % (v, i, dup_v_i))) "variables in the inputs list." % (v, i, dup_v_i)))
# Check that we are not using `givens` to replace input variables, because
# this typically does nothing, contrary to what one may expect.
in_var_set = set(in_variables)
try:
givens_pairs = givens.items()
except AttributeError:
givens_pairs = givens
for x, y in givens_pairs:
if x in in_var_set:
raise RuntimeError(
'You are trying to replace variable \'%s\' through the '
'`givens` parameter, but this variable is an input to your '
'function. Replacing inputs is currently forbidden because it '
'has no effect. One way to modify an input `x` to a function '
'evaluating f(x) is to define a new input `y` and use '
'`theano.function([y], f(x), givens={x: g(y)})`. Another '
'solution consists in using `theano.clone`, e.g. like this: '
'`theano.function([x], '
'theano.clone(f(x), replace={x: g(x)}))`.'
% x)
output_vars = rebuild_collect_shared(outputs, output_vars = rebuild_collect_shared(outputs,
in_variables, in_variables,
replace=givens, replace=givens,
......
...@@ -386,6 +386,14 @@ class T_function(unittest.TestCase): ...@@ -386,6 +386,14 @@ class T_function(unittest.TestCase):
self.assertRaises(UnusedInputError, function, [m, mt], mt*2) self.assertRaises(UnusedInputError, function, [m, mt], mt*2)
f = function([m, mt], mt*2, on_unused_input='ignore') f = function([m, mt], mt*2, on_unused_input='ignore')
def test_givens_input_var(self):
"""
Ensure error is raised when trying to replace an input variable.
"""
x = T.scalar('x')
y = x * 2
self.assertRaises(RuntimeError, function, [x], y, givens={x: x + 1})
class T_picklefunction(unittest.TestCase): class T_picklefunction(unittest.TestCase):
...@@ -680,6 +688,18 @@ class SomethingToPickle(object): ...@@ -680,6 +688,18 @@ class SomethingToPickle(object):
self.f2 = function([x, In(a, value=1.0,name='a'), In(s, value=self.f1.container[s], update=s+a*x, mutable=True)], s+a*x) self.f2 = function([x, In(a, value=1.0,name='a'), In(s, value=self.f1.container[s], update=s+a*x, mutable=True)], s+a*x)
def test_empty_givens_updates():
"""
Regression test for bug fixed in 8625e03.
"""
# Empty givens / updates dictionaries were not properly detected before,
# triggering useless crashes at compile time.
x = T.scalar()
y = x * 2
function([theano.In(x)], y, givens={})
function([theano.In(x)], y, updates={})
if __name__ == '__main__': if __name__ == '__main__':
if 1: if 1:
......
...@@ -420,6 +420,11 @@ else: ...@@ -420,6 +420,11 @@ else:
" want theano to use.") " want theano to use.")
default_openmp = count > 1 default_openmp = count > 1
# Disable it by default for now as currently only the ConvOp support
# it And this cause slow down by default as we do not disable it for
# too small convolution.
default_openmp = False
AddConfigVar('openmp', AddConfigVar('openmp',
"Allow (or not) parallel computation on the CPU with OpenMP. " "Allow (or not) parallel computation on the CPU with OpenMP. "
"This is the default value used when creating an Op that " "This is the default value used when creating an Op that "
......
import cPickle, logging, sys import cPickle, logging
_logger=logging.getLogger("theano.gof.callcache") _logger=logging.getLogger("theano.gof.callcache")
......
...@@ -892,8 +892,8 @@ class ModuleCache(object): ...@@ -892,8 +892,8 @@ class ModuleCache(object):
key_data = None key_data = None
# We have never seen this key before. # We have never seen this key before.
# We acquire the lock later only if we where able to # We acquire the lock later only if we were able to
# generate c code Otherwise, we would take the lock for op # generate C code. Otherwise, we would take the lock for ops
# that have only a perform(). # that have only a perform().
lock_taken = False lock_taken = False
# This try/finally block ensures that the lock is released once we # This try/finally block ensures that the lock is released once we
...@@ -920,11 +920,14 @@ class ModuleCache(object): ...@@ -920,11 +920,14 @@ class ModuleCache(object):
src_code = compile_steps.next() src_code = compile_steps.next()
module_hash = get_module_hash(src_code, key) module_hash = get_module_hash(src_code, key)
# The op have c_code, so take the lock. # The op has c_code, so take the lock.
compilelock.get_lock() compilelock.get_lock()
lock_taken = True lock_taken = True
assert os.path.exists(location), (
"The directory just created shouldn't be deleted!") if not os.path.exists(location):
# Temporary fix, we should make sure it don't
# get deleted by the clear*() fct.
os.makedirs(path)
if module_hash in self.module_hash_to_key_data: if module_hash in self.module_hash_to_key_data:
_logger.debug("Duplicated module! Will re-use the " _logger.debug("Duplicated module! Will re-use the "
...@@ -1469,7 +1472,7 @@ class GCC_compiler(object): ...@@ -1469,7 +1472,7 @@ class GCC_compiler(object):
#cxxflags.append("-D NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION") #cxxflags.append("-D NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION")
numpy_ver = [int(n) for n in numpy.__version__.split('.')[:2]] numpy_ver = [int(n) for n in numpy.__version__.split('.')[:2]]
# numpy 1.7 deprecated the following macro but the didn't # numpy 1.7 deprecated the following macro but the new one didn't
# existed in the past # existed in the past
if bool(numpy_ver < [1, 7]): if bool(numpy_ver < [1, 7]):
cxxflags.append("-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY") cxxflags.append("-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY")
...@@ -1483,7 +1486,7 @@ class GCC_compiler(object): ...@@ -1483,7 +1486,7 @@ class GCC_compiler(object):
@staticmethod @staticmethod
def compile_str(module_name, src_code, location=None, def compile_str(module_name, src_code, location=None,
include_dirs=None, lib_dirs=None, libs=None, include_dirs=None, lib_dirs=None, libs=None,
preargs=None): preargs=None, py_module=True):
""" """
:param module_name: string (this has been embedded in the src_code :param module_name: string (this has been embedded in the src_code
...@@ -1503,7 +1506,11 @@ class GCC_compiler(object): ...@@ -1503,7 +1506,11 @@ class GCC_compiler(object):
:param preargs: a list of extra compiler arguments :param preargs: a list of extra compiler arguments
:param py_module: if False, compile to a shared library, but do not
import it as a Python module.
:returns: dynamically-imported python module of the compiled code. :returns: dynamically-imported python module of the compiled code.
(unless py_module is False, in that case returns None.)
""" """
#TODO: Do not do the dlimport in this function #TODO: Do not do the dlimport in this function
...@@ -1628,6 +1635,7 @@ class GCC_compiler(object): ...@@ -1628,6 +1635,7 @@ class GCC_compiler(object):
# Print errors just below the command line. # Print errors just below the command line.
print compile_stderr print compile_stderr
if py_module:
#touch the __init__ file #touch the __init__ file
file(os.path.join(location, "__init__.py"), 'w').close() file(os.path.join(location, "__init__.py"), 'w').close()
return dlimport(lib_filename) return dlimport(lib_filename)
......
...@@ -42,7 +42,7 @@ compiledir_format_dict = {"platform": platform.platform(), ...@@ -42,7 +42,7 @@ compiledir_format_dict = {"platform": platform.platform(),
"numpy_version": numpy.__version__, "numpy_version": numpy.__version__,
"gxx_version": gcc_version_str.replace(" ", "_"), "gxx_version": gcc_version_str.replace(" ", "_"),
} }
compiledir_format_keys = ", ".join(compiledir_format_dict.keys()) compiledir_format_keys = ", ".join(sorted(compiledir_format_dict.keys()))
default_compiledir_format =\ default_compiledir_format =\
"compiledir_%(platform)s-%(processor)s-%(python_version)s" "compiledir_%(platform)s-%(processor)s-%(python_version)s"
......
...@@ -2,7 +2,6 @@ ...@@ -2,7 +2,6 @@
# same compilation directory (which can cause crashes). # same compilation directory (which can cause crashes).
from theano import config from theano import config
import compiledir
import os, random, time, atexit import os, random, time, atexit
import socket # only used for gethostname() import socket # only used for gethostname()
import logging import logging
......
...@@ -2,12 +2,6 @@ ...@@ -2,12 +2,6 @@
Classes and functions for validating graphs that contain view Classes and functions for validating graphs that contain view
and inplace operations. and inplace operations.
""" """
import sys
if sys.version_info[:2] >= (2,5):
from collections import defaultdict
# otherwise it's implemented in python25.py
import theano import theano
import toolbox import toolbox
import graph import graph
......
...@@ -12,7 +12,6 @@ from python25 import all ...@@ -12,7 +12,6 @@ from python25 import all
from theano import config from theano import config
import warnings import warnings
NullType = None NullType = None
import theano
from python25 import OrderedDict from python25 import OrderedDict
from theano.misc.ordered_set import OrderedSet from theano.misc.ordered_set import OrderedSet
......
...@@ -764,6 +764,7 @@ class OpenMPOp(Op): ...@@ -764,6 +764,7 @@ class OpenMPOp(Op):
self.openmp = openmp self.openmp = openmp
def c_compile_args(self): def c_compile_args(self):
self.update_self_openmp()
if self.openmp: if self.openmp:
return ['-fopenmp'] return ['-fopenmp']
return [] return []
...@@ -808,7 +809,10 @@ class OpenMPOp(Op): ...@@ -808,7 +809,10 @@ class OpenMPOp(Op):
return False return False
return default_openmp return default_openmp
def make_thunk(self, node, storage_map, compute_map, no_recycling): def update_self_openmp(self):
"""
Make sure self.openmp is not True if there is no support in gxx
"""
if self.openmp: if self.openmp:
if OpenMPOp.gxx_support_openmp is None: if OpenMPOp.gxx_support_openmp is None:
OpenMPOp.gxx_support_openmp = OpenMPOp.test_gxx_support() OpenMPOp.gxx_support_openmp = OpenMPOp.test_gxx_support()
...@@ -819,9 +823,13 @@ class OpenMPOp(Op): ...@@ -819,9 +823,13 @@ class OpenMPOp(Op):
" know this happen with some version of the EPD mingw" " know this happen with some version of the EPD mingw"
" compiler. We disable openmp everywhere in Theano." " compiler. We disable openmp everywhere in Theano."
" To remove this warning set the theano flags `openmp`" " To remove this warning set the theano flags `openmp`"
" to False.") " to False.",
stacklevel=3)
if OpenMPOp.gxx_support_openmp is False: if OpenMPOp.gxx_support_openmp is False:
self.openmp = False self.openmp = False
theano.config.openmp = False theano.config.openmp = False
def make_thunk(self, node, storage_map, compute_map, no_recycling):
self.update_self_openmp()
return super(OpenMPOp, self).make_thunk(node, storage_map, return super(OpenMPOp, self).make_thunk(node, storage_map,
compute_map, no_recycling) compute_map, no_recycling)
...@@ -2,11 +2,9 @@ ...@@ -2,11 +2,9 @@
__docformat__ = "restructuredtext en" __docformat__ = "restructuredtext en"
import copy
import utils import utils
from utils import MethodNotDefined, object2 from utils import MethodNotDefined, object2
import graph import graph
from theano import config
######## ########
# Type # # Type #
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# import variable # import variable
from theano import config from theano import config
import re, os, traceback import re, traceback
def add_tag_trace(thing): def add_tag_trace(thing):
"""Add tag.trace to an node or variable. """Add tag.trace to an node or variable.
......
...@@ -21,10 +21,8 @@ from itertools import izip ...@@ -21,10 +21,8 @@ from itertools import izip
from theano import gof from theano import gof
from theano.gof import Variable from theano.gof import Variable
from theano.gof.python25 import OrderedDict from theano.gof.python25 import OrderedDict
from theano.gof.python25 import all
import theano.gof.utils
from theano.gof.null_type import NullType from theano.gof.null_type import NullType
from theano.printing import min_informative_str
# we can't do "import theano.tensor" # we can't do "import theano.tensor"
# tensor depends on theano.compile # tensor depends on theano.compile
# theano.compile depends on theano.gradient (this file) # theano.compile depends on theano.gradient (this file)
......
...@@ -194,41 +194,28 @@ if __name__ == "__main__": ...@@ -194,41 +194,28 @@ if __name__ == "__main__":
goto2 1.13/16 3.16s goto2 1.13/16 3.16s
Test time in float32 Test time in float32
(cuda version 3.2RC and up have a faster gemm on the Fermi/GTX[45]??)
cuda version 5.0 4.2 4.1 4.0 3.2 3.0 # note
gpu/cuda version gpu
M2050(Amazon)/5.0 0.25s M2070 0.25s 0.27s 0.32s
M2050(Amazon) 0.25s
GTX680/4.2 0.154s C2075 0.25s
GTX580/4.2 0.164s C1060 0.46s
GTX480/4.2 0.192s
GTX470/4.2 0.238s GTX680 0.154s 0.218s
C2075/4.2 0.25s GTX580 0.164s 0.203s
GTX285/4.2 0.452s #cuda 3.0 seam faster? driver version? GTX480 0.192s 0.237s 0.27s
GT520/4.2 2.68s GTX470 0.238s 0.297s 0.34s
GTX560/4.2 0.30s GTX660 0.24s
GTX560 0.30s
GTX460/4.0 0.45s GTX460 0.37s 0.45s
GTX285 0.452s 0.452s 0.40s # cuda 3.0 seam faster? driver version?
GTX580/3.2 0.203s GTX550Ti 0.57s
GTX680/3.2 0.218s GT520 2.68s 3.06s
GTX480/3.2 0.237s 520M 3.19s # with bumblebee on Ubuntu 12.04
GTX470/3.2 0.297s GT220 3.80s
GTX285/3.2 0.452s #cuda 3.0 seam faster? driver version? GT210 6.35s
8500GT 10.68s
GTX480/3.0 0.27s
M2070/4.1 0.27s
GTX470/3.2 0.29s
M2070/3.2 0.32s
GTX470/3.0 0.34s
GTX285/3.0 0.40s
C1060/3.2 0.46s
GTX550Ti/4.0 0.57s
520/3.2 3.06s
520M/3.2 3.19s with bumblebee on Ubuntu 12.04
GT220/3.2RC 3.80s
GT210/4.0 6.35s
8500GT/3.0 10.68s
""" """
t, impl = execute(not options.print_only, not options.quiet, t, impl = execute(not options.print_only, not options.quiet,
......
def renderString(string, dict): import warnings
def render_string(string, sub):
"""
string: a string, containing formatting instructions
sub: a dictionary containing keys and values to substitute for
them.
returns: string % sub
The only difference between this function and the % operator
is that it raises an exception with a more informative error
message than the % operator does.
"""
try: try:
finalCode = string % dict finalCode = string % sub
except Exception , E: except Exception , E:
#print 'could not render C code due to exception with message "'+str(E)+'", trying to find out why...' # If unable to render the string, render longer and longer
# initial substrings until we find the minimal initial substring
# that causes an error
i = 0 i = 0
while i <= len(string): while i <= len(string):
try: try:
finalCode = string[0:i] % dict finalCode = string[0:i] % sub
except Exception, F: except Exception, F:
if str(F) == str(E): if str(F) == str(E):
raise Exception(string[0:i]+"<<<< caused exception "+str(F)) raise Exception(string[0:i]+"<<<< caused exception "+str(F))
i+=1 i+=1
assert False assert False
return finalCode return finalCode
#
def renderString(string, dict):
warnings.warn("renderString is deprecated. It is now called render_string",
stacklevel = 2)
return render_string(string, dict)
def pretty_format(string): def pretty_format(string):
lines = string.split('\n') lines = string.split('\n')
...@@ -34,11 +53,8 @@ def pretty_format(string): ...@@ -34,11 +53,8 @@ def pretty_format(string):
rval = '\n'.join(lines) rval = '\n'.join(lines)
return rval return rval
#
def strip_leading_white_space(line): def strip_leading_white_space(line):
while len(line) >0 and (line[0]==' ' or line[0]=='\t'): while len(line) >0 and (line[0]==' ' or line[0]=='\t'):
line = line[1:] line = line[1:]
#
return line return line
#
...@@ -13,5 +13,9 @@ def call_subprocess_Popen(command, **params): ...@@ -13,5 +13,9 @@ def call_subprocess_Popen(command, **params):
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
except AttributeError: except AttributeError:
startupinfo.dwFlags |= subprocess._subprocess.STARTF_USESHOWWINDOW startupinfo.dwFlags |= subprocess._subprocess.STARTF_USESHOWWINDOW
# Under Windows 7 64-bits, Anaconda's g++ is not found unless
# specifying "shell=True".
params['shell'] = True
proc = subprocess.Popen(command, startupinfo=startupinfo, **params) proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
return proc return proc
...@@ -220,7 +220,7 @@ if(!work_complete){ ...@@ -220,7 +220,7 @@ if(!work_complete){
}}}}}}} //extra scope so error handler jumps don't cross declarations }}}}}}} //extra scope so error handler jumps don't cross declarations
///////////// < /code generated by GpuConv3D > ///////////// < /code generated by GpuConv3D >
""" """
return strutil.renderString(codeSource,locals()) return strutil.render_string(codeSource,locals())
def c_support_code_apply(self, node, nodename): def c_support_code_apply(self, node, nodename):
# This code is not sensitive to the ignore_border flag. # This code is not sensitive to the ignore_border flag.
...@@ -279,7 +279,7 @@ conv_rows_stack( float* img, float* kern, float* bias, float* out, ...@@ -279,7 +279,7 @@ conv_rows_stack( float* img, float* kern, float* bias, float* out,
""" """
return codeSource#renderString(codeSource,locals()) return codeSource
gpu_convd = GpuConv3D() gpu_convd = GpuConv3D()
......
...@@ -336,7 +336,7 @@ convgrad_rows_stack( float* img, float* dCdH, float* dCdW, ...@@ -336,7 +336,7 @@ convgrad_rows_stack( float* img, float* dCdH, float* dCdW,
dCdW[j,z,k,l,m] += dCdH[i,j,p,q,r] * V[i,z,dr*p+k,dc*q+l,dt*r+m] dCdW[j,z,k,l,m] += dCdH[i,j,p,q,r] * V[i,z,dr*p+k,dc*q+l,dt*r+m]
*/ */
""" """
return codeSource#renderString(codeSource,locals()) return codeSource
gpu_conv_grad3d = GpuConvGrad3D() gpu_conv_grad3d = GpuConvGrad3D()
......
...@@ -263,7 +263,7 @@ if(!work_complete){ ...@@ -263,7 +263,7 @@ if(!work_complete){
}}}}}} // for fail }}}}}} // for fail
///////////// < /code generated by GpuConvTransp3D > ///////////// < /code generated by GpuConvTransp3D >
""" """
return strutil.renderString(codeSource,locals()) return strutil.render_string(codeSource,locals())
def c_support_code_apply(self, node, nodename): def c_support_code_apply(self, node, nodename):
# This code is not sensitive to the ignore_border flag. # This code is not sensitive to the ignore_border flag.
......
...@@ -218,7 +218,7 @@ if cuda_available: ...@@ -218,7 +218,7 @@ if cuda_available:
atexit.register(gpu_shutdown) atexit.register(gpu_shutdown)
except EnvironmentError, e: except EnvironmentError, e:
cuda_available = False cuda_available = False
cuda_initialization_error_message = e.message cuda_initialization_error_message = " ".join(e.args)
class GpuOp(theano.gof.Op): class GpuOp(theano.gof.Op):
......
...@@ -13,15 +13,20 @@ scal = scalar # somewhere scalar gets reassigned to be a function ...@@ -13,15 +13,20 @@ scal = scalar # somewhere scalar gets reassigned to be a function
from theano.gof.python25 import all, any from theano.gof.python25 import all, any
from theano.sandbox.cuda import GpuOp, device_properties try:
# We must be able to import this file to create the full doc when nvcc
# is not available
from theano.sandbox.cuda import filter as type_support_filter
from theano.sandbox.cuda import device_properties
import cuda_ndarray
except ImportError:
pass
from theano.sandbox.cuda import GpuOp
from theano.sandbox.cuda.type import CudaNdarrayType from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda import filter as type_support_filter
from theano.sandbox.cuda.elemwise import NaiveAlgo from theano.sandbox.cuda.elemwise import NaiveAlgo
import cuda_ndarray
_logger_name = 'theano.sandbox.cuda.basic_ops' _logger_name = 'theano.sandbox.cuda.basic_ops'
_logger = logging.getLogger(_logger_name) _logger = logging.getLogger(_logger_name)
_logger.setLevel(logging.INFO) _logger.setLevel(logging.INFO)
...@@ -2267,9 +2272,17 @@ class GpuSubtensor(GpuOp, tensor.Subtensor): ...@@ -2267,9 +2272,17 @@ class GpuSubtensor(GpuOp, tensor.Subtensor):
set_dim='CudaNdarray_set_dim', set_dim='CudaNdarray_set_dim',
set_stride='CudaNdarray_set_stride', set_stride='CudaNdarray_set_stride',
update_flags="", strides_mul=4) update_flags="", strides_mul=4)
finish_view = ""
#For broadcasted dimensions, set the strides to 0
#We can't do that only for broadcasted dimensions as this can happen for dimensions of size 0,
#That are rebroadcated later.
for idx in range(node.outputs[0].ndim):
finish_view += """
if(CudaNdarray_HOST_DIMS(xview)[%(idx)s]==1)
CudaNdarray_set_stride(xview, %(idx)s, 0);
""" % locals()
finish_view += """
finish_view = """
//Set the base only now //Set the base only now
if(CudaNdarray_set_device_data(xview, CudaNdarray_DEV_DATA(xview), if(CudaNdarray_set_device_data(xview, CudaNdarray_DEV_DATA(xview),
...@@ -2287,6 +2300,13 @@ class GpuSubtensor(GpuOp, tensor.Subtensor): ...@@ -2287,6 +2300,13 @@ class GpuSubtensor(GpuOp, tensor.Subtensor):
return build_view + "{" + get_xview + "}" + finish_view return build_view + "{" + get_xview + "}" + finish_view
def c_code_cache_version(self):
hv = self.helper_c_code_cache_version()
# If `helper_c_code_cache_version` is not versioned we do not want to
# have a versioned version of this op's C code.
if len(hv) == 0:
return ()
return (3, hv)
class GpuAdvancedSubtensor1(tensor.AdvancedSubtensor1, GpuOp): class GpuAdvancedSubtensor1(tensor.AdvancedSubtensor1, GpuOp):
""" """
...@@ -2455,7 +2475,7 @@ class GpuIncSubtensor(tensor.IncSubtensor, GpuOp): ...@@ -2455,7 +2475,7 @@ class GpuIncSubtensor(tensor.IncSubtensor, GpuOp):
:return: C code expression to make a copy of x :return: C code expression to make a copy of x
Base class uses PyArrayObject *, subclasses may override for Base class uses `PyArrayObject *`, subclasses may override for
different types of arrays. different types of arrays.
""" """
return """(CudaNdarray*) CudaNdarray_Copy(%(x)s)""" % locals() return """(CudaNdarray*) CudaNdarray_Copy(%(x)s)""" % locals()
......
...@@ -53,7 +53,13 @@ struct table_struct{ ...@@ -53,7 +53,13 @@ struct table_struct{
}; };
table_struct _alloc_size_table[TABLE_SIZE]; table_struct _alloc_size_table[TABLE_SIZE];
#endif #endif
void * device_malloc(size_t size) void * device_malloc(size_t size)
{
return device_malloc(size, VERBOSE_DEVICE_MALLOC);
}
void * device_malloc(size_t size, int verbose)
{ {
void * rval=NULL; void * rval=NULL;
cudaError_t err = cudaMalloc(&rval, size); cudaError_t err = cudaMalloc(&rval, size);
...@@ -64,11 +70,14 @@ void * device_malloc(size_t size) ...@@ -64,11 +70,14 @@ void * device_malloc(size_t size)
// it returns something else I still don't see why we should ignore // it returns something else I still don't see why we should ignore
// it. All we want to do here is reset the flag. // it. All we want to do here is reset the flag.
cudaGetLastError(); cudaGetLastError();
if (verbose)
{
#if COMPUTE_GPU_MEM_USED #if COMPUTE_GPU_MEM_USED
fprintf(stderr, "Error allocating %li bytes of device memory (%s). new total bytes allocated: %d\n", (long)size, cudaGetErrorString(err),_allocated_size); fprintf(stderr, "Error allocating %li bytes of device memory (%s). new total bytes allocated: %d\n", (long)size, cudaGetErrorString(err),_allocated_size);
#else #else
fprintf(stderr, "Error allocating %li bytes of device memory (%s).\n", (long)size, cudaGetErrorString(err)); fprintf(stderr, "Error allocating %li bytes of device memory (%s).\n", (long)size, cudaGetErrorString(err));
#endif #endif
}
PyErr_Format(PyExc_MemoryError, PyErr_Format(PyExc_MemoryError,
"Error allocating %li bytes of device memory (%s).", (long)size, cudaGetErrorString(err)); "Error allocating %li bytes of device memory (%s).", (long)size, cudaGetErrorString(err));
return NULL; return NULL;
......
...@@ -42,6 +42,9 @@ typedef float real; ...@@ -42,6 +42,9 @@ typedef float real;
#define SHARED_SIZE (16*1024) #define SHARED_SIZE (16*1024)
#endif #endif
#define VERBOSE_DEVICE_MALLOC 1
#define NO_VERBOSE_DEVICE_MALLOC 0
/** /**
* Allocation and freeing of device memory should go through these functions so that the lib can track memory usage. * Allocation and freeing of device memory should go through these functions so that the lib can track memory usage.
* *
...@@ -49,6 +52,7 @@ typedef float real; ...@@ -49,6 +52,7 @@ typedef float real;
* device_free will return nonzero on failure (after setting the python error message) * device_free will return nonzero on failure (after setting the python error message)
*/ */
DllExport void * device_malloc(size_t size); DllExport void * device_malloc(size_t size);
DllExport void * device_malloc(size_t size, int verbose);
DllExport int device_free(void * ptr); DllExport int device_free(void * ptr);
template <typename T> template <typename T>
...@@ -162,7 +166,8 @@ CudaNdarray_set_dim(CudaNdarray * self, int idx, int d) ...@@ -162,7 +166,8 @@ CudaNdarray_set_dim(CudaNdarray * self, int idx, int d)
{ {
if ((idx >= self->nd) || (idx < 0) || (d < 0)) if ((idx >= self->nd) || (idx < 0) || (d < 0))
{ {
fprintf(stderr, "WARNING: probably bad CudaNdarray_set_dim arguments: %i %i\n", idx, d); fprintf(stderr, "WARNING: probably bad CudaNdarray_set_dim arguments: self->ndim=%i, idx=%i stride=%i\n",
self->nd, idx, d);
} }
if (d != self->host_structure[idx]) if (d != self->host_structure[idx])
......
# This is work in progress # This is work in progress
import theano
from theano import Op, Apply from theano import Op, Apply
import theano.tensor as T
from theano.gof import local_optimizer from theano.gof import local_optimizer
from theano.sandbox.cuda import cuda_available, GpuOp from theano.sandbox.cuda import cuda_available, GpuOp
......
...@@ -164,7 +164,7 @@ class NVCC_compiler(object): ...@@ -164,7 +164,7 @@ class NVCC_compiler(object):
def compile_str( def compile_str(
module_name, src_code, module_name, src_code,
location=None, include_dirs=[], lib_dirs=[], libs=[], preargs=[], location=None, include_dirs=[], lib_dirs=[], libs=[], preargs=[],
rpaths=rpath_defaults): rpaths=rpath_defaults, py_module=True):
""":param module_name: string (this has been embedded in the src_code """:param module_name: string (this has been embedded in the src_code
:param src_code: a complete c or c++ source listing for the module :param src_code: a complete c or c++ source listing for the module
:param location: a pre-existing filesystem directory where the :param location: a pre-existing filesystem directory where the
...@@ -178,8 +178,11 @@ class NVCC_compiler(object): ...@@ -178,8 +178,11 @@ class NVCC_compiler(object):
:param preargs: a list of extra compiler arguments :param preargs: a list of extra compiler arguments
:param rpaths: list of rpaths to use with Xlinker. :param rpaths: list of rpaths to use with Xlinker.
Defaults to `rpath_defaults`. Defaults to `rpath_defaults`.
:param py_module: if False, compile to a shared library, but
do not import as a Python module.
:returns: dynamically-imported python module of the compiled code. :returns: dynamically-imported python module of the compiled code.
(unless py_module is False, in that case returns None.)
:note 1: On Windows 7 with nvcc 3.1 we need to compile in the :note 1: On Windows 7 with nvcc 3.1 we need to compile in the
real directory Otherwise nvcc never finish. real directory Otherwise nvcc never finish.
...@@ -393,6 +396,7 @@ class NVCC_compiler(object): ...@@ -393,6 +396,7 @@ class NVCC_compiler(object):
# this doesn't happen to my knowledge # this doesn't happen to my knowledge
print >> sys.stderr, "DEBUG: nvcc STDOUT", nvcc_stdout print >> sys.stderr, "DEBUG: nvcc STDOUT", nvcc_stdout
if py_module:
#touch the __init__ file #touch the __init__ file
file(os.path.join(location, "__init__.py"), 'w').close() file(os.path.join(location, "__init__.py"), 'w').close()
return dlimport(lib_filename) return dlimport(lib_filename)
......
...@@ -288,7 +288,9 @@ class CudaNdarrayType(Type): ...@@ -288,7 +288,9 @@ class CudaNdarrayType(Type):
//std::cerr << "c_extract " << %(name)s << '\\n'; //std::cerr << "c_extract " << %(name)s << '\\n';
if (%(name)s->nd != %(nd)s) if (%(name)s->nd != %(nd)s)
{ {
PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has rank %%i, it was supposed to have rank %(nd)s", %(name)s->nd); PyErr_Format(PyExc_RuntimeError,
"c_extract: Some CudaNdarray has rank %%i, it was supposed to have rank %(nd)s",
%(name)s->nd);
%(name)s = NULL; %(name)s = NULL;
%(fail)s; %(fail)s;
} }
...@@ -299,7 +301,9 @@ class CudaNdarrayType(Type): ...@@ -299,7 +301,9 @@ class CudaNdarrayType(Type):
print >> sio, """ print >> sio, """
if (CudaNdarray_HOST_DIMS(%(name)s)[%(i)s] != 1) if (CudaNdarray_HOST_DIMS(%(name)s)[%(i)s] != 1)
{ {
PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has dim %%i on broadcastable dimension %%i", CudaNdarray_HOST_DIMS(%(name)s)[%(i)s], %(i)s); PyErr_Format(PyExc_RuntimeError,
"c_extract: Some CudaNdarray has dim %%i on broadcastable dimension %%i",
CudaNdarray_HOST_DIMS(%(name)s)[%(i)s], %(i)s);
%(name)s = NULL; %(name)s = NULL;
%(fail)s; %(fail)s;
} }
...@@ -309,7 +313,9 @@ class CudaNdarrayType(Type): ...@@ -309,7 +313,9 @@ class CudaNdarrayType(Type):
if (CudaNdarray_HOST_STRIDES(%(name)s)[%(i)s]) if (CudaNdarray_HOST_STRIDES(%(name)s)[%(i)s])
{ {
//std::cerr << "c_extract bad stride detected...\\n"; //std::cerr << "c_extract bad stride detected...\\n";
PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has a nonzero stride %%i on a broadcastable dimension %%i", CudaNdarray_HOST_STRIDES(%(name)s)[%(i)s], %(i)s); PyErr_Format(PyExc_RuntimeError,
"c_extract: Some CudaNdarray has a nonzero stride %%i on a broadcastable dimension %%i",
CudaNdarray_HOST_STRIDES(%(name)s)[%(i)s], %(i)s);
%(name)s = NULL; %(name)s = NULL;
%(fail)s; %(fail)s;
} }
......
import numpy
import theano import theano
from theano.gof import Op, Apply from theano.gof import Op, Apply
from theano import tensor from theano import tensor
......
...@@ -12,7 +12,7 @@ from theano.tensor.opt import (register_stabilize, ...@@ -12,7 +12,7 @@ from theano.tensor.opt import (register_stabilize,
register_specialize, register_canonicalize) register_specialize, register_canonicalize)
from theano.gof import local_optimizer from theano.gof import local_optimizer
from theano.gof.opt import Optimizer from theano.gof.opt import Optimizer
from theano.gradient import grad_not_implemented, DisconnectedType from theano.gradient import DisconnectedType
try: try:
import scipy.linalg import scipy.linalg
...@@ -433,16 +433,14 @@ class CholeskyGrad(Op): ...@@ -433,16 +433,14 @@ class CholeskyGrad(Op):
return Apply(self, [x, l, dz], [x.type()]) return Apply(self, [x, l, dz], [x.type()])
def perform(self, node, inputs, outputs): def perform(self, node, inputs, outputs):
""" """Implements the "reverse-mode" gradient [1]_ for the
Implements the "reverse-mode" gradient for the Cholesky factorization Cholesky factorization of a positive-definite matrix.
of a positive-definite matrix.
References
----------
.. [1] S. P. Smith. "Differentiation of the Cholesky Algorithm". .. [1] S. P. Smith. "Differentiation of the Cholesky Algorithm".
Journal of Computational and Graphical Statistics, Journal of Computational and Graphical Statistics,
Vol. 4, No. 2 (Jun.,1995), pp. 134-147 Vol. 4, No. 2 (Jun.,1995), pp. 134-147
http://www.jstor.org/stable/1390762 http://www.jstor.org/stable/1390762
""" """
x = inputs[0] x = inputs[0]
L = inputs[1] L = inputs[1]
......
...@@ -12,27 +12,18 @@ __authors__ = ("Razvan Pascanu " ...@@ -12,27 +12,18 @@ __authors__ = ("Razvan Pascanu "
__copyright__ = "(c) 2010, Universite de Montreal" __copyright__ = "(c) 2010, Universite de Montreal"
__contact__ = "Razvan Pascanu <r.pascanu@gmail>" __contact__ = "Razvan Pascanu <r.pascanu@gmail>"
import itertools
import logging import logging
import time
from itertools import izip from itertools import izip
import numpy import numpy
import theano import theano
from theano.compile import function, Param, Out
from theano import compile from theano import compile
from theano import gradient
from theano.gof.python25 import any from theano.gof.python25 import any
from theano.gof import PureOp, Apply from theano.gof import PureOp, Apply
from theano import gof from theano import gof
from theano.tensor import TensorType from theano.tensor import TensorType
from theano import tensor
from theano.tensor.opt import Shape_i from theano.tensor.opt import Shape_i
#from theano.sandbox import cuda
from theano.compile.profiling import ScanProfileStats
import scan_utils
# Logging function for sending warning or info # Logging function for sending warning or info
_logger = logging.getLogger('theano.scan_module.scan_op') _logger = logging.getLogger('theano.scan_module.scan_op')
......
...@@ -561,6 +561,9 @@ class ScalarVariable(_scalar_py_operators, Variable): ...@@ -561,6 +561,9 @@ class ScalarVariable(_scalar_py_operators, Variable):
class ScalarConstant(_scalar_py_operators, Constant): class ScalarConstant(_scalar_py_operators, Constant):
pass pass
# Register ScalarConstant as the type of Constant corresponding to Scalar
Scalar.Constant = ScalarConstant
# Easy constructors # Easy constructors
......
...@@ -22,7 +22,7 @@ __contact__ = "theano-dev <theano-dev@googlegroups.com>" ...@@ -22,7 +22,7 @@ __contact__ = "theano-dev <theano-dev@googlegroups.com>"
__docformat__ = "restructuredtext en" __docformat__ = "restructuredtext en"
import numpy import numpy
from theano.compile import shared_constructor, SharedVariable from theano.compile import SharedVariable
from basic import Scalar, _scalar_py_operators from basic import Scalar, _scalar_py_operators
class ScalarSharedVariable(_scalar_py_operators, SharedVariable): class ScalarSharedVariable(_scalar_py_operators, SharedVariable):
......
...@@ -520,7 +520,6 @@ def get_scalar_constant_value(v): ...@@ -520,7 +520,6 @@ def get_scalar_constant_value(v):
if isinstance(v, numpy.ndarray): if isinstance(v, numpy.ndarray):
return numpy_scalar(v) return numpy_scalar(v)
if isinstance(v, Constant): if isinstance(v, Constant):
if getattr(v.tag, 'unique_value', None) is not None: if getattr(v.tag, 'unique_value', None) is not None:
data = v.tag.unique_value data = v.tag.unique_value
...@@ -529,11 +528,9 @@ def get_scalar_constant_value(v): ...@@ -529,11 +528,9 @@ def get_scalar_constant_value(v):
return numpy_scalar(data) return numpy_scalar(data)
if v.owner: if v.owner:
if isinstance(v.owner.op, Alloc): if isinstance(v.owner.op, (Alloc, DimShuffle, Rebroadcast,
return get_scalar_constant_value(v.owner.inputs[0]) compile.ops.OutputGuard,
if isinstance(v.owner.op, DimShuffle): compile.DeepCopyOp)):
return get_scalar_constant_value(v.owner.inputs[0])
if isinstance(v.owner.op, Rebroadcast):
return get_scalar_constant_value(v.owner.inputs[0]) return get_scalar_constant_value(v.owner.inputs[0])
if isinstance(v.owner.op, Elemwise) and \ if isinstance(v.owner.op, Elemwise) and \
isinstance(v.owner.op.scalar_op, scal.Second): isinstance(v.owner.op.scalar_op, scal.Second):
...@@ -604,11 +601,33 @@ def get_scalar_constant_value(v): ...@@ -604,11 +601,33 @@ def get_scalar_constant_value(v):
# This is needed when we take the grad as the Shape op # This is needed when we take the grad as the Shape op
# are not already changed into MakeVector # are not already changed into MakeVector
if (v.owner.inputs[0].owner and owner = v.owner
isinstance(v.owner.inputs[0].owner.op, leftmost_parent = owner.inputs[0]
if (leftmost_parent.owner and
isinstance(leftmost_parent.owner.op,
theano.tensor.Shape)): theano.tensor.Shape)):
if v.owner.inputs[0].owner.inputs[0].type.broadcastable[ op = owner.op
v.owner.op.idx_list[0]]: idx_list = op.idx_list
idx = idx_list[0]
grandparent = leftmost_parent.owner.inputs[0]
gp_broadcastable = grandparent.type.broadcastable
ndim = grandparent.type.ndim
assert ndim == len(gp_broadcastable)
if not (idx < len(gp_broadcastable)):
msg = "get_scalar_constant_value detected " + \
"deterministic IndexError: x.shape[%d] " + \
"when x.ndim=%d." % (ndim, idx)
if config.exception_verbosity == 'high':
msg += 'x=%s' % min_informative_str(x)
else:
msg += 'x=%s' % str(x)
raise ValueError(msg)
if gp_broadcastable[idx]:
return numpy.asarray(1) return numpy.asarray(1)
raise NotScalarConstantError(v) raise NotScalarConstantError(v)
...@@ -1986,6 +2005,13 @@ class TensorConstant(_tensor_py_operators, Constant): ...@@ -1986,6 +2005,13 @@ class TensorConstant(_tensor_py_operators, Constant):
def signature(self): def signature(self):
return TensorConstantSignature((self.type, self.data)) return TensorConstantSignature((self.type, self.data))
def equals(self, other):
# Override Contant.equals to allow to compare with numpy.ndarray
if isinstance(other, numpy.ndarray):
# Make a TensorConstant to be able to compare
other = constant(other)
return (isinstance(other, TensorConstant) and
self.signature() == other.signature())
TensorType.Constant = TensorConstant TensorType.Constant = TensorConstant
...@@ -3620,6 +3646,10 @@ def var(input, axis=None, keepdims=False): ...@@ -3620,6 +3646,10 @@ def var(input, axis=None, keepdims=False):
:param keepdims: If this is set to True, the axes which are reduced are :param keepdims: If this is set to True, the axes which are reduced are
left in the result as dimensions with size one. With this option, left in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original tensor. the result will broadcast correctly against the original tensor.
:note: It use the two-pass algorithm for more stable results.
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Two-pass_algorithm
It exist other implementation that are even more stable, but probably slower.
""" """
input_ndim = input.type.ndim input_ndim = input.type.ndim
...@@ -3655,6 +3685,10 @@ def std(input, axis=None, keepdims=False): ...@@ -3655,6 +3685,10 @@ def std(input, axis=None, keepdims=False):
With this option, With this option,
the result will broadcast correctly against the the result will broadcast correctly against the
original tensor. original tensor.
:note: It call var and var use the two-pass algorithm for more stable results.
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Two-pass_algorithm
It exist other implementation that are even more stable, but probably slower.
""" """
return sqrt(var(input=input, axis=axis, keepdims=keepdims)) return sqrt(var(input=input, axis=axis, keepdims=keepdims))
...@@ -6510,12 +6544,12 @@ class AdvancedSubtensor1(Op): ...@@ -6510,12 +6544,12 @@ class AdvancedSubtensor1(Op):
else: else:
o = None o = None
# If i.dtype is more precise than numpy.intc (int32 on 32-bit machines, # If i.dtype is more precise than numpy.intp (int32 on 32-bit machines,
# int64 on 64-bit machines), numpy may raise the following error: # int64 on 64-bit machines), numpy may raise the following error:
# TypeError: array cannot be safely cast to required type. # TypeError: array cannot be safely cast to required type.
# Since we will probably not have an array with more than 2**31 items # Since we will probably not have an array with more than 2**31 items
# on a 32-bit arch, I suppose it is safe to cast i into intc. # on a 32-bit arch, I suppose it is safe to cast i into intp.
i = theano._asarray(i, dtype=numpy.intc) i = theano._asarray(i, dtype=numpy.intp)
out[0] = x.take(i, axis=0, out=o) out[0] = x.take(i, axis=0, out=o)
......
...@@ -492,11 +492,19 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail): ...@@ -492,11 +492,19 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail):
{ {
if (PyArray_DESCR(%(xx)s)->type_num == NPY_FLOAT) if (PyArray_DESCR(%(xx)s)->type_num == NPY_FLOAT)
{ {
//fprintf(stderr, "B %%i %%i %%i %%i\\n",
// Nz0, Nz1, Sz0, Sz1);
float alpha = ((dtype_%(alpha)s*)PyArray_DATA(%(alpha)s))[0]; float alpha = ((dtype_%(alpha)s*)PyArray_DATA(%(alpha)s))[0];
//fprintf(stderr, "alpha=%%f\\n", alpha);
//fprintf(stderr, "sx sy %%i %%i\\n", Sx, Sy); // Check for vector-vector dot (Nx0 == 1). The code may work
// for Sx1 != 1 as well, but has not been tested for this case,
// so Sx1 == 1 is required for safety.
if (Nx0 == 1 && Sx1 == 1)
{
zz_data[0] = fbeta*zz_data[0] + alpha*sdot_(&Nx1,
(float*)(PyArray_DATA(%(xx)s)), &Sx1,
(float*)yy_data, &Sy);
}
else
{
sgemv_(&TRANS, &Nx1, &Nx0, sgemv_(&TRANS, &Nx1, &Nx0,
&alpha, &alpha,
(float*)(PyArray_DATA(%(xx)s)), &Sx0, (float*)(PyArray_DATA(%(xx)s)), &Sx0,
...@@ -504,9 +512,22 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail): ...@@ -504,9 +512,22 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail):
&fbeta, &fbeta,
(float*)zz_data, &Sz); (float*)zz_data, &Sz);
} }
}
else if (PyArray_DESCR(%(xx)s)->type_num == NPY_DOUBLE) else if (PyArray_DESCR(%(xx)s)->type_num == NPY_DOUBLE)
{ {
double alpha = ((dtype_%(alpha)s*)PyArray_DATA(%(alpha)s))[0]; double alpha = ((dtype_%(alpha)s*)PyArray_DATA(%(alpha)s))[0];
// Check for vector-vector dot (Nx0 == 1). The code may work
// for Sx1 != 1 as well, but has not been tested for this case,
// so Sx1 == 1 is required for safety.
if (Nx0 == 1 && Sx1 == 1)
{
zz_data[0] = dbeta*zz_data[0] + alpha*ddot_(&Nx1,
(double*)(PyArray_DATA(%(xx)s)), &Sx1,
(double*)yy_data, &Sy);
}
else
{
dgemv_(&TRANS, &Nx1, &Nx0, dgemv_(&TRANS, &Nx1, &Nx0,
&alpha, &alpha,
(double*)(PyArray_DATA(%(xx)s)), &Sx0, (double*)(PyArray_DATA(%(xx)s)), &Sx0,
...@@ -514,6 +535,7 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail): ...@@ -514,6 +535,7 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail):
&dbeta, &dbeta,
(double*)zz_data, &Sz); (double*)zz_data, &Sz);
} }
}
else else
{ {
PyErr_SetString(PyExc_AssertionError, PyErr_SetString(PyExc_AssertionError,
...@@ -556,7 +578,7 @@ class CGemv(BaseBLAS, Gemv): ...@@ -556,7 +578,7 @@ class CGemv(BaseBLAS, Gemv):
return code return code
def c_code_cache_version(self): def c_code_cache_version(self):
return (9,) return (10,)
@local_optimizer([gemv_inplace, gemv_no_inplace]) @local_optimizer([gemv_inplace, gemv_no_inplace])
......
import theano
import numpy import numpy
import math import math
from theano import gof, tensor, function, scalar from theano import gof, tensor
from theano.sandbox.linalg.ops import diag
class Fourier(gof.Op): class Fourier(gof.Op):
......
from basic import _scal_elemwise #, _transpose_inplace
from theano import scalar as scal from theano import scalar as scal
import elemwise import elemwise
from theano import printing from theano import printing
from theano.printing import pprint from theano.printing import pprint
from theano.gof.python25 import any
def _scal_inplace(symbol): def _scal_inplace(symbol):
"""Replace a symbol definition with an elementwise version of the corresponding scalar Op""" """Replace a symbol definition with an elementwise version of the corresponding scalar Op"""
......
...@@ -545,7 +545,7 @@ class Conv3D(theano.Op): ...@@ -545,7 +545,7 @@ class Conv3D(theano.Op):
///////////// < /code generated by Conv3D > ///////////// < /code generated by Conv3D >
""" """
return strutil.renderString(codeSource,locals()) return strutil.render_string(codeSource,locals())
global conv3D global conv3D
conv3D = Conv3D() conv3D = Conv3D()
......
...@@ -271,7 +271,7 @@ class ConvGrad3D(theano.Op): ...@@ -271,7 +271,7 @@ class ConvGrad3D(theano.Op):
///////////// < /code generated by ConvGradW3D > ///////////// < /code generated by ConvGradW3D >
""" """
return strutil.renderString(codeSource, locals()) return strutil.render_string(codeSource, locals())
convGrad3D = ConvGrad3D() convGrad3D = ConvGrad3D()
......
...@@ -324,7 +324,7 @@ class ConvTransp3D(theano.Op): ...@@ -324,7 +324,7 @@ class ConvTransp3D(theano.Op):
///////////// < /code generated by ConvTransp3D > ///////////// < /code generated by ConvTransp3D >
""" """
return strutil.renderString(codeSource, locals()) return strutil.render_string(codeSource, locals())
convTransp3D = ConvTransp3D() convTransp3D = ConvTransp3D()
......
...@@ -813,7 +813,21 @@ class ShapeFeature(object): ...@@ -813,7 +813,21 @@ class ShapeFeature(object):
"for a variable with %d dimensions." % ( "for a variable with %d dimensions." % (
len(s), r.ndim)) len(s), r.ndim))
shape_vars = [self.unpack(s_i) for s_i in s] shape_vars = []
for i in range(r.ndim):
if (hasattr(r.type, 'broadcastable') and
r.type.broadcastable[i]):
shape_vars.append(self.lscalar_one)
else:
shape_vars.append(self.unpack(s[i]))
assert all([not hasattr(r.type, "broadcastable") or
not r.type.broadcastable[i] or
# The two following comparison are a speed optimization
# But we never timed this speed optimization!
self.lscalar_one.equals(shape_vars[i]) or
self.lscalar_one.equals(
T.extract_constant(shape_vars[i]))
for i in range(r.ndim)])
self.shape_of[r] = tuple(shape_vars) self.shape_of[r] = tuple(shape_vars)
for sv in shape_vars: for sv in shape_vars:
self.shape_of_reverse_index.setdefault(sv, set()).add(r) self.shape_of_reverse_index.setdefault(sv, set()).add(r)
...@@ -855,6 +869,15 @@ class ShapeFeature(object): ...@@ -855,6 +869,15 @@ class ShapeFeature(object):
merged_shape.append(r_shape[i]) merged_shape.append(r_shape[i])
else: else:
merged_shape.append(other_shape[i]) merged_shape.append(other_shape[i])
assert all([(not hasattr(r.type, "broadcastable") or
not r.type.broadcastable[i] and
not other_r.type.broadcastable[i]) or
# The two following comparison are a speed optimization
# But we never timed this speed optimization!
self.lscalar_one.equals(merged_shape[i]) or
self.lscalar_one.equals(
T.extract_constant(merged_shape[i]))
for i in range(r.ndim)])
self.shape_of[r] = tuple(merged_shape) self.shape_of[r] = tuple(merged_shape)
for sv in self.shape_of[r]: for sv in self.shape_of[r]:
self.shape_of_reverse_index.setdefault(sv, set()).add(r) self.shape_of_reverse_index.setdefault(sv, set()).add(r)
...@@ -871,6 +894,13 @@ class ShapeFeature(object): ...@@ -871,6 +894,13 @@ class ShapeFeature(object):
new_shape.append(self.unpack(s_i)) new_shape.append(self.unpack(s_i))
else: else:
new_shape.append(s_j) new_shape.append(s_j)
assert all([not hasattr(r.type, "broadcastable") or
not r.type.broadcastable[i] or
# The two following comparison are a speed optimization
# But we never timed this speed optimization!
self.lscalar_one.equals(new_shape[i]) or
self.lscalar_one.equals(T.extract_constant(new_shape[i]))
for i in range(r.ndim)])
self.shape_of[r] = tuple(new_shape) self.shape_of[r] = tuple(new_shape)
for sv in self.shape_of[r]: for sv in self.shape_of[r]:
self.shape_of_reverse_index.setdefault(sv, set()).add(r) self.shape_of_reverse_index.setdefault(sv, set()).add(r)
......
...@@ -28,16 +28,10 @@ Also, we should make the fgraph refuse optimization that break the canonization ...@@ -28,16 +28,10 @@ Also, we should make the fgraph refuse optimization that break the canonization
import logging import logging
_logger = logging.getLogger('theano.tensor.opt') _logger = logging.getLogger('theano.tensor.opt')
import operator
import itertools
import sys
import theano
from theano import gof from theano import gof
from elemwise import CAReduce from elemwise import CAReduce
import basic as T import basic as T
from theano.gof.python25 import any, all
from theano.gof.opt import Optimizer from theano.gof.opt import Optimizer
from theano.gof import InconsistencyError, toolbox from theano.gof import InconsistencyError, toolbox
......
...@@ -4,11 +4,8 @@ graphs. ...@@ -4,11 +4,8 @@ graphs.
__docformat__ = "restructuredtext en" __docformat__ = "restructuredtext en"
import copy import copy
import sys
import numpy import numpy
from theano.gof import Container
from theano.compile.sharedvalue import (SharedVariable, shared_constructor, from theano.compile.sharedvalue import (SharedVariable, shared_constructor,
shared) shared)
import raw_random import raw_random
......
...@@ -5,11 +5,7 @@ generic 2D convolution. ...@@ -5,11 +5,7 @@ generic 2D convolution.
__docformat__ = "restructuredtext en" __docformat__ = "restructuredtext en"
import numpy
import theano
import theano.tensor as tensor import theano.tensor as tensor
import theano.tensor.nnet as nnet
from theano import gof, Op, tensor, config
from theano.tensor.nnet import conv from theano.tensor.nnet import conv
import logging import logging
......
...@@ -5456,8 +5456,9 @@ class test_tensordot(unittest.TestCase): ...@@ -5456,8 +5456,9 @@ class test_tensordot(unittest.TestCase):
f1 = inplace_func([avec, bvec], c) f1 = inplace_func([avec, bvec], c)
aval = rand(5) aval = rand(5)
bval = rand(5) bval = rand(5)
self.assertTrue(numpy.tensordot(aval, bval, axes) == \ out0 = numpy.tensordot(aval, bval, axes)
f1(aval, bval)) out1 = f1(aval, bval)
self.assertTrue(numpy.allclose(out0, out1), (out0, out1))
utt.verify_grad(self.TensorDot(axes), [aval, bval]) utt.verify_grad(self.TensorDot(axes), [aval, bval])
# Test matrix-vector # Test matrix-vector
......
...@@ -9,7 +9,7 @@ class T_load_tensor(unittest.TestCase): ...@@ -9,7 +9,7 @@ class T_load_tensor(unittest.TestCase):
def setUp(self): def setUp(self):
self.data = numpy.arange(5, dtype=numpy.int32) self.data = numpy.arange(5, dtype=numpy.int32)
self.filename = os.path.join( self.filename = os.path.join(
theano.config.base_compiledir, theano.config.compiledir,
"_test.npy") "_test.npy")
numpy.save(self.filename, self.data) numpy.save(self.filename, self.data)
...@@ -52,5 +52,5 @@ class T_load_tensor(unittest.TestCase): ...@@ -52,5 +52,5 @@ class T_load_tensor(unittest.TestCase):
def tearDown(self): def tearDown(self):
os.remove(os.path.join( os.remove(os.path.join(
theano.config.base_compiledir, theano.config.compiledir,
"_test.npy")) "_test.npy"))
...@@ -2475,6 +2475,57 @@ class test_shapeoptimizer(unittest.TestCase): ...@@ -2475,6 +2475,57 @@ class test_shapeoptimizer(unittest.TestCase):
assert len(topo) == 1 assert len(topo) == 1
assert topo[0].op == deep_copy_op assert topo[0].op == deep_copy_op
@staticmethod
def max_pool_c01b(c01b, pool_shp, pool_stride, img_shp):
"""Like max_pool but with input using axes ('c', 0, 1, 'b')
(Alex Krizhevsky format)
pool_shp, pool_stride and img_shp are int that represent
the same shp in x and y.
"""
mx = None
# Compute index in pooled space of last needed pool
# (needed = each input pixel must appear in at least one pool)
def last_pool(im_shp, p_shp, p_strd):
rval = int(numpy.ceil(float(im_shp - p_shp) / p_strd))
assert p_strd * rval + p_shp >= im_shp
assert p_strd * (rval - 1) + p_shp < im_shp
return rval
# Compute starting row of the last pool
last_pool_r = last_pool(img_shp, pool_shp, pool_stride) * pool_stride
# Compute number of rows needed in img for all indexes to work out
required_r = last_pool_r + pool_shp
last_pool_c = last_pool(img_shp, pool_shp, pool_stride) * pool_stride
required_c = last_pool_c + pool_shp
wide_infinity = T.alloc(-numpy.inf, c01b.shape[0],
required_r, required_c, c01b.shape[3])
c01b = T.set_subtensor(wide_infinity[:, 0:img_shp, 0:img_shp, :], c01b)
for row_within_pool in xrange(pool_shp):
row_stop = last_pool_r + row_within_pool + 1
for col_within_pool in xrange(pool_shp):
col_stop = last_pool_c + col_within_pool + 1
cur = c01b[:, row_within_pool:row_stop:pool_stride,
col_within_pool:col_stop:pool_stride, :]
if mx is None:
mx = cur
else:
mx = T.maximum(mx, cur)
return mx
def test_broadcasted_dims(self):
#This test a case that caused a crash during optimization
shp = (1, 1, 1, 1)
rng = numpy.random.RandomState(utt.fetch_seed())
a = shared(rng.rand(*shp).astype(config.floatX))
out = self.max_pool_c01b(a, 1, 1, 1)
f = theano.function([], out)
f()
def test_local_track_shape_i(self): def test_local_track_shape_i(self):
class IdentityNoShape(gof.Op): class IdentityNoShape(gof.Op):
'''Op that does not infer the output shape from the input one''' '''Op that does not infer the output shape from the input one'''
......
import theano
import numpy import numpy
from elemwise import Elemwise from elemwise import Elemwise
......
...@@ -55,10 +55,12 @@ nosetests. ...@@ -55,10 +55,12 @@ nosetests.
import cPickle import cPickle
import datetime
import os import os
import subprocess import subprocess
import sys import sys
import datetime import time
import theano import theano
from theano.misc.windows import call_subprocess_Popen from theano.misc.windows import call_subprocess_Popen
...@@ -261,8 +263,8 @@ def run(stdout, stderr, argv, theano_nose, batch_size, time_profile, ...@@ -261,8 +263,8 @@ def run(stdout, stderr, argv, theano_nose, batch_size, time_profile,
n_tests + 1)): n_tests + 1)):
# Print the test we will start in the raw log to help # Print the test we will start in the raw log to help
# debug tests that are too long. # debug tests that are too long.
f_rawlog.write("\nWill run test #%d %s\n" % (test_id, f_rawlog.write("\n%s Will run test #%d %s\n" % (
data["ids"][test_id])) time.ctime(), test_id, data["ids"][test_id]))
f_rawlog.flush() f_rawlog.flush()
proc = call_subprocess_Popen( proc = call_subprocess_Popen(
......
...@@ -64,7 +64,8 @@ class OrderedUpdates(OrderedDict): ...@@ -64,7 +64,8 @@ class OrderedUpdates(OrderedDict):
# Warn about non-determinism. # Warn about non-determinism.
warnings.warn('Updating an `OrderedUpdates` with a ' warnings.warn('Updating an `OrderedUpdates` with a '
'non-ordered dictionary with 2+ elements could ' 'non-ordered dictionary with 2+ elements could '
'make your code non-deterministic') 'make your code non-deterministic',
stacklevel=2)
for key, val in OrderedDict(other).iteritems(): for key, val in OrderedDict(other).iteritems():
if key in self: if key in self:
if self[key] == val: if self[key] == val:
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论