提交 e8c50c78 authored 作者: fsavard's avatar fsavard

merge

...@@ -27,7 +27,7 @@ Theano (current directory) is the distribution directory. ...@@ -27,7 +27,7 @@ Theano (current directory) is the distribution directory.
* scalar depends upon core * scalar depends upon core
* tensor depends upon scalar * tensor depends upon scalar
* sparse depends upon tensor * sparse depends upon tensor
* sandbox can depends on everything else * sandbox can depend on everything else
* Theano/examples are copies of the example on the wiki * Theano/examples are copies of the example on the wiki
* Theano/benchmark, Theano/bin and Theano/examples are in the distribution, * Theano/benchmark, Theano/bin and Theano/examples are in the distribution,
but not in the python package but not in the python package
......
...@@ -99,7 +99,7 @@ The ``make_node`` method creates a node to be included in the expression graph. ...@@ -99,7 +99,7 @@ The ``make_node`` method creates a node to be included in the expression graph.
It runs when we apply our Op (``fibby``) to Variable (``x``), as in ``fibby(tensor.vector())``. It runs when we apply our Op (``fibby``) to Variable (``x``), as in ``fibby(tensor.vector())``.
When an Op has multiple inputs, their order in the inputs argument to ``Apply`` When an Op has multiple inputs, their order in the inputs argument to ``Apply``
is important: Theano will call ``make_node(*inputs)`` to copy the graph, is important: Theano will call ``make_node(*inputs)`` to copy the graph,
so it is important to not change the semantics of the expression by doing changing the argument order. so it is important not to change the semantics of the expression by changing the argument order.
......
...@@ -138,7 +138,7 @@ following methods: ...@@ -138,7 +138,7 @@ following methods:
other criterion C with respect to the Op's input. other criterion C with respect to the Op's input.
If the outputs of your op are :math:`[ f_1, ... f_n]`, then If the outputs of your op are :math:`[ f_1, ... f_n]`, then
``output_derivatives`` gives ``output_gradients`` is
:math:`[ grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C) ]`. :math:`[ grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C) ]`.
If the inputs of your op are :math:`[x_1, ..., x_m]`, then your Op.grad If the inputs of your op are :math:`[x_1, ..., x_m]`, then your Op.grad
should return :math:`[ grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C) ]`, should return :math:`[ grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C) ]`,
......
...@@ -14,7 +14,8 @@ Requirements ...@@ -14,7 +14,8 @@ Requirements
------------ ------------
In order to use Theano, the following libraries and software will need In order to use Theano, the following libraries and software will need
to be installed: to be installed (MacOS and Windows users should refer to platform-specific
instructions below for detailed installation steps):
Linux, Mac OS X or Windows operating system Linux, Mac OS X or Windows operating system
We develop mainly on 64-bit Linux machines. 32-bit architectures are We develop mainly on 64-bit Linux machines. 32-bit architectures are
...@@ -394,7 +395,7 @@ Windows V1 (Installing from Scratch) ...@@ -394,7 +395,7 @@ Windows V1 (Installing from Scratch)
You can keep the default install options (except for the installation directory). You can keep the default install options (except for the installation directory).
- Install Mercurial. You can download it - Install Mercurial. You can download it
`here <http://mercurial.selenic.com/downloads>`_. You may get either the command `here <http://mercurial.selenic.com/downloads>`__. You may get either the command
line Windows version or the TortoiseHG GUI version: it does not matter as line Windows version or the TortoiseHG GUI version: it does not matter as
far as installing Theano is concerned. far as installing Theano is concerned.
...@@ -450,7 +451,7 @@ compile GotoBLAS2 (ATLAS may work too, but was not tested, and is ...@@ -450,7 +451,7 @@ compile GotoBLAS2 (ATLAS may work too, but was not tested, and is
usually reported to be slower and more difficult to compile -- especially usually reported to be slower and more difficult to compile -- especially
on Windows). on Windows).
GotoBLAS2 can be downloaded GotoBLAS2 can be downloaded
`here <http://www.tacc.utexas.edu/tacc-projects/gotoblas2/downloads>`_ `here <http://www.tacc.utexas.edu/tacc-projects/gotoblas2/downloads>`__
after registering on the website (we tested v1.13). after registering on the website (we tested v1.13).
To compile it, you will also need to install MSYS and Perl, To compile it, you will also need to install MSYS and Perl,
as described below. as described below.
...@@ -538,8 +539,7 @@ Windows: Using the GPU ...@@ -538,8 +539,7 @@ Windows: Using the GPU
Please note that these are tentative instructions (we have not yet been able to Please note that these are tentative instructions (we have not yet been able to
get the GPU to work under Windows with Theano). get the GPU to work under Windows with Theano).
Please report your own successes / failures on the Please report your own successes / failures on the `theano-users`_ mailing list.
`theano-users <http://groups.google.com/group/theano-users>`_ mailing list.
Those are instructions for the 32-bit version of Python (the one that comes Those are instructions for the 32-bit version of Python (the one that comes
with Python(x,y) is 32-bit). with Python(x,y) is 32-bit).
...@@ -555,14 +555,15 @@ use a compilation directory located somewhere else: ...@@ -555,14 +555,15 @@ use a compilation directory located somewhere else:
[global] [global]
base_compiledir=path_to_a_directory_without_such_characters base_compiledir=path_to_a_directory_without_such_characters
You also need to add in the configuration file those lines: You also need to add in the configuration file those lines (make sure this
is the correct Python installation path):
.. code-block:: cfg .. code-block:: cfg
[cuda] [cuda]
nvccflags=-LC:\Python26\libs nvccflags=-LC:\Python26\libs
Then Then
1) Install CUDA driver (32-bit on 32-bit Windows, idem for 64-bit). 1) Install CUDA driver (32-bit on 32-bit Windows, idem for 64-bit).
......
...@@ -128,16 +128,26 @@ Config Attributes ...@@ -128,16 +128,26 @@ Config Attributes
Default 'Mode' Default 'Mode'
This set the default compilation mode for theano functions. By default the This sets the default compilation mode for theano functions. By default the
mode Mode is equivalent to FAST_RUN. See Config attribute linker and optimizer. mode Mode is equivalent to FAST_RUN. See Config attribute linker and optimizer.
.. attribute:: config.lib.amdlibm
Bool value: either True or False
Default False
This makes the compilation use the
`amdlibm <http://developer.amd.com/cpu/libraries/libm/>`__
library, which is faster than the standard libm.
.. attribute:: linker .. attribute:: linker
String value: 'c|py', 'py', 'c', 'c|py_nogc', 'c&py' String value: 'c|py', 'py', 'c', 'c|py_nogc', 'c&py'
Default: 'c|py' Default: 'c|py'
When the mode is Mode, it set the default linker used. When the mode is Mode, it sets the default linker used.
.. attribute:: optimizer .. attribute:: optimizer
...@@ -145,7 +155,7 @@ Config Attributes ...@@ -145,7 +155,7 @@ Config Attributes
Default: 'fast_run' Default: 'fast_run'
When the mode is Mode, it set the default optimizer used. When the mode is Mode, it sets the default optimizer used.
.. attribute:: warn.ignore_bug_before .. attribute:: warn.ignore_bug_before
......
...@@ -46,7 +46,7 @@ AddConfigVar('DebugMode.check_strides', ...@@ -46,7 +46,7 @@ AddConfigVar('DebugMode.check_strides',
IntParam(1, lambda i: i in (0,1,2))) IntParam(1, lambda i: i in (0,1,2)))
AddConfigVar('DebugMode.warn_input_not_reused', AddConfigVar('DebugMode.warn_input_not_reused',
("Generate a warning when the destroy_map tell that an op work inplace, but the op did not reuse the input for its output." ("Generate a warning when the destroy_map or view_map tell that an op work inplace, but the op did not reuse the input for its output."
), ),
BoolParam(True)) BoolParam(True))
...@@ -519,6 +519,18 @@ def _check_inputs(node, storage_map, r_vals, dr_vals, active_nodes, clobber_dr_v ...@@ -519,6 +519,18 @@ def _check_inputs(node, storage_map, r_vals, dr_vals, active_nodes, clobber_dr_v
if storage_map[node.outputs[oo]][0] is not storage_map[node.inputs[ii[0]]][0]: if storage_map[node.outputs[oo]][0] is not storage_map[node.inputs[ii[0]]][0]:
warning("input idx %d marked as destroyed was not changed for node '%s'"%(ii[0],str(node))) warning("input idx %d marked as destroyed was not changed for node '%s'"%(ii[0],str(node)))
if warn_input_not_reused:
vmap=getattr(node.op,'view_map',{})
for oo,ii in vmap.iteritems():
if hasattr(node.outputs[0].type,"may_share_memory"):
if not node.outputs[0].type.may_share_memory(storage_map[node.outputs[oo]][0],storage_map[node.inputs[ii[0]]][0]):
#when a subtensor return a tensor ofndim==0, numpy seam to return a copy.
#when have an empty ndarray(happen with output guard) it is not the same. why?
if storage_map[node.outputs[oo]][0].ndim>0 and storage_map[node.outputs[oo]][0].size>0:
warning("input idx %d marked as viewed but new memory allocated by node '%s'"%(ii[0],str(node)))
elif storage_map[node.outputs[oo]][0] is not storage_map[node.inputs[ii[0]]][0]:
warning("input idx %d marked as viewed but new memory allocated by node '%s'"%(ii[0],str(node)))
for r_idx, r in enumerate(node.inputs): for r_idx, r in enumerate(node.inputs):
if not r.type.values_eq(r_vals[r], storage_map[r][0]): if not r.type.values_eq(r_vals[r], storage_map[r][0]):
# some input node 'r' got changed by running the node # some input node 'r' got changed by running the node
......
...@@ -14,6 +14,8 @@ import tokenize ...@@ -14,6 +14,8 @@ import tokenize
import argparse import argparse
import reindent import reindent
SKIP_WHITESPACE_CHECK_FILENAME = ".hg/skip_whitespace_check"
def get_parse_error(code): def get_parse_error(code):
""" """
Checks code for ambiguous tabs or other basic parsing issues. Checks code for ambiguous tabs or other basic parsing issues.
...@@ -128,6 +130,20 @@ def save_diffs(diffs, filename): ...@@ -128,6 +130,20 @@ def save_diffs(diffs, filename):
diff_file.write(diff) diff_file.write(diff)
diff_file.close() diff_file.close()
def should_skip_commit():
if not os.path.exists(SKIP_WHITESPACE_CHECK_FILENAME):
return False
whitespace_check_file = open(SKIP_WHITESPACE_CHECK_FILENAME, "r")
whitespace_check_changeset = whitespace_check_file.read()
whitespace_check_file.close()
return whitespace_check_changeset == parent_commit()
def save_skip_next_commit():
whitespace_check_file = open(SKIP_WHITESPACE_CHECK_FILENAME, "w")
whitespace_check_file.write(parent_commit())
whitespace_check_file.close()
def main(argv=None): def main(argv=None):
if argv is None: if argv is None:
argv = sys.argv[1:] argv = sys.argv[1:]
...@@ -145,12 +161,32 @@ def main(argv=None): ...@@ -145,12 +161,32 @@ def main(argv=None):
const=True, const=True,
help="only check indentation if the file was previously correctly indented (or is new)" help="only check indentation if the file was previously correctly indented (or is new)"
) )
parser.add_argument("-s", "--skip-after-failure",
action="store_const",
default=False,
const=True,
help="when this pre-commit hook fails, don't run it on the next commit; "
"this lets you check in your changes and then check in "
"any necessary whitespace changes in the subsequent commit"
)
args = parser.parse_args(argv) args = parser.parse_args(argv)
# -i and -s are incompatible; if you skip checking, you end up with a not-correctly-indented
# file, which -i then causes you to ignore!
if args.skip_after_failure and args.incremental:
print >> sys.stderr, "*** check whitespace hook misconfigured! -i and -s are incompatible."
return 1
if is_merge(): if is_merge():
# don't inspect merges: (a) they're complex and (b) they don't really introduce new code # don't inspect merges: (a) they're complex and (b) they don't really introduce new code
return 0 return 0
if args.skip_after_failure and should_skip_commit():
# we're set up to skip this one, so skip it, but
# first, make sure we don't skip the next one as well :)
os.remove(SKIP_WHITESPACE_CHECK_FILENAME)
return 0
block_commit = False block_commit = False
diffs = [] diffs = []
...@@ -185,12 +221,15 @@ def main(argv=None): ...@@ -185,12 +221,15 @@ def main(argv=None):
save_diffs(diffs, diffs_filename) save_diffs(diffs, diffs_filename)
print >> sys.stderr, "*** To fix all indentation issues, run: cd `hg root` && patch -p0 < %s" % diffs_filename print >> sys.stderr, "*** To fix all indentation issues, run: cd `hg root` && patch -p0 < %s" % diffs_filename
if block_commit: if block_commit:
save_filename = ".hg/commit_message.saved" save_filename = ".hg/commit_message.saved"
save_commit_message(save_filename) save_commit_message(save_filename)
print >> sys.stderr, "*** Commit message saved to %s" % save_filename print >> sys.stderr, "*** Commit message saved to %s" % save_filename
if args.skip_after_failure:
save_skip_next_commit()
print >> sys.stderr, "*** Next commit attempt will not be checked. To change this, rm %s" % SKIP_WHITESPACE_CHECK_FILENAME
return int(block_commit) return int(block_commit)
......
import atexit, os, stat import atexit, gc, os, stat
from theano.compile import optdb from theano.compile import optdb
from theano import config from theano import config
...@@ -96,6 +96,9 @@ if cuda_available: ...@@ -96,6 +96,9 @@ if cuda_available:
cuda_initialization_error_message = "" cuda_initialization_error_message = ""
# actively closing our gpu session presents segfault-on-exit on some systems # actively closing our gpu session presents segfault-on-exit on some systems
atexit.register(gpu_shutdown) atexit.register(gpu_shutdown)
# do garbage collection before releasing the gpu to avoid releasing invalid pointers later
# note that atexit-registered calls are called in LIFO order
atexit.register(gc.collect)
except EnvironmentError, e: except EnvironmentError, e:
cuda_available = False cuda_available = False
cuda_initialization_error_message = e.message cuda_initialization_error_message = e.message
......
...@@ -12,43 +12,12 @@ ...@@ -12,43 +12,12 @@
//If true, we fill with NAN allocated device memory. //If true, we fill with NAN allocated device memory.
#define ALLOC_MEMSET 0 #define ALLOC_MEMSET 0
#define DEBUG_GPU_CONTEXT_REFCOUNT 0
// g_gpu_context_refcount starts at one b/c the gpu context will be implicitly created
// on the first successful cuda call. the matching decref is in CudaNdarray_gpu_shutdown.
static int g_gpu_context_refcount = 1;
///////////////////////////
// cuda context management
///////////////////////////
void gpu_context_incref() {
g_gpu_context_refcount++;
#if DEBUG_GPU_CONTEXT_REFCOUNT
fprintf(stderr, "gpu_context_incref, to %d\n", g_gpu_context_refcount);
#endif
}
void gpu_context_decref() {
g_gpu_context_refcount--;
#if DEBUG_GPU_CONTEXT_REFCOUNT
fprintf(stderr, "gpu_context_decref, to %d\n", g_gpu_context_refcount);
#endif
if(g_gpu_context_refcount == 0) {
// we're now free to close the cuda context; if we don't explicitly
// exit our cuda context, some systems segfault on process exit
// for as-yet unknown reasons; see
// http://groups.google.com/group/theano-users/browse_thread/thread/c351846e5cebe35f
cudaThreadExit();
#if DEBUG_GPU_CONTEXT_REFCOUNT
fprintf(stderr, "gpu_context_decref at 0, calling cudaThreadExit\n");
#endif
}
}
///////////////////////// /////////////////////////
// Alloc and Free // Alloc and Free
///////////////////////// /////////////////////////
static int g_gpu_context_active = 0;
/** /**
* *
* In the test program I'm using, the _outstanding_mallocs decreases with every call. * In the test program I'm using, the _outstanding_mallocs decreases with every call.
...@@ -80,9 +49,6 @@ void * device_malloc(size_t size) ...@@ -80,9 +49,6 @@ void * device_malloc(size_t size)
return NULL; return NULL;
} }
_outstanding_mallocs[0] += (rval != NULL); _outstanding_mallocs[0] += (rval != NULL);
if(rval != NULL) {
gpu_context_incref(); // keep the gpu context around until we've free this memory
}
#if COMPUTE_GPU_MEM_USED #if COMPUTE_GPU_MEM_USED
for(int i=0;i<TABLE_SIZE;i++){ for(int i=0;i<TABLE_SIZE;i++){
if(NULL==_alloc_size_table[i].ptr){ if(NULL==_alloc_size_table[i].ptr){
...@@ -104,6 +70,10 @@ void * device_malloc(size_t size) ...@@ -104,6 +70,10 @@ void * device_malloc(size_t size)
} }
int device_free(void *ptr) int device_free(void *ptr)
{ {
// if there is no gpu context, the call to cudaFree will fail; skip it entirely
if(!g_gpu_context_active) {
return 0;
}
cudaError_t err = cudaFree(ptr); cudaError_t err = cudaFree(ptr);
if (cudaSuccess != err) if (cudaSuccess != err)
{ {
...@@ -116,9 +86,6 @@ int device_free(void *ptr) ...@@ -116,9 +86,6 @@ int device_free(void *ptr)
return -1; return -1;
} }
_outstanding_mallocs[0] -= (ptr != NULL); _outstanding_mallocs[0] -= (ptr != NULL);
if(ptr != NULL) {
gpu_context_decref();
}
#if COMPUTE_GPU_MEM_USED #if COMPUTE_GPU_MEM_USED
int i=0; int i=0;
for(;i<TABLE_SIZE;i++) for(;i<TABLE_SIZE;i++)
...@@ -1883,6 +1850,11 @@ CudaNdarray_gpu_init(PyObject* _unused, PyObject* args) ...@@ -1883,6 +1850,11 @@ CudaNdarray_gpu_init(PyObject* _unused, PyObject* args)
"Unable to get the number of gpus available: %s", "Unable to get the number of gpus available: %s",
cudaGetErrorString(cudaGetLastError())); cudaGetErrorString(cudaGetLastError()));
} }
// as soon as the first successful call to a cuda* function is made, a
// gpu context has been created
g_gpu_context_active = 1;
if(deviceCount <= 0) { if(deviceCount <= 0) {
return PyErr_Format(PyExc_EnvironmentError, return PyErr_Format(PyExc_EnvironmentError,
"Can't use the GPU, no devices support CUDA"); "Can't use the GPU, no devices support CUDA");
...@@ -1926,7 +1898,8 @@ CudaNdarray_gpu_init(PyObject* _unused, PyObject* args) ...@@ -1926,7 +1898,8 @@ CudaNdarray_gpu_init(PyObject* _unused, PyObject* args)
PyObject * PyObject *
CudaNdarray_gpu_shutdown(PyObject* _unused, PyObject* _unused_args) { CudaNdarray_gpu_shutdown(PyObject* _unused, PyObject* _unused_args) {
gpu_context_decref(); cudaThreadExit();
g_gpu_context_active = 0; // context has now been closed down
Py_INCREF(Py_None); Py_INCREF(Py_None);
return Py_None; return Py_None;
} }
......
...@@ -213,7 +213,8 @@ class SparseType(gof.Type): ...@@ -213,7 +213,8 @@ class SparseType(gof.Type):
# a FAST_RUN computation.. # a FAST_RUN computation..
return scipy.sparse.issparse(a) \ return scipy.sparse.issparse(a) \
and scipy.sparse.issparse(b) \ and scipy.sparse.issparse(b) \
and abs(a-b).sum() < (1e-6 * a.nnz) and ((abs(a-b).sum() < (1e-6 * a.nnz))
or (a.nnz==0 and b.nnz==0))#in case a and b are empty
def values_eq(self, a, b): def values_eq(self, a, b):
#WARNING: equality comparison of sparse matrices is not fast or easy #WARNING: equality comparison of sparse matrices is not fast or easy
...@@ -789,7 +790,11 @@ class StructuredDot(gof.Op): ...@@ -789,7 +790,11 @@ class StructuredDot(gof.Op):
dtype_out = scalar.upcast(a.type.dtype, b.type.dtype) dtype_out = scalar.upcast(a.type.dtype, b.type.dtype)
if b.type.ndim != 2: if b.type.ndim != 2:
raise NotImplementedError('non-matrix b') raise NotImplementedError('non-matrix b')
return gof.Apply(self, [a,b], [tensor.tensor(dtype_out, (False, b.type.broadcastable[1]))])
if _is_sparse_variable(b):
return gof.Apply(self, [a,b], [SparseType(a.type.format,dtype_out)()])
else:
return gof.Apply(self, [a,b], [tensor.tensor(dtype_out, (False, b.type.broadcastable[1]))])
def perform(self, node, (a,b), (out,)): def perform(self, node, (a,b), (out,)):
if a.shape[1] != b.shape[0]: if a.shape[1] != b.shape[0]:
...@@ -797,6 +802,11 @@ class StructuredDot(gof.Op): ...@@ -797,6 +802,11 @@ class StructuredDot(gof.Op):
#variable = a.dot(b) # deprecated #variable = a.dot(b) # deprecated
variable = a * b variable = a * b
if isinstance(node.outputs[0].type,SparseType):
assert _is_sparse(variable)
out[0] = variable
return
assert _is_dense(variable) # scipy 0.7 automatically converts to dense assert _is_dense(variable) # scipy 0.7 automatically converts to dense
# dot of an NxM sparse matrix, with a Mx1 dense matrix, returns vector not matrix # dot of an NxM sparse matrix, with a Mx1 dense matrix, returns vector not matrix
......
...@@ -344,6 +344,28 @@ class test_structureddot(unittest.TestCase): ...@@ -344,6 +344,28 @@ class test_structureddot(unittest.TestCase):
outvals = f(kernvals,imvals) outvals = f(kernvals,imvals)
print outvals print outvals
def test_dot_sparse_sparse(self):
#test dot for 2 input sparse matrix
sparse_dtype = 'float64'
for sparse_format in ['csc','csr']:
a = SparseType(sparse_format, dtype=sparse_dtype)()
b = SparseType(sparse_format, dtype=sparse_dtype)()
d = theano.dot(a,b)
f = theano.function([a,b], theano.Out(d, borrow=True))
topo = f.maker.env.toposort()
for M,N,K,nnz in [(4,3,2,3),
(40,30,20,3),
(40,30,20,30),
(400,3000,200,6000),
]:
if sparse_format == 'csc':
spmat = sp.csc_matrix(random_lil((M,N), sparse_dtype, nnz))
spmat2 = sp.csc_matrix(random_lil((N,K), sparse_dtype, nnz))
elif sparse_format == 'csr':
spmat = sp.csr_matrix(random_lil((M,N), sparse_dtype, nnz))
spmat2 = sp.csr_matrix(random_lil((N,K), sparse_dtype, nnz))
f(spmat,spmat2)
def test_csc_correct_output_faster_than_scipy(self): def test_csc_correct_output_faster_than_scipy(self):
sparse_dtype = 'float64' sparse_dtype = 'float64'
dense_dtype = 'float64' dense_dtype = 'float64'
......
...@@ -33,6 +33,9 @@ def _info(*msg): ...@@ -33,6 +33,9 @@ def _info(*msg):
def _warn(*msg): def _warn(*msg):
_logger.warn(' '.join(msg)) _logger.warn(' '.join(msg))
#This is needed as we will hide it later
python_complex=complex
def check_equal_numpy(x, y): def check_equal_numpy(x, y):
""" """
Returns True iff x and y are equal (checks the dtype and Returns True iff x and y are equal (checks the dtype and
...@@ -367,8 +370,41 @@ def get_constant_value(v): ...@@ -367,8 +370,41 @@ def get_constant_value(v):
ret = [[None]] ret = [[None]]
v.owner.op.perform(v.owner, [const], ret) v.owner.op.perform(v.owner, [const], ret)
return ret[0][0] return ret[0][0]
if isinstance(v.owner.op, Subtensor) and v.ndim==0 and isinstance(v.owner.inputs[0], TensorConstant): if isinstance(v.owner.op, Subtensor) and v.ndim==0:
return v.owner.inputs[0].data[v.owner.op.idx_list[0]] if isinstance(v.owner.inputs[0], TensorConstant):
return v.owner.inputs[0].data[v.owner.op.idx_list[0]]
#Needed to make better graph in this test.
#theano/tensor/tests/test_sharedvar.py:test_shared_options.test_specify_shape_partial
if (v.owner.inputs[0].owner and
isinstance(v.owner.inputs[0].owner.op, Join) and
# Ensure the Join is joining only scalar variables (so that
# the constant value can be found at the same index as the one
# used in the sub-tensor).
all(var.ndim==0 for var in v.owner.inputs[0].owner.inputs)):
# The index list 'idx_list' should have length one
# since joining scalar variables results in a 1D vector.
assert len(v.owner.op.idx_list) == 1
# Note the '+ 1' is because the first argument to Join is the
# axis.
ret = v.owner.inputs[0].owner.inputs[v.owner.op.idx_list[0]+1]
ret = get_constant_value(ret)
#join can cast implicitly its input in some case.
return theano._asarray(ret, dtype=v.type.dtype)
if (v.owner.inputs[0].owner and
isinstance(v.owner.inputs[0].owner.op,
theano.tensor.opt.MakeVector) and
# MakeVector normally accept only scalar as input.
# We put this check in case there is change in the future
all(var.ndim==0 for var in v.owner.inputs[0].owner.inputs)):
# The index list 'idx_list' should have length one
# since joining scalar variables results in a 1D vector.
assert len(v.owner.op.idx_list) == 1
ret = v.owner.inputs[0].owner.inputs[v.owner.op.idx_list[0]]
ret = get_constant_value(ret)
#MakeVector can cast implicitly its input in some case.
return theano._asarray(ret, dtype=v.type.dtype)
raise TypeError(v) raise TypeError(v)
...@@ -531,7 +567,11 @@ class TensorType(Type): ...@@ -531,7 +567,11 @@ class TensorType(Type):
@staticmethod @staticmethod
def may_share_memory(a,b): def may_share_memory(a,b):
return numpy.may_share_memory(a,b) #when this is called with a an ndarray and b
#a sparce matrix, numpy.may_share_memory fail.
if a.__class__ is b.__class__:
return numpy.may_share_memory(a,b)
else: return False
@staticmethod @staticmethod
def values_eq(a, b): def values_eq(a, b):
...@@ -1477,6 +1517,54 @@ shape = Shape() ...@@ -1477,6 +1517,54 @@ shape = Shape()
_shape = shape #was used in the past, now use shape directly. _shape = shape #was used in the past, now use shape directly.
pprint.assign(_shape, printing.MemberPrinter('shape')) pprint.assign(_shape, printing.MemberPrinter('shape'))
class SpecifyShape(Op):
"""
L{Op} put into the graph the user provided shape
In the case where this op stay in the final graph, we assert the shape.
For this the output of this op must be used in the graph. This is not
the case most of the time if we only take the shape of the output.
Maybe there is other optimization that will mess with this.
@note: Maybe in the futur we will never do the assert!
@note: We currently don't support specifying partial shape information.
"""
view_map = {0: [0]}
def __hash__(self):
return hash(type(self))
def __eq__(self, other):
return type(self) == type(other)
def __str__(self):
return self.__class__.__name__
def make_node(self, x, shape):
if not isinstance(x,Variable):
x = as_tensor_variable(x)
shape = as_tensor_variable(shape)
return Apply(self, [x, shape], [x.type()])
def perform(self, node, (x,shape ), (out, )):
assert numpy.all(x.shape==shape), ("got shape", x.shape,
"expected", shape)
out[0] = x
def infer_shape(self, node, (xshape, sshape)):
new_shape=[]
for dim in range(node.inputs[0].ndim):
try:
s=get_constant_value(node.inputs[1][dim])
s=as_tensor_variable(s)
new_shape.append(s)
except TypeError, e:
new_shape.append(node.inputs[1][dim])
assert len(new_shape)==len(xshape)
return [new_shape]
def grad(self, (x,), (gz,)):
return [gz]
specify_shape = SpecifyShape()
class MaxAndArgmax(Op): class MaxAndArgmax(Op):
"""Calculate the max and argmax over a given axis. """Calculate the max and argmax over a given axis.
...@@ -1620,10 +1708,10 @@ def min(x, axis='DEFAULT'): ...@@ -1620,10 +1708,10 @@ def min(x, axis='DEFAULT'):
axis = 0 axis = 0
elif axis=='DEFAULT': elif axis=='DEFAULT':
axis = x.type.ndim - 1 axis = x.type.ndim - 1
warnings.warn("The default axis of min will change! Now we return the min over the last dimensions. It will change to be the same as numpy: the min over all dimensions. To hide this warning and be compatible with the future behavior, set axis to -1 to have the current behavior. To have the futur behavior set axis to range(nb dim), but this don't support the grad. To have the grad, you must flatten the tensor before calling min().") warnings.warn("The default axis of min will change! Now we return the min over the last dimensions. It will change to be the same as numpy: the min over all dimensions. To hide this warning and be compatible with the future behavior, set axis to -1 to have the current behavior. To have the future behavior, set axis to range(x.ndim), but this does not support the grad. To be able to get the grad, you must flatten the tensor before calling min().")
elif axis is None: elif axis is None:
axis = x.type.ndim - 1 axis = x.type.ndim - 1
warnings.warn("The behavior of min when axis==None will change! Now we return the min over the last dimensions. It will change to the min over all dimensions as numpy. To hide this warning and be compatible with the future behavior, set axis to -1 to have the current behavior. To have the futur behavior set axis to range(nb dim), but this don't support the grad. To have the grad, you must flatten the tensor before calling min().") warnings.warn("The behavior of min when axis is None will change! Now we return the min over the last dimensions. It will change to the min over all dimensions as numpy. To hide this warning and be compatible with the future behavior, set axis to -1 to have the current behavior. To have the future behavior, set axis to range(x.ndim), but this does not support the grad. To be able to get the grad, you must flatten the tensor before calling min().")
str_x_type = str(x.dtype) str_x_type = str(x.dtype)
if str_x_type.startswith('float') or str_x_type.startswith('int'): if str_x_type.startswith('float') or str_x_type.startswith('int'):
return -max(-x, axis=axis) return -max(-x, axis=axis)
...@@ -2159,9 +2247,10 @@ def mean(input, axis = None, op = False): ...@@ -2159,9 +2247,10 @@ def mean(input, axis = None, op = False):
@constructor @constructor
def var(input, axis = None): def var(input, axis = None):
"""Compute the variance along the given axis of a tensor `input` """Compute the variance along the given axis of a tensor `input`.
:param axis: compute the variance along this axis of the tensor. None means trailing axis. :param axis: Compute the variance along this axis of the tensor.
None means all axes (like numpy).
:type axis: None or int or (list of int) (see `Sum`) :type axis: None or int or (list of int) (see `Sum`)
""" """
...@@ -2195,6 +2284,16 @@ def var(input, axis = None): ...@@ -2195,6 +2284,16 @@ def var(input, axis = None):
#return the mean sqr #return the mean sqr
return mean(centered_input**2, axis) return mean(centered_input**2, axis)
@constructor
def std(input, axis=None):
"""Compute the standard deviation along the given axis of a tensor `input`.
:param axis: Compute the standard deviation along this axis of the tensor.
None means all axes (like numpy).
:type axis: None or int or (list of int) (see `Sum`)
"""
return sqrt(var(input=input, axis=axis))
if 0: if 0:
## COMMENTED OUT FEB 17 2010 ## COMMENTED OUT FEB 17 2010
## TODO (DOCUMENT AND WRITE TESTS) OR DELETE ## TODO (DOCUMENT AND WRITE TESTS) OR DELETE
...@@ -3187,11 +3286,18 @@ def stack(*tensors): ...@@ -3187,11 +3286,18 @@ def stack(*tensors):
raise Exception('theano.tensor.stack(*tensors) must have at least one parameter') raise Exception('theano.tensor.stack(*tensors) must have at least one parameter')
# If all tensors are scalars of the same type, call make_vector. # If all tensors are scalars of the same type, call make_vector.
# It makes the graph simpler, by not adding DimShuffles and Rebroadcasts # It makes the graph simpler, by not adding DimShuffles and Rebroadcasts
if numpy.all([isinstance(t, Variable) and\ if isinstance(tensors[0], (numpy.number, float, int, python_complex)):
isinstance(t.type, TensorType) and\ tensors=list(tensors)
t.ndim==0 and t.type==tensors[0].type\ tensors[0]=as_tensor_variable(tensors[0])
if numpy.all([isinstance(t, (numpy.number, float, int, python_complex))#in case their is direct int
or (isinstance(t, Variable) and
isinstance(t.type, TensorType) and
t.ndim==0 and
t.type.__class__==tensors[0].type.__class__)
for t in tensors]): for t in tensors]):
return theano.tensor.opt.MakeVector(scal.upcast(*[i.dtype for i in tensors]))(*tensors) tensors = map(as_tensor_variable,tensors)#in case their is direct int
dtype = scal.upcast(*[i.dtype for i in tensors])
return theano.tensor.opt.MakeVector(dtype)(*tensors)
return join(0, *[shape_padleft(t, 1) for t in tensors]) return join(0, *[shape_padleft(t, 1) for t in tensors])
@constructor @constructor
...@@ -3334,6 +3440,7 @@ class Reshape(Op): ...@@ -3334,6 +3440,7 @@ class Reshape(Op):
return '%s{%s}' %(self.__class__.__name__, self.ndim) return '%s{%s}' %(self.__class__.__name__, self.ndim)
def make_node(self, x, shp): def make_node(self, x, shp):
x = as_tensor_variable(x) x = as_tensor_variable(x)
shp_orig = shp
shp = as_tensor_variable(shp, ndim=1) shp = as_tensor_variable(shp, ndim=1)
if not shp.dtype.startswith('int'): if not shp.dtype.startswith('int'):
raise TypeError("Shape must be integers") raise TypeError("Shape must be integers")
...@@ -3342,7 +3449,16 @@ class Reshape(Op): ...@@ -3342,7 +3449,16 @@ class Reshape(Op):
bcast = [s==1 for s in shp.data] bcast = [s==1 for s in shp.data]
return gof.Apply(self, [x, shp], [tensor(x.type.dtype, bcast)]) return gof.Apply(self, [x, shp], [tensor(x.type.dtype, bcast)])
else: else:
return gof.Apply(self, [x, shp], [tensor(x.type.dtype, [False]*self.ndim)]) bcasts = [False] * self.ndim
for index in xrange(self.ndim):
y = shp_orig[index]
# Try to see if we can infer that y has a constant value of 1.
# If so, that dimension should be broadcastable.
try:
bcasts[index] = (hasattr(y, 'get_constant_value') and y.get_constant_value() == 1)
except TypeError:
pass
return gof.Apply(self, [x, shp], [tensor(x.type.dtype, bcasts)])
def perform(self, node, (x, shp), (out,)): def perform(self, node, (x, shp), (out,)):
if (len(shp) != self.ndim): if (len(shp) != self.ndim):
raise ValueError('shape argument to Reshape.perform has incorrect length %i' raise ValueError('shape argument to Reshape.perform has incorrect length %i'
......
...@@ -4,8 +4,8 @@ import sys, traceback, logging, copy, os ...@@ -4,8 +4,8 @@ import sys, traceback, logging, copy, os
import numpy import numpy
import numpy.distutils import numpy.distutils
from theano.configparser import config, AddConfigVar, StrParam from theano.configparser import config, AddConfigVar, StrParam
from theano.gof import (utils, Op, view_roots, PatternSub, DestroyHandler, from theano.gof import (utils, Op, view_roots, PatternSub, DestroyHandler,
SeqOptimizer, local_optimizer, Optimizer, LocalOptimizer, OpKeyOptimizer, SeqOptimizer, local_optimizer, Optimizer, LocalOptimizer, OpKeyOptimizer,
InconsistencyError, toolbox, SequenceDB, EquilibriumOptimizer) InconsistencyError, toolbox, SequenceDB, EquilibriumOptimizer)
from theano.printing import pprint, FunctionPrinter, debugprint from theano.printing import pprint, FunctionPrinter, debugprint
from theano.compile.mode import optdb from theano.compile.mode import optdb
...@@ -17,7 +17,7 @@ import basic as T ...@@ -17,7 +17,7 @@ import basic as T
from theano.tensor.tsor_apply import Apply from theano.tensor.tsor_apply import Apply
#NB: this clobbers the builtin 'compile' symbol #NB: this clobbers the builtin 'compile' symbol
from theano import compile #to register the optimizer built by this file from theano import compile #to register the optimizer built by this file
from theano.tensor.blas_headers import cblas_header_text, blas_header_text from theano.tensor.blas_headers import cblas_header_text, blas_header_text
...@@ -108,11 +108,11 @@ def default_blas_ldflags(): ...@@ -108,11 +108,11 @@ def default_blas_ldflags():
if all(not os.path.exists(dir) for dir in numpy.distutils.__config__.blas_opt_info['library_dirs']): if all(not os.path.exists(dir) for dir in numpy.distutils.__config__.blas_opt_info['library_dirs']):
return "-lblas" return "-lblas"
return ' '.join( return ' '.join(
#TODO: the Gemm op below should separate the -L and -l arguments into the two callbacks that CLinker uses for that stuff. #TODO: the Gemm op below should separate the -L and -l arguments into the two callbacks that CLinker uses for that stuff.
# for now, we just pass the whole ldflags as the -l options part. # for now, we just pass the whole ldflags as the -l options part.
['-L%s'%l for l in numpy.distutils.__config__.blas_opt_info['library_dirs']] + ['-L%s'%l for l in numpy.distutils.__config__.blas_opt_info['library_dirs']] +
['-l%s'%l for l in numpy.distutils.__config__.blas_opt_info['libraries']]) ['-l%s'%l for l in numpy.distutils.__config__.blas_opt_info['libraries']])
# ['-I%s'%l for l in numpy.distutils.__config__.blas_opt_info['include_dirs']]) # ['-I%s'%l for l in numpy.distutils.__config__.blas_opt_info['include_dirs']])
except KeyError: except KeyError:
return "-lblas" return "-lblas"
...@@ -124,7 +124,7 @@ AddConfigVar('blas.ldflags', ...@@ -124,7 +124,7 @@ AddConfigVar('blas.ldflags',
def ldflags(libs=True, flags=False, libs_dir=False, include_dir=False): def ldflags(libs=True, flags=False, libs_dir=False, include_dir=False):
"""Return a list of libraries against which an Op's object file should be """Return a list of libraries against which an Op's object file should be
linked to benefit from a BLAS implementation. linked to benefit from a BLAS implementation.
Default: ['blas'], but configuration variable config.blas.ldflags overrides this. Default: ['blas'], but configuration variable config.blas.ldflags overrides this.
""" """
rval = [] rval = []
...@@ -139,7 +139,7 @@ def ldflags(libs=True, flags=False, libs_dir=False, include_dir=False): ...@@ -139,7 +139,7 @@ def ldflags(libs=True, flags=False, libs_dir=False, include_dir=False):
found_dyn=True found_dyn=True
if not found_dyn and dirs: if not found_dyn and dirs:
warning("We did not found a dynamic library into the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.") warning("We did not found a dynamic library into the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library.")
for t in config.blas.ldflags.split(): for t in config.blas.ldflags.split():
try: try:
t0, t1, t2 = t[0:3] t0, t1, t2 = t[0:3]
...@@ -162,7 +162,7 @@ def ldflags(libs=True, flags=False, libs_dir=False, include_dir=False): ...@@ -162,7 +162,7 @@ def ldflags(libs=True, flags=False, libs_dir=False, include_dir=False):
class GemmRelated(Op): class GemmRelated(Op):
"""Base class for Gemm and Dot22 """Base class for Gemm and Dot22
This class provides a kind of templated gemm Op. This class provides a kind of templated gemm Op.
""" """
def __eq__(self, other): def __eq__(self, other):
...@@ -186,14 +186,14 @@ class GemmRelated(Op): ...@@ -186,14 +186,14 @@ class GemmRelated(Op):
""" """
return blas_header_text() + mod_str return blas_header_text() + mod_str
def c_headers(self): def c_headers(self):
# std.cout doesn't require the '%' symbol to print stuff... # std.cout doesn't require the '%' symbol to print stuff...
# so it works much better with python's string-substitution stuff. # so it works much better with python's string-substitution stuff.
return ['<iostream>', '<time.h>', '<sys/time.h>'] return ['<iostream>', '<time.h>', '<sys/time.h>']
def c_libraries(self): def c_libraries(self):
return ldflags() return ldflags()
# code_cache_version is built by subclasses from # code_cache_version is built by subclasses from
# build_gemm_version # build_gemm_version
def c_compile_args(self): def c_compile_args(self):
...@@ -201,10 +201,10 @@ class GemmRelated(Op): ...@@ -201,10 +201,10 @@ class GemmRelated(Op):
def c_lib_dirs(self): def c_lib_dirs(self):
return ldflags(libs=False, libs_dir=True) return ldflags(libs=False, libs_dir=True)
def c_header_dirs(self): def c_header_dirs(self):
return ldflags(libs=False, include_dir=True) return ldflags(libs=False, include_dir=True)
declare_NS = """ declare_NS = """
int unit = 0; int unit = 0;
...@@ -231,15 +231,15 @@ class GemmRelated(Op): ...@@ -231,15 +231,15 @@ class GemmRelated(Op):
if (%(_zout)s->nd != 2) {PyErr_SetString(PyExc_NotImplementedError, "rank(z) != 2"); %(fail)s;} if (%(_zout)s->nd != 2) {PyErr_SetString(PyExc_NotImplementedError, "rank(z) != 2"); %(fail)s;}
""" """
check_xyz_double_or_float = """ check_xyz_double_or_float = """
if ((%(_x)s->descr->type_num != PyArray_DOUBLE) if ((%(_x)s->descr->type_num != PyArray_DOUBLE)
&& (%(_x)s->descr->type_num != PyArray_FLOAT)) && (%(_x)s->descr->type_num != PyArray_FLOAT))
{PyErr_SetString(PyExc_NotImplementedError, "type(x) is not double or float"); %(fail)s;} {PyErr_SetString(PyExc_NotImplementedError, "type(x) is not double or float"); %(fail)s;}
if ((%(_y)s->descr->type_num != PyArray_DOUBLE) if ((%(_y)s->descr->type_num != PyArray_DOUBLE)
&& (%(_y)s->descr->type_num != PyArray_FLOAT)) && (%(_y)s->descr->type_num != PyArray_FLOAT))
{PyErr_SetString(PyExc_NotImplementedError, "type(y) is not double or float"); %(fail)s;} {PyErr_SetString(PyExc_NotImplementedError, "type(y) is not double or float"); %(fail)s;}
if ((%(_zout)s->descr->type_num != PyArray_DOUBLE) if ((%(_zout)s->descr->type_num != PyArray_DOUBLE)
&& (%(_zout)s->descr->type_num != PyArray_FLOAT)) && (%(_zout)s->descr->type_num != PyArray_FLOAT))
{PyErr_SetString(PyExc_NotImplementedError, "type(z) is not double or float"); %(fail)s;} {PyErr_SetString(PyExc_NotImplementedError, "type(z) is not double or float"); %(fail)s;}
...@@ -262,21 +262,21 @@ class GemmRelated(Op): ...@@ -262,21 +262,21 @@ class GemmRelated(Op):
check_dims_strides = """ check_dims_strides = """
if (Nx[0] != Nz[0]) if (Nx[0] != Nz[0])
{ {
PyErr_Format(PyExc_ValueError, PyErr_Format(PyExc_ValueError,
"Shape mismatch: x has %%ld rows but z has %%ld rows", "Shape mismatch: x has %%ld rows but z has %%ld rows",
(long int)Nx[0], (long int)Nz[0]); (long int)Nx[0], (long int)Nz[0]);
%(fail)s; %(fail)s;
} }
if (Nx[1] != Ny[0]) if (Nx[1] != Ny[0])
{ {
PyErr_Format(PyExc_ValueError, PyErr_Format(PyExc_ValueError,
"Shape mismatch: x has %%ld cols but y has %%ld rows", "Shape mismatch: x has %%ld cols but y has %%ld rows",
(long int)Nx[1], (long int)Ny[0]); (long int)Nx[1], (long int)Ny[0]);
%(fail)s; %(fail)s;
} }
if (Ny[1] != Nz[1]) if (Ny[1] != Nz[1])
{ {
PyErr_Format(PyExc_ValueError, PyErr_Format(PyExc_ValueError,
"Shape mismatch: y has %%ld cols but z has %%ld cols", "Shape mismatch: y has %%ld cols but z has %%ld cols",
(long int)Ny[1], (long int)Nz[1]); (long int)Ny[1], (long int)Nz[1]);
%(fail)s; %(fail)s;
...@@ -413,11 +413,11 @@ class Gemm(GemmRelated): ...@@ -413,11 +413,11 @@ class Gemm(GemmRelated):
When a and b are scalars and x, y, and z are matrices, then When a and b are scalars and x, y, and z are matrices, then
gemm(z,a,x,y,b) gemm(z,a,x,y,b)
is similar to is similar to
b*z + a*dot(x,y) b*z + a*dot(x,y)
The difference between the two is that the top form is destructive on z, The difference between the two is that the top form is destructive on z,
whereas the bottom form is not. Gemm works in-place on the storage whereas the bottom form is not. Gemm works in-place on the storage
...@@ -450,7 +450,7 @@ class Gemm(GemmRelated): ...@@ -450,7 +450,7 @@ class Gemm(GemmRelated):
def __setstate__(self, dct): def __setstate__(self, dct):
inplace = dct.get('inplace', True) inplace = dct.get('inplace', True)
if inplace: if inplace:
self.destroy_map = {0: [0]} self.destroy_map = {0: [0]}
self.setup_z_Nz_Sz = self.setup_z_Nz_Sz_inplace self.setup_z_Nz_Sz = self.setup_z_Nz_Sz_inplace
else: else:
self.setup_z_Nz_Sz = self.setup_z_Nz_Sz_outplace self.setup_z_Nz_Sz = self.setup_z_Nz_Sz_outplace
...@@ -577,7 +577,7 @@ class Gemm(GemmRelated): ...@@ -577,7 +577,7 @@ class Gemm(GemmRelated):
case_float_ab_constants = """ case_float_ab_constants = """
#define REAL float #define REAL float
float a = (%(_a)s->descr->type_num == PyArray_FLOAT) float a = (%(_a)s->descr->type_num == PyArray_FLOAT)
? (REAL)(((float*)%(_a)s->data)[0]) ? (REAL)(((float*)%(_a)s->data)[0])
: (REAL)(((double*)%(_a)s->data)[0]); : (REAL)(((double*)%(_a)s->data)[0]);
float b = (%(_b)s->descr->type_num == PyArray_FLOAT) ? float b = (%(_b)s->descr->type_num == PyArray_FLOAT) ?
...@@ -587,7 +587,7 @@ class Gemm(GemmRelated): ...@@ -587,7 +587,7 @@ class Gemm(GemmRelated):
""" """
case_double_ab_constants = """ case_double_ab_constants = """
#define REAL double #define REAL double
double a = (%(_a)s->descr->type_num == PyArray_FLOAT) double a = (%(_a)s->descr->type_num == PyArray_FLOAT)
? (REAL)(((float*)%(_a)s->data)[0]) ? (REAL)(((float*)%(_a)s->data)[0])
: (REAL)(((double*)%(_a)s->data)[0]); : (REAL)(((double*)%(_a)s->data)[0]);
double b = (%(_b)s->descr->type_num == PyArray_FLOAT) ? double b = (%(_b)s->descr->type_num == PyArray_FLOAT) ?
...@@ -618,14 +618,14 @@ pprint.assign(gemm_inplace, FunctionPrinter('gemm_inplace')) ...@@ -618,14 +618,14 @@ pprint.assign(gemm_inplace, FunctionPrinter('gemm_inplace'))
pprint.assign(gemm_no_inplace, FunctionPrinter('gemm_no_inplace')) pprint.assign(gemm_no_inplace, FunctionPrinter('gemm_no_inplace'))
def res_is_a(node, op, maxclients=None): def res_is_a(node, op, maxclients=None):
if maxclients is not None: if maxclients is not None:
retval = (len(node.clients) <= maxclients) retval = (len(node.clients) <= maxclients)
else: else:
retval = True retval = True
return node.owner \ return node.owner \
and node.owner.op == op \ and node.owner.op == op \
and retval and retval
def _as_scalar(res): def _as_scalar(res):
...@@ -654,7 +654,7 @@ def _is_real_matrix(res): ...@@ -654,7 +654,7 @@ def _is_real_matrix(res):
def _is_real_vector(res): def _is_real_vector(res):
return res.type.dtype in ('float32', 'float64') \ return res.type.dtype in ('float32', 'float64') \
and res.type.ndim == 1 \ and res.type.ndim == 1 \
and res.type.broadcastable[0] == False and res.type.broadcastable[0] == False
def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip = True): def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip = True):
#print 'BETA L + ALPHA M', beta, L, alpha, M, recurse_flip #print 'BETA L + ALPHA M', beta, L, alpha, M, recurse_flip
...@@ -680,7 +680,7 @@ def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip = True): ...@@ -680,7 +680,7 @@ def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip = True):
pass pass
if Mr.ndim == 2: if Mr.ndim == 2:
#print "RETURNING GEMV (case 2)" #print "RETURNING GEMV (case 2)"
if Mr.dtype == Ml.dtype: if Mr.dtype == Ml.dtype:
rval = [gemv_no_inplace(L, alpha, Mr.T, Ml, beta)] rval = [gemv_no_inplace(L, alpha, Mr.T, Ml, beta)]
assert L.type == rval[0].type, (L.type, rval[0].type) assert L.type == rval[0].type, (L.type, rval[0].type)
else: else:
...@@ -700,7 +700,7 @@ def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip = True): ...@@ -700,7 +700,7 @@ def _beta_L_plus_alpha_M(beta, L, alpha, M, recurse_flip = True):
pass pass
return rval return rval
# this is False'd out because of inadequate testing. # this is False'd out because of inadequate testing.
# TODO see ticket #237 # TODO see ticket #237
if False and res_is_a(M, gemm_no_inplace, 1): if False and res_is_a(M, gemm_no_inplace, 1):
#EXPRESSION: (beta * L) + (alpha * (gemm_no_inplace(G, a, u, v, b))) #EXPRESSION: (beta * L) + (alpha * (gemm_no_inplace(G, a, u, v, b)))
...@@ -860,7 +860,7 @@ def _gemm_from_factored_list(lst): ...@@ -860,7 +860,7 @@ def _gemm_from_factored_list(lst):
s_j, M_j = lst[j] s_j, M_j = lst[j]
except: except:
continue continue
#print 'TRYING', (s_i, M_i, s_j, M_j) #print 'TRYING', (s_i, M_i, s_j, M_j)
gemm_of_sM_list = _beta_L_plus_alpha_M(s_i, M_i, s_j, M_j) gemm_of_sM_list = _beta_L_plus_alpha_M(s_i, M_i, s_j, M_j)
...@@ -874,7 +874,7 @@ def _gemm_from_factored_list(lst): ...@@ -874,7 +874,7 @@ def _gemm_from_factored_list(lst):
return s*M return s*M
assert len(gemm_of_sM_list) == 1 assert len(gemm_of_sM_list) == 1
add_inputs = [item_to_var(input) add_inputs = [item_to_var(input)
for k, input in enumerate(lst) if k not in (i,j)] for k, input in enumerate(lst) if k not in (i,j)]
add_inputs.extend(gemm_of_sM_list) add_inputs.extend(gemm_of_sM_list)
if len(add_inputs) > 1: if len(add_inputs) > 1:
...@@ -1050,7 +1050,7 @@ optdb.register('BlasOpt', blas_optdb, 1.7, 'fast_run') ...@@ -1050,7 +1050,7 @@ optdb.register('BlasOpt', blas_optdb, 1.7, 'fast_run')
# run before specialize (2.0) because specialize is basically a free-for-all that makes the # run before specialize (2.0) because specialize is basically a free-for-all that makes the
# graph crazy. # graph crazy.
blas_optdb.register('local_dot_to_dot22', blas_optdb.register('local_dot_to_dot22',
EquilibriumOptimizer([local_dot_to_dot22], max_use_ratio=5), EquilibriumOptimizer([local_dot_to_dot22], max_use_ratio=5),
0, 'fast_run') 0, 'fast_run')
blas_optdb.register('local_dot_to_gemm', GemmOptimizer(), 10, 'fast_run') blas_optdb.register('local_dot_to_gemm', GemmOptimizer(), 10, 'fast_run')
...@@ -1058,9 +1058,9 @@ blas_optdb.register('local_dot_to_gemm', GemmOptimizer(), 10, 'fast_run') ...@@ -1058,9 +1058,9 @@ blas_optdb.register('local_dot_to_gemm', GemmOptimizer(), 10, 'fast_run')
# After destroyhandler is in but before we try to make elemwise things inplace # After destroyhandler is in but before we try to make elemwise things inplace
# Try to make gemm inplace # Try to make gemm inplace
# Also, need to make the gemm optimisation(step 70) happen before the fusion of elemwise(step 71) # Also, need to make the gemm optimisation(step 70) happen before the fusion of elemwise(step 71)
optdb.register('InplaceBlasOpt', optdb.register('InplaceBlasOpt',
EquilibriumOptimizer([local_inplace_gemm, local_inplace_gemv], failure_callback=EquilibriumOptimizer.warn_inplace, EquilibriumOptimizer([local_inplace_gemm, local_inplace_gemv], failure_callback=EquilibriumOptimizer.warn_inplace,
max_use_ratio=5), max_use_ratio=5),
70.0, 'fast_run', 'inplace') 70.0, 'fast_run', 'inplace')
class Dot22Scalar(GemmRelated): class Dot22Scalar(GemmRelated):
...@@ -1103,7 +1103,7 @@ class Dot22Scalar(GemmRelated): ...@@ -1103,7 +1103,7 @@ class Dot22Scalar(GemmRelated):
""" """
case_float_ab_constants = """ case_float_ab_constants = """
#define REAL float #define REAL float
float a = (%(_a)s->descr->type_num == PyArray_FLOAT) float a = (%(_a)s->descr->type_num == PyArray_FLOAT)
? (REAL)(((float*)%(_a)s->data)[0]) ? (REAL)(((float*)%(_a)s->data)[0])
: (REAL)(((double*)%(_a)s->data)[0]); : (REAL)(((double*)%(_a)s->data)[0]);
#undef REAL #undef REAL
...@@ -1111,7 +1111,7 @@ class Dot22Scalar(GemmRelated): ...@@ -1111,7 +1111,7 @@ class Dot22Scalar(GemmRelated):
""" """
case_double_ab_constants = """ case_double_ab_constants = """
#define REAL double #define REAL double
double a = (%(_a)s->descr->type_num == PyArray_FLOAT) double a = (%(_a)s->descr->type_num == PyArray_FLOAT)
? (REAL)(((float*)%(_a)s->data)[0]) ? (REAL)(((float*)%(_a)s->data)[0])
: (REAL)(((double*)%(_a)s->data)[0]); : (REAL)(((double*)%(_a)s->data)[0]);
#undef REAL #undef REAL
...@@ -1138,7 +1138,7 @@ def local_dot22_to_dot22scalar(node): ...@@ -1138,7 +1138,7 @@ def local_dot22_to_dot22scalar(node):
.. note: .. note:
We execute this optimizer after the gemm optimizer. This allow to give more priority to gemm that give more speed up then this optimizer, but allow the gemm optimizer to ignore this op. We execute this optimizer after the gemm optimizer. This allow to give more priority to gemm that give more speed up then this optimizer, but allow the gemm optimizer to ignore this op.
TODO: support when we can reorder the mul to generate a dot22scalar or fix the canonizer to merge them(1 mul with multiple inputs) TODO: support when we can reorder the mul to generate a dot22scalar or fix the canonizer to merge them(1 mul with multiple inputs)
""" """
if node.op != T.mul: if node.op != T.mul:
...@@ -1154,7 +1154,7 @@ def local_dot22_to_dot22scalar(node): ...@@ -1154,7 +1154,7 @@ def local_dot22_to_dot22scalar(node):
#no scalar in input and no multiplication #no scalar in input and no multiplication
#if their was a multiplication we couls reorder the graph by the associativity of the graph. #if their was a multiplication we couls reorder the graph by the associativity of the graph.
return False return False
if not any(i_scalar): if not any(i_scalar):
#maybe we can reorder the graph as this mul have a mul in input. #maybe we can reorder the graph as this mul have a mul in input.
#The canonizer should have merged those mul together. #The canonizer should have merged those mul together.
...@@ -1207,4 +1207,3 @@ from opt import register_specialize, register_canonicalize ...@@ -1207,4 +1207,3 @@ from opt import register_specialize, register_canonicalize
def local_print_as_we_go_along(node): def local_print_as_we_go_along(node):
if node.op in (T.sub, T.add): if node.op in (T.sub, T.add):
debugprint(node) debugprint(node)
...@@ -240,10 +240,10 @@ class DimShuffle(Op): ...@@ -240,10 +240,10 @@ class DimShuffle(Op):
shape_statements = ['npy_intp dimensions[%i]'%nd_out] shape_statements = ['npy_intp dimensions[%i]'%nd_out]
for i, o in enumerate(self.new_order): for i, o in enumerate(self.new_order):
if o != 'x': if o != 'x':
shape_statements += [('dimensions['+str(i)+'] = %(basename)s->dimensions['+str(o)+']')] shape_statements += [('dimensions['+str(i)+'] = %(basename)s->dimensions['+str(o)+']')]
else: else:
shape_statements += [('dimensions['+str(i)+'] = 1')] shape_statements += [('dimensions['+str(i)+'] = 1')]
#backport #backport
#shape_statements += [('dimensions['+str(i)+'] = %(basename)s->dimensions['+str(o)+']') #shape_statements += [('dimensions['+str(i)+'] = %(basename)s->dimensions['+str(o)+']')
# if o != 'x' else # if o != 'x' else
...@@ -255,10 +255,10 @@ class DimShuffle(Op): ...@@ -255,10 +255,10 @@ class DimShuffle(Op):
#set the strides of the non-broadcasted dimensions #set the strides of the non-broadcasted dimensions
for i, o in enumerate(self.new_order): for i, o in enumerate(self.new_order):
if o != 'x': if o != 'x':
strides_statements += [('strides['+str(i)+'] = %(basename)s->strides['+str(o)+']')] strides_statements += [('strides['+str(i)+'] = %(basename)s->strides['+str(o)+']')]
else: else:
strides_statements += [('strides['+str(i)+'] = 0')] strides_statements += [('strides['+str(i)+'] = 0')]
#backport #backport
#strides_statements += [('strides['+str(i)+'] = %(basename)s->strides['+str(o)+']') #strides_statements += [('strides['+str(i)+'] = %(basename)s->strides['+str(o)+']')
# if o != 'x' else # if o != 'x' else
...@@ -276,7 +276,7 @@ class DimShuffle(Op): ...@@ -276,7 +276,7 @@ class DimShuffle(Op):
# npy_intp* strides, void* data, int itemsize, int flags, PyObject* obj) # npy_intp* strides, void* data, int itemsize, int flags, PyObject* obj)
# #
close_bracket = [ close_bracket = [
#create a new array, #create a new array,
('%(res)s = (PyArrayObject*)PyArray_New(&PyArray_Type, ' ('%(res)s = (PyArrayObject*)PyArray_New(&PyArray_Type, '
'' + str(nd_out) + ', dimensions, ' '' + str(nd_out) + ', dimensions, '
'PyArray_TYPE(%(basename)s), strides, ' 'PyArray_TYPE(%(basename)s), strides, '
...@@ -287,13 +287,13 @@ class DimShuffle(Op): ...@@ -287,13 +287,13 @@ class DimShuffle(Op):
#recalculate flags: CONTIGUOUS, FORTRAN, ALIGNED #recalculate flags: CONTIGUOUS, FORTRAN, ALIGNED
'PyArray_UpdateFlags(%(res)s, NPY_UPDATE_ALL)', 'PyArray_UpdateFlags(%(res)s, NPY_UPDATE_ALL)',
#we are making a view in both inplace and non-inplace cases #we are making a view in both inplace and non-inplace cases
'%(res)s->base = (PyObject*)%(basename)s', '%(res)s->base = (PyObject*)%(basename)s',
'}'] '}']
full_code = statements(check_input_nd full_code = statements(check_input_nd
+ clear_output + clear_output
+ get_base + get_base
+ shape_statements + shape_statements
+ strides_statements + strides_statements
+ close_bracket) + close_bracket)
...@@ -345,7 +345,7 @@ class DimShufflePrinter: ...@@ -345,7 +345,7 @@ class DimShufflePrinter:
raise TypeError("Can only print DimShuffle.") raise TypeError("Can only print DimShuffle.")
elif isinstance(r.owner.op, DimShuffle): elif isinstance(r.owner.op, DimShuffle):
ord = r.owner.op.new_order ord = r.owner.op.new_order
return self.__p(ord, pstate, r.owner.inputs[0]) return self.__p(ord, pstate, r.owner.inputs[0])
else: else:
raise TypeError("Can only print DimShuffle.") raise TypeError("Can only print DimShuffle.")
...@@ -411,7 +411,7 @@ class Elemwise(Op): ...@@ -411,7 +411,7 @@ class Elemwise(Op):
d.pop('__epydoc_asRoutine', None) d.pop('__epydoc_asRoutine', None)
d.pop('_hashval') d.pop('_hashval')
return d return d
def __setstate__(self, d): def __setstate__(self, d):
self.__dict__.update(d) self.__dict__.update(d)
if self.scalar_op.nin > 0: if self.scalar_op.nin > 0:
...@@ -441,7 +441,7 @@ class Elemwise(Op): ...@@ -441,7 +441,7 @@ class Elemwise(Op):
else: else:
# TODO: use LComplete instead # TODO: use LComplete instead
args.append(DimShuffle( args.append(DimShuffle(
input.type.broadcastable, input.type.broadcastable,
['x']*difference + range(length), ['x']*difference + range(length),
inplace = True)(input)) inplace = True)(input))
inputs = args inputs = args
...@@ -463,7 +463,7 @@ class Elemwise(Op): ...@@ -463,7 +463,7 @@ class Elemwise(Op):
raise ValueError("Operation cannot be done inplace on an input with broadcasted dimensions.") raise ValueError("Operation cannot be done inplace on an input with broadcasted dimensions.")
out_dtypes = [o.type.dtype for o in shadow.outputs] out_dtypes = [o.type.dtype for o in shadow.outputs]
if any(inputs[i].type.dtype != out_dtypes[o] for o, i in inplace_pattern.items()): if any(inputs[i].type.dtype != out_dtypes[o] for o, i in inplace_pattern.items()):
raise TypeError("Cannot do an inplace operation on incompatible data types.", raise TypeError("Cannot do an inplace operation on incompatible data types.",
([i.type.dtype for i in inputs], out_dtypes, inplace_pattern)) ([i.type.dtype for i in inputs], out_dtypes, inplace_pattern))
outputs = [TensorType(dtype = dtype, broadcastable = broadcastable)() for dtype, broadcastable in zip(out_dtypes, out_broadcastables)] outputs = [TensorType(dtype = dtype, broadcastable = broadcastable)() for dtype, broadcastable in zip(out_dtypes, out_broadcastables)]
return Apply(self, inputs, outputs) return Apply(self, inputs, outputs)
...@@ -484,10 +484,10 @@ class Elemwise(Op): ...@@ -484,10 +484,10 @@ class Elemwise(Op):
first_part = [k for k,v in items] first_part = [k for k,v in items]
second_part = [] second_part = []
for k,v in items: for k,v in items:
if isinstance(v, (tuple, list)): if isinstance(v, (tuple, list)):
second_part += [tuple(v)] second_part += [tuple(v)]
else: else:
second_part += [v] second_part += [v]
tuple_items = tuple(first_part + second_part) tuple_items = tuple(first_part + second_part)
#backport #backport
#tuple_items = tuple([k for k,v in items] + [(tuple(v) if isinstance(v, (tuple, list)) else v) for k,v in items]) #tuple_items = tuple([k for k,v in items] + [(tuple(v) if isinstance(v, (tuple, list)) else v) for k,v in items])
...@@ -511,7 +511,7 @@ class Elemwise(Op): ...@@ -511,7 +511,7 @@ class Elemwise(Op):
def grad(self, inputs, ograds): def grad(self, inputs, ograds):
# Gradients (especially on the final costs) don't have to be symbolic # Gradients (especially on the final costs) don't have to be symbolic
ograds = map(as_tensor_variable, ograds) ograds = map(as_tensor_variable, ograds)
scalar_inputs = [Scalar(dtype = t.type.dtype)() for t in inputs] scalar_inputs = [Scalar(dtype = t.type.dtype)() for t in inputs]
scalar_ograds = [Scalar(dtype = ograd.type.dtype)() for ograd in ograds] scalar_ograds = [Scalar(dtype = ograd.type.dtype)() for ograd in ograds]
scalar_igrads = self.scalar_op.grad(scalar_inputs, scalar_ograds) scalar_igrads = self.scalar_op.grad(scalar_inputs, scalar_ograds)
...@@ -575,7 +575,7 @@ class Elemwise(Op): ...@@ -575,7 +575,7 @@ class Elemwise(Op):
msg2 = [] msg2 = []
for d, b in zip(input.shape, sinput.type.broadcastable): for d, b in zip(input.shape, sinput.type.broadcastable):
if b: if b:
msg2 += ['*'] msg2 += ['*']
else: else:
msg2 += [str(d)] msg2 += [str(d)]
msg.append('(%s)' % ", ".join(msg2)) msg.append('(%s)' % ", ".join(msg2))
...@@ -616,7 +616,7 @@ class Elemwise(Op): ...@@ -616,7 +616,7 @@ class Elemwise(Op):
# the first (faster) version leads to segfaults # the first (faster) version leads to segfaults
ufunc_args = inputs # + output_storage ufunc_args = inputs # + output_storage
ufunc = self.ufunc or numpy.frompyfunc(self.scalar_op.impl, len(inputs), self.scalar_op.nout) ufunc = self.ufunc or numpy.frompyfunc(self.scalar_op.impl, len(inputs), self.scalar_op.nout)
try: try:
variables = ufunc(*ufunc_args) variables = ufunc(*ufunc_args)
except Exception, e: except Exception, e:
...@@ -655,7 +655,7 @@ class Elemwise(Op): ...@@ -655,7 +655,7 @@ class Elemwise(Op):
# b_dim might still be None, if every input's shape was unknown in dimension 'dim' # b_dim might still be None, if every input's shape was unknown in dimension 'dim'
oshp.append(b_dim) oshp.append(b_dim)
# TODO: it would be interesting to return the constraining information that if # TODO: it would be interesting to return the constraining information that if
# one of the inputs shape[dim] is known and another input's shape[dim] is not, # one of the inputs shape[dim] is known and another input's shape[dim] is not,
# that we can now assume that the other input's shape[dim] is the same as the # that we can now assume that the other input's shape[dim] is the same as the
# first. # first.
rval.append(tuple(oshp)) rval.append(tuple(oshp))
...@@ -841,9 +841,9 @@ class CAReduce(Op): ...@@ -841,9 +841,9 @@ class CAReduce(Op):
Examples: Examples:
CAReduce(add) -> sum CAReduce(add) -> sum
CAReduce(mul) -> product CAReduce(mul) -> product
CAReduce(maximum) -> sum CAReduce(maximum) -> max
CAReduce(_or) -> any # not lazy CAReduce(or_) -> any # not lazy
CAReduce(_and) -> all # not lazy CAReduce(and_) -> all # not lazy
In order to (eventually) optimize memory usage patterns, In order to (eventually) optimize memory usage patterns,
L{CAReduce} makes zero guarantees on the order in which it L{CAReduce} makes zero guarantees on the order in which it
...@@ -899,7 +899,7 @@ class CAReduce(Op): ...@@ -899,7 +899,7 @@ class CAReduce(Op):
assert len(axis)==len(axis2) assert len(axis)==len(axis2)
axis = tuple(axis2) axis = tuple(axis2)
op = self.__class__(self.scalar_op, axis) op = self.__class__(self.scalar_op, axis)
else: else:
op = self op = self
output = TensorType(dtype = self._output_dtype(input.type.dtype), output = TensorType(dtype = self._output_dtype(input.type.dtype),
broadcastable = [x for i, x in enumerate(input.type.broadcastable) if i not in axis])() broadcastable = [x for i, x in enumerate(input.type.broadcastable) if i not in axis])()
...@@ -910,7 +910,7 @@ class CAReduce(Op): ...@@ -910,7 +910,7 @@ class CAReduce(Op):
d = copy(self.__dict__) d = copy(self.__dict__)
d.pop('ufunc') d.pop('ufunc')
return d return d
def __setstate__(self, d): def __setstate__(self, d):
self.__dict__.update(d) self.__dict__.update(d)
self.ufunc = numpy.frompyfunc(self.scalar_op.impl, 2, 1) self.ufunc = numpy.frompyfunc(self.scalar_op.impl, 2, 1)
......
...@@ -317,8 +317,10 @@ class MakeVector(T.Op): ...@@ -317,8 +317,10 @@ class MakeVector(T.Op):
inputs = map(T.as_tensor_variable, inputs) inputs = map(T.as_tensor_variable, inputs)
if not all(a.type == inputs[0].type for a in inputs) or (len(inputs)>0 and inputs[0].dtype != self.dtype): if not all(a.type == inputs[0].type for a in inputs) or (len(inputs)>0 and inputs[0].dtype != self.dtype):
dtype=theano.scalar.upcast(self.dtype,*[i.dtype for i in inputs]) dtype=theano.scalar.upcast(self.dtype,*[i.dtype for i in inputs])
#upcast the input to the determined dtype, but don't upcast downcast anything #upcast the input to the determined dtype, but don't downcast anything
assert dtype==self.dtype, "Upcast the input of MakeVector to dtype gived in init without precissino loss only." assert dtype==self.dtype, (
"The upcast of the inputs to MakeVector should match the "
"dtype given in __init__.")
if not all(self.dtype == T.cast(i,dtype=dtype).dtype for a in inputs): if not all(self.dtype == T.cast(i,dtype=dtype).dtype for a in inputs):
raise TypeError("MakeVector.make_node expected inputs upcastable to %s. got %s"%( raise TypeError("MakeVector.make_node expected inputs upcastable to %s. got %s"%(
self.dtype, self.dtype,
...@@ -348,6 +350,9 @@ class MakeVector(T.Op): ...@@ -348,6 +350,9 @@ class MakeVector(T.Op):
# assume that out has correct dtype. there is no cheap way to check # assume that out has correct dtype. there is no cheap way to check
out[0][...] = inputs out[0][...] = inputs
def grad(self, inputs, output_gradients):
return [output_gradients[0][i] for i in xrange(len(inputs))]
make_vector = MakeVector() make_vector = MakeVector()
class MakeVectorPrinter: class MakeVectorPrinter:
......
...@@ -1552,6 +1552,36 @@ class T_Join_and_Split(unittest.TestCase): ...@@ -1552,6 +1552,36 @@ class T_Join_and_Split(unittest.TestCase):
assert len([n for n in e if isinstance(n, Join)]) == 0 assert len([n for n in e if isinstance(n, Join)]) == 0
assert f.maker.env.outputs[0].dtype == config.floatX assert f.maker.env.outputs[0].dtype == config.floatX
def test_stack_scalar_make_vector_dtype(self):
'''Test that calling stack() on scalars instantiates MakeVector,
event when the scalar don't have the same dtype.'''
a = tensor.iscalar('a')
b = tensor.lscalar('b')
s = stack(a, b, a, b)
f = function([a,b], s)
val = f(1,2)
self.failUnless(numpy.all(val == [1,2,1,2]))
e = f.maker.env.toposort()
assert len([n for n in e if isinstance(n.op,opt.MakeVector)]) > 0
assert len([n for n in e if isinstance(n, Join)]) == 0
assert f.maker.env.outputs[0].dtype == 'int64'
def test_stack_scalar_make_vector_constant(self):
'''Test that calling stack() on scalars instantiates MakeVector,
event when the scalar are simple int type.'''
a = tensor.iscalar('a')
b = tensor.lscalar('b')
#test when the constant is the first element.
#The first element is used in a special way
s = stack(10,a,b, numpy.int8(3))
f = function([a,b], s)
val = f(1,2)
self.failUnless(numpy.all(val == [10,1,2,3]))
e = f.maker.env.toposort()
assert len([n for n in e if isinstance(n.op,opt.MakeVector)]) > 0
assert len([n for n in e if isinstance(n, Join)]) == 0
assert f.maker.env.outputs[0].dtype == 'int64'
def test_join_vector(self): def test_join_vector(self):
a = as_tensor_variable(numpy.array([1, 2, 3])) a = as_tensor_variable(numpy.array([1, 2, 3]))
b = as_tensor_variable(numpy.array([7, 8, 9])) b = as_tensor_variable(numpy.array([7, 8, 9]))
...@@ -3440,6 +3470,28 @@ def test_dimshuffle_duplicate(): ...@@ -3440,6 +3470,28 @@ def test_dimshuffle_duplicate():
assert success assert success
class T_get_constant_value(unittest.TestCase):
def test_get_constant_value(self):
a = tensor.stack(1,2,3)
assert get_constant_value(a[0])==1
assert get_constant_value(a[1])==2
assert get_constant_value(a[2])==3
b = tensor.iscalar()
a = tensor.stack(b,2,3)
self.assertRaises(TypeError, get_constant_value, a[0])
assert get_constant_value(a[1])==2
assert get_constant_value(a[2])==3
#For now get_constant_value got throught only MakeVector and Join of scalar.
v = tensor.ivector()
a = tensor.stack(v,2,3)
self.assertRaises(TypeError, get_constant_value, a[0])
self.assertRaises(TypeError, get_constant_value, a[1])
self.assertRaises(TypeError, get_constant_value, a[2])
if __name__ == '__main__': if __name__ == '__main__':
if 1: if 1:
unittest.main() unittest.main()
...@@ -3449,5 +3501,3 @@ if __name__ == '__main__': ...@@ -3449,5 +3501,3 @@ if __name__ == '__main__':
suite = unittest.TestLoader() suite = unittest.TestLoader()
suite = suite.loadTestsFromTestCase(testcase) suite = suite.loadTestsFromTestCase(testcase)
unittest.TextTestRunner(verbosity=2).run(suite) unittest.TextTestRunner(verbosity=2).run(suite)
...@@ -80,6 +80,49 @@ def makeSharedTester(shared_constructor_, ...@@ -80,6 +80,49 @@ def makeSharedTester(shared_constructor_,
else: else:
assert numpy.allclose(x_ref, total_func()) assert numpy.allclose(x_ref, total_func())
def test_shape(self):
dtype = self.dtype
if dtype is None:
dtype = theano.config.floatX
rng = numpy.random.RandomState([3,5,17])
x = numpy.asarray(rng.uniform(0,1,[2,4]),dtype=dtype)
x = self.cast_value(x)
x_ref = self.ref_fct(x)
x_shared = self.shared_constructor(x, borrow = False)
total = self.theano_fct(x_shared)
f = theano.function([],x_shared.shape)
topo = f.maker.env.toposort()
assert numpy.all(f()==(2,4))
if theano.config.mode!='FAST_COMPILE':
assert len(topo)==3
assert isinstance(topo[0].op,tensor.opt.Shape_i)
assert isinstance(topo[1].op,tensor.opt.Shape_i)
assert isinstance(topo[2].op,tensor.opt.MakeVector)
def test_shape_i(self):
dtype = self.dtype
if dtype is None:
dtype = theano.config.floatX
rng = numpy.random.RandomState([3,5,17])
x = numpy.asarray(rng.uniform(0,1,[2,4]),dtype=dtype)
x = self.cast_value(x)
x_ref = self.ref_fct(x)
x_shared = self.shared_constructor(x, borrow = False)
total = self.theano_fct(x_shared)
f = theano.function([],x_shared.shape[1])
topo = f.maker.env.toposort()
assert numpy.all(f()==(4))
if theano.config.mode!='FAST_COMPILE':
assert len(topo)==1
assert isinstance(topo[0].op,tensor.opt.Shape_i)
def test_return_internal_type(self): def test_return_internal_type(self):
dtype = self.dtype dtype = self.dtype
...@@ -191,6 +234,174 @@ def makeSharedTester(shared_constructor_, ...@@ -191,6 +234,174 @@ def makeSharedTester(shared_constructor_,
else: else:
assert numpy.allclose(x_ref, total_func()) assert numpy.allclose(x_ref, total_func())
def test_specify_shape(self):
dtype = self.dtype
if dtype is None:
dtype = theano.config.floatX
rng = numpy.random.RandomState([2,4,16])
x1_1 = numpy.asarray(rng.uniform(1,2,[4,2]),dtype=dtype)
x1_1 = self.cast_value(x1_1)
x1_2 = numpy.asarray(rng.uniform(1,2,[4,2]),dtype=dtype)
x1_2 = self.cast_value(x1_2)
x2 = numpy.asarray(rng.uniform(1,2,[4,3]),dtype=dtype)
x2 = self.cast_value(x2)
#Test that we can replace with values of the same shape
x1_shared = self.shared_constructor(x1_1)
x1_specify_shape = tensor.specify_shape(x1_shared,x1_1.shape)
x1_shared.set_value(x1_2)
assert numpy.allclose(self.ref_fct(x1_shared.value), self.ref_fct( x1_2))
shape_op_fct = theano.function([],x1_shared.shape)
topo = shape_op_fct.maker.env.toposort()
if theano.config.mode!='FAST_COMPILE':
assert len(topo)==3
assert isinstance(topo[0].op,tensor.opt.Shape_i)
assert isinstance(topo[1].op,tensor.opt.Shape_i)
assert isinstance(topo[2].op,tensor.opt.MakeVector)
#Test that we forward the input
specify_shape_fct = theano.function([],x1_specify_shape)
assert numpy.all(self.ref_fct(specify_shape_fct())==
self.ref_fct(x1_2))
topo_specify = specify_shape_fct.maker.env.toposort()
assert len(topo_specify)==2
#Test that we put the shape info into the graph
shape_constant_fct = theano.function([],x1_specify_shape.shape)
assert numpy.all(shape_constant_fct()==shape_op_fct())
topo_cst = shape_constant_fct.maker.env.toposort()
if theano.config.mode!='FAST_COMPILE':
assert len(topo_cst)==0
#Test that we can replace with values of the different shape
# but that will raise an error in some case, but not all
x1_shared.set_value(x2)
self.assertRaises(AssertionError, specify_shape_fct)
#No assertion will be raised as the Op is removed from the graph
#when their is optimization
if theano.config.mode not in ['FAST_COMPILE','DebugMode','DEBUG_MODE']:
shape_constant_fct()
else:
self.assertRaises(AssertionError, shape_constant_fct)
def test_specify_shape_partial(self):
dtype = self.dtype
if dtype is None:
dtype = theano.config.floatX
rng = numpy.random.RandomState([2,4,16])
x1_1 = numpy.asarray(rng.uniform(1,2,[4,2]),dtype=dtype)
x1_1 = self.cast_value(x1_1)
x1_2 = numpy.asarray(rng.uniform(1,2,[4,2]),dtype=dtype)
x1_2 = self.cast_value(x1_2)
x2 = numpy.asarray(rng.uniform(1,2,[5,2]),dtype=dtype)
x2 = self.cast_value(x2)
#Test that we can replace with values of the same shape
x1_shared = self.shared_constructor(x1_1)
x1_specify_shape = tensor.specify_shape(x1_shared,
(tensor.as_tensor_variable(x1_1.shape[0]),
x1_shared.shape[1]))
x1_shared.set_value(x1_2)
assert numpy.allclose(self.ref_fct(x1_shared.value), self.ref_fct( x1_2))
shape_op_fct = theano.function([],x1_shared.shape)
topo = shape_op_fct.maker.env.toposort()
if theano.config.mode!='FAST_COMPILE':
assert len(topo)==3
assert isinstance(topo[0].op,tensor.opt.Shape_i)
assert isinstance(topo[1].op,tensor.opt.Shape_i)
assert isinstance(topo[2].op,tensor.opt.MakeVector)
#Test that we forward the input
specify_shape_fct = theano.function([],x1_specify_shape)
#theano.printing.debugprint(specify_shape_fct)
assert numpy.all(self.ref_fct(specify_shape_fct())
==self.ref_fct(x1_2))
topo_specify = specify_shape_fct.maker.env.toposort()
if theano.config.mode!='FAST_COMPILE':
assert len(topo_specify)==4
#Test that we put the shape info into the graph
shape_constant_fct = theano.function([],x1_specify_shape.shape)
#theano.printing.debugprint(shape_constant_fct)
assert numpy.all(shape_constant_fct()==shape_op_fct())
topo_cst = shape_constant_fct.maker.env.toposort()
if theano.config.mode!='FAST_COMPILE':
assert len(topo_cst)==2
#Test that we can replace with values of the different shape
# but that will raise an error in some case, but not all
x1_shared.set_value(x2)
self.assertRaises(AssertionError, specify_shape_fct)
#No assertion will be raised as the Op is removed from the graph
if theano.config.mode not in ['FAST_COMPILE','DebugMode','DEBUG_MODE']:
shape_constant_fct()
else:
self.assertRaises(AssertionError, shape_constant_fct)
def test_specify_shape_inplace(self):
#test that specify_shape don't break inserting inplace op
dtype = self.dtype
if dtype is None:
dtype = theano.config.floatX
rng = numpy.random.RandomState([2,4,16])
a = numpy.asarray(rng.uniform(1,2,[40,40]),dtype=dtype)
a = self.cast_value(a)
a_shared = self.shared_constructor(a)
b = numpy.asarray(rng.uniform(1,2,[40,40]),dtype=dtype)
b = self.cast_value(b)
b_shared = self.shared_constructor(b)
s = numpy.zeros((40,40),dtype=dtype)
s = self.cast_value(s)
s_shared = self.shared_constructor(s)
f = theano.function([],
updates={s_shared:theano.dot(a_shared,b_shared)
+s_shared})
topo=f.maker.env.toposort()
f()
#[Gemm{inplace}(<TensorType(float64, matrix)>, 0.01, <TensorType(float64, matrix)>, <TensorType(float64, matrix)>, 2e-06)]
#print topo
if theano.config.mode!='FAST_COMPILE':
assert sum([node.op.__class__.__name__ in ["Gemm","GpuGemm","StructuredDot"] for node in topo])==1
assert all(node.op == tensor.blas.gemm_inplace for node in topo if isinstance(node.op,tensor.blas.Gemm))
assert all(node.op.inplace for node in topo if node.op.__class__.__name__ == "GpuGemm")
#Their is no inplace gemm for sparse
#assert all(node.op.inplace for node in topo if node.op.__class__.__name__ == "StructuredDot")
s_shared_specify = tensor.specify_shape(s_shared,s_shared.value.shape)
#now test with the specify shape op in the output
f = theano.function([], s_shared.shape,
updates={s_shared:theano.dot(a_shared,b_shared)
+s_shared_specify})
topo=f.maker.env.toposort()
print topo
shp=f()
assert numpy.all(shp == (40,40))
if theano.config.mode!='FAST_COMPILE':
assert sum([node.op.__class__.__name__ in ["Gemm","GpuGemm","StructuredDot"] for node in topo])==1
assert all(node.op == tensor.blas.gemm_inplace for node in topo if isinstance(node.op,tensor.blas.Gemm))
assert all(node.op.inplace for node in topo if node.op.__class__.__name__ == "GpuGemm")
#now test with the specify shape op in the inputs and outputs
a_shared = tensor.specify_shape(a_shared,a_shared.value.shape)
b_shared = tensor.specify_shape(b_shared,b_shared.value.shape)
f = theano.function([], s_shared.shape,
updates={s_shared:theano.dot(a_shared,b_shared)
+s_shared_specify})
topo=f.maker.env.toposort()
print topo
shp=f()
assert numpy.all(shp == (40,40))
if theano.config.mode!='FAST_COMPILE':
assert sum([node.op.__class__.__name__ in ["Gemm","GpuGemm","StructuredDot"] for node in topo])==1
assert all(node.op == tensor.blas.gemm_inplace for node in topo if isinstance(node.op,tensor.blas.Gemm))
assert all(node.op.inplace for node in topo if node.op.__class__.__name__ == "GpuGemm")
return SharedTester return SharedTester
test_shared_options=makeSharedTester(tensor.shared, 'float64', test_shared_options=makeSharedTester(tensor.shared, 'float64',
...@@ -199,4 +410,3 @@ test_shared_options=makeSharedTester(tensor.shared, 'float64', ...@@ -199,4 +410,3 @@ test_shared_options=makeSharedTester(tensor.shared, 'float64',
lambda a: isinstance(a,numpy.ndarray), lambda a: isinstance(a,numpy.ndarray),
theano.tensor.sum, theano.tensor.sum,
numpy.sum) numpy.sum)
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论