提交 a69b6045 authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merged -- solved conflicts in .hgignore and doc/install.txt

......@@ -8,6 +8,14 @@ syntax: glob
*.so
*.sw?
*~
*.aux
*.log
*.nav
*.out
*.pdf
*.snm
*.toc
*.vrb
.noseids
Theano.egg-info
\#*\#
......
......@@ -5,6 +5,81 @@
Release Notes
=============
Theano 0.3.1 (2011-02-21)
=========================
Deprecation:
* The theano shared variable attribute `value` is deprecated, use `get_value()` or `set_value()`!
See http://deeplearning.net/software/theano/tutorial/aliasing.html
Bugs fixed:
* The random number generator in theano/sandbox/rng_mrg.py did not always return the same sequence of number on the CPU and GPU.
* In some cases, there was a (possibly large) fraction of non-random garbage in the returned sequence.
* In python mode (not the default mode) when input of elemwise operation was an empty ndarray, we were not returning an empty ndarray.
* Scan cached the number of steps. This caused no problem because each time you called scan the number of steps would got refreshed.
The problem was when you called ScanGrad which would use the cached number of steps without refreshing it.
To be affected by this bug, one would have to compile two graph, one that would contain a Scan and the other the corresponding GradScan, and
call the first function to cache the number of steps, and then call the second function with a different number of steps.
* In GpuConv, errors in conv_patch_stack_reduce when the entire kernel doesn't fit into shared memory.
The error was not found before as the impact was less then the relative tolerance of 1e-3. Now the relative tolerance is 1e-5.
Crash fixed:
* Add a feature to not have an exception that makes Theano crash when taking the gradient on DimShuffle in some particular case.
* Compilation crash for GpuElemwise with tensor with high number of dimensions (~6 or more).
* Disabled C code generator that make gcc crash on complex type.
* Crash in optimization when an Op has no input.
* Output shape is now computed correctly for matrix-vector multiplication on GPU.
* In Scan, when using numbers as inputs, not symbolic variables.
* In GradScan, when there is only 1 inputs in the Scan.
* In GpuSum, bug in calculation of n_blocks for the 10 pattern. (Sum on the row of a matrix)
* Some segfault at exit with GPU code.
Optimization:
* New SpecifyShape op that allow to pass more shape info in the graph.
* Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
* Remove join of only 1 element.
* During optimization, consider one more case in get_constant_value.
GPU:
* cuda_shared.value = X now works inplace!
* cuda_shared_var.set_value(new_ndarray) will overwrite the old value inplace in the most common case.
* Allow to create a CudaNdarraySharedVariable from a CudaNdarray.
* New init_gpu_device theano flags.
* Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
* CPU join of only 1 element that was not moved to the GPU.
New features:
* tensor.reshape now makes dimensions of length 1 broadcastable.
* tensor.prod now implements the gradient.
* DebugMode now warns if an Op declared itself as returning a view of the input but did not do so.
* This behaviour is a problem, because it can block other Ops from being inplace on the same inputs. This could lower the reuse of memory.
* Sparse.structured_dot now works when both matrices are sparse
* Sparse type is now supported by the shape op, and the ShapeFeature optimizer works correctly with them.
* New 3D convolution ops, with CPU and GPU implementations.
* New colors in pydotprint.
Documentation:
* Documented lib.amdlibm and (new) init_gpu_device config variables.
* A new page (was done for 0.3 but an error was hiding it on the web page) on the memory aliasing contract of Theano.
* Revision to the Windows installation instructions.
* The cuda documentation is now generated on the web server.
* Better documentation of .theanorc and its sections.
Unit tests:
* Stop usage of deprecated functions or syntax in the unit tests.
* Better testing of GPU convolution nets.
* Make more tests able to use different random seeds.
* Tests of sparse now use default mode, not a hard-coded one.
* Remove some tests of unimplemented features.
Other:
* The name of compiledir now includes the Python version to make it easier for people with many Python versions
* Added theano.tensor.std as a shortcut to sqrt(var(input=input, axis=axis)).
* Whitespace, tabulation and indentation clean-up in the code.
* Better detection of memory sharing between variables.
Theano 0.3 (2010-11-23)
=======================
......
......@@ -3,108 +3,71 @@ Modifications in the trunk since the last release
Partial of what is in trunk since the last release
--------------------------------------------------
Deprecation:
* tag.shape attribute deprecated (#633)
* FAST_RUN_NOGC mode deprecated
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
Bugs fixed:
* Bugfix in CudaNdarray.__iadd__. When it is not implemented, return the error.
* Typo fixed in tensor/opt.py
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Fixed behaviour of pydotprint's max_label_size option
Crash fixed:
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution crash.
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
crash.
Optimization:
* Optimize 4 pattern of subtensor followed by subtensor.
* Gemm inplace optimization on the GPU re-enabled
GPU:
* Move to the gpu fused elemwise that have other dtype then float32 in them(except float64) if the input and output are float32.
* This allow to move elemwise comparaison to the gpu if we cast it to float32 after that.
* Move to the gpu fused elemwise that have other dtype then float32 in them
(except float64) if the input and output are float32.
* This allow to move elemwise comparisons to the GPU if we cast it to
float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
New features:
* ProfileMode
* profile the scan overhead
* simple hook system to add profiler
* reordered the output to be in the order of more general to more specific
* var[vector of index] now work, (grad work recursivly, the direct grad work inplace, gpu work)
* var[vector of index] now work, (grad work recursively, the direct grad
work inplace, gpu work)
* limitation: work only of the outer most dimensions.
* test_value implementation to allow quick debugging at graph creation time
* cuda.root inferred if nvcc is on the path, otherwise defaults to
/usr/local/cuda
* Better graph printing for graphs involving a scan subgraph
*
Documentation:
* Better commenting of cuda_ndarray.cu
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
Unit tests:
* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
(#374)
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
Other:
* ?? a bug?? Correctly put the broadcast flag to True in the output var of a Rehapse op when we receive an int 1 in the new shape.
* ?? a bug?? Correctly put the broadcast flag to True in the output var of
a Rehapse op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default
* More compact printing (ignore leading "Composite" in op names)
Theano 0.3.1 (2011-02-21)
----------------------------
Deprecation:
* The theano shared variable attribute `value` is deprecated, use `get_value()` or `set_value()`!
See http://deeplearning.net/software/theano/tutorial/aliasing.html
Bugs fixed:
* The random number generator in theano/sandbox/rng_mrg.py did not always return the same sequence of number on the CPU and GPU.
* In some cases, there was a (possibly large) fraction of non-random garbage in the returned sequence.
* In python mode (not the default mode) when input of elemwise operation was an empty ndarray, we were not returning an empty ndarray.
* Scan cached the number of steps. This caused no problem because each time you called scan the number of steps would got refreshed.
The problem was when you called ScanGrad which would use the cached number of steps without refreshing it.
To be affected by this bug, one would have to compile two graph, one that would contain a Scan and the other the corresponding GradScan, and
call the first function to cache the number of steps, and then call the second function with a different number of steps.
* In GpuConv, errors in conv_patch_stack_reduce when the entire kernel doesn't fit into shared memory.
The error was not found before as the impact was less then the relative tolerance of 1e-3. Now the relative tolerance is 1e-5.
Crash fixed:
* Add a feature to not have an exception that makes Theano crash when taking the gradient on DimShuffle in some particular case.
* Compilation crash for GpuElemwise with tensor with high number of dimensions (~6 or more).
* Disabled C code generator that make gcc crash on complex type.
* Crash in optimization when an Op has no input.
* Output shape is now computed correctly for matrix-vector multiplication on GPU.
* In Scan, when using numbers as inputs, not symbolic variables.
* In GradScan, when there is only 1 inputs in the Scan.
* In GpuSum, bug in calculation of n_blocks for the 10 pattern. (Sum on the row of a matrix)
* Some segfault at exit with GPU code.
Optimization:
* New SpecifyShape op that allow to pass more shape info in the graph.
* Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
* Remove join of only 1 element.
* During optimization, consider one more case in get_constant_value.
GPU:
* cuda_shared.value = X now works inplace!
* cuda_shared_var.set_value(new_ndarray) will overwrite the old value inplace in the most common case.
* Allow to create a CudaNdarraySharedVariable from a CudaNdarray.
* New init_gpu_device theano flags.
* Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
* CPU join of only 1 element that was not moved to the GPU.
New features:
* tensor.reshape now makes dimensions of length 1 broadcastable.
* tensor.prod now implements the gradient.
* DebugMode now warns if an Op declared itself as returning a view of the input but did not do so.
* This behaviour is a problem, because it can block other Ops from being inplace on the same inputs. This could lower the reuse of memory.
* Sparse.structured_dot now works when both matrices are sparse
* Sparse type is now supported by the shape op, and the ShapeFeature optimizer works correctly with them.
* New 3D convolution ops, with CPU and GPU implementations.
* New colors in pydotprint.
Documentation:
* Documented lib.amdlibm and (new) init_gpu_device config variables.
* A new page (was done for 0.3 but an error was hiding it on the web page) on the memory aliasing contract of Theano.
* Revision to the Windows installation instructions.
* The cuda documentation is now generated on the web server.
* Better documentation of .theanorc and its sections.
Unit tests:
* Stop usage of deprecated functions or syntax in the unit tests.
* Better testing of GPU convolution nets.
* Make more tests able to use different random seeds.
* Tests of sparse now use default mode, not a hard-coded one.
* Remove some tests of unimplemented features.
Other:
* The name of compiledir now includes the Python version to make it easier for people with many Python versions
* Added theano.tensor.std as a shortcut to sqrt(var(input=input, axis=axis)).
* Whitespace, tabulation and indentation clean-up in the code.
* Better detection of memory sharing between variables.
(To be continued...)
......@@ -6,7 +6,7 @@ from theano.gof.cc import get_module_cache
if len(sys.argv) == 1:
print config.compiledir
elif sys.argv[1] in ('clear'):
get_module_cache().clear()
get_module_cache().clear(unversioned_min_age=-1, clear_base_files=True)
else:
print 'command "%s" not recognized' % sys.argv[1]
print 'Type "theano-cache" to print the cache location'
......
......@@ -51,9 +51,9 @@ copyright = '2008--2011, LISA lab'
# other places throughout the built documents.
#
# The short X.Y version.
version = '0.3.1'
version = '0.4'
# The full version, including alpha/beta/rc tags.
release = '0.3.1'
release = '0.4.0rc3'
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
......
.. _developer:
======================
==============================================
Theano Design and Implementation Documentation
======================
==============================================
.. toctree::
......
......@@ -7,7 +7,7 @@ Tensor
This file describes the design of theano.tensor.
Elemwise grad and R_op
=================
======================
Here's another straightforward example, though a bit more elaborate
than adding two numbers together. Let's say that you want to compute
......
all:
pdflatex presentation.tex
import numpy
import theano
class DoubleOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, x):
x = theano.tensor.as_tensor_variable(x)
return theano.Apply(self, [x], [x.type()])
def perform(self, node, inputs, output_storage):
x = inputs[0]
z = output_storage[0]
z[0] = x * 2
x = theano.tensor.matrix()
f = theano.function([x], DoubleOp()(x))
inp = numpy.random.rand(5,5)
out = f(inp)
assert numpy.allclose(inp*2, out)
print inp
print out
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX), rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
print "Initial model:"
print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.1*gw, b:b-0.1*gb},
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
for i in range(training_steps):
pred, err = train(D[0], D[1])
print "Final model:"
print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
# Print the graph used in the slides
theano.printing.pydotprint(predict,
outfile="pics/logreg_pydotprint_predic.png",
var_with_name_simple=True)
theano.printing.pydotprint_variables(prediction,
outfile="pics/logreg_pydotprint_prediction.png",
var_with_name_simple=True)
theano.printing.pydotprint(train,
outfile="pics/logreg_pydotprint_train.png",
var_with_name_simple=True)
差异被折叠。
import numpy, theano
import theano.misc.pycuda_init
from pycuda.compiler import SourceModule
import theano.sandbox.cuda as cuda
class PyCUDADoubleOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, inp):
inp = cuda.basic_ops.gpu_contiguous(
cuda.basic_ops.as_cuda_ndarray_variable(inp))
assert inp.dtype == "float32"
return theano.Apply(self, [inp], [inp.type()])
def make_thunk(self, node, storage_map, _, _2):
mod = SourceModule("""
__global__ void my_fct(float * i0, float * o0, int size) {
int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<size){
o0[i] = i0[i]*2;
}
}""")
pycuda_fct = mod.get_function("my_fct")
inputs = [ storage_map[v] for v in node.inputs]
outputs = [ storage_map[v] for v in node.outputs]
def thunk():
z = outputs[0]
if z[0] is None or z[0].shape!=inputs[0][0].shape:
z[0] = cuda.CudaNdarray.zeros(inputs[0][0].shape)
grid = (int(numpy.ceil(inputs[0][0].size / 512.)),1)
pycuda_fct(inputs[0][0], z[0], numpy.intc(inputs[0][0].size),
block=(512,1,1), grid=grid)
return thunk
x = theano.tensor.fmatrix()
f = theano.function([x], PyCUDADoubleOp()(x))
xv=numpy.ones((4,5), dtype="float32")
assert numpy.allclose(f(xv), xv*2)
print numpy.asarray(f(xv))
import theano
a = theano.tensor.vector("a") # declare variable
b = a + a**10 # build symbolic expression
f = theano.function([a], b) # compile function
print f([0,1,2])
# prints `array([0,2,1026])`
theano.printing.pydotprint_variables(b, outfile="pics/f_unoptimized.png", var_with_name_simple=True)
theano.printing.pydotprint(f, outfile="pics/f_optimized.png", var_with_name_simple=True)
......@@ -557,7 +557,7 @@ used within a MinGW Shell (not available if you only installed Python(x,y)).
You do not need to do the following now, because it is not usually needed, but if
later on, when running Theano, you see an error message that looks like:
*error: 'assert' was not declared in this scope*
*error: 'assert' was not declared in this scope*
then you will have to add another section:
.. code-block:: cfg
......
......@@ -144,7 +144,7 @@ import theano and print the config variable, as in:
.. attribute:: floatX
String value: either 'float64' or 'float32'.
String value: either 'float64' or 'float32'
Default: 'float64'
......@@ -152,6 +152,50 @@ import theano and print the config variable, as in:
and similar functions. It also sets the default theano bit width for
arguments passed as Python floating-point numbers.
.. attribute:: cast_policy
String value: either 'numpy+floatX' or 'custom'
Default: 'custom'
This specifies how data types are implicitly figured out in Theano, e.g. for
constants or in the results of arithmetic operations. The 'custom' value
corresponds to a set of custom rules originally used in
Theano (which can be partially customized, see e.g. the in-code help of
``tensor.NumpyAutocaster``), and will be deprecated in the future.
The 'numpy+floatX' setting attempts to mimic the numpy casting rules,
although it prefers to use float32 numbers instead of float64 when
``config.floatX`` is set to 'float32' and the user uses data that is not
explicitly typed as float64 (e.g. regular Python floats).
Note that 'numpy+floatX' is not currently behaving exactly as planned (it
is a work-in-progress), and thus you should consider it as experimental.
At the moment it behaves differently from numpy in the following
situations:
* Depending on the value of ``config.int_division``, the resulting type
of a division of integer types with the ``/`` operator may not match
that of numpy.
* On mixed scalar / array operations, numpy tries to prevent the scalar
from upcasting the array's type unless it is of a fundamentally
different type. Theano does not attempt to do the same at this point,
so you should be careful that scalars may upcast arrays when they
would not when using numpy. This behavior should change in the near
future.
.. attribute:: int_division
String value: either 'int', 'floatX' or 'raise'
Default: 'int'
Specifies what to do when one tries to compute ``x / y``, where both ``x`` and
``y`` are of integer types (possibly unsigned). 'int' means an integer is
returned (as in Python 2.X), but this behavior is deprecated. 'floatX'
returns a number of type given by ``config.floatX``. 'raise' is the safest
choice (and will become default in a future release of Theano) and raises
an error when one tries to do such an operation, enforcing the use of the
integer division operator (``//``) (if a float result is intended, either
cast one of the arguments to a float, or use ``x.__truediv__(y)``).
.. attribute:: mode
String value: 'Mode', 'ProfileMode', 'DebugMode', 'FAST_RUN', 'FAST_COMPILE'
......@@ -385,3 +429,23 @@ import theano and print the config variable, as in:
means using the default, defined by :attr:`config.numpy.seterr_all`.
This flag's value cannot be modified during the program execution.
.. attribute:: config.compute_test_value
String Value: ``'off'``, ``'ignore'``, ``'warn'``, ``'raise'``.
Default: ``'off'``
Setting this attribute to something other than ``'off'`` activates a
debugging mechanism, where Theano executes the graph on-the-fly, as it is
being built. This allows the user to spot errors early on (such as
dimension mis-match), **before** optimizations are applied.
Theano will execute the graph using the Constants and/or shared variables
provided by the user. Purely symbolic variables (e.g. x = T.dmatrix()) can be
augmented with test values, by writing to their ``'tag.test_value'``
attribute (e.g. x.tag.test_value = numpy.random.rand(5,4)).
``'warn'`` will result in a UserWarning being raised when some Op inputs
do not contain an appropriate test value. ``'raise'`` will instead raise
an Exception when a problem is encountered during this debugging phase.
......@@ -14,6 +14,7 @@
:maxdepth: 1
cuda/index
linalg
.. ../../../../theano/sandbox/linalg/ops.py
.. ../../../../theano/sandbox/linalg
.. _libdoc_linalg:
===================================================================
:mod:`sandbox.linalg` -- Linear Algebra Ops
===================================================================
.. module:: sandbox.linalg
:platform: Unix, Windows
:synopsis: Linear Algebra Ops
.. moduleauthor:: LISA
API
===
.. automodule:: theano.sandbox.linalg.ops
:members:
......@@ -728,11 +728,11 @@ row of a matrix x:
Index-assignment is *not* supported. If you want to do something like ``a[5]
= b`` or ``a[5]+=b``, see :func:`setsubtensor` and :func:`incsubtensor` below.
= b`` or ``a[5]+=b``, see :func:`set_subtensor` and :func:`inc_subtensor` below.
.. autofunction:: theano.tensor.basic.setsubtensor
.. autofunction:: theano.tensor.basic.set_subtensor
.. autofunction:: theano.tensor.basic.incsubtensor
.. autofunction:: theano.tensor.basic.inc_subtensor
.. _tensor_operator_support:
......
......@@ -112,7 +112,7 @@ Misc
----
The sparse equivalent of dmatrix is csc_matrix and csr_matrix.
:api:`TrueDot` vs. :api:`StructuredDot`
:api:`Dot` vs. :api:`StructuredDot`
----------------------------------------
Often when you use a sparse matrix it is because there is a meaning to the
......
......@@ -17,6 +17,132 @@ Isolating the problem/Testing Theano compiler
You can run your Theano function in a DebugMode(:ref:`using_debugmode`). This test the Theano optimizations and help to find where NaN, inf and other problem come from.
Interactive Debugger
--------------------
As of v.0.4.0, Theano has a new mechanism by which graphs are executed
on-the-fly, before a theano.function is ever compiled. Since optimizations
haven't been applied at this stage, it is easy for the user to locate the
source of this bug. This functionality is enabled through the config flag
``theano.config.compute_test_value``. Its use is best shown through the
following example.
.. code-block:: python
# compute_test_value is 'off' by default, meaning this feature is inactive
theano.config.compute_test_value = 'off'
# configure shared variables
W1val = numpy.random.rand(2,10,10).astype(theano.config.floatX)
W1 = theano.shared(W1val, 'W1')
W2val = numpy.random.rand(15,20).astype(theano.config.floatX)
W2 = theano.shared(W2val, 'W2')
# input which will be of shape (5,10)
x = T.matrix('x')
# transform the shared variable in some way. Theano does not
# know off hand that the matrix func_of_W1 has shape (20,10)
func_of_W1 = W1.dimshuffle(2,0,1).flatten(2).T
# source of error: dot product of 5x10 with 20x10
h1 = T.dot(x,func_of_W1)
# do more stuff
h2 = T.dot(h1,W2.T)
# compile and call the actual function
f = theano.function([x], h2)
f(numpy.random.rand(5,10))
Running the above code generates the following error message:
.. code-block:: bash
Definition in:
File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 1102, in apply
lopt_change = self.process_node(env, node, lopt)
File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 882, in process_node
replacements = lopt.transform(node)
File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/blas.py", line 1030, in local_dot_to_dot22
return [_dot22(*node.inputs)]
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 324, in __call__
self.add_tag_trace(node)
For the full definition stack trace set the Theano flags traceback.limit to -1
Traceback (most recent call last):
File "test.py", line 29, in <module>
f(numpy.random.rand(5,10))
File "/u/desjagui/workspace/PYTHON/theano/compile/function_module.py", line 596, in __call__
self.fn()
File "/u/desjagui/workspace/PYTHON/theano/gof/link.py", line 288, in streamline_default_f
raise_with_op(node)
File "/u/desjagui/workspace/PYTHON/theano/gof/link.py", line 284, in streamline_default_f
thunk()
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/cc.py", line 1111, in execute
raise exc_type, exc_value, exc_trace
ValueError: ('Shape mismatch: x has 10 cols but y has 20 rows',
_dot22(x, <TensorType(float64, matrix)>), [_dot22.0],
_dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4')
Needless to say the above is not very informative and does not provide much in
the way of guidance. However, by instrumenting the code ever so slightly, we
can get Theano to give us the exact source of the error.
.. code-block:: python
# enable on-the-fly graph computations
theano.config.compute_test_value = 'warn'
...
# input which will be of shape (5,10)
x = T.matrix('x')
# provide Theano with a default test-value
x.tag.test_value = numpy.random.rand(5,10)
In the above, we're tagging the symbolic matrix ``x`` with a special test
value. This allows Theano to evaluate symbolic expressions on-the-fly (by
calling the ``perform`` method of each Op), as they are being defined. Sources
of error can thus be identified with much more precision and much earlier in
the compilation pipeline. For example, running the above code yields the
following error message, which properly identifies line 23 as the culprit.
.. code-block:: bash
Traceback (most recent call last):
File "test2.py", line 23, in <module>
h1 = T.dot(x,func_of_W1)
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 360, in __call__
node.op.perform(node, input_vals, output_storage)
File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/basic.py", line 4458, in perform
z[0] = numpy.asarray(numpy.dot(x, y))
ValueError: ('matrices are not aligned', (5, 10), (20, 10))
The compute_test_value mechanism works as follows:
* Theano Constants and SharedVariable are used as is. No need to instrument them.
* A Theano ``Variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be
given a special test value through the attribute ``tag.test_value``.
* Theano automatically instruments intermediate results. As such, any quantity
derived from ``x`` will be given a `tag.test_value` automatically.
`compute_test_value` can take the following values:
* ``off``: default behavior. This debugging mechanism is inactive.
* ``raise``: compute test values on the fly. Any variable for which a test
value is required, but not provided by the user, is treated as an error. An
exception is raised accordingly.
* ``warn``: idem, but a warning is issued instead of an Exception.
* ``ignore``: silently ignore the computation of intermediate test values, if a
variable is missing a test value.
.. note::
This feature is currently not compatible with ``Scan`` and also with Ops
which do not implement a ``perform`` method.
How do I print an intermediate value in a Function/Method?
----------------------------------------------------------
......
......@@ -46,9 +46,9 @@ AUTHOR = "LISA laboratory, University of Montreal"
AUTHOR_EMAIL = "theano-dev@googlegroups.com"
PLATFORMS = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
MAJOR = 0
MINOR = 3
MICRO = 1
SUFFIX = "" # Should be blank except for rc's, betas, etc.
MINOR = 4
MICRO = 0
SUFFIX = "rc3" # Should be blank except for rc's, betas, etc.
ISRELEASED = False
VERSION = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX)
......@@ -105,12 +105,13 @@ if not release:
a = open(filename, 'w')
try:
a.write(cnt % {'version': VERSION,
'full_version' : FULL_VERSION,
'hg_revision' : HG_REVISION,
'isrelease': str(ISRELEASED)})
except Exception, e:
print e
try:
a.write(cnt % {'version': VERSION,
'full_version' : FULL_VERSION,
'hg_revision' : HG_REVISION,
'isrelease': str(ISRELEASED)})
except Exception, e:
print e
finally:
a.close()
......
......@@ -937,9 +937,14 @@ class FunctionMaker(object):
optimizer, linker = mode.optimizer, copy.copy(mode.linker)
# optimize the env
start_optimizer = time.time()
optimizer(env)
end_optimizer = time.time()
compute_test_value_orig = theano.config.compute_test_value
try:
theano.config.compute_test_value = "off"
start_optimizer = time.time()
optimizer(env)
end_optimizer = time.time()
finally:
theano.config.compute_test_value = compute_test_value_orig
mode.optimizer_time += end_optimizer - start_optimizer
_logger.debug('Optimizing took %f seconds' % (end_optimizer - start_optimizer))
......
......@@ -411,7 +411,10 @@ class ProfileMode(Mode):
apply_time, op_cimpl, message, outputs_size,
other_time)
if outputs_size:
if not outputs_size:
print """\nProfile of Theano intermediate memory disabled.
To enabled, put the Theano flag ProfileMode.profile_memory to True."""
else:
fct_memory={}#env->dict(node->(outputs size))
var_mem = {}
for node,val in outputs_size.items():
......@@ -421,6 +424,7 @@ class ProfileMode(Mode):
var_mem[out]=v
print
print "Profile of Theano functions memory:"
print "(This check only the output of each apply node. It don't check the temporary memory used by the op in the apply node.)"
nb_skipped = 0
for env,nodes_mem in fct_memory.iteritems():
size_sum=sum([sum(val) for key,val in nodes_mem.iteritems()])
......
差异被折叠。
# For flag of bool type, we consider the string 'False','false' and '0' as False
# For flag of bool type, we consider the string 'False','false' and '0' as False
# and the string 'True', 'true', '1' as true.
# We also accept the bool type as its corresponding value!
......@@ -7,6 +7,8 @@ import ConfigParser
import logging
import warnings
import theano
_logger = logging.getLogger('theano.config')
class TheanoConfigWarning(Warning):
......@@ -103,6 +105,21 @@ def _config_print(thing, buf):
print >> buf, " Value: ", cv.val
print >> buf, ""
def get_config_md5():
"""
Return a string md5 of the current config options. It should be such that
we can safely assume that two different config setups will lead to two
different strings.
We only take into account config options for which `in_c_key` is True.
"""
all_opts = sorted([c for c in _config_var_list if c.in_c_key],
key=lambda cv: cv.fullname)
return theano.gof.cc.hash_from_code('\n'.join(
['%s = %s' % (cv.fullname, cv.val) for cv in all_opts]))
class TheanoConfigParser(object):
#properties are installed by AddConfigVar
_i_am_a_config_class = True
......@@ -110,6 +127,7 @@ class TheanoConfigParser(object):
sio = StringIO.StringIO()
_config_print(self.__class__, sio)
return sio.getvalue()
# N.B. all instances of TheanoConfigParser give access to the same properties.
config = TheanoConfigParser()
......@@ -124,17 +142,27 @@ config = TheanoConfigParser()
# - The subtrees provide the same interface as the root
# - ConfigParser subclasses control get/set of config properties to guard against craziness.
def AddConfigVar(name, doc, configparam, root=config):
def AddConfigVar(name, doc, configparam, root=config, in_c_key=True):
"""Add a new variable to theano.config
:type name: string for form "[section0.[section1.[etc]]].option"
:param name: the full name for this configuration variable.
:type doc: string
:param doc: What does this variable specify?
:type configparam: ConfigParam instance
:param configparam: an object for getting and setting this configuration parameter
:param configparam: an object for getting and setting this configuration parameter
:type root: object
:param root: used for recusive calls -- don't provide an argument for this parameter.
:param root: used for recusive calls -- do not provide an argument for this parameter.
:type in_c_key: boolean
:param in_c_key: If True, then whenever this config option changes, the
key associated to compiled C modules also changes, i.e. it may trigger a
compilation of these modules (this compilation will only be partial if it
turns out that the generated C code is unchanged). Set this option to False
only if you are confident this option should not affect C code compilation.
:returns: None
"""
......@@ -155,11 +183,13 @@ def AddConfigVar(name, doc, configparam, root=config):
newroot = getattr(root, sections[0])
if not getattr(newroot, '_i_am_a_config_class', False) or isinstance(newroot, type):
raise TypeError('Internal config nodes must be config class instances', newroot)
return AddConfigVar('.'.join(sections[1:]), doc, configparam, root=newroot)
return AddConfigVar('.'.join(sections[1:]), doc, configparam,
root=newroot, in_c_key=in_c_key)
else:
if hasattr(root, name):
raise AttributeError('This name is already taken', configparam.fullname)
configparam.doc = doc
configparam.in_c_key = in_c_key
configparam.__get__() # trigger a read of the value from config files and env vars
setattr(root.__class__, sections[0], configparam)
_config_var_list.append(configparam)
......@@ -171,12 +201,16 @@ class ConfigParam(object):
So the value should be the same during all the execution
"""
self.default = default
self.filter=filter
self.filter = filter
self.allow_override = allow_override
# N.B. --
# self.fullname # set by AddConfigVar
# self.doc # set by AddConfigVar
# Check that default is a valid value
if self.filter:
self.filter(self.default)
def __get__(self, *args):
#print "GETTING PARAM", self.fullname, self, args
if not hasattr(self, 'val'):
......@@ -203,6 +237,13 @@ class EnumStr(ConfigParam):
def __init__(self, default, *options, **kwargs):
self.default = default
self.all = (default,) + options
# All options should be strings
for val in self.all:
if not isinstance(val, str):
raise ValueError('Valid values for an EnumStr parameter '
'should be strings', val, type(val))
def filter(val):
if val in self.all:
return val
......@@ -248,7 +289,7 @@ def BoolParam(default, is_valid=None, allow_override=True):
def is_valid_bool(s):
if s in ['False', 'false', '0', 'True', 'true', '1', False, True]:
return True
else:
else:
return False
if is_valid is None:
is_valid = is_valid_bool
......
"""Apply for use with Tensors that implements shape propagation via variable.tag.shape
This is not used currently very used. It appear in some case, but I'm not sure it if work or if it is used by default.
It could help the current system to make it detect problem earlier when contructing the graph instead of during optimization.
"""
import sys
from theano import gof
def ishape(v):
try:
return (True, v.tag.shape)
except AttributeError:
return (False, (None,)*v.type.ndim)
class Apply(gof.Apply):
def __init__(self, op, inputs, outputs):
super(Apply, self).__init__(op, inputs, outputs)
if not inputs:
return
# if any input has any shape info, then propagate it
try:
provided, ishapes = zip(*[ishape(i) for i in inputs])
except AttributeError:
# i.type.ndim didn't make sense for some i
return
if provided == [False for i in inputs]:
# no input had a tag.shape
return
try:
infer_shape = op.infer_shape
except AttributeError:
# op has no infer_shape, that's fine
return
try:
oshapes = infer_shape(self, ishapes)
except NotImplementedError:
return
for o, oshp in zip(outputs, oshapes):
o.tag.shape = oshp
......@@ -7,6 +7,7 @@ from copy import copy
import re #for set_compiledir
import os, sys, StringIO
if sys.version_info[:2] >= (2,5):
import hashlib
def hash_from_code(msg):
......@@ -16,6 +17,13 @@ else:
def hash_from_code(msg):
return md5.new(msg).hexdigest()
def hash_from_file(file_path):
"""Return the MD5 hash of a file."""
return hash_from_code(open(file_path, 'rb').read())
import theano
from theano.gof.python25 import all
from theano import config
......@@ -43,6 +51,7 @@ import cmodule
import logging
_logger=logging.getLogger("theano.gof.cc")
_logger.setLevel(logging.WARN)
def info(*args):
_logger.info(' '.join(str(a) for a in args))
def debug(*args):
......@@ -791,7 +800,7 @@ class CLinker(link.Linker):
The key returned by this function is of the form (version, signature)
The signature has the following form:
{{{
'CLinker.cmodule_key', compilation args, libraries,
'CLinker.cmodule_key', compilation args, libraries, config md5,
(op0, input_signature0, output_signature0),
(op1, input_signature1, output_signature1),
...
......@@ -858,10 +867,16 @@ class CLinker(link.Linker):
constant_ids = dict()
op_pos = {} # Apply -> topological position
# first we put the header, compile_args, library names into the signature
# First we put the header, compile_args, library names and config md5
# into the signature.
sig = ['CLinker.cmodule_key'] # will be cast to tuple on return
if compile_args is not None: sig.append(tuple(compile_args))
if libraries is not None: sig.append(tuple(libraries))
# IMPORTANT: The 'md5' prefix is used to isolate the compilation
# parameters from the rest of the key. If you want to add more key
# elements, they should be before this md5 hash if and only if they
# can lead to a different compiled file with the same source code.
sig.append('md5:' + theano.configparser.get_config_md5())
# technically this should only be appended for gcc-compiled Ops
# and the flags of other compilers should be inserted here... but it's not clear how to
......@@ -943,11 +958,30 @@ class CLinker(link.Linker):
def compile_cmodule(self, location=None):
"""
This method is a callback for `ModuleCache.module_from_key`
Compile the module and return it.
"""
# Go through all steps of the compilation process.
for step_result in self.compile_cmodule_by_step(location=location):
pass
# And return the output of the last step, which should be the module
# itself.
return step_result
def compile_cmodule_by_step(self, location=None):
"""
This method is a callback for `ModuleCache.module_from_key`.
It is a generator (thus the 'by step'), so that:
- it first yields the module's C code
- it last yields the module itself
- it may yield other intermediate outputs in-between if needed
in the future (but this is not currently the case)
"""
if location is None:
location = cmodule.dlimport_workdir(config.compiledir)
mod = self.build_dynamic_module()
src_code = mod.code()
yield src_code
get_lock()
try:
debug("LOCATION", location)
......@@ -955,7 +989,7 @@ class CLinker(link.Linker):
libs = self.libraries()
preargs = self.compile_args()
if c_compiler.__name__=='nvcc_module_compile_str' and config.lib.amdlibm:
#this lib don't work correctly with nvcc in device code.
# This lib does not work correctly with nvcc in device code.
if '<amdlibm.h>' in mod.includes:
mod.includes.remove('<amdlibm.h>')
if '-DREPLACE_WITH_AMDLIBM' in preargs:
......@@ -965,7 +999,7 @@ class CLinker(link.Linker):
try:
module = c_compiler(
module_name=mod.name,
src_code = mod.code(),
src_code=src_code,
location=location,
include_dirs=self.header_dirs(),
lib_dirs=self.lib_dirs(),
......@@ -977,8 +1011,7 @@ class CLinker(link.Linker):
finally:
release_lock()
return module
yield module
def build_dynamic_module(self):
"""Return a cmodule.DynamicModule instance full of the code for our env.
......@@ -1041,10 +1074,10 @@ class CLinker(link.Linker):
except KeyError:
key = None
if key is None:
#if we can't get a key, then forget the cache mechanism
# If we can't get a key, then forget the cache mechanism.
module = self.compile_cmodule()
else:
module = get_module_cache().module_from_key(key=key, fn=self.compile_cmodule, keep_lock=keep_lock)
module = get_module_cache().module_from_key(key=key, fn=self.compile_cmodule_by_step, keep_lock=keep_lock)
vars = self.inputs + self.outputs + self.orphans
# List of indices that should be ignored when passing the arguments
......@@ -1174,54 +1207,21 @@ class OpWiseCLinker(link.LocalLinker):
else:
post_thunk_old_storage = None
compute_map = {}
for k in storage_map:
compute_map[k] = [k.owner is None]
thunks = []
for node in order:
# Maker sure we use the C version of the code whenever
# possible
node._op_use_c_code = True
thunks += [node.op.make_thunk(node,
storage_map,
compute_map,
no_recycling)]
for node_idx, node in enumerate(order):
node_input_storage = [storage_map[r] for r in node.inputs]
node_output_storage = [storage_map[r] for r in node.outputs]
debug('Compiling node %i of graph' % node_idx)
thunk = None
# If the op don't override the c_code function, we don't try
# to generate a cthunk! Otherwise we won't find it in the compilation cache
# and try to compile it. This will get the lock even if we don't need it!
if node.op.c_code.im_func is not op.Op.c_code.im_func:
try:
e = Env(*graph.clone(node.inputs, node.outputs))
if self.allow_gc:
# if we allow garbage collection of intermediate nodes
# we must forbid this C implementatio from cacheing its own
# reference to its output
node_no_recycling = e.outputs
else:
node_no_recycling = [r for r, r2 in zip(e.outputs, node.outputs) if r2 in no_recycling]
cl = CLinker().accept(e, node_no_recycling)
debug('Trying CLinker.make_thunk')
thunk, node_input_filters, node_output_filters = cl.make_thunk(
input_storage = node_input_storage,
output_storage = node_output_storage,
keep_lock=getattr(get_lock,"n_lock",0) != orig_n_lock)
assert callable(thunk)
thunk.inputs = node_input_storage
thunk.outputs = node_output_storage
thunks.append(thunk)
do_python_thunk = False
except (NotImplementedError, utils.MethodNotDefined):
thunk = None
if thunk is None:
if self.fallback_on_perform:
debug('Falling back on perform')
p = node.op.perform
# default arguments are stored in the closure of `thunk`
def thunk(p=p, i=node_input_storage, o=node_output_storage,n=node):
return p(n, [x[0] for x in i], o)
#thunk = lambda p = p, i = node_input_storage, o = node_output_storage, n = node: p(n, [x[0] for x in i], o)
thunk.inputs = node_input_storage
thunk.outputs = node_output_storage
thunk.perform = p
thunks.append(thunk)
else:
raise NotImplementedError("We where not able to use c_code and perform code for this node", node)
if self.allow_gc:
post_thunk_old_storage.append([storage_map[input]
......
差异被折叠。
......@@ -6,7 +6,6 @@ from type import Type
import sys, traceback
from copy import copy
from theano.gof.python25 import all
import numpy
__excepthook = sys.excepthook
def thunk_hook(type, value, trace):
......@@ -329,7 +328,7 @@ class LocalLinker(Linker):
# 3. output storage
# 4. thunks: list of nodes' functions in the order they will be run by the function in (1)
# 5. order: list of nodes, in the order they will be run by the function in (1)
raise MethodNotDefined("make_all", type(self), self.__class__.__name__)
raise utils.MethodNotDefined("make_all", type(self), self.__class__.__name__)
def gc_helper(node_list):
"""
......@@ -391,10 +390,23 @@ class PerformLinker(LocalLinker):
order = list(env.toposort())
no_recycling = self.no_recycling
thunks = []
input_storage, output_storage, storage_map = map_storage(env, order, input_storage, output_storage)
compute_map = {}
for k in storage_map:
compute_map[k] = [k.owner is None]
thunks = []
for node in order:
# Maker sure we don't use C version of the code, but rather only
# the python version
node.op._op_use_c_code = False
thunks += [node.op.make_thunk(node,
storage_map,
compute_map,
no_recycling)]
computed, last_user = gc_helper(order)
if self.allow_gc:
post_thunk_old_storage = []
......@@ -402,18 +414,6 @@ class PerformLinker(LocalLinker):
post_thunk_old_storage = None
for node in order:
node_input_storage = tuple(storage_map[input] for input in node.inputs)
node_output_storage = tuple(storage_map[output] for output in node.outputs)
p = node.op.perform
# Thunk is meant to be called without arguments.
# The arguments are given in the lambda expression so that they are saved in the lambda expression.
# Using the closure in a simple way didn't work.
thunk = lambda p = p, i = node_input_storage, o = node_output_storage, n = node: p(n, [x[0] for x in i], o)
thunk.inputs = node_input_storage
thunk.outputs = node_output_storage
thunk.perform = p
thunks.append(thunk)
if self.allow_gc:
post_thunk_old_storage.append([storage_map[input]
for input in node.inputs
......
......@@ -2,16 +2,20 @@
The `Op` class is the base interface for all operations
compatible with `gof`'s :doc:`graph` routines.
"""
__docformat__ = "restructuredtext en"
from .. import config
from theano import config
import graph
import numpy
import utils
import warnings
import logging
from theano import config
from env import Env
import graph
import cc
class CLinkerObject(object):
......@@ -323,45 +327,64 @@ class PureOp(object):
"""
node = self.make_node(*inputs, **kwargs)
self.add_tag_trace(node)
if config.compute_test_value:
if config.compute_test_value != 'off':
# avoid circular import
from ..compile.sharedvalue import SharedVariable
from theano.compile.sharedvalue import SharedVariable
run_perform = True
# build test input-values
input_vals = []
for ins in inputs:
for i, ins in enumerate(node.inputs):
if isinstance(ins, graph.Constant):
input_vals.append(ins.value)
elif isinstance(ins,numpy.ndarray):
input_vals.append(ins)
elif isinstance(ins,SharedVariable):
input_vals.append(ins.get_value(borrow=True))
input_vals.append(ins.get_value(borrow=True, return_internal_type=True))
elif isinstance(ins,graph.Variable) and hasattr(ins.tag, 'test_value'):
input_vals.append(ins.tag.test_value)
# ensure that the test value is correct
input_vals.append(ins.type.filter(ins.tag.test_value))
else:
# no test-value was specified, act accordingly
if config.compute_test_value == 'warn':
raise Warning('Cannot compute test value: input %s of Op %s missing default value')
warnings.warn('Warning, Cannot compute test value: input %i (%s) of Op %s missing default value' % (i, ins, node), stacklevel=2)
run_perform = False
elif config.compute_test_value == 'err':
raise ValueError('Cannot compute test value: input %s of Op %s missing default value')
else:
elif config.compute_test_value == 'raise':
raise ValueError('Cannot compute test value: input %i (%s) of Op %s missing default value' % (i, ins, node))
elif config.compute_test_value == 'ignore':
# silently skip test
run_perform = False
else:
raise ValueError('%s is invalid for option config.compute_Test_value' % config.compute_test_value)
# if all inputs have test-values, run the actual op
if run_perform:
# Original values should not be destroyed:
# copy the values of the inputs in destroy_map
destroyed_inputs_idx = []
if getattr(node.op, 'destroy_map', None):
for i_pos_list in node.op.destroy_map.itervalues():
destroyed_inputs_idx.extend(i_pos_list)
for i in destroyed_inputs_idx:
input_vals[i] = input_vals[i].copy()
# compute output value once with test inputs to validate graph
output_storage = [[None] * len(node.outputs)]
node.op.perform(node, input_vals, output_storage)
# add 'test_value' to output tags, so that downstream ops can use these
# numerical values as inputs to their perform method.
for (outval, node_output) in zip(output_storage, node.outputs):
node_output.tag.test_value = outval[0]
output_storage = [[None]] * len(node.outputs)
try:
node.op.perform(node, input_vals, output_storage)
# add 'test_value' to output tags, so that downstream ops can use these
# numerical values as inputs to their perform method.
for (outval, node_output) in zip(output_storage, node.outputs):
node_output.tag.test_value = outval[0]
except utils.MethodNotDefined, e:
# This case happens when the perform method is not defined
# for a certain Op.
#TODO: use the c_thunk?
if config.compute_test_value == 'warn':
warnings.warn('Warning, in compute_test_value:' + type(e), stacklevel=2)
elif config.compute_test_value == 'raise':
raise
if self.default_output is not None:
return node.outputs[self.default_output]
......@@ -405,4 +428,82 @@ class PureOp(object):
class Op(utils.object2, PureOp, CLinkerOp):
"""Convenience class to bundle `PureOp` and `CLinkerOp`"""
pass
def __new__(cls, *args, **kwargs):
# this function exists to silently and transparently ensure that all
# existing Ops get a _op_use_c_code attribute
obj = object.__new__(cls, *args, **kwargs)
if not hasattr(obj, '_op_use_c_code'):
obj._op_use_c_code = True
return obj
def __init__(self, use_c_code=True):
self._op_use_c_code = use_c_code
def make_thunk(self, node, storage_map, compute_map, no_recycling):
"""
:param node: something previously returned by self.make_node
:param storage_map: dict variable -> one-element-list where a computed
value for this variable may be found.
:param compute_map: dict variable -> one-element-list where a boolean
value will be found. The boolean indicates whether the
variable's storage_map container contains a valid value (True)
or if it has not been computed yet (False).
:param no_recycling: list of variables for which it is forbidden to
reuse memory allocated by a previous call.
:note: If the thunk consults the storage_map on every call, it is safe
for it to ignore the no_recycling argument, because elements of the
no_recycling list will have a value of None in the storage map. If
the thunk can potentially cache return values (like CLinker does),
then it must not do so for variables in the no_recycling list.
"""
logger = logging.getLogger('theano.Op')
node_input_storage = [storage_map[r] for r in node.inputs]
node_output_storage = [storage_map[r] for r in node.outputs]
node_input_compute = [compute_map[r] for r in node.inputs]
node_output_compute = [compute_map[r] for r in node.outputs]
#logger.debug('Compiling node %i of graph' % node_idx)
if self._op_use_c_code:
try:
e = Env(*graph.clone(node.inputs, node.outputs))
e_no_recycling = [new_o
for (new_o, old_o) in zip(e.outputs, node.outputs)
if old_o in no_recycling]
cl = cc.CLinker().accept(e,
no_recycling=e_no_recycling)
logger.debug('Trying CLinker.make_thunk')
fill_storage, node_input_filters, node_output_filters = cl.make_thunk(
input_storage = node_input_storage,
output_storage = node_output_storage)
def rval():
fill_storage()
for o in node.outputs:
compute_map[o][0] = True
rval.cthunk = fill_storage.cthunk
rval.inputs = node_input_storage
rval.outputs = node_output_storage
rval.lazy = False
return rval
except (NotImplementedError, utils.MethodNotDefined):
logger.debug('Falling back on perform')
# condition: either there was no c_code, or it failed
p = node.op.perform
# default arguments are stored in the closure of `rval`
def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
r = p(n, [x[0] for x in i], o)
for o in node.outputs:
compute_map[o][0] = True
return r
rval.inputs = node_input_storage
rval.outputs = node_output_storage
rval.perform = p
rval.lazy = False
return rval
......@@ -2,97 +2,193 @@ import numpy
import unittest
import theano
import warnings
from theano import config
from theano import tensor as T
from theano.tensor.basic import _allclose
from theano.scan_module import scan
class TestComputeTestValue(unittest.TestCase):
def test_variable_only(self):
theano.config.compute_test_value = True
orig_compute_test_value = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'raise'
x = T.matrix('x')
x.tag.test_value = numpy.random.rand(3,4)
y = T.matrix('y')
y.tag.test_value = numpy.random.rand(4,5)
# should work
z = T.dot(x,y)
x = T.matrix('x')
x.tag.test_value = numpy.random.rand(3,4).astype(config.floatX)
y = T.matrix('y')
y.tag.test_value = numpy.random.rand(4,5).astype(config.floatX)
# this test should fail
y.tag.test_value = numpy.random.rand(6,5)
self.assertRaises(ValueError, T.dot, x, y)
# should work
z = T.dot(x,y)
assert hasattr(z.tag, 'test_value')
f = theano.function([x,y], z)
assert _allclose(f(x.tag.test_value, y.tag.test_value),
z.tag.test_value)
def test_compute_flag(self):
# this test should fail
y.tag.test_value = numpy.random.rand(6,5).astype(config.floatX)
self.assertRaises(ValueError, T.dot, x, y)
finally:
theano.config.compute_test_value = orig_compute_test_value
x = T.matrix('x')
y = T.matrix('y')
y.tag.test_value = numpy.random.rand(4,5)
# should skip computation of test value
theano.config.compute_test_value = False
z = T.dot(x,y)
# should fail one or another when flag is set
theano.config.compute_test_value = 'warn'
self.assertRaises(Warning, T.dot, x, y)
theano.config.compute_test_value = 'err'
self.assertRaises(ValueError, T.dot, x, y)
def test_compute_flag(self):
orig_compute_test_value = theano.config.compute_test_value
try:
x = T.matrix('x')
y = T.matrix('y')
y.tag.test_value = numpy.random.rand(4,5).astype(config.floatX)
# should skip computation of test value
theano.config.compute_test_value = 'off'
z = T.dot(x,y)
assert not hasattr(z.tag, 'test_value')
# should fail when asked by user
theano.config.compute_test_value = 'raise'
self.assertRaises(ValueError, T.dot, x, y)
# test that a warning is raised if required
theano.config.compute_test_value = 'warn'
warnings.simplefilter('error', UserWarning)
self.assertRaises(UserWarning, T.dot, x, y)
finally:
theano.config.compute_test_value = orig_compute_test_value
warnings.resetwarnings()
def test_string_var(self):
theano.config.compute_test_value = True
x = T.matrix('x')
x.tag.test_value = numpy.random.rand(3,4)
y = T.matrix('y')
y.tag.test_value = numpy.random.rand(4,5)
z = theano.shared(numpy.random.rand(5,6))
# should work
out = T.dot(T.dot(x,y), z)
def f(x,y,z):
return T.dot(T.dot(x,y),z)
# this test should fail
z.set_value(numpy.random.rand(7,6))
self.assertRaises(ValueError, f, x, y, z)
orig_compute_test_value = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'raise'
x = T.matrix('x')
x.tag.test_value = numpy.random.rand(3,4).astype(config.floatX)
y = T.matrix('y')
y.tag.test_value = numpy.random.rand(4,5).astype(config.floatX)
z = theano.shared(numpy.random.rand(5,6).astype(config.floatX))
# should work
out = T.dot(T.dot(x,y), z)
assert hasattr(out.tag, 'test_value')
tf = theano.function([x,y], out)
assert _allclose(
tf(x.tag.test_value, y.tag.test_value),
out.tag.test_value)
def f(x,y,z):
return T.dot(T.dot(x,y),z)
# this test should fail
z.set_value(numpy.random.rand(7,6).astype(config.floatX))
self.assertRaises(ValueError, f, x, y, z)
finally:
theano.config.compute_test_value = orig_compute_test_value
def test_shared(self):
theano.config.compute_test_value = True
x = T.matrix('x')
x.tag.test_value = numpy.random.rand(3,4)
y = theano.shared(numpy.random.rand(4,6), 'y')
# should work
z = T.dot(x,y)
# this test should fail
y.set_value(numpy.random.rand(5,6))
self.assertRaises(ValueError, T.dot, x, y)
orig_compute_test_value = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'raise'
x = T.matrix('x')
x.tag.test_value = numpy.random.rand(3,4).astype(config.floatX)
y = theano.shared(numpy.random.rand(4,6).astype(config.floatX), 'y')
# should work
z = T.dot(x,y)
assert hasattr(z.tag, 'test_value')
f = theano.function([x], z)
assert _allclose(f(x.tag.test_value), z.tag.test_value)
# this test should fail
y.set_value(numpy.random.rand(5,6).astype(config.floatX))
self.assertRaises(ValueError, T.dot, x, y)
finally:
theano.config.compute_test_value = orig_compute_test_value
def test_ndarray(self):
theano.config.compute_test_value = True
orig_compute_test_value = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'raise'
x = numpy.random.rand(2,3)
y = theano.shared(numpy.random.rand(3,6), 'y')
# should work
z = T.dot(x,y)
x = numpy.random.rand(2,3).astype(config.floatX)
y = theano.shared(numpy.random.rand(3,6).astype(config.floatX), 'y')
# this test should fail
x = numpy.random.rand(2,4)
self.assertRaises(ValueError, T.dot, x, y)
# should work
z = T.dot(x,y)
assert hasattr(z.tag, 'test_value')
f = theano.function([], z)
assert _allclose(f(), z.tag.test_value)
def test_constant(self):
theano.config.compute_test_value = True
# this test should fail
x = numpy.random.rand(2,4).astype(config.floatX)
self.assertRaises(ValueError, T.dot, x, y)
finally:
theano.config.compute_test_value = orig_compute_test_value
x = T.constant(numpy.random.rand(2,3))
y = theano.shared(numpy.random.rand(3,6), 'y')
# should work
z = T.dot(x,y)
# this test should fail
x = T.constant(numpy.random.rand(2,4))
self.assertRaises(ValueError, T.dot, x, y)
def test_constant(self):
orig_compute_test_value = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'raise'
x = T.constant(numpy.random.rand(2,3), dtype=config.floatX)
y = theano.shared(numpy.random.rand(3,6).astype(config.floatX), 'y')
# should work
z = T.dot(x,y)
assert hasattr(z.tag, 'test_value')
f = theano.function([], z)
assert _allclose(f(), z.tag.test_value)
# this test should fail
x = T.constant(numpy.random.rand(2,4), dtype=config.floatX)
self.assertRaises(ValueError, T.dot, x, y)
finally:
theano.config.compute_test_value = orig_compute_test_value
def test_incorrect_type(self):
orig_compute_test_value = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'raise'
x = T.fmatrix('x')
# Incorrect dtype (float64) for test_value
x.tag.test_value = numpy.random.rand(3,4)
y = T.dmatrix('y')
y.tag.test_value = numpy.random.rand(4,5)
self.assertRaises(TypeError, T.dot, x, y)
finally:
theano.config.compute_test_value = orig_compute_test_value
def notest_scan(self):
"""
Do not run this test as the compute_test_value mechanism is known not to work with Scan.
TODO: fix scan to work with compute_test_value
"""
orig_compute_test_value = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'raise'
k = T.iscalar("k")
A = T.vector("A")
k.tag.test_value = 3
A.tag.test_value = numpy.random.rand(5)
def fx(prior_result, A):
return prior_results * A
# Symbolic description of the result
result, updates = theano.scan(fn=lambda prior_result, A: prior_result * A,
outputs_info=T.ones_like(A),
non_sequences=A,
n_steps=k)
# We only care about A**k, but scan has provided us with A**1 through A**k.
# Discard the values that we don't care about. Scan is smart enough to
# notice this and not waste memory saving them.
final_result = result[-1]
assert hasattr(final_result.tag, 'test_value')
finally:
theano.config.compute_test_value = orig_compute_test_value
......@@ -10,7 +10,10 @@ ls ${COMPILEDIR}|wc -l
FLAGS=warn.argmax_pushdown_bug=False,warn.gpusum_01_011_0111_bug=False,warn.sum_sum_bug=False,warn.sum_div_dimshuffle_bug=False,compiledir=${COMPILEDIR}
export PYTHONPATH=${ROOT_CWD}:$PYTHONPATH
cd ${ROOT_CWD}
cd ${ROOT_CWD}/Theano
hg summary
cd ..
echo "executing nosetests with mode=FAST_COMPILE"
THEANO_FLAGS=${FLAGS},mode=FAST_COMPILE ${NOSETESTS} Theano
echo "nb element in the compiledir:"
......
......@@ -106,20 +106,22 @@ class PycudaElemwiseSourceModuleOp(Op):
otype = CudaNdarrayType(broadcastable=[False]*_inputs[0].type.ndim)
assert self.nout == 1
#TODO change the scalar op with the good c_code!
fct_name = "pycuda_elemwise_%s"%str(self.scalar_op)
out_node = Apply(self, _inputs, [otype() for o in xrange(self.nout)])
in_name = ["i"+str(id) for id in range(len(inputs))]
out_name = ["o"+str(id) for id in range(self.nout)]
c_code = self.scalar_op.c_code(out_node, "some_name", tuple([n+"[i]"for n in in_name]), tuple(n+"[i]"for n in out_name), {})
c_code_param = ", ".join([var.type.dtype_specs()[1]+" *"+name for var,name in zip(inputs,in_name) + zip(out_node.outputs,out_name)])
c_code_param = ", ".join([var.type.dtype_specs()[1]+" *"+name for var,name in zip(inputs,in_name) + zip(out_node.outputs,out_name)]+["int size"])
mod = SourceModule("""
#include<Python.h>
#include <numpy/arrayobject.h>
__global__ void %s(%s)
{
int i = threadIdx.x + threadIdx.y*blockDim.x;
%s
int i = (blockIdx.x+blockIdx.y*gridDim.x)*(blockDim.x*blockDim.y);
i += threadIdx.x + threadIdx.y*blockDim.x;
if(i<size){
%s
}
}
"""%(fct_name,c_code_param,c_code))
self.pycuda_fct = mod.get_function(fct_name)
......@@ -131,7 +133,16 @@ class PycudaElemwiseSourceModuleOp(Op):
z, = out
if z[0] is None or z[0].shape!=inputs[0].shape:
z[0] = theano.sandbox.cuda.CudaNdarray.zeros(inputs[0].shape)
self.pycuda_fct(inputs[0],inputs[1],z[0], block=(inputs[0].shape[0],inputs[0].shape[1],1))
if inputs[0].shape != inputs[1].shape:
raise TypeError("PycudaElemwiseSourceModuleOp: inputs don't have the same shape!")
if inputs[0].size > 512:
grid = (int(numpy.ceil(inputs[0].size / 512.)),1)
block = (512,1,1)
else:
grid = (1,1)
block = (inputs[0].shape[0],inputs[0].shape[1],1)
self.pycuda_fct(inputs[0], inputs[1], z[0], numpy.intc(inputs[1].size), block=block, grid=grid)
class PycudaElemwiseKernelOp(Op):
......
......@@ -24,23 +24,27 @@ else:
mode_without_gpu = theano.compile.mode.get_default_mode().excluding('gpu')
def test_pycuda_elemwise_source_module():
x=T.fmatrix('x')
y=T.fmatrix('y')
f=theano.function([x,y],x*y, mode=mode_with_gpu)
print f.maker.env.toposort()
f2 = theano.function([x,y],x*y, mode=mode_with_gpu.including("local_pycuda_gpu_elemwise"))
print f2.maker.env.toposort()
for shape in [(5,5), (10,49), (50,49),(500,501),(5000,5001)]:
for op in [theano.scalar.basic.mul, theano.scalar.basic.add]:
x=T.fmatrix('x')
y=T.fmatrix('y')
pycuda_op = PycudaElemwiseSourceModuleOp(op)
elemwise_op = theano.tensor.Elemwise(op)
f=theano.function([x,y], elemwise_op(x,y), mode=mode_with_gpu)
f2 = theano.function([x,y], theano.sandbox.cuda.host_from_gpu(pycuda_op(x,y)))
f3 = theano.function([x,y], elemwise_op(x,y),
mode=mode_with_gpu.including("local_pycuda_gpu_elemwise"))
assert any([ isinstance(node.op, theano.sandbox.cuda.GpuElemwise) for node in f.maker.env.toposort()])
assert any([ isinstance(node.op, PycudaElemwiseSourceModuleOp) for node in f2.maker.env.toposort()])
assert any([ isinstance(node.op, theano.sandbox.cuda.GpuElemwise) for node in f.maker.env.toposort()])
assert any([ isinstance(node.op, PycudaElemwiseSourceModuleOp) for node in f2.maker.env.toposort()])
assert any([ isinstance(node.op, PycudaElemwiseSourceModuleOp) for node in f3.maker.env.toposort()])
val1 = numpy.asarray(numpy.random.rand(5,5), dtype='float32')
val2 = numpy.asarray(numpy.random.rand(5,5), dtype='float32')
#val1 = numpy.ones((5,5))
#val2 = numpy.arange(25).reshape(5,5)
assert (f(val1,val2) == f2(val1,val2)).all()
print f(val1,val2)
print f2(val1,val2)
val1 = numpy.asarray(numpy.random.rand(*shape), dtype='float32')
val2 = numpy.asarray(numpy.random.rand(*shape), dtype='float32')
assert (f(val1,val2) == f2(val1,val2)).all()
assert (f(val1,val2) == f3(val1,val2)).all()
#print f(val1,val2)
#print f2(val1,val2)
def test_pycuda_elemwise_kernel():
x=T.fmatrix('x')
......
......@@ -392,8 +392,10 @@ default_colorCodes = {'GpuFromHost' : 'red',
def pydotprint(fct, outfile=None,
compact=True, format='png', with_ids=False,
high_contrast=True, cond_highlight = None, colorCodes = None,
max_label_size=50, scan_graphs = False):
high_contrast=True, cond_highlight=None, colorCodes=None,
max_label_size=50, scan_graphs=False,
var_with_name_simple=False
):
"""
print to a file in png format the graph of op of a compile theano fct.
......@@ -493,14 +495,20 @@ def pydotprint(fct, outfile=None,
return var_str[var]
if var.name is not None:
varstr = 'name='+var.name+" "+str(var.type)
if var_with_name_simple:
varstr = var.name
else:
varstr = 'name='+var.name+" "+str(var.type)
elif isinstance(var,gof.Constant):
dstr = 'val='+str(numpy.asarray(var.data))
if '\n' in dstr:
dstr = dstr[:dstr.index('\n')]
varstr = '%s [%s]'% (dstr, str(var.type))
varstr = '%s %s'% (dstr, str(var.type))
elif var in input_update and input_update[var].variable.name is not None:
varstr = input_update[var].variable.name+" "+str(var.type)
if var_with_name_simple:
varstr = input_update[var].variable.name
else:
varstr = input_update[var].variable.name+" "+str(var.type)
else:
#a var id is needed as otherwise var with the same type will be merged in the graph.
varstr = str(var.type)
......@@ -667,7 +675,8 @@ def pydotprint_variables(vars,
format='png',
depth=-1,
high_contrast=True, colorCodes=None,
max_label_size=50):
max_label_size=50,
var_with_name_simple=False):
''' Identical to pydotprint just that it starts from a variable instead
of a compiled function. Could be useful ? '''
......@@ -692,12 +701,15 @@ def pydotprint_variables(vars,
return var_str[var]
if var.name is not None:
varstr = 'name='+var.name
if var_with_name_simple:
varstr = var.name
else:
varstr = 'name='+var.name+" "+str(var.type)
elif isinstance(var,gof.Constant):
dstr = 'val='+str(var.data)
if '\n' in dstr:
dstr = dstr[:dstr.index('\n')]
varstr = '%s [%s]'% (dstr, str(var.type))
varstr = '%s %s'% (dstr, str(var.type))
else:
#a var id is needed as otherwise var with the same type will be merged in the graph.
varstr = str(var.type)
......
......@@ -154,8 +154,11 @@ outdated!""")
import cuda_ndarray
def use(device, force=False, default_to_move_computation_to_gpu = True,
move_shared_float32_to_gpu = True):
def use(device,
force=False,
default_to_move_computation_to_gpu=True,
move_shared_float32_to_gpu=True,
enable_cuda=True):
global cuda_enabled, cuda_initialization_error_message
if force and not cuda_available and device.startswith('gpu'):
raise EnvironmentError("You forced use of device %s, but CUDA initialization failed "
......@@ -191,7 +194,9 @@ def use(device, force=False, default_to_move_computation_to_gpu = True,
if move_shared_float32_to_gpu:
handle_shared_float32(True)
use.device_number = device
cuda_enabled = True
if enable_cuda:
cuda_enabled = True
print >> sys.stderr, "Using gpu device %d: %s" % (active_device_number(), active_device_name())
except (EnvironmentError, ValueError), e:
_logger.error(("ERROR: Not using GPU."
......@@ -251,4 +256,5 @@ elif config.init_gpu_device:
use(device=config.init_gpu_device,
force=config.force_device,
default_to_move_computation_to_gpu=False,
move_shared_float32_to_gpu=False)
move_shared_float32_to_gpu=False,
enable_cuda=False)
......@@ -2049,6 +2049,7 @@ CudaNdarray_gpu_shutdown(PyObject* _unused, PyObject* _unused_args) {
PyObject *
CudaNdarray_from_gpu_pointer(PyObject* _unused, PyObject* args)
{
int verbose = 0;
PyObject *gpu_ptr = NULL;
PyObject *shapes = NULL;
PyObject *strides = NULL;
......@@ -2062,7 +2063,7 @@ CudaNdarray_from_gpu_pointer(PyObject* _unused, PyObject* args)
if (! PyArg_ParseTuple(args, "OOOO", &gpu_ptr, &shapes, &strides, &base))
return NULL;
printf("In CudaNdarray_from_gpu_pointer\n");
if (verbose) printf("In CudaNdarray_from_gpu_pointer\n");
if (!PyLong_Check(gpu_ptr))
{
PyErr_Format(PyExc_Exception, "CudaNdarray_from_gpu_pointer: The gpu pointor is not an long");
......@@ -2133,7 +2134,7 @@ CudaNdarray_from_gpu_pointer(PyObject* _unused, PyObject* args)
Py_DECREF(dim_);
Py_DECREF(strd_);
}
printf("CudaNdarray_from_gpu_pointer normal return\n");
if (verbose) printf("CudaNdarray_from_gpu_pointer normal return\n");
return rval;
}
......@@ -2188,7 +2189,7 @@ CudaNdarray_Dot(PyObject* _unused, PyObject* args)
}
static PyObject *
filter(PyObject* __unsed_self, PyObject *args) // args = (data, broadcastable, strict)
filter(PyObject* __unsed_self, PyObject *args) // args = (data, broadcastable, strict, storage)
{
/*
* TODO: DOC what this function should do in the various cases of
......@@ -2282,10 +2283,10 @@ filter(PyObject* __unsed_self, PyObject *args) // args = (data, broadcastable, s
Py_DECREF(rval);
rval = NULL;
}
Py_DECREF(data);
Py_DECREF(py_data);
Py_DECREF(broadcastable);
}
Py_DECREF(data);
Py_DECREF(py_data);
Py_DECREF(broadcastable);
return (PyObject*)rval;
}
}
......@@ -2490,6 +2491,11 @@ CudaNdarray_new_nd(int nd)
return (PyObject *) rval;
}
/**
* Initialize 'self' as a view of 'base', with memory storage 'data'
*/
int CudaNdarray_set_device_data(CudaNdarray * self, float * data, PyObject * base)
{
if (self->data_allocated)
......@@ -2503,12 +2509,15 @@ int CudaNdarray_set_device_data(CudaNdarray * self, float * data, PyObject * bas
}
}
// Get the original base object (base.base.base...)
// TODO: check that base is indeed a CudaNdarray?
PyObject * orig_base = base;
while (((CudaNdarray*) orig_base)->base)
// base is not always a CudaNdarray. It can be a GpuArray from pycuda, ...
if (orig_base && CudaNdarray_Check(orig_base))
{
// base_base is itself a view
orig_base = ((CudaNdarray*) orig_base)->base;
while (((CudaNdarray*) orig_base)->base)
{
// base_base is itself a view
orig_base = ((CudaNdarray*) orig_base)->base;
}
}
//N.B. XDECREF and XINCREF are no-ops for NULL pointers
if (self->base != orig_base)
......
......@@ -26,7 +26,7 @@ typedef float real;
#endif
#ifndef SHARED_SIZE
#ifndef SHARED_SIZE
#define SHARED_SIZE (16*1024)
#endif
......@@ -48,10 +48,10 @@ static T ceil_intdiv(T a, T b)
/**
* struct CudaNdarray
*
* This is a Python type.
* This is a Python type.
*
*/
struct CudaNdarray
struct CudaNdarray
{
PyObject_HEAD
......@@ -65,40 +65,46 @@ struct CudaNdarray
/* Type-specific fields go here. */
//GpuTensorType::VoidTensor * vt;
int nd; //the number of dimensions of the tensor
// Client should acces host_structure via CudaNdarray_HOST_DIMS / CudaNdarray_HOST_STRIDES macros
// Client should acces host_structure via CudaNdarray_HOST_DIMS / CudaNdarray_HOST_STRIDES macros
int * host_structure; //dim0, dim1, ... stride0, stride1, ...
int data_allocated; //the number of bytes allocated for devdata
//device pointers (allocated by cudaMalloc)
int dev_structure_fresh;
//dev_structure should be accessed via macros, otherwise may not be synchronized
int * dev_structure; //dim0, dim1, ..., stride0, stride1, ...
//dev_structure should be accessed via macros, otherwise may not be synchronized
int * dev_structure; //dim0, dim1, ..., stride0, stride1, ...
real* devdata; //pointer to data element [0,..,0].
};
/*
* Return a CudaNdarray whose 'nd' dimensions are all 0.
*/
PyObject *
PyObject *
CudaNdarray_New(int nd=-1);
/**
* Return 1 for a CudaNdarray otw 0
*/
int
int
CudaNdarray_Check(const PyObject * ob);
/**
* Return 1 for a CudaNdarray otw 0
*/
int
int
CudaNdarray_CheckExact(const PyObject * ob);
/**
* Return true for a C-contiguous CudaNdarray, else false
*/
bool
CudaNdarray_is_c_contiguous(const CudaNdarray * self);
/****
* Returns the number of elements necessary in host_structure and dev_structure for a given number of dimensions.
*/
int
int
cnda_structure_size(int nd)
{
// dim0, dim1, ...
......@@ -107,23 +113,23 @@ cnda_structure_size(int nd)
return nd + nd + nd;
}
const int *
const int *
CudaNdarray_HOST_DIMS(const CudaNdarray * self)
{
return self->host_structure;
}
const int *
const int *
CudaNdarray_HOST_STRIDES(const CudaNdarray * self)
{
return self->host_structure + self->nd;
}
const int *
const int *
CudaNdarray_HOST_LOG2DIMS(const CudaNdarray * self)
{
return self->host_structure + 2*self->nd;
}
void
void
cnda_mark_dev_structure_dirty(CudaNdarray * self)
{
self->dev_structure_fresh = 0;
......@@ -190,7 +196,7 @@ CudaNdarray_Equal(CudaNdarray *cnda1, CudaNdarray *cnda2)
*
* Does not sync structure to host.
*/
void
void
CudaNdarray_set_dim(CudaNdarray * self, int idx, int d)
{
if ((idx >= self->nd) || (idx < 0) || (d < 0))
......@@ -206,7 +212,7 @@ CudaNdarray_set_dim(CudaNdarray * self, int idx, int d)
cnda_mark_dev_structure_dirty(self);
}
}
void
void
CudaNdarray_set_stride(CudaNdarray * self, int idx, int s)
{
if ((idx >= self->nd) || (idx < 0))
......@@ -225,7 +231,7 @@ CudaNdarray_set_stride(CudaNdarray * self, int idx, int s)
*
* This means: recalculate the log2dims and transfer structure to the card
*/
int
int
cnda_copy_structure_to_device(CudaNdarray * self)
{
cublasSetVector(cnda_structure_size(self->nd), sizeof(int), self->host_structure, 1, self->dev_structure, 1);
......@@ -239,7 +245,7 @@ cnda_copy_structure_to_device(CudaNdarray * self)
return 0;
}
const int *
const int *
CudaNdarray_DEV_DIMS(CudaNdarray * self)
{
if (!self->dev_structure_fresh)
......@@ -249,7 +255,7 @@ CudaNdarray_DEV_DIMS(CudaNdarray * self)
}
return self->dev_structure;
}
const int *
const int *
CudaNdarray_DEV_STRIDES(CudaNdarray * self)
{
if (!self->dev_structure_fresh)
......@@ -259,7 +265,7 @@ CudaNdarray_DEV_STRIDES(CudaNdarray * self)
}
return self->dev_structure + self->nd;
}
const int *
const int *
CudaNdarray_DEV_LOG2DIMS(CudaNdarray * self)
{
if (!self->dev_structure_fresh)
......@@ -269,7 +275,7 @@ CudaNdarray_DEV_LOG2DIMS(CudaNdarray * self)
}
return self->dev_structure + 2*self->nd;
}
float *
float *
CudaNdarray_DEV_DATA(const CudaNdarray * self)
{
return self->devdata;
......@@ -278,7 +284,7 @@ CudaNdarray_DEV_DATA(const CudaNdarray * self)
/**
* Return the number of elements in the ndarray (product of the dimensions)
*/
int
int
CudaNdarray_SIZE(const CudaNdarray *self)
{
if (self->nd == -1) return 0;
......@@ -289,7 +295,7 @@ CudaNdarray_SIZE(const CudaNdarray *self)
}
return size;
}
static PyObject *
static PyObject *
CudaNdarray_SIZE_Object(const CudaNdarray *self, void *closure)
{
return PyInt_FromLong(CudaNdarray_SIZE(self));
......@@ -320,7 +326,7 @@ int CudaNdarray_set_nd(CudaNdarray * self, const int nd)
}
self->dev_structure = NULL;
}
if (self->host_structure)
if (self->host_structure)
{
free(self->host_structure);
self->host_structure = NULL;
......@@ -386,29 +392,41 @@ int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const inttype
size = size * dim[i];
}
if (self->data_allocated != size)
if (CudaNdarray_is_c_contiguous(self) && (self->data_allocated == size))
{
if (device_free(self->devdata))
{
// Does this ever happen?? Do we need to set data_allocated or devdata to 0?
return -1;
}
assert(size>0);
self->devdata = (float*)device_malloc(size*sizeof(real));
if (!self->devdata)
{
CudaNdarray_set_nd(self,-1);
self->data_allocated = 0;
self->devdata = 0;
return -1;
}
if (0)
fprintf(stderr,
"Allocated devdata %p (self=%p)\n",
self->devdata,
self);
self->data_allocated = size;
return 0;
}
// The structure of self will be reused with newly allocated memory.
// If self was a view, we should remove the reference to its base.
// (If base was already NULL, the following has no effect.)
Py_XDECREF(self->base);
self->base = NULL;
// If self is a view, do not try to free its memory
if (self->data_allocated && device_free(self->devdata))
{
self->devdata = NULL;
self->data_allocated = 0;
return -1;
}
assert(size>0);
self->devdata = (float*)device_malloc(size*sizeof(real));
if (!self->devdata)
{
CudaNdarray_set_nd(self,-1);
self->data_allocated = 0;
self->devdata = 0;
return -1;
}
if (0)
fprintf(stderr,
"Allocated devdata %p (self=%p)\n",
self->devdata,
self);
self->data_allocated = size;
return 0;
}
......@@ -416,7 +434,7 @@ int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const inttype
* Return a CudaNdarray whose 'nd' dimensions are set to dims, and allocated.
*/
template<typename inttype>
PyObject *
PyObject *
CudaNdarray_NewDims(int nd, const inttype * dims)
{
CudaNdarray * rval = (CudaNdarray*)CudaNdarray_New();
......@@ -440,7 +458,7 @@ CudaNdarray_NewDims(int nd, const inttype * dims)
int CudaNdarray_set_device_data(CudaNdarray * self, float * data, PyObject * base);
int CudaNdarray_set_device_data(CudaNdarray * self, float * data, CudaNdarray * base)
{
return CudaNdarray_set_device_data(self, data, (PyObject *) base);
return CudaNdarray_set_device_data(self, data, (PyObject *) base);
}
/**
......@@ -475,10 +493,10 @@ int CudaNdarray_CopyFromCudaNdarray(CudaNdarray * self, CudaNdarray * other, boo
/**
* Transfer the contents of CudaNdarray `self` to a new numpy ndarray.
*/
PyObject *
PyObject *
CudaNdarray_CreateArrayObj(CudaNdarray * self);
PyObject *
PyObject *
CudaNdarray_ZEROS(int n, int * dims);
/**
......@@ -499,7 +517,7 @@ int CudaNdarray_dimshuffle(CudaNdarray * self, unsigned int len, const int * pat
void fprint_CudaNdarray(FILE * fd, const CudaNdarray *self)
{
fprintf(fd, "CudaNdarray <%p, %p> nd=%i dev_structure_fresh=%d data_allocated=%d\n",
self, self->devdata, self->nd, self->dev_structure_fresh, self->data_allocated);
self, self->devdata, self->nd, self->dev_structure_fresh, self->data_allocated);
fprintf(fd, "\tHOST_DIMS: ");
for (int i = 0; i < self->nd; ++i)
{
......@@ -510,23 +528,23 @@ void fprint_CudaNdarray(FILE * fd, const CudaNdarray *self)
{
fprintf(fd, "%i\t", CudaNdarray_HOST_STRIDES(self)[i]);
}
int data=0;
fprintf(fd, "\n\tDEV_DIMS: ");
for (int i = 0; i < self->nd; ++i)
{
cublasGetVector(1, sizeof(int),
self->dev_structure+i, 1,
&data, 1);
fprintf(fd, "%i\t", data);
self->dev_structure+i, 1,
&data, 1);
fprintf(fd, "%i\t", data);
}
fprintf(fd, "\n\tDEV_STRIDES: ");
for (int i = 0; i < self->nd; ++i)
{
cublasGetVector(1, sizeof(int),
self->dev_structure + self->nd+i, 1,
&data, 1);
fprintf(fd, "%i \t", data);
self->dev_structure + self->nd+i, 1,
&data, 1);
fprintf(fd, "%i \t", data);
}
fprintf(fd, "\n");
}
......
......@@ -37,7 +37,7 @@ def get_str_list_logical_scalar(node, value_str='ii_i%i_value', data_str='ii_i%i
class NaiveAlgo(object):
verbose = 0 # 1, 2 or 3 for more verbose output.
cache_version = ()
cache_version = ('debug', 14, verbose)
cache_version = (14, verbose)
def __init__(self, scalar_op, sync=True, inplace_pattern={}):
"""
......@@ -56,7 +56,7 @@ class NaiveAlgo(object):
print >> sio, "// Input ", ipos, str(i.type)
for ipos, i in enumerate(node.outputs):
print >> sio, "// Output ", ipos, str(i.type)
print >> sio, "static __global__ void kernel_%s_%s_%s_%s(unsigned int numEls" %(self.scalar_op.__class__.__name__,nodename, id(self), nd)
print >> sio, "static __global__ void kernel_%s_%s_%s(unsigned int numEls" % (self.scalar_op.__class__.__name__,nodename, nd)
if (nd):
print >> sio, "\t,", ", ".join("const int dim%i" % i for i in xrange(nd))
#declare inputs
......@@ -159,10 +159,9 @@ class NaiveAlgo(object):
print >> sio, "// Input ", ipos, str(i.type)
for ipos, i in enumerate(node.outputs):
print >> sio, "// Output ", ipos, str(i.type)
print >> sio, "static __global__ void kernel_%s_%s_%s_%s(unsigned int numEls" %(
print >> sio, "static __global__ void kernel_%s_%s_%s(unsigned int numEls" %(
self.scalar_op.__class__.__name__,
nodename,
id(self),
'tiling%i'%nd)
if (nd):
print >> sio, "\t,", ", ".join("const int dim%i" % i for i in xrange(nd))
......@@ -262,10 +261,9 @@ class NaiveAlgo(object):
print >> sio, "// Input ", ipos, str(i.type)
for ipos, i in enumerate(node.outputs):
print >> sio, "// Output ", ipos, str(i.type)
print >> sio, "static __global__ void kernel_%s_%s_%s_%s(unsigned int numEls" %(
print >> sio, "static __global__ void kernel_%s_%s_%s(unsigned int numEls" %(
self.scalar_op.__class__.__name__,
nodename,
id(self),
'tiling%i_less_registers'%nd)
if (nd):
print >> sio, "\t,", ", ".join("const int dim%i" % i for i in xrange(nd))
......@@ -472,7 +470,6 @@ class NaiveAlgo(object):
nd = node.outputs[0].type.ndim
nb_inputs = len(node.inputs)
nb_outputs = len(node.outputs)
id_self = id(self)
d = dict()
#input_params and output_params go into the function declaration/definition
input_params = ", ".join("const float * i%i_data, const int * i%i_str"%(ipos, ipos)
......@@ -512,7 +509,7 @@ class NaiveAlgo(object):
""" %locals()
if self.verbose:
print >> sio, """
std::cerr << "calling kernel_%(scalar_op)s_%(nodename)s_%(id_self)s w numEls" << numEls << " dims"<< d << "\\n";
std::cerr << "calling kernel_%(scalar_op)s_%(nodename)s w numEls" << numEls << " dims"<< d << "\\n";
""" %locals()
print >> sio, 'std::cerr << ' + " << ' ' << ".join(['" "']+list("dims[%i]"%di
for di in xrange(nd)) + ["'\\n';"])
......@@ -693,7 +690,7 @@ nd_collapse_[i]=0;
print >> sio, 'std::cerr << " local_ostr %(ipos)s: " <<'%locals()+' << " " << '.join(["local_ostr[%(ipos)s][%(x)s]"%locals() for x in range(nd)])+'<<"\\n";'
def launch_Ccontiguous(nodename, id_self, scalar_op, sync=True):
def launch_Ccontiguous(nodename, scalar_op, sync=True):
kernel_call_args = ["numEls"]
for ipos in xrange(len(node.inputs)):
kernel_call_args.append("i%i_data"%ipos)
......@@ -736,7 +733,7 @@ nd_collapse_[i]=0;
else:
print >> sio, " return 0; " %locals()
def launch_General(nodename, id_self, scalar_op, force_nd, sync=True):
def launch_General(nodename, scalar_op, force_nd, sync=True):
# kernel_call_args are used to invoke the cuda kernel
local="local_"
kernel_call_args = ["numEls"]
......@@ -769,7 +766,7 @@ nd_collapse_[i]=0;
if (threads_per_block * n_blocks < numEls)
threads_per_block = std::min(numEls/n_blocks, (unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK);
kernel_%(scalar_op)s_%(nodename)s_%(id_self)s_%(force_nd)s<<<n_blocks, threads_per_block>>>(%(kernel_call_args)s);
kernel_%(scalar_op)s_%(nodename)s_%(force_nd)s<<<n_blocks, threads_per_block>>>(%(kernel_call_args)s);
""" %locals()
if sync:
print >> sio, """
......@@ -791,11 +788,11 @@ nd_collapse_[i]=0;
print >> sio, "if(numEls==0) return 0;"
print >> sio, "switch (nd_collapse==0?0:min(%(nd)s,nd_collapse)) {"%locals()
print >> sio, "case 0: {"
launch_Ccontiguous(nodename, id_self, scalar_op, self.sync)
launch_Ccontiguous(nodename, scalar_op, self.sync)
print >> sio, " } break;"
for i in range(1, nd+1):
print >> sio, "case "+str(i)+": {"
launch_General(nodename, id_self, scalar_op, i, self.sync)
launch_General(nodename, scalar_op, i, self.sync)
print >> sio, " } break;"
print >> sio, "}"#end case
......
......@@ -553,7 +553,7 @@ def local_gpu_advanced_incsubtensor1(node):
gpu_from_host(y), *coords)]
# Should not execute for GpuAdvancedIncSubtensor1
if node.op.__class__ is tensor.AdvancedSubtensor1:
if node.op.__class__ is tensor.AdvancedSubtensor1 and node.inputs[0].dtype=="float32":
x, y = node.inputs[0:2]
coords = node.inputs[2:]
go_gpu = False
......@@ -585,7 +585,7 @@ def local_gpu_incsubtensor(node):
gpu_from_host(x),
gpu_from_host(y),
*coords)]
if type(node.op) == tensor.IncSubtensor:
if type(node.op) == tensor.IncSubtensor and node.inputs[0].dtype=="float32":
x, y = node.inputs[0:2]
assert isinstance(x.type, tensor.TensorType)
assert isinstance(y.type, tensor.TensorType)
......
......@@ -318,11 +318,11 @@ def test_elemwise3():
a = tcn.shared_constructor(theano._asarray(numpy.random.rand(*shape), dtype='float32'), 'a')
b = tensor.fvector()
print b.type
print tensor.constant(1).type
print (1 + b).type
print (1 + b**a).type
print tensor.exp((1 + b**a)).type
f = pfunc([b], [], updates=[(a, (a+b).dimshuffle([2,0,3,1]) * tensor.exp(1 +
fone = tensor.constant(1, dtype='float32')
print (fone + b).type
print (fone + b**a).type
print tensor.exp((fone + b**a)).type
f = pfunc([b], [], updates=[(a, (a+b).dimshuffle([2,0,3,1]) * tensor.exp(fone +
b**a).dimshuffle([2,0,3,1]))], mode=mode_with_gpu)
has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()):
......
......@@ -61,7 +61,7 @@ class Kouh2008(object):
dtype = x_list[0].dtype
n_terms = len(x_list)
def shared_uniform(low, high, size, name):
def shared_uniform(low, high, size, name):
return _shared_uniform(rng, low, high, size, dtype, name)
use_softmax_w = True
......@@ -86,7 +86,7 @@ class Kouh2008(object):
raise ValueError('exponent range must have low <= high')
p_unbounded = shared_uniform(low=-0.1, high=0.1, size=(n_out,), name='p')
q_unbounded = shared_uniform(low=-0.1, high=0.1, size=(n_out,), name='q')
q_unbounded = shared_uniform(low=-0.1, high=0.1, size=(n_out,), name='q')
r_unbounded = shared_uniform(low=-0.1, high=0.1, size=(n_out,), name='r')
k_unbounded = shared_uniform(low=-0.2, high=0.2, size=(n_out,), name='k') # biases
......@@ -122,7 +122,7 @@ class Kouh2008(object):
"""Return a KouhLayer instance with random parameters
The parameters are drawn on a range [typically] suitable for fine-tuning by gradient
descent.
descent.
:param input: a tensor of shape (n_examples, n_in)
......@@ -137,7 +137,7 @@ class Kouh2008(object):
many 'simple cell' responses.
:param eps: this amount is added to the softplus of filter responses as a baseline
firing rate (that prevents a subsequent error from ``pow(0, p)``)
firing rate (that prevents a subsequent error from ``pow(0, p)``)
:returns: KouhLayer instance with freshly-allocated random weights.
......@@ -149,7 +149,7 @@ class Kouh2008(object):
dtype = input.dtype
_logger.debug('dtype %s' % dtype)
def shared_uniform(low, high, size, name):
def shared_uniform(low, high, size, name):
return _shared_uniform(rng, low, high, size, dtype, name)
f_list = [shared_uniform(low=-2.0/numpy.sqrt(n_in), high=2.0/numpy.sqrt(n_in), size=(n_in, n_out), name='f_%i'%i)
......@@ -232,7 +232,7 @@ class Config(object):
if dtype2=='floatX':
import theano.config as c
dtype2 = c.config.get('scalar.floatX')
rng_seed = 23498
n_hid = 300
......@@ -273,7 +273,7 @@ if 0:
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.cuda_enabled == False:
if not cuda_ndarray.cuda_available:
raise SkipTest('Optional package cuda disabled')
import theano.sandbox.cuda
theano.sandbox.cuda.use()
......
......@@ -5,7 +5,7 @@ import numpy
from nose.plugins.skip import SkipTest
from theano.compile.pfunc import pfunc
from theano import tensor
from theano import config, tensor
import theano
import theano.sandbox.cuda as cuda
......@@ -35,18 +35,28 @@ def test_no_shared_var_graph():
assert numpy.any(isinstance(x.op,cuda.HostFromGpu) for x in l)
def test_int_pow():
a = CudaNdarrayType([False])()
# This is to ensure that '4' does not upcast to float64.
if config.cast_policy == 'numpy+floatX':
floatX_backup = config.floatX
config.floatX = 'float32'
f = theano.function([a], (a*4).sum(), mode=mode_with_gpu)
try:
a = CudaNdarrayType([False])()
op_names = [n.op.__class__.__name__ for n in f.maker.env.toposort()]
assert op_names == ['GpuSum', 'GpuElemwise', 'HostFromGpu']
f = theano.function([a], (a*4).sum(), mode=mode_with_gpu)
f = theano.function([a], tensor.pow(a,4).sum(), mode=mode_with_gpu)
op_names = [n.op.__class__.__name__ for n in f.maker.env.toposort()]
assert op_names == ['GpuElemwise', 'GpuSum', 'HostFromGpu']
op_names = [n.op.__class__.__name__ for n in f.maker.env.toposort()]
assert op_names == ['GpuSum', 'GpuElemwise', 'HostFromGpu']
#theano.printing.debugprint(f)
f = theano.function([a], tensor.pow(a,4).sum(), mode=mode_with_gpu)
op_names = [n.op.__class__.__name__ for n in f.maker.env.toposort()]
assert op_names == ['GpuElemwise', 'GpuSum', 'HostFromGpu']
#theano.printing.debugprint(f)
finally:
if config.cast_policy == 'numpy+floatX':
config.floatX = floatX_backup
def test_gpualloc():
'''
......@@ -144,7 +154,8 @@ def test_opt_gpujoin_joinvectors_elemwise_then_minusone():
def test_print_op():
""" Test that print ops don't block gpu optimization"""
b = tensor.fmatrix()
f = theano.function([b],theano.printing.Print()(b)*2, mode=mode_with_gpu)
ftwo = tensor.constant(2, dtype='float32')
f = theano.function([b],theano.printing.Print()(b) * ftwo, mode=mode_with_gpu)
#theano.printing.debugprint(f)
#print f.maker.env.toposort()
#[GpuFromHost(<TensorType(float32, matrix)>), <theano.printing.Print object at 0x3581210>(GpuFromHost.0), GpuElemwise{mul}(CudaNdarray{[[ 2.]]}, <theano.printing.Print object at 0x3581210>.0), HostFromGpu(GpuElemwise{mul}.0)]
......
from ops import (cholesky, matrix_inverse, solve,
diag, extract_diag, alloc_diag,
det, PSD_hint,
trace, spectral_radius_bound)
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论