提交 7a0b8177 authored 作者: James Bergstra's avatar James Bergstra

merge nc

...@@ -5,6 +5,141 @@ ...@@ -5,6 +5,141 @@
Release Notes Release Notes
============= =============
Theano 0.4.0 (2011-06-13)
=========================
Change in output memory storage for Ops:
If you implemented custom Ops, with either C or Python implementation,
this will concern you.
The contract for memory storage of Ops has been changed. In particular,
it is no longer guaranteed that output memory buffers are either empty,
or allocated by a previous execution of the same Op.
Right now, here is the situation:
* For Python implementation (perform), what is inside output_storage
may have been allocated from outside the perform() function, for
instance by another node (e.g., Scan) or the Mode. If that was the
case, the memory can be assumed to be C-contiguous (for the moment).
* For C implementations (c_code), nothing has changed yet.
In a future version, the content of the output storage, both for Python and C
versions, will either be NULL, or have the following guarantees:
* It will be a Python object of the appropriate Type (for a Tensor variable,
a numpy.ndarray, for a GPU variable, a CudaNdarray, for instance)
* It will have the correct number of dimensions, and correct dtype
However, its shape and memory layout (strides) will not be guaranteed.
When that change is made, the config flag DebugMode.check_preallocated_output
will help you find implementations that are not up-to-date.
Deprecation:
* tag.shape attribute deprecated (#633)
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
* Dividing integers with / is deprecated: use // for integer division, or
cast one of the integers to a float type if you want a float result (you may
also change this behavior with config.int_division).
* Removed (already deprecated) sandbox/compile module
* Removed (already deprecated) incsubtensor and setsubtensor functions,
inc_subtensor and set_subtensor are to be used instead.
Bugs fixed:
* In CudaNdarray.__{iadd,idiv}__, when it is not implemented, return the error.
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Trying to compute x % y with one or more arguments being complex now
raises an error.
* The output of random samples computed with uniform(..., dtype=...) is
guaranteed to be of the specified dtype instead of potentially being of a
higher-precision dtype.
* The perform() method of DownsampleFactorMax did not give the right result
when reusing output storage. This happen only if you use the Theano flags
'linker=c|py_nogc' or manually specify the mode to be 'c|py_nogc'.
Crash fixed:
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
crash.
* Some optimizations crashed when the "ShapeOpt" optimization was disabled.
Optimization:
* Optimize all subtensor followed by subtensor.
GPU:
* Move to the gpu fused elemwise that have other dtype then float32 in them
(except float64) if the input and output are float32.
* This allow to move elemwise comparisons to the GPU if we cast it to
float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
* Multinomial random variates now available on GPU
New features:
* ProfileMode
* profile the scan overhead
* simple hook system to add profiler
* reordered the output to be in the order of more general to more specific
* DebugMode now checks Ops with different patterns of preallocated memory,
configured by config.DebugMode.check_preallocated_output.
* var[vector of index] now work, (grad work recursively, the direct grad
work inplace, gpu work)
* limitation: work only of the outer most dimensions.
* New way to test the graph as we build it. Allow to easily find the source
of shape mismatch error:
`http://deeplearning.net/software/theano/tutorial/debug_faq.html#interactive-debugger`__
* cuda.root inferred if nvcc is on the path, otherwise defaults to
/usr/local/cuda
* Better graph printing for graphs involving a scan subgraph
* Casting behavior can be controlled through config.cast_policy,
new (experimental) mode.
* Smarter C module cache, avoiding erroneous usage of the wrong C
implementation when some options change, and avoiding recompiling the
same module multiple times in some situations.
* The "theano-cache clear" command now clears the cache more thoroughly.
* More extensive linear algebra ops (CPU only) that wrap scipy.linalg
now available in the sandbox.
* CUDA devices 4 - 16 should now be available if present.
* infer_shape support for the View op, better infer_shape support in Scan
* infer_shape supported in all case of subtensor
* tensor.grad now gives an error by default when computing the gradient
wrt a node that is disconnected from the cost (not in the graph, or
no continuous path from that op to the cost).
* New tensor.isnan and isinf functions.
Documentation:
* Better commenting of cuda_ndarray.cu
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
* Better documentation of testing on Windows
* Better documentation of the 'run_individual_tests' script
Unit tests:
* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
(#374)
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
* Better scripts to run tests individually or in batches
* Some tests are now run whenever cuda is available and not just when it has
been enabled before
* Tests display less pointless warnings.
Other:
* Correctly put the broadcast flag to True in the output var of
a Reshape op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default, option to print
more compact node names.
* pydotprint: How trunk label that are too long.
* More compact printing (ignore leading "Composite" in op names)
Theano 0.3.1 (2011-02-21) Theano 0.3.1 (2011-02-21)
========================= =========================
......
差异被折叠。
.. _advanced_theano: .. _advanced_theano:
*************** ***************
Advanced Theano Advanced Theano
*************** ***************
Conditions
----------
**IfElse**
- Build condition over symbolic variables.
- IfElse Op takes a boolean condition and two variables to compute as input.
- While Switch Op evaluates both 'output' variables, IfElse Op is lazy and only
evaluates one variable respect to the condition.
**IfElse Example: Comparison with Switch**
.. code-block:: python
from theano import tensor as T
from theano.lazycond import ifelse
import theano, time, numpy
a,b = T.scalars('a','b')
x,y = T.matrices('x','y')
z_switch = T.switch(T.lt(a,b), T.mean(x), T.mean(y))
z_lazy = ifelse(T.lt(a,b), T.mean(x), T.mean(y))
f_switch = theano.function([a,b,x,y], z_switch,
mode=theano.Mode(linker='vm'))
f_lazyifelse = theano.function([a,b,x,y], z_lazy,
mode=theano.Mode(linker='vm'))
val1 = 0.
val2 = 1.
big_mat1 = numpy.ones((10000,1000))
big_mat2 = numpy.ones((10000,1000))
n_times = 10
tic = time.clock()
for i in xrange(n_times):
f_switch(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating both values %f sec'%(time.clock()-tic)
tic = time.clock()
for i in xrange(n_times):
f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic)
IfElse Op spend less time (about an half) than Switch since it computes only
one variable instead of both.
>>> python ifelse_switch.py
time spent evaluating both values 0.6700 sec
time spent evaluating one value 0.3500 sec
Note that IfElse condition is a boolean while Switch condition is a tensor, so
Switch is more general.
It is actually important to use ``linker='vm'`` or ``linker='cvm'``,
otherwise IfElse will compute both variables and take the same computation
time as the Switch Op. The linker is not currently set by default to 'cvm' but
it will be in a near future.
Loops
-----
**Scan**
- General form of **recurrence**, which can be used for looping.
- **Reduction** and **map** (loop over the leading dimensions) are special cases of Scan
- You 'scan' a function along some input sequence, producing an output at each time-step
- The function can see the **previous K time-steps** of your function
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- The advantage of using ``scan`` over for loops
- The number of iterations to be part of the symbolic graph
- Minimizes GPU transfers if GPU is involved
- Compute gradients through sequential steps
- Slightly faster then using a for loop in Python with a compiled Theano function
- Can lower the overall memory usage by detecting the actual amount of memory needed
**Scan Example: Computing pow(A,k)**
.. code-block:: python
import theano
import theano.tensor as T
k = T.iscalar("k"); A = T.vector("A")
def inner_fct(prior_result, A): return prior_result * A
# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
outputs_info=T.ones_like(A),
non_sequences=A, n_steps=k)
# Scan has provided us with A**1 through A**k. Keep only the last
# value. Scan notices this and does not waste memory saving them.
final_result = result[-1]
power = theano.function(inputs=[A,k], outputs=final_result,
updates=updates)
print power(range(10),2)
#[ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81.]
**Scan Example: Calculating a Polynomial**
.. code-block:: python
import theano
import theano.tensor as T
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x"); max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range=theano.tensor.arange(max_coefficients_supported)
components, updates = theano.scan(fn=lambda coeff, power, free_var:
coeff * (free_var ** power),
outputs_info=None,
sequences=[coefficients, full_range],
non_sequences=x)
polynomial = components.sum()
calculate_polynomial = theano.function(inputs=[coefficients, x],
outputs=polynomial)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial(test_coeff, 3)
# 19.0
Exercise 4
-----------
- Run both examples
- Modify and execute the polynomial example to have the reduction done by scan
Compilation pipeline Compilation pipeline
-------------------- --------------------
.. image:: pics/pipeline.png .. image:: ../hpcs2011_tutorial/pics/pipeline.png
:width: 400 px :width: 400 px
Inplace optimization Inplace optimization
...@@ -113,7 +252,7 @@ Theano output: ...@@ -113,7 +252,7 @@ Theano output:
- Try the Theano flag floatX=float32 - Try the Theano flag floatX=float32
""" """
Exercise 4 Exercise 5
----------- -----------
- In the last exercises, do you see a speed up with the GPU? - In the last exercises, do you see a speed up with the GPU?
...@@ -167,19 +306,19 @@ Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] '' 2 ...@@ -167,19 +306,19 @@ Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] '' 2
>>> theano.printing.pydotprint_variables(prediction) >>> theano.printing.pydotprint_variables(prediction)
.. image:: pics/logreg_pydotprint_prediction.png .. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
:width: 800 px :width: 800 px
All pydotprint* requires graphviz and pydot All pydotprint* requires graphviz and pydot
>>> theano.printing.pydotprint(predict) >>> theano.printing.pydotprint(predict)
.. image:: pics/logreg_pydotprint_predic.png .. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
:width: 800 px :width: 800 px
>>> theano.printing.pydotprint(train) # This is a small train example! >>> theano.printing.pydotprint(train) # This is a small train example!
.. image:: pics/logreg_pydotprint_train.png .. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
:width: 1500 px :width: 1500 px
...@@ -206,85 +345,6 @@ Debugging ...@@ -206,85 +345,6 @@ Debugging
- Few optimizations - Few optimizations
- Run Python code (better error messages and can be debugged interactively in the Python debugger) - Run Python code (better error messages and can be debugged interactively in the Python debugger)
Loops
-----
**Scan**
- General form of **recurrence**, which can be used for looping.
- **Reduction** and **map** (loop over the leading dimensions) are special cases of Scan
- You 'scan' a function along some input sequence, producing an output at each time-step
- The function can see the **previous K time-steps** of your function
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- The advantage of using ``scan`` over for loops
- The number of iterations to be part of the symbolic graph
- Minimizes GPU transfers if GPU is involved
- Compute gradients through sequential steps
- Slightly faster then using a for loop in Python with a compiled Theano function
- Can lower the overall memory usage by detecting the actual amount of memory needed
**Scan Example: Computing pow(A,k)**
.. code-block:: python
import theano
import theano.tensor as T
k = T.iscalar("k"); A = T.vector("A")
def inner_fct(prior_result, A): return prior_result * A
# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
outputs_info=T.ones_like(A),
non_sequences=A, n_steps=k)
# Scan has provided us with A**1 through A**k. Keep only the last
# value. Scan notices this and does not waste memory saving them.
final_result = result[-1]
power = theano.function(inputs=[A,k], outputs=final_result,
updates=updates)
print power(range(10),2)
#[ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81.]
**Scan Example: Calculating a Polynomial**
.. code-block:: python
import theano
import theano.tensor as T
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x"); max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range=theano.tensor.arange(max_coefficients_supported)
components, updates = theano.scan(fn=lambda coeff, power, free_var:
coeff * (free_var ** power),
outputs_info=None,
sequences=[coefficients, full_range],
non_sequences=x)
polynomial = components.sum()
calculate_polynomial = theano.function(inputs=[coefficients, x],
outputs=polynomial)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial(test_coeff, 3)
# 19.0
Exercise 5
-----------
- Run both examples
- Modify and execute the polynomial example to have the reduction done by scan
Known limitations Known limitations
----------------- -----------------
...@@ -304,5 +364,3 @@ Known limitations ...@@ -304,5 +364,3 @@ Known limitations
- Disabling a few optimizations can speed up compilation - Disabling a few optimizations can speed up compilation
- Usually too many nodes indicates a problem with the graph - Usually too many nodes indicates a problem with the graph
- Lazy evaluation in a branch (We will try to merge this summer)
...@@ -13,13 +13,13 @@ on the afternoons of Aug 2, 3, 5, and 6 (but not Aug 4th). ...@@ -13,13 +13,13 @@ on the afternoons of Aug 2, 3, 5, and 6 (but not Aug 4th).
Day 1 Day 1
----- -----
* Show of hands - what is your background? * Show of hands - what is your background?
* Python & Numpy in a nutshell * Python & Numpy in a nutshell
* Theano basics * Theano basics
* Quick tour through Deep Learning Tutorials (think about projects) * Quick tour through Deep Learning Tutorials (think about projects)
.. : .. :
day 1: day 1:
...@@ -38,25 +38,25 @@ Day 1 ...@@ -38,25 +38,25 @@ Day 1
Day 2 Day 2
----- -----
* Loop/Condition in Theano (10-20m) * Loop/Condition in Theano (10-20m)
* Propose/discuss projects * Propose/discuss projects
* Form groups and start projects! * Form groups and start projects!
Day 3 Day 3
----- -----
* Advanced Theano (30 minutes) * Advanced Theano (30 minutes)
* Debugging, profiling, compilation pipeline * Debugging, profiling, compilation pipeline
* Projects / General hacking / code-sprinting. * Projects / General hacking / code-sprinting.
Day 4 Day 4
----- -----
* *You choose* (we can split the group) * *You choose* (we can split the group)
* Extending Theano * Extending Theano
...@@ -64,9 +64,5 @@ Day 4 ...@@ -64,9 +64,5 @@ Day 4
* How to use pycuda code in Theano * How to use pycuda code in Theano
* Projects / General hacking / code-sprinting. * Projects / General hacking / code-sprinting.
Note - the schedule here is a guideline.
We can adapt it in reponse to developments in the hands-on work.
The point is for you to learn something about the practice of machine
learning.
...@@ -14,7 +14,7 @@ Theano graphs ...@@ -14,7 +14,7 @@ Theano graphs
Inputs and Outputs are lists of Theano variables Inputs and Outputs are lists of Theano variables
.. image:: pics/apply_node.png .. image:: ../hpcs2011_tutorial/pics/apply_node.png
:width: 500 px :width: 500 px
Op contract Op contract
......
...@@ -21,9 +21,10 @@ What does it do? ...@@ -21,9 +21,10 @@ What does it do?
It complements the Python numeric/scientific software stack (e.g. numpy, scipy, It complements the Python numeric/scientific software stack (e.g. numpy, scipy,
scikits, matplotlib, PIL.) scikits, matplotlib, PIL.)
Design and feature set has been driven by research in the machine learning group at the University of Design and feature set has been driven by machine learning research
Montreal (Yoshua Bengio, Pascal Vincent, Douglas Eck). at the University of
Result: a very good library for doing research in deep Montreal (groups of Yoshua Bengio, Pascal Vincent, Douglas Eck).
The result is a very good library for doing research in deep
learning and neural network training, and a flexible framework for learning and neural network training, and a flexible framework for
many other models and algorithms in machine learning more generally. many other models and algorithms in machine learning more generally.
...@@ -53,7 +54,7 @@ calculations on other data structures. ...@@ -53,7 +54,7 @@ calculations on other data structures.
Contents Contents
-------- --------
The structured part of the course will be a walk-through of the following The structured part of these lab sessions will be a walk-through of the following
material. Interleaved with this structured part will be blocks of time for material. Interleaved with this structured part will be blocks of time for
individual or group work. The idea is that you can try out Theano and get help individual or group work. The idea is that you can try out Theano and get help
from gurus on hand if you get stuck. from gurus on hand if you get stuck.
......
差异被折叠。
...@@ -51,9 +51,9 @@ copyright = '2008--2011, LISA lab' ...@@ -51,9 +51,9 @@ copyright = '2008--2011, LISA lab'
# other places throughout the built documents. # other places throughout the built documents.
# #
# The short X.Y version. # The short X.Y version.
version = '0.4' version = '0.4.1'
# The full version, including alpha/beta/rc tags. # The full version, including alpha/beta/rc tags.
release = '0.4.0' release = '0.4.1rc1'
# There are two options for replacing |today|: either, you set today to some # There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used: # non-false value, then it is used:
......
...@@ -51,7 +51,7 @@ You will need to commit the previous changes, tag the resulting version, and ...@@ -51,7 +51,7 @@ You will need to commit the previous changes, tag the resulting version, and
push that into the original repository. The syntax is something like the push that into the original repository. The syntax is something like the
following:: following::
hg commit -m"modifications for 0.X release" setup.py doc/conf.py NEWS.txt hg commit -m"modifications for 0.X release" setup.py doc/conf.py NEWS.txt HISTORY.txt theano/configdefaults.py doc/library/config.txt
hg tag 0.X hg tag 0.X
hg push hg push
......
...@@ -245,7 +245,7 @@ import theano and print the config variable, as in: ...@@ -245,7 +245,7 @@ import theano and print the config variable, as in:
.. attribute:: config.warn.ignore_bug_before .. attribute:: config.warn.ignore_bug_before
String value: 'None', 'all', '0.3', '0.4' String value: 'None', 'all', '0.3', '0.4', '0.4.1'
Default: 'None' Default: 'None'
......
...@@ -98,7 +98,7 @@ In order to compute the Jacobian of some function ``y`` with respect to some ...@@ -98,7 +98,7 @@ In order to compute the Jacobian of some function ``y`` with respect to some
parameter ``x`` we need to use the ``scan``. What we do is to loop over the parameter ``x`` we need to use the ``scan``. What we do is to loop over the
entries in ``y`` and compute the gradient of ``y[i]`` with respect to ``x``. entries in ``y`` and compute the gradient of ``y[i]`` with respect to ``x``.
.. node:: .. note::
``scan`` is a generic op in Theano that allows writting in a symbolic ``scan`` is a generic op in Theano that allows writting in a symbolic
manner all kind of recurrent equations. While in principle, creating manner all kind of recurrent equations. While in principle, creating
......
...@@ -32,6 +32,7 @@ you out. ...@@ -32,6 +32,7 @@ you out.
symbolic_graphs symbolic_graphs
modes modes
aliasing aliasing
loop
using_gpu using_gpu
pycuda pycuda
shape_info shape_info
......
.. _tutloop:
====
Loop
====
You can use :ref:`Scan <lib_scan>` to do all type of loop in Theano. All the documentation about it is in the library for now.
...@@ -47,8 +47,8 @@ AUTHOR_EMAIL = "theano-dev@googlegroups.com" ...@@ -47,8 +47,8 @@ AUTHOR_EMAIL = "theano-dev@googlegroups.com"
PLATFORMS = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"] PLATFORMS = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
MAJOR = 0 MAJOR = 0
MINOR = 4 MINOR = 4
MICRO = 0 MICRO = 1
SUFFIX = "" # Should be blank except for rc's, betas, etc. SUFFIX = "rc1" # Should be blank except for rc's, betas, etc.
ISRELEASED = False ISRELEASED = False
VERSION = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX) VERSION = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX)
......
...@@ -814,7 +814,7 @@ def insert_deepcopy(env, wrapped_inputs, wrapped_outputs): ...@@ -814,7 +814,7 @@ def insert_deepcopy(env, wrapped_inputs, wrapped_outputs):
assert len(wrapped_inputs) == len(env.inputs) assert len(wrapped_inputs) == len(env.inputs)
assert len(wrapped_outputs) == len(env.outputs) assert len(wrapped_outputs) == len(env.outputs)
reason = "insert_deepcopy"
updated_env_inputs = [env_i for i, env_i in zip(wrapped_inputs, env.inputs) if getattr(i, 'update', False)] updated_env_inputs = [env_i for i, env_i in zip(wrapped_inputs, env.inputs) if getattr(i, 'update', False)]
# We can't use env.inputs as this don't include Constant Value. # We can't use env.inputs as this don't include Constant Value.
...@@ -830,9 +830,11 @@ def insert_deepcopy(env, wrapped_inputs, wrapped_outputs): ...@@ -830,9 +830,11 @@ def insert_deepcopy(env, wrapped_inputs, wrapped_outputs):
# and not(wrapped_outputs[i].borrow and wrapped_outputs[j].borrow): # and not(wrapped_outputs[i].borrow and wrapped_outputs[j].borrow):
if env.outputs[j] in views_of_output_i: if env.outputs[j] in views_of_output_i:
if wrapped_outputs[i].borrow and wrapped_outputs[j].borrow: if wrapped_outputs[i].borrow and wrapped_outputs[j].borrow:
env.change_input('output',i, view_op(env.outputs[i])) env.change_input('output',i, view_op(env.outputs[i]),
reason=reason)
else: else:
env.change_input('output', i, deep_copy_op(env.outputs[i])) env.change_input('output', i, deep_copy_op(env.outputs[i]),
reason=reason)
copied = True copied = True
break break
...@@ -850,16 +852,20 @@ def insert_deepcopy(env, wrapped_inputs, wrapped_outputs): ...@@ -850,16 +852,20 @@ def insert_deepcopy(env, wrapped_inputs, wrapped_outputs):
if input_j in env.inputs: if input_j in env.inputs:
j = env.inputs.index(input_j) j = env.inputs.index(input_j)
if wrapped_outputs[i].borrow and wrapped_inputs[j].borrow: if wrapped_outputs[i].borrow and wrapped_inputs[j].borrow:
env.change_input('output',i, view_op(env.outputs[i])) env.change_input('output',i, view_op(env.outputs[i]),
reason="insert_deepcopy")
break break
else: else:
env.change_input('output', i, deep_copy_op(env.outputs[i])) env.change_input('output', i, deep_copy_op(env.outputs[i]),
reason="insert_deepcopy")
break break
elif wrapped_outputs[i].borrow: elif wrapped_outputs[i].borrow:
env.change_input('output',i, view_op(env.outputs[i])) env.change_input('output',i, view_op(env.outputs[i]),
reason="insert_deepcopy")
break break
else: else:
env.change_input('output', i, deep_copy_op(env.outputs[i])) env.change_input('output', i, deep_copy_op(env.outputs[i]),
reason="insert_deepcopy")
break break
NODEFAULT = ['NODEFAULT'] NODEFAULT = ['NODEFAULT']
......
...@@ -223,7 +223,7 @@ AddConfigVar('numpy.seterr_invalid', ...@@ -223,7 +223,7 @@ AddConfigVar('numpy.seterr_invalid',
### ###
AddConfigVar('warn.ignore_bug_before', AddConfigVar('warn.ignore_bug_before',
"If 'None', we warn about all Theano bugs found by default. If 'all', we don't warn about Theano bugs found by default. If a version, we print only the warnings relative to Theano bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags.", "If 'None', we warn about all Theano bugs found by default. If 'all', we don't warn about Theano bugs found by default. If a version, we print only the warnings relative to Theano bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags.",
EnumStr('None', 'all', '0.3','0.4', allow_override=False), EnumStr('None', 'all', '0.3','0.4', '0.4.1',allow_override=False),
in_c_key=False) in_c_key=False)
default_0_3 = True default_0_3 = True
......
...@@ -283,7 +283,10 @@ class TestComputeTestValue(unittest.TestCase): ...@@ -283,7 +283,10 @@ class TestComputeTestValue(unittest.TestCase):
n_steps=k) n_steps=k)
assert False assert False
except ValueError, e: except ValueError, e:
assert e.message.startswith("shape mismatch") # The first message is for numpy before 1.6
# The second is a new message in numpy 1.6
assert (e.message.startswith("shape mismatch") or
e.message.startswith("operands could not be broadcast together with shapes"))
finally: finally:
theano.config.compute_test_value = orig_compute_test_value theano.config.compute_test_value = orig_compute_test_value
......
...@@ -84,11 +84,6 @@ class IfElseIfElseIf(PureOp): ...@@ -84,11 +84,6 @@ class IfElseIfElseIf(PureOp):
class NotImplementedOp(PureOp): class NotImplementedOp(PureOp):
class E(Exception): pass class E(Exception): pass
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def make_node(self, x): def make_node(self, x):
return Apply(self, [x], [x.type()]) return Apply(self, [x], [x.type()])
def make_thunk(self, node, storage_map, compute_map, no_recycling): def make_thunk(self, node, storage_map, compute_map, no_recycling):
......
...@@ -65,7 +65,9 @@ def execute(execute=True, verbose=True): ...@@ -65,7 +65,9 @@ def execute(execute=True, verbose=True):
t1=time.time() t1=time.time()
if verbose and execute: if verbose and execute:
print print
print 'this execution time took %.2fs'%(t1-t0) print 'This execution time took %.2fs'%(t1-t0)
print
print 'Try to run this script a few times. Experience show that the first time is not as fast as followings call. The difference is not big, but consistent.'
return t1-t0 return t1-t0
...@@ -139,6 +141,8 @@ if __name__ == "__main__": ...@@ -139,6 +141,8 @@ if __name__ == "__main__":
M2070/3.2 0.32s M2070/3.2 0.32s
GTX470/3.0 0.34s GTX470/3.0 0.34s
GTX285/3.0 0.40s GTX285/3.0 0.40s
C1060/3.2 0.46s
GTX550Ti/4.0 0.57s
GT220/3.2RC 3.80s GT220/3.2RC 3.80s
8500GT/3.0 10.68s 8500GT/3.0 10.68s
""" """
......
...@@ -19,7 +19,7 @@ if not theano.misc.pycuda_init.pycuda_available: ...@@ -19,7 +19,7 @@ if not theano.misc.pycuda_init.pycuda_available:
if cuda_ndarray.cuda_available == False: if cuda_ndarray.cuda_available == False:
from nose.plugins.skip import SkipTest from nose.plugins.skip import SkipTest
raise SkipTest('Optional package cuda disabled') raise SkipTest('Optional theano package cuda disabled')
import pycuda import pycuda
import pycuda.driver as drv import pycuda.driver as drv
......
...@@ -424,6 +424,9 @@ def pydotprint(fct, outfile=None, ...@@ -424,6 +424,9 @@ def pydotprint(fct, outfile=None,
file to which the name of the scan op is concatenated and file to which the name of the scan op is concatenated and
the index in the toposort of the scan. the index in the toposort of the scan.
This index can be printed in the graph with the option with_ids. This index can be printed in the graph with the option with_ids.
:param var_with_name_simple: If true and a variable have a name,
we will print only the variable name.
Otherwise, we concatenate the type to the var name.
In the graph, box are an Apply Node(the execution of an op) and ellipse are variable. In the graph, box are an Apply Node(the execution of an op) and ellipse are variable.
If variable have name they are used as the text(if multiple var have the same name, they will be merged in the graph). If variable have name they are used as the text(if multiple var have the same name, they will be merged in the graph).
......
"""
Optimization to specialize gemm -> ger are not written
Scipy implementation is not written
We need to call scipy.linalg.blas.[cf]blas.[sdcz]ger here in order not to lose speed against the old Outer op.
Here is the scipy signature: ger(alpha,x,y,incx=1,incy=1,a=0.0,overwrite_x=1,overwrite_y=1,overwrite_a=0)
http://www.scipy.org/doc/api_docs/SciPy.lib.blas.info.html
C implementation is not written.
Tests are not written.
"""
class GER(Op):
"""
General rank-1 update
A <- A + a x' y
For matrix A, vectors x, y, and scalar a.
"""
def __init__(self, inplace):
self.inplace = bool(inplace)
if self.inplace:
self.destroy_map = {0: [0]}
def __hash__(self):
return hash((type(self), self.inplace))
def __eq__(self, other):
return hash((type(self), self.inplace))
def make_node(self, *inputs):
inputs = map(as_tensor_variable, inputs)
A, a, x, y = inputs
nx = x.type.ndim
ny = y.type.ndim
if nx != 1: raise TypeError('non-vector arg0 to outer()', x)
if ny != 1: raise TypeError('not-vector arg1 to outer()', y)
if A.dtype != a.dtype:
raise TypeError('dtype mismatch', (A.dtype, a.dtype))
if A.dtype != x.dtype:
raise TypeError('dtype mismatch', (A.dtype, x.dtype))
if A.dtype != y.dtype:
raise TypeError('dtype mismatch', (A.dtype, y.dtype))
return Apply(self, inputs, [A.type()])
def perform(self, node, inp, out):
A, a, x, y = inp
if not self.inplace:
A = A.copy()
A += a * numpy.outer(x, y)
out[0][0] = A
# grad not needed because this is put in during optimization
def __str__(self):
return "GER"
...@@ -297,6 +297,14 @@ def scan( fn ...@@ -297,6 +297,14 @@ def scan( fn
loop are done (see ``theano.function`` for details about loop are done (see ``theano.function`` for details about
possible values and their meaning). possible values and their meaning).
:param profile:
Flag or string. If true, or different from the empty string, a
profile object will be created and attached to the inner graph of
scan. In case ``profile`` is True, the profile object will have the
name of the scan instance, otherwise it will have the passed string.
Profile object collect (and print) information only when running the
inner graph with the new cvm linker ( with default modes,
other linkers this argument is useless)
:rtype: tuple :rtype: tuple
:return: tuple of the form (outputs, updates); ``outputs`` is either a :return: tuple of the form (outputs, updates); ``outputs`` is either a
......
...@@ -17,17 +17,16 @@ import numpy ...@@ -17,17 +17,16 @@ import numpy
import sys import sys
import theano import theano
from theano import tensor, scalar from theano import tensor
from theano.tensor import opt, TensorType, get_constant_value from theano.tensor import opt, get_constant_value
from theano import gof from theano import gof
from theano.compile import optdb from theano.compile import optdb
from theano.gof.opt import EquilibriumOptimizer
from theano import config from theano import config
from theano.compile.function_module import deep_copy_op from theano.compile.function_module import deep_copy_op
import scan_op import scan_op
import scan_utils import scan_utils
from scan_utils import clone, equal_computations, find_up, scan_args from scan_utils import equal_computations, find_up, scan_args
from theano.gof.opt import pre_constant_merge, pre_greedy_local_optimizer from theano.gof.opt import pre_constant_merge, pre_greedy_local_optimizer
# Logging function for sending warning or info # Logging function for sending warning or info
...@@ -454,12 +453,12 @@ class ScanSaveMem(gof.Optimizer): ...@@ -454,12 +453,12 @@ class ScanSaveMem(gof.Optimizer):
if i > op.n_mit_mot: if i > op.n_mit_mot:
try: try:
length = shape_of[out][0] length = shape_of[out][0]
except: except Exception:
length = node.inputs[0] + init_l[i] length = node.inputs[0] + init_l[i]
else: else:
try: try:
length = shape_of[out][0] length = shape_of[out][0]
except: except Exception:
length = out.shape[0] length = out.shape[0]
cf_slice = tensor.basic.get_canonical_form_slice( cf_slice = tensor.basic.get_canonical_form_slice(
this_slice[0], length) this_slice[0], length)
...@@ -556,7 +555,7 @@ class ScanSaveMem(gof.Optimizer): ...@@ -556,7 +555,7 @@ class ScanSaveMem(gof.Optimizer):
else: else:
try: try:
length = shape_of[out][0] length = shape_of[out][0]
except: except Exception:
length = out.shape[0] length = out.shape[0]
cf_slice = tensor.basic.get_canonical_form_slice( cf_slice = tensor.basic.get_canonical_form_slice(
this_slice[0],length) this_slice[0],length)
...@@ -1124,5 +1123,3 @@ scan_seqopt.register('scanOp_merge_inouts', ...@@ -1124,5 +1123,3 @@ scan_seqopt.register('scanOp_merge_inouts',
3, 3,
'fast_run', 'fast_run',
'scan') 'scan')
...@@ -158,7 +158,7 @@ def as_tensor_variable(x, name=None, ndim=None): ...@@ -158,7 +158,7 @@ def as_tensor_variable(x, name=None, ndim=None):
except TypeError: except TypeError:
try: try:
str_x = str(x) str_x = str(x)
except: except Exception, e:
str_x = repr(x) str_x = repr(x)
raise TypeError("Cannot convert %s to TensorType" % str_x, type(x)) raise TypeError("Cannot convert %s to TensorType" % str_x, type(x))
...@@ -340,7 +340,7 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None): ...@@ -340,7 +340,7 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
else: else:
# leave the shape out of the type # leave the shape out of the type
return rtype(TensorType(dtype = x_.dtype, broadcastable = bcastable), x_, name=name) return rtype(TensorType(dtype = x_.dtype, broadcastable = bcastable), x_, name=name)
except: except Exception:
raise TypeError("Could not convert %s to TensorType" % x, type(x)) raise TypeError("Could not convert %s to TensorType" % x, type(x))
def constant(x, name=None, ndim=None, dtype=None): def constant(x, name=None, ndim=None, dtype=None):
...@@ -425,7 +425,7 @@ def get_constant_value(v): ...@@ -425,7 +425,7 @@ def get_constant_value(v):
try: try:
numpy.complex(data) #works for all numeric scalars numpy.complex(data) #works for all numeric scalars
return data return data
except: except Exception:
raise TypeError('v.data is non-numeric, non-scalar, or has more than one unique value', v) raise TypeError('v.data is non-numeric, non-scalar, or has more than one unique value', v)
if v.owner: if v.owner:
if isinstance(v.owner.op, Alloc): if isinstance(v.owner.op, Alloc):
...@@ -1361,7 +1361,7 @@ class TensorConstantSignature(tuple): ...@@ -1361,7 +1361,7 @@ class TensorConstantSignature(tuple):
return False return False
try: try:
(t0, d0), (t1,d1) = self, other (t0, d0), (t1,d1) = self, other
except: except Exception, e:
return False return False
#N.B. compare shape to ensure no broadcasting in == #N.B. compare shape to ensure no broadcasting in ==
if t0 != t1 or d0.shape != d1.shape: if t0 != t1 or d0.shape != d1.shape:
...@@ -1994,7 +1994,7 @@ def max(x, axis='DEFAULT'): ...@@ -1994,7 +1994,7 @@ def max(x, axis='DEFAULT'):
try: try:
const = get_constant_value(axis) const = get_constant_value(axis)
return CAReduce(scal.maximum,list(const))(x) return CAReduce(scal.maximum,list(const))(x)
except: except Exception:
return max_and_argmax(x,axis)[0] return max_and_argmax(x,axis)[0]
@constructor @constructor
...@@ -2873,7 +2873,7 @@ def extract_constant(x): ...@@ -2873,7 +2873,7 @@ def extract_constant(x):
''' '''
try: try:
x = get_constant_value(x) x = get_constant_value(x)
except: except Exception:
pass pass
if isinstance(x, scal.ScalarVariable): if isinstance(x, scal.ScalarVariable):
if x.owner and isinstance(x.owner.op, ScalarFromTensor): if x.owner and isinstance(x.owner.op, ScalarFromTensor):
...@@ -4398,7 +4398,7 @@ class Reshape(Op): ...@@ -4398,7 +4398,7 @@ class Reshape(Op):
', should be %i' % (len(shp), self.ndim), shp) ', should be %i' % (len(shp), self.ndim), shp)
try: try:
out[0] = numpy.reshape(x, shp) out[0] = numpy.reshape(x, shp)
except: except Exception, e:
raise ValueError('Cannot reshape input of shape %s to shape %s' % (x.shape,shp)) raise ValueError('Cannot reshape input of shape %s to shape %s' % (x.shape,shp))
def grad(self, inp, grads): def grad(self, inp, grads):
x, shp = inp x, shp = inp
...@@ -4593,7 +4593,7 @@ class ARange(Op): ...@@ -4593,7 +4593,7 @@ class ARange(Op):
try: try:
v = get_constant_value(var) v = get_constant_value(var)
return numpy.all(v == value) return numpy.all(v == value)
except: except Exception:
pass pass
return False return False
......
...@@ -12,7 +12,7 @@ class ConvTransp3D(theano.Op): ...@@ -12,7 +12,7 @@ class ConvTransp3D(theano.Op):
return hash(type(self)) return hash(type(self))
def c_code_cache_version(self): def c_code_cache_version(self):
return (1,) return (2,)
def make_node(self, W, b, d, H, RShape = None): def make_node(self, W, b, d, H, RShape = None):
""" """
...@@ -266,11 +266,11 @@ class ConvTransp3D(theano.Op): ...@@ -266,11 +266,11 @@ class ConvTransp3D(theano.Op):
for (int i = 0; i < batchSize; i++) { for (int i = 0; i < batchSize; i++) {
for (int r = 0; r < videoHeight; r++) { for (int r = 0; r < videoHeight; r++) {
const int frc = std::max(0.0, ceil(float(r-filterHeight+1)/float(dr))); const int frc = (int)std::max(0.0f, ceilf(float(r-filterHeight+1)/float(dr)));
for (int c = 0; c < videoWidth; c++) { for (int c = 0; c < videoWidth; c++) {
const int fcc = std::max(0.0, ceil(float(c-filterWidth +1)/float(dc))); const int fcc = (int)std::max(0.0f, ceilf(float(c-filterWidth +1)/float(dc)));
for (int t = 0; t < videoDur; t++) { for (int t = 0; t < videoDur; t++) {
const int ftc = std::max(0.0, ceil(float(t-filterDur +1) /float(dt))); const int ftc = (int)std::max(0.0f, ceilf(float(t-filterDur +1) /float(dt)));
long long Rpost = i * %(R)s->strides[0] + r * %(R)s->strides[1] + c * %(R)s->strides[2] + t * %(R)s->strides[3]; long long Rpost = i * %(R)s->strides[0] + r * %(R)s->strides[1] + c * %(R)s->strides[2] + t * %(R)s->strides[3];
......
...@@ -117,7 +117,7 @@ def Rop(f, wrt, eval_points): ...@@ -117,7 +117,7 @@ def Rop(f, wrt, eval_points):
return rval return rval
def Lop(f, wrt, eval_points, consider_constant=[], warn_type=False, def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
disconnected_inputs='raise'): disconnected_inputs='raise'):
""" """
Computes the L operation on `f` wrt to `wrt` evaluated at points given Computes the L operation on `f` wrt to `wrt` evaluated at points given
...@@ -140,6 +140,8 @@ def Lop(f, wrt, eval_points, consider_constant=[], warn_type=False, ...@@ -140,6 +140,8 @@ def Lop(f, wrt, eval_points, consider_constant=[], warn_type=False,
indices that specify both the position within a list and all indices that specify both the position within a list and all
coordinates of the tensor element in the last coordinates of the tensor element in the last
""" """
if consider_constant is None:
consider_constant = []
if not isinstance(f, TensorVariable): if not isinstance(f, TensorVariable):
raise TypeError('In tensor.Lop(), cost argument should be a TensorVariable.', f) raise TypeError('In tensor.Lop(), cost argument should be a TensorVariable.', f)
...@@ -155,7 +157,6 @@ def Lop(f, wrt, eval_points, consider_constant=[], warn_type=False, ...@@ -155,7 +157,6 @@ def Lop(f, wrt, eval_points, consider_constant=[], warn_type=False,
list(inputs) + list(consider_constant), list(inputs) + list(consider_constant),
warn_type=warn_type) warn_type=warn_type)
# Note : If p is not in gmap there can be several reasons, among which # Note : If p is not in gmap there can be several reasons, among which
# is the fact that p might not be part of the computational graph. A # is the fact that p might not be part of the computational graph. A
# simple example is that for a+b for e.g. a[0] is not part of the graph, # simple example is that for a+b for e.g. a[0] is not part of the graph,
...@@ -196,7 +197,7 @@ def Lop(f, wrt, eval_points, consider_constant=[], warn_type=False, ...@@ -196,7 +197,7 @@ def Lop(f, wrt, eval_points, consider_constant=[], warn_type=False,
# Gradient # Gradient
######################### #########################
def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False, def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
disconnected_inputs='raise'): disconnected_inputs='raise'):
""" """
:type cost: Scalar (0-dimensional) `Variable` :type cost: Scalar (0-dimensional) `Variable`
...@@ -228,6 +229,9 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False, ...@@ -228,6 +229,9 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False,
`theano.gradient.grad_sources_inputs``. `theano.gradient.grad_sources_inputs``.
""" """
if consider_constant is None:
consider_constant = []
if not isinstance(cost, TensorVariable): if not isinstance(cost, TensorVariable):
raise TypeError('In tensor.grad(), cost argument should be a TensorVariable.', cost) raise TypeError('In tensor.grad(), cost argument should be a TensorVariable.', cost)
......
...@@ -7,6 +7,7 @@ import unittest ...@@ -7,6 +7,7 @@ import unittest
from nose.plugins.skip import SkipTest from nose.plugins.skip import SkipTest
import numpy import numpy
from numpy.testing import dec from numpy.testing import dec
from numpy.testing.noseclasses import KnownFailureTest
from theano.tensor import * from theano.tensor import *
from theano.tensor import basic as tensor # for hidden symbols from theano.tensor import basic as tensor # for hidden symbols
...@@ -4736,6 +4737,22 @@ class test_arithmetic_cast(unittest.TestCase): ...@@ -4736,6 +4737,22 @@ class test_arithmetic_cast(unittest.TestCase):
config.int_division == 'floatX'): config.int_division == 'floatX'):
assert theano_dtype == config.floatX assert theano_dtype == config.floatX
continue continue
if (cfg == 'numpy+floatX' and
a_type == 'complex128' and
b_type == 'float32' and
combo == ('scalar', 'array') and
numpy.__version__ == '1.6.0' and
theano_dtype == 'complex128' and
numpy_dtypes == ['complex64',
'complex64']):
# In numpy 1.6.0 adding a complex128 with
# a float32 may result in a complex64. This
# may be a bug (investigation is currently
# in progress), so in the meantime we just
# mark this test as a known failure.
raise KnownFailureTest('Known issue with '
'numpy 1.6.0, see #761')
# In any other situation: something wrong is # In any other situation: something wrong is
# going on! # going on!
assert False assert False
......
...@@ -82,7 +82,7 @@ class test_Broadcast(unittest.TestCase): ...@@ -82,7 +82,7 @@ class test_Broadcast(unittest.TestCase):
self.assertTrue((f(xv, yv) == zv).all()) self.assertTrue((f(xv, yv) == zv).all())
#test CAReduce.infer_shape #test Elemwise.infer_shape
#the Shape op don't implement c_code! #the Shape op don't implement c_code!
if isinstance(linker,gof.PerformLinker): if isinstance(linker,gof.PerformLinker):
x = TensorType('float64', [(entry == 1) for entry in xsh])('x') x = TensorType('float64', [(entry == 1) for entry in xsh])('x')
...@@ -111,7 +111,7 @@ class test_Broadcast(unittest.TestCase): ...@@ -111,7 +111,7 @@ class test_Broadcast(unittest.TestCase):
f(xv, yv) f(xv, yv)
self.assertTrue((xv == zv).all()) self.assertTrue((xv == zv).all())
#test CAReduce.infer_shape #test Elemwise.infer_shape
#the Shape op don't implement c_code! #the Shape op don't implement c_code!
if isinstance(linker,gof.PerformLinker): if isinstance(linker,gof.PerformLinker):
x = TensorType('float64', [(entry == 1) for entry in xsh])('x') x = TensorType('float64', [(entry == 1) for entry in xsh])('x')
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论