提交 79fda719 authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merged (solved conflict in theano/configdefaults.py)

......@@ -24,27 +24,27 @@ instructions below for detailed installation steps):
Python_ >= 2.4
The development package (``python-dev`` or ``python-devel``
on most Linux distributions) is recommended (see just below).
on most Linux distributions) is recommended (see just below).
``g++``, ``python-dev``
Not technically required but *highly* recommended, in order to compile
generated C code. Theano `can` fall back on a NumPy-based Python execution
model, but a C compiler allows for vastly faster execution.
Not technically required but *highly* recommended, in order to compile
generated C code. Theano `can` fall back on a NumPy-based Python execution
model, but a C compiler allows for vastly faster execution.
`NumPy <http://numpy.scipy.org/>`_ >= 1.3.0
Earlier versions have memory leaks.
`SciPy <http://scipy.org>`_
Only currently required for sparse matrix and special functions
support, but *highly* recommended. We recommend SciPy
>=0.7 if you are using sparse matrices, because ``scipy.sparse``
is buggy in 0.6 (the ``scipy.csc_matrix`` version of ``dot()`` has a
bug with singleton dimensions, there may be more bugs).
Only currently required for sparse matrix and special functions
support, but *highly* recommended. We recommend SciPy
>=0.7 if you are using sparse matrices, because ``scipy.sparse``
is buggy in 0.6 (the ``scipy.csc_matrix`` version of ``dot()`` has a
bug with singleton dimensions, there may be more bugs).
A `BLAS`_ installation (with Level 3 functionality)
Including the development headers (``-dev``, ``-devel``, depending on
your Linux distribution). Mac OS X comes with the `Accelerate
framework`_ built in, and various options exist for Windows (see
Including the development headers (``-dev``, ``-devel``, depending on
your Linux distribution). Mac OS X comes with the `Accelerate
framework`_ built in, and various options exist for Windows (see
below).
.. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
......@@ -55,14 +55,17 @@ The following libraries and software are optional:
`nose <http://somethingaboutorange.com/mrl/projects/nose/>`_
Recommended, to run Theano's test-suite.
`Sphinx <http://sphinx.pocoo.org/>`_ >= 0.5.1, `pygments <http://pygments.org/>`_
For building the documentation. LaTeX_ and dvipng_ are also necessary
for math to show up as images.
For building the documentation. LaTeX_ and dvipng_ are also necessary
for math to show up as images.
`Mercurial <http://mercurial.selenic.com/>`_
To download bleeding-edge versions of Theano.
`NVIDIA CUDA drivers and SDK`_
Required for GPU code generation/execution. Only NVIDIA GPUs using
32-bit floating point numbers are currently supported.
Required for GPU code generation/execution. Only NVIDIA GPUs using
32-bit floating point numbers are currently supported.
.. _LaTeX: http://www.latex-project.org/
.. _dvipng: http://savannah.nongnu.org/projects/dvipng/
......@@ -77,7 +80,7 @@ Basic user install instructions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The easiest way to obtain the released version of Theano is from PyPI using
pip_ (a replacement for easy_install_ provided by setuptools_/distribute_)
pip_ (a replacement for easy_install_ provided by setuptools_/distribute_)
by typing
.. code-block:: bash
......@@ -111,7 +114,7 @@ directory; see the `virtualenv documentation`_ for details.
``easy_install`` such as more intelligent dependency management, better
error messages and a ``pip uninstall`` command for easily removing
packages.
If you do not have ``pip`` installed but do have ``easy_install``, you can
get ``pip`` by simply typing ``easy_install pip``.
......@@ -320,7 +323,7 @@ correctly (for example, for MKL this might be ``-lmkl -lguide -lpthread`` or
.. note::
Make sure your BLAS
libraries are available as dynamically-loadable libraries.
libraries are available as dynamically-loadable libraries.
ATLAS is often installed only as a static library. Theano is not able to
use this static library. Your ATLAS installation might need to be modified
to provide dynamically loadable libraries. (On Linux this
......@@ -334,8 +337,8 @@ correctly (for example, for MKL this might be ``-lmkl -lguide -lpthread`` or
Mac
---
- If the above required libraries are not already installed on your Mac, one option is first, to
install `MacPorts <http://www.macports.org/>`__.
- If the above required libraries are not already installed on your Mac,
one option is first, to install `MacPorts <http://www.macports.org/>`__.
- Then, in order to install one or more of the required libraries, use "port install", e.g. as follows:
......@@ -359,7 +362,7 @@ Mac
reason this is necessary is because you might have an Apple-provided python
(via, for example, an Xcode installation). After performing this step, you
should check that the symbolic link provided by ``which python`` points to
the MacPorts python. For instance, on Snow Leopard with the latest MacPorts,
the MacPorts python. For instance, on Snow Leopard with the latest MacPorts,
the output of ``which python`` is ``/opt/local/bin/python`` and this symbolic
link points to ``/opt/local/bin/python2.6``. When executing ``sudo
python_select python26-apple`` (which you should **not** do), the link
......@@ -376,7 +379,7 @@ Mac
- Please follow the same procedure with ``numpy``.
- Put ``export PYTHONPATH=/opt/local/lib/python2.6/site-packages:$PYTHONPATH``
in your ``.bashrc`` in order to include your MacPorts Python packages
in your ``.bashrc`` in order to include your MacPorts Python packages
(NumPy, SciPy) in Python's path.
- Make sure that the gcc version that you have installed on your system is
......@@ -408,7 +411,7 @@ Mac
- An obscure ``Bus error`` can sometimes be caused when linking
Theano-generated object files against the ``framework`` library in Leopard.
For this reason, we've disabled linking with ``-framework Python``, since on
For this reason, we've disabled linking with ``-framework Python``, since on
most configurations this solves the ``Bus error`` problem. If this default
configuration causes problems with your Python/Theano installation and you think
that linking with ``-framework Python`` might help, then either set
......@@ -421,9 +424,10 @@ Mac
mac_framework_link=True
Please infom us if you have trouble installing and running Theano on your mac.
We would be especially interested in dependencies that we missed listing, as well as tests
that fail on your platform (use the ``theano-users@googlegroups.com`` mailing list,
but note that you must first register to it, by going to `theano-users`_).
We would be especially interested in dependencies that we missed
listing, as well as tests that fail on your platform (use the
``theano-users@googlegroups.com`` mailing list, but note that you must
first register to it, by going to `theano-users`_).
Windows
......@@ -533,7 +537,7 @@ used within a MinGW Shell (not available if you only installed Python(x,y)).
to create under Windows) in your user profile directory (the directory you
are into when you start a new command prompt with ``cmd``), containing the
following two lines:
.. code-block:: cfg
[blas]
......@@ -564,7 +568,7 @@ used within a MinGW Shell (not available if you only installed Python(x,y)).
error of the type ``"Not enough storage is available to
process this command"``): one workaround is to run nosetests
multiple times under individual subdirectories.
Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~
......@@ -598,7 +602,7 @@ top of the MinGW installation included within Python(x,y), as follows:
- In a prompt (``cmd``), install MSYS with
.. code-block:: bash
mingw-get install msys-base
- Edit ``pythonxy\mingw\msys\1.0\msys.bat`` (e.g. in Wordpad) and add as first
......@@ -619,7 +623,7 @@ follows:
a) Download `ActivePerl <http://www.activestate.com/activeperl/downloads>`_ and
install it (other Perl interpreters should also work, but were not
tested).
b) Unpack GotoBLAS2, either using `7-zip <http://www.7-zip.org/>`__ or in
a shell with:
......@@ -633,7 +637,7 @@ follows:
.. code-block:: bash
quickbuild.win32 1>log.txt 2>err.txt
quickbuild.win32 1>log.txt 2>err.txt
Compilation should take a few minutes. Afterwards, you will probably
find many error messages in err.txt, but there should be an ``exports``
......@@ -695,20 +699,21 @@ use a compilation directory located somewhere else:
base_compiledir=path_to_a_directory_without_such_characters
Then
1) Install CUDA driver (32-bit on 32-bit Windows, idem for 64-bit).
2) Install CUDA toolkit 32-bit (even if you computer is 64-bit,
2) Install CUDA toolkit 32-bit (even if you computer is 64-bit,
must match the Python installation version).
3) Install CUDA SDK 32-bit.
4) Test some pre-compiled example of the sdk.
5) Download Visual Studio 2008 Express (free, VS2010 not supported by nvcc 3.1,
VS2005 is not available for download but supported by nvcc, the non free version should work too).
VS2005 is not available for download but supported by nvcc, the non
free version should work too).
6) Follow the instruction in the GettingStartedWindows.pdf file from the CUDA web
6) Follow the instruction in the GettingStartedWindows.pdf file from the CUDA web
site to compile CUDA code with VS2008. If that does not work, you will
not be able to compile GPU code with Theano.
......@@ -729,7 +734,7 @@ Then
9) Then run the Theano CUDA test files with nosetests from the
``theano/sandbox/cuda/tests`` subdirectory. In the current version of
Theano, this should fail with an error like:
.. code-block:: bash
NVCC: nvcc fatal: Don't know what to do with
......
......@@ -156,7 +156,7 @@ This is primarily for internal debugging, not for typical use.
For the transparent use of different type of optimization Theano can make,
there is the policy that get_value() always return by default the same object type
it received when the shared variable was created. So if you created manually data on
it received when the shared variable was created. So if you created manually data on
the gpu and create a shared variable on the gpu with this data, get_value will always
return gpu data event when return_internal_type=False.
......
......@@ -6,6 +6,7 @@ from StringIO import StringIO
import numpy
import theano
from theano import gof
from theano.gof import Env, graph, utils, link
from theano.gof.link import WrapLinkerMany, raise_with_op
......@@ -536,6 +537,9 @@ def _check_inputs(node, storage_map, r_vals, dr_vals, active_nodes, clobber_dr_v
# But this depend on the version of numpy!
if getattr(out_var,'size',2)==1:
continue
if isinstance(node.op, theano.compile.mode.OutputGuard):
# This class is not in the final graph.
continue
if not _may_share_memory(out_var, in_var):
#when a subtensor return a tensor of ndim==0, numpy seam to return a copy.
#when have an empty ndarray(happen with output guard) it is not the same. why?
......
......@@ -24,7 +24,10 @@ AddConfigVar('device',
)
AddConfigVar('init_gpu_device',
"Initialize the gpu device to use. This don't change the default behavior. We don't default to try to move the computation to it. We don't default to put shared variable of float32 on it. Useful to run the test on a specific gpu.",
("Initialize the gpu device to use, works only if device=cpu. "
"Unlike 'device', setting this option will NOT move computations, "
"nor shared variables, to the specified GPU. "
"It can be used to run GPU-specific tests on a particular GPU."),
EnumStr('', 'gpu0', 'gpu1', 'gpu2', 'gpu3',
allow_override=False)
)
......
......@@ -104,7 +104,8 @@ if cuda_available:
cuda_available = False
cuda_initialization_error_message = e.message
# We must do those import to be able to create the full doc when nvcc
# We must do those import to be able to create the full doc when
# nvcc is not available
from theano.sandbox.cuda.var import (CudaNdarrayVariable,
CudaNdarrayConstant,
CudaNdarraySharedVariable,
......@@ -115,7 +116,12 @@ if cuda_available:
#check if their is an old cuda_ndarray that was loading instead of the one we compiled!
import cuda_ndarray.cuda_ndarray
if cuda_ndarray_so != cuda_ndarray.cuda_ndarray.__file__:
warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!")
warning("WARNING: cuda_ndarray was loaded from",
cuda_ndarray.cuda_ndarray.__file__,
"""This is not expected as theano should compile it
automatically for you. Do you have a directory called cuda_ndarray in your
LD_LIBRARY_PATH environment variable? If so, please remove it as it is
outdated!""")
shared_constructor = float32_shared_constructor
......@@ -204,8 +210,14 @@ def handle_shared_float32(tf):
raise NotImplementedError('removing our handler')
if config.device.startswith('gpu'):
use(config.device, config.force_device)
use(device=config.device, force=config.force_device)
elif config.init_gpu_device:
assert config.device=="cpu", "We can use the theano flags init_gpu_device only when the theano flags device=='cpu'"
print "Will init the gpu to use a specific gpu device. This don't default tomove computation and allocate shared variable of float32 to this device. For that try the theano flags device."
use(config.init_gpu_device, config.force_device, False, False)
assert config.device=="cpu", "We can use the Theano flag init_gpu_device only when the Theano flag device=='cpu'"
warning(("GPU device %s will be initialized, and used if a GPU is needed. "
"However, no computation, nor shared variables, will be implicitly "
"moved to that device. If you want that behavior, use the 'device' "
"flag instead.") % config.init_gpu_device)
use(device=config.init_gpu_device,
force=config.force_device,
default_to_move_computation_to_gpu=False,
move_shared_float32_to_gpu=False)
......@@ -335,12 +335,14 @@ class GpuConv(Op):
^ hash(self.imshp)
def __str__(self):
return '%s{%s, %s, %s, %s, %s}' %(self.__class__.__name__,
return '%s{%s, %s, %s, %s, %s, %s, %s}' %(self.__class__.__name__,
self.border_mode,
str(self.subsample),
str(self.logical_img_hw),
str(self.logical_kern_hw),
str(self.logical_kern_align_top))
str(self.logical_kern_align_top),
str(self.imshp),
str(self.kshp))
def make_node(self, img, kern):
if img.type.ndim != 4:
......
......@@ -56,7 +56,7 @@ class InputToGpuOptimizer(Optimizer):
new_input = host_from_gpu(gpu_from_host(input))
if new_input.type==input.type:
env.replace_validate(input, new_input, "To allow further optimisation to move Ops to gpu")
env.replace_validate(input, new_input, "InputToGpuOptimizer")
except Exception, e:
#as we currently only support float32, this can fail.
#Using try except make that we won't need
......@@ -113,9 +113,12 @@ def local_gpu_elemwise_0(node):
else:
return False
gpu_elemwise = split_huge_add_or_mul(gpu_elemwise.owner).outputs[0]
return [host_from_gpu(gpu_elemwise)]
gpu_elemwise = split_huge_add_or_mul(gpu_elemwise.owner)
if not gpu_elemwise:
return False
if max_inputs_to_GpuElemwise(node)<len(gpu_elemwise.inputs):
return False
return [host_from_gpu(gpu_elemwise.outputs[0])]
@register_opt()
@local_optimizer([])
def local_gpu_elemwise_1(node):
......@@ -130,8 +133,10 @@ def local_gpu_elemwise_1(node):
new_op = GpuElemwise(elemwise_node.op.scalar_op)
if all([i.dtype=='float32' for i in elemwise_node.inputs]):
gpu_elemwise = new_op(*[gpu_from_host(i) for i in elemwise_node.inputs])
gpu_elemwise = split_huge_add_or_mul(gpu_elemwise.owner).outputs[0]
return [gpu_elemwise]
gpu_elemwise = split_huge_add_or_mul(gpu_elemwise.owner)
if not gpu_elemwise:
return False
return [gpu_elemwise.outputs[0]]
return False
@register_opt()
......@@ -730,24 +735,35 @@ optdb.register('InplaceGpuBlasOpt',
max_use_ratio=5),
70.0, 'fast_run', 'inplace')
gpu_ptr_size = 8
cpu_ptr_size = 8
int_size = 8
try:
#RETURN (gpu ptr size, cpu ptr size, int sizes)
t = cuda_ndarray.cuda_ndarray.ptr_int_size()
gpu_ptr_size, cpu_ptr_size, int_size = t
except Exception, e:
_logger.warning(("OPTIMIZATION WARNING: "
"Got the following error, but we can ignore it. "
"This could cause less GpuElemwise fused together.\n"
"%s") % e)
def max_inputs_to_GpuElemwise(node):
"""
return the maximum number of input this Apply node to an GpuElemwise can accept.
This is needed as currently their is a limit of 256 bytes of paramter for the gpu function.
This mesure the number of paramter we put in our gpu function and compute the maximum number of inputs that respect the 256 bytes limits.
"""
#TODO: detect the size of gpu pointeur and c int.
int_size = 8
ptr_size = 8
argument_limit = 256 # if was 240, with this note: 16 bytes are used for block and thread coords etc.
argument_limit = 232 # some bytes are used for block and thread coords etc.
ndim = node.inputs[0].type.ndim
size_param_mandatory = int_size #for numels
size_param_mandatory += int_size * node.inputs[0].type.ndim # for the shape#node.outputs[0].ndim+1+node.inputs[0].ndim+1
size_param_mandatory += sum((ptr_size + int_size * i.type.ndim) for i in node.outputs)
size_param_mandatory += int_size * ndim # for the shape
size_param_mandatory += sum((gpu_ptr_size + int_size * ndim) for i in node.outputs)
nb_bytes_avail = argument_limit-size_param_mandatory
nb_bytes_per_inputs = (node.inputs[0].ndim*int_size)+ptr_size
max_nb_inputs = nb_bytes_avail//nb_bytes_per_inputs
nb_bytes_avail = argument_limit - size_param_mandatory
nb_bytes_per_inputs = (ndim*int_size) + gpu_ptr_size
max_nb_inputs = nb_bytes_avail // nb_bytes_per_inputs
return max_nb_inputs
def split_huge_add_or_mul(node):
......@@ -762,6 +778,8 @@ def split_huge_add_or_mul(node):
"""
if node.op.scalar_op in (scal.add, scal.mul):
max_nb_inputs = max_inputs_to_GpuElemwise(node)
if max_nb_inputs<=1 and len(node.inputs)>1:
return False
while len(node.inputs)>max_nb_inputs:
inner_op = []
for i in range(0,len(node.inputs),max_nb_inputs):
......
......@@ -161,8 +161,9 @@ def test_huge_elemwise_fusion():
in case their is too many inputs and that would make it bust the 256
bytes limits.
"""
shape = (3,4,5,6)
vars = [tensor.tanh(tensor.ftensor4()) for x in range(10)]
shape = (2,3,4,5,6)
ttype = tensor.tensor(dtype='float32',broadcastable=(False,)*len(shape))
vars = [tensor.tanh(ttype) for x in range(10)]
f = pfunc(vars, [vars[0]-vars[1]-vars[2]-vars[3]-vars[4]-vars[5]-vars[6]], mode=mode_with_gpu)
topo = f.maker.env.toposort()
#theano.printing.debugprint(f)
......@@ -170,12 +171,29 @@ def test_huge_elemwise_fusion():
# print >> sys.stdout, i, node
assert len(topo)==10
assert sum([isinstance(node.op, cuda.GpuElemwise) for node in topo])==2
assert isinstance(topo[7].op.scalar_op,theano.scalar.basic.Composite)
assert isinstance(topo[7].op.scalar_op,theano.scalar.basic.Sub)
assert isinstance(topo[8].op.scalar_op,theano.scalar.basic.Composite)
#let debugmode catch errors
gen = lambda : theano._asarray(numpy.random.rand(*shape), dtype='float32')
f(gen(),gen(),gen(),gen(),gen(),gen(),gen(),gen(),gen(),gen())
# Test the case where we can't put the computation on the gpu! their is too many
# dimensions to the input to have 2 inputs to the op!
shape = (1,2,3,4,5,6,7,2,2,3,2,1,2,2,2,)
ttype = tensor.tensor(dtype='float32',broadcastable=(False,)*len(shape))
vars = [tensor.tanh(ttype) for x in range(10)]
f = pfunc(vars, [vars[0]-vars[1]-vars[2]-vars[3]-vars[4]-vars[5]-vars[6]], mode=mode_with_gpu)
topo = f.maker.env.toposort()
#theano.printing.debugprint(f)
assert len(topo)==1
assert sum([isinstance(node.op, cuda.GpuElemwise) for node in topo])==0
assert sum([isinstance(node.op, tensor.Elemwise) for node in topo])==1
#let debugmode catch errors
gen = lambda : theano._asarray(numpy.random.rand(*shape), dtype='float32')
f(gen(),gen(),gen(),gen(),gen(),gen(),gen(),gen(),gen(),gen())
def test_elemwise_fusion():
""" Test the the GpuElemwise fusion work correctly"""
shape = (3,4)
......
......@@ -10,6 +10,7 @@ from theano import scalar as scal
try:
# We must do those import to be able to create the full doc when nvcc
# is not available
import cuda_ndarray.cuda_ndarray as cuda
from theano.sandbox.cuda.nvcc_compiler import nvcc_module_compile_str
import cuda_ndarray
......
......@@ -10,6 +10,7 @@ from theano.compile import SharedVariable
from theano.sandbox.cuda.type import CudaNdarrayType
try:
# We must do those import to be able to create the full doc when nvcc
# is not available
from theano.sandbox.cuda import filter as type_support_filter
from theano.sandbox.cuda.basic_ops import HostFromGpu, GpuFromHost
except ImportError:
......
差异被折叠。
#definition theano.scalar op that have their python implementation taked from scipy
#as scipy is not always available, we put threat them separatly
from theano.scalar.basic import UnaryScalarOp,exp,sqrt,upgrade_to_float,complex_types,float_types,upcast
import numpy
from theano.scalar.basic import UnaryScalarOp,exp,upgrade_to_float,float_types
from theano.scalar.basic import upgrade_to_float_no_complex,complex_types,upcast
imported_scipy_special = False
try:
import scipy.special
......@@ -49,4 +50,6 @@ class Erfc(UnaryScalarOp):
if node.inputs[0].type in complex_types:
raise NotImplementedError('type not supported', type)
return "%(z)s = erfc(%(x)s);" % locals()
erfc = Erfc(upgrade_to_float, name = 'erfc')
# scipy.special.erfc don't support complex. Why?
erfc = Erfc(upgrade_to_float_no_complex, name = 'erfc')
......@@ -4414,6 +4414,7 @@ class numeric_grad:
x[i] += eps
f_eps = f(*apt)
gx[i] = numpy.asarray((f_eps - f_x)/eps)
if packed_pt:
......@@ -4594,6 +4595,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None, rel_tol=No
for test_num in xrange(n_tests):
num_grad = numeric_grad(cost_fn, [p.copy() for p in pt], eps)
analytic_grad = grad_fn(*[p.copy() for p in pt])
if not isinstance(analytic_grad, (list, tuple)):
......@@ -4621,6 +4623,7 @@ class GradientError(Exception):
self.abs_tol = abs_tol
self.rel_tol = rel_tol
def __str__(self):
return """GradientError: numeric gradient and analytic gradient exceed tolerance:
At position %i of argument %i,
......
......@@ -49,12 +49,16 @@ class Conv3D(theano.Op):
def __str__(self):
return "Conv3D"
def c_code_cache_version(self):
return (1,)
def make_node(self, V, W, b, d):
"""
:param V: Visible unit, input
:param W: Weights, filter
:param V: Visible unit, input(batch,row,column,time,in channel)
:param W: Weights, filter(out channel,row,column,time,in channel)
:param b: bias, shape == (W.shape[0],)
:param d: strides when moving the filter over the input
:param d: strides when moving the filter over the input(dx,dy,dt)
"""
V_ = T.as_tensor_variable(V)
......@@ -82,22 +86,22 @@ class Conv3D(theano.Op):
dCdb = T.sum(dCdH, axis=(0,1,2,3))
dCdd = None #not differentiable, since d is not continuous
if 'name' in dir(dCdH) and dCdH.name != None:
if 'name' in dir(dCdH) and dCdH.name is not None:
dCdH_name = dCdH.name
else:
dCdH_name = 'anon'
if 'name' in dir(V) and V.name != None:
if 'name' in dir(V) and V.name is not None:
V_name = V.name
else:
V_name = 'anon'
if 'name' in dir(W) and W.name != None:
if 'name' in dir(W) and W.name is not None:
W_name = W.name
else:
W_name = 'anon'
if 'name' in dir(b) and b.name != None:
if 'name' in dir(b) and b.name is not None:
b_name = b.name
else:
b_name = 'anon'
......
......@@ -3,6 +3,10 @@ from theano.tensor import basic as T
from theano.misc import strutil
import numpy as N
#TODO: speed up by reordering loops. Should pass through the videos once, incrementing all weight gradients, rather
# than visiting each weight gradient element once and passing through whole video
class ConvGrad3D(theano.Op):
""" Gradient of Conv3D with respect to W """
def __eq__(self,other):
......@@ -11,6 +15,9 @@ class ConvGrad3D(theano.Op):
def __hash__(self):
return hash(type(self))
def c_code_cache_version(self):
return (1,)
def make_node(self, V, d, WShape, dCdH):
V_ = T.as_tensor_variable(V)
d_ = T.as_tensor_variable(d)
......
......@@ -11,6 +11,9 @@ class ConvTransp3D(theano.Op):
def __hash__(self):
return hash(type(self))
def c_code_cache_version(self):
return (1,)
def make_node(self, W, b, d, H, RShape = None):
"""
:param W: Weights, filter
......@@ -50,22 +53,22 @@ class ConvTransp3D(theano.Op):
dCdRShape = None #not differentiable, since RShape is not continuous
if 'name' in dir(dCdR) and dCdR.name != None:
if 'name' in dir(dCdR) and dCdR.name is not None:
dCdR_name = dCdR.name
else:
dCdR_name = 'anon'
if 'name' in dir(H) and H.name != None:
if 'name' in dir(H) and H.name is not None:
H_name = H.name
else:
H_name = 'anon'
if 'name' in dir(W) and W.name != None:
if 'name' in dir(W) and W.name is not None:
W_name = W.name
else:
W_name = 'anon'
if 'name' in dir(b) and b.name != None:
if 'name' in dir(b) and b.name is not None:
b_name = b.name
else:
b_name = 'anon'
......@@ -79,9 +82,9 @@ class ConvTransp3D(theano.Op):
def perform(self, node, inputs, output_storage):
W, b, d, H, RShape = inputs
print "\t\t\t\tConvTransp3D python code"
output_storage[0][0] = computeR(W,b,d,H,RShape)
W, b, d, H, RShape = inputs
print "\t\t\t\tConvTransp3D python code"
output_storage[0][0] = computeR(W,b,d,H,RShape)
def c_code(self, node, nodename, (W, b, d, H, RShape), outputs, sub):
fail = sub['fail']
......@@ -360,7 +363,7 @@ def computeR(W,b,d,H,Rshape = None):
videoWidth = (outputWidth-1) * dc + filterWidth
videoDur = (outputDur-1) * dt + filterDur
if Rshape != None and Rshape[0] != -1:
if Rshape is not None and Rshape[0] != -1:
if Rshape[0] < videoHeight:
print (Rshape[0], videoHeight)
assert False
......
......@@ -290,7 +290,7 @@ class ConvOp(Op):
:type dx: int
:param dx: patch stride rows
:type dy: int
:param dx: patch stride cols
:param dy: patch stride cols
Params which select the version of code used:
......
......@@ -283,7 +283,6 @@ def local_dimshuffle_lift(node):
else:
return DimShuffle(iinput.type.broadcastable, new_order, inplace).make_node(iinput).outputs
@register_specialize
@gof.local_optimizer([])
def dimshuffle_as_view(node):
op = node.op
......@@ -293,6 +292,7 @@ def dimshuffle_as_view(node):
return [new_op(*node.inputs)]
register_specialize(dimshuffle_as_view, 'inplace')
register_canonicalize(local_dimshuffle_lift)
register_specialize(local_dimshuffle_lift)
......@@ -2313,15 +2313,21 @@ def local_add_specialize(node):
y = get_constant_value(input)
except TypeError:
y = input
if N.all(y == 0.0):
if numpy.all(y == 0.0):
continue
new_inputs.append(input)
if len(new_inputs) < len(node.inputs):
if len(new_inputs) == 0:
#we got rid of the entire expression!
return fill_chain(T.TensorConstant(T.TensorType(dtype=node.outputs[0].type.dtype,
broadcastable = [True] * node.outputs[0].ndim), N.asarray(0)))
ndim = node.outputs[0].type.ndim
dtype = node.outputs[0].type.dtype
return fill_chain(
T.TensorConstant(
T.TensorType(
dtype=dtype,
broadcastable = [True] * ndim),
numpy.zeros((1,)*ndim, dtype=dtype)))
if len(new_inputs) == 1:
return fill_chain(new_inputs[0])
......
......@@ -876,8 +876,7 @@ class test_fusion(unittest.TestCase):
self.do(mode, cuda.float32_shared_constructor, shp, gpu=True)
def test_gpu_fusion_3d(self):
shp=(5,5,5)
def test_gpu_fusion_Xd(self):
#we need the optimisation enabled, debug do this.
if theano.config.mode == "FAST_COMPILE":
mode = theano.compile.mode.get_mode("FAST_RUN").including('local_elemwise_fusion','canonicalize','gpu')
......@@ -886,7 +885,10 @@ class test_fusion(unittest.TestCase):
import theano.sandbox.cuda as cuda
if not cuda.cuda_available:
raise SkipTest("cuda not available")
if cuda.opt.int_size == 4:
shp=(5,5,5,5)
else:
shp=(5,5,5)
self.do(mode, cuda.float32_shared_constructor, shp, gpu=True)
def speed_fusion(self, shared_fn = shared, gpu = False, s=None):
......@@ -2174,3 +2176,15 @@ def test_local_mul_to_neg():
aval = numpy.random.randint(0,10,(2,2)).astype('int32')
assert f1(aval).dtype == a.dtype
assert f2(aval).dtype == 'float64'
def test_local_add_specialize():
# test of non-zero dimension
a = TT.vector()
s = TT.add(TT.zeros_like(a))
assert local_add_specialize.transform(s.owner)
# test of 0-d
a = TT.scalar()
s = TT.add(TT.zeros_like(a))
assert local_add_specialize.transform(s.owner)
......@@ -231,6 +231,7 @@ def makeSharedTester(shared_constructor_,
total = self.theano_fct(x_shared)
total_func = theano.function([],total)
total_func()
values_to_div = .5
if self.op_by_matrix:
......@@ -418,6 +419,7 @@ def makeSharedTester(shared_constructor_,
#Test that we can replace with values of the different shape
# but that will raise an error in some case, but not all
specify_shape_fct()
x1_shared.set_value(x2)
self.assertRaises(AssertionError, specify_shape_fct)
......@@ -450,6 +452,7 @@ def makeSharedTester(shared_constructor_,
assert numpy.allclose(self.ref_fct(x1_shared.value), self.ref_fct( x1_2))
shape_op_fct = theano.function([],x1_shared.shape)
topo = shape_op_fct.maker.env.toposort()
shape_op_fct()
if theano.config.mode!='FAST_COMPILE':
assert len(topo)==3
assert isinstance(topo[0].op,tensor.opt.Shape_i)
......@@ -458,6 +461,7 @@ def makeSharedTester(shared_constructor_,
#Test that we forward the input
specify_shape_fct = theano.function([],x1_specify_shape)
specify_shape_fct()
#theano.printing.debugprint(specify_shape_fct)
assert numpy.all(self.ref_fct(specify_shape_fct())
==self.ref_fct(x1_2))
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论