提交 adc97e87 authored 作者: abergeron's avatar abergeron

Merge pull request #2247 from nouiz/opt

Opt
...@@ -72,13 +72,33 @@ and use directly the optimized graph from the pickled file. ...@@ -72,13 +72,33 @@ and use directly the optimized graph from the pickled file.
Faster Theano function Faster Theano function
---------------------- ----------------------
You can set the Theano flag `allow_gc` to `False` to get a speed-up by using You can set the Theano flag ``allow_gc`` to ``False`` to get a speed-up by using
more memory. By default, Theano frees intermediate results when we don't need more memory. By default, Theano frees intermediate results when we don't need
them anymore. Doing so prevents us from reusing this memory. So disabling the them anymore. Doing so prevents us from reusing this memory. So disabling the
garbage collection will keep all intermediate results' memory space to allow to garbage collection will keep all intermediate results' memory space to allow to
reuse them during the next call to the same Theano function, if they are of the reuse them during the next call to the same Theano function, if they are of the
correct shape. The shape could change if the shapes of the inputs change. correct shape. The shape could change if the shapes of the inputs change.
.. unsafe_optimization:
Unsafe optimization
===================
Some Theano optimizations make the assumption that the user inputs are
valid. What this means is that if the user provides invalid values (like
incompatible shapes or indexing values that are out of bounds) and
the optimizations are applied, the user error will get lost. Most of the
time, the assumption is that the user inputs are valid. So it is good
to have the optimization being applied, but loosing the error is bad.
The newest optimization in Theano with such assumption will add an
assertion in the graph to keep the user error message. Computing
these assertions could take some time. If you are sure everything is valid
in your graph and want the fastest possible Theano, you can enable an
optimization that will remove those assertions with:
``optimizer_including=local_remove_all_assert``
Faster Small Theano function Faster Small Theano function
---------------------------- ----------------------------
......
...@@ -460,7 +460,9 @@ import theano and print the config variable, as in: ...@@ -460,7 +460,9 @@ import theano and print the config variable, as in:
Default: '-lblas' Default: '-lblas'
Link arguments to link against a (Fortran) level-3 blas implementation. Link arguments to link against a (Fortran) level-3 blas
implementation. The default will test if '-lblas' work. If not,
we will disable our c code for BLAS.
.. attribute:: config.experimental.local_alloc_elemwise_assert .. attribute:: config.experimental.local_alloc_elemwise_assert
......
...@@ -51,13 +51,13 @@ Convolution Ops ...@@ -51,13 +51,13 @@ Convolution Ops
=============== ===============
.. automodule:: theano.sandbox.cuda.dnn .. automodule:: theano.sandbox.cuda.dnn
:members: GpuDnnConvDesc, GpuDnnConv, GpuDnnConvGradW, GpuDnnConvGradI, :members: GpuDnnConvDesc, GpuDnnConv, GpuDnnConvGradW, GpuDnnConvGradI
Pooling Ops Pooling Ops
=========== ===========
.. automodule:: theano.sandbox.cuda.dnn .. automodule:: theano.sandbox.cuda.dnn
:members: GpuDnnPoolDesc, GpuDnnPool, GpuDnnPoolGrad, :members: GpuDnnPoolDesc, GpuDnnPool, GpuDnnPoolGrad
Softmax Ops Softmax Ops
=========== ===========
......
...@@ -164,8 +164,6 @@ TODO: Give examples on how to use these things! They are pretty complicated. ...@@ -164,8 +164,6 @@ TODO: Give examples on how to use these things! They are pretty complicated.
.. autofunction:: theano.tensor.nnet.conv.conv2d .. autofunction:: theano.tensor.nnet.conv.conv2d
.. autofunction:: theano.sandbox.cuda.fftconv.conv2d_fft .. autofunction:: theano.sandbox.cuda.fftconv.conv2d_fft
.. autofunction:: theano.sandbox.cuda.blas.GpuCorrMM
.. autofunction:: theano.sandbox.cuda.dnn.dnn_conv
.. autofunction:: theano.tensor.nnet.Conv3D.conv3D .. autofunction:: theano.tensor.nnet.Conv3D.conv3D
.. autofunction:: theano.sandbox.cuda.fftconv.conv3d_fft .. autofunction:: theano.sandbox.cuda.fftconv.conv3d_fft
.. autofunction:: theano.tensor.nnet.conv3d2d.conv3d .. autofunction:: theano.tensor.nnet.conv3d2d.conv3d
...@@ -12,4 +12,4 @@ Proposals for new/revised features ...@@ -12,4 +12,4 @@ Proposals for new/revised features
noupdates noupdates
opt_patterns2 opt_patterns2
graphical_models graphical_models
complex_gradient
...@@ -117,6 +117,7 @@ An op has to implement some methods defined in the the interface of ...@@ -117,6 +117,7 @@ An op has to implement some methods defined in the the interface of
:func:`perform` method defines the Python implementation of an op. :func:`perform` method defines the Python implementation of an op.
It takes several arguments: It takes several arguments:
- ``node`` is a reference to an Apply node which was previously - ``node`` is a reference to an Apply node which was previously
obtained via the ``Op``'s :func:`make_node` method. It is typically not obtained via the ``Op``'s :func:`make_node` method. It is typically not
used in simple ops, but it contains symbolic information that used in simple ops, but it contains symbolic information that
...@@ -149,6 +150,7 @@ An op has to implement some methods defined in the the interface of ...@@ -149,6 +150,7 @@ An op has to implement some methods defined in the the interface of
It returns a thunk. A thunk is defined as a zero-arguments It returns a thunk. A thunk is defined as a zero-arguments
function which encapsulates the computation to be performed by an function which encapsulates the computation to be performed by an
op on the arguments of its corresponding node. It takes several parameters: op on the arguments of its corresponding node. It takes several parameters:
- ``node`` is the Apply instance for which a thunk is requested, - ``node`` is the Apply instance for which a thunk is requested,
- ``storage_map`` is a dict of lists which maps variables to a one-element - ``storage_map`` is a dict of lists which maps variables to a one-element
lists holding the variable's current value. The one-element list acts as lists holding the variable's current value. The one-element list acts as
......
...@@ -2,6 +2,7 @@ import atexit ...@@ -2,6 +2,7 @@ import atexit
import copy import copy
import os import os
import time import time
import warnings
import theano import theano
from theano.gof.link import WrapLinker from theano.gof.link import WrapLinker
...@@ -98,6 +99,10 @@ class Profile_Maker(FunctionMaker): ...@@ -98,6 +99,10 @@ class Profile_Maker(FunctionMaker):
# Lazy import to avoid compilation when importing theano. # Lazy import to avoid compilation when importing theano.
from theano.gof.cutils import run_cthunk from theano.gof.cutils import run_cthunk
warnings.warn(
"DEPRECATION WARNING: The ProfileMode is deprecated. Use the Theano"
" flags/parameter to theano.function 'profile=True' instead"
" of 'mode=ProfileMode'")
return ret return ret
......
...@@ -951,10 +951,10 @@ CLazyLinker_set_allow_gc(CLazyLinker *self, PyObject *value, void *closure) ...@@ -951,10 +951,10 @@ CLazyLinker_set_allow_gc(CLazyLinker *self, PyObject *value, void *closure)
} }
static PyGetSetDef CLazyLinker_getset[] = { static PyGetSetDef CLazyLinker_getset[] = {
{"allow_gc", {(char*)"allow_gc",
(getter)CLazyLinker_get_allow_gc, (getter)CLazyLinker_get_allow_gc,
(setter)CLazyLinker_set_allow_gc, (setter)CLazyLinker_set_allow_gc,
"do this function support allow_gc", (char*)"do this function support allow_gc",
NULL}, NULL},
{NULL, NULL, NULL, NULL} /* Sentinel */ {NULL, NULL, NULL, NULL} /* Sentinel */
}; };
......
...@@ -32,7 +32,21 @@ class DB(object): ...@@ -32,7 +32,21 @@ class DB(object):
self.name = None # will be reset by register self.name = None # will be reset by register
#(via obj.name by the thing doing the registering) #(via obj.name by the thing doing the registering)
def register(self, name, obj, *tags): def register(self, name, obj, *tags, **kwargs):
"""
:param name: name of the optimizer.
:param obj: the optimizer to register.
:param tags: tag name that allow to select the optimizer.
:param kwargs: If non empty, should contain
only use_db_name_as_tag=False.
By default, all optimizations registered in EquilibriumDB
are selected when the EquilibriumDB name is used as a
tag. We do not want this behavior for some optimizer like
local_remove_all_assert. use_db_name_as_tag=False remove
that behavior. This mean only the optimizer name and the
tags specified will enable that optimization.
"""
# N.B. obj is not an instance of class Optimizer. # N.B. obj is not an instance of class Optimizer.
# It is an instance of a DB.In the tests for example, # It is an instance of a DB.In the tests for example,
# this is not always the case. # this is not always the case.
...@@ -42,7 +56,10 @@ class DB(object): ...@@ -42,7 +56,10 @@ class DB(object):
raise ValueError('The name of the object cannot be an existing' raise ValueError('The name of the object cannot be an existing'
' tag or the name of another existing object.', ' tag or the name of another existing object.',
obj, name) obj, name)
if kwargs:
assert "use_db_name_as_tag" in kwargs
assert kwargs["use_db_name_as_tag"] is False
else:
if self.name is not None: if self.name is not None:
tags = tags + (self.name,) tags = tags + (self.name,)
obj.name = name obj.name = name
...@@ -155,6 +172,10 @@ class Query(object): ...@@ -155,6 +172,10 @@ class Query(object):
if isinstance(self.exclude, (list, tuple)): if isinstance(self.exclude, (list, tuple)):
self.exclude = OrderedSet(self.exclude) self.exclude = OrderedSet(self.exclude)
def __str__(self):
return "Query{inc=%s,ex=%s,require=%s,subquery=%s,position_cutoff=%d}" % (
self.include, self.exclude, self.require, self.subquery, self.position_cutoff)
#add all opt with this tag #add all opt with this tag
def including(self, *tags): def including(self, *tags):
return Query(self.include.union(tags), return Query(self.include.union(tags),
......
...@@ -728,8 +728,7 @@ class VM_Linker(link.LocalLinker): ...@@ -728,8 +728,7 @@ class VM_Linker(link.LocalLinker):
if self.use_cloop and config.profile_memory: if self.use_cloop and config.profile_memory:
warnings.warn( warnings.warn(
'CVM does not support memory profile, using Stack VM.') 'CVM does not support memory profile, using Stack VM.')
deps = None # Needed when allow_gc=True and profiling
if self.allow_gc:
deps = self.compute_gc_dependencies(storage_map) deps = self.compute_gc_dependencies(storage_map)
vm = Stack( vm = Stack(
nodes, thunks, pre_call_clear, nodes, thunks, pre_call_clear,
...@@ -765,13 +764,11 @@ class VM_Linker(link.LocalLinker): ...@@ -765,13 +764,11 @@ class VM_Linker(link.LocalLinker):
assert type(storage_map_list[0]) is list assert type(storage_map_list[0]) is list
assert type(compute_map_list[0]) is list assert type(compute_map_list[0]) is list
if self.allow_gc: # Needed when allow_gc=True and profiling
dependency_map = self.compute_gc_dependencies(storage_map) dependency_map = self.compute_gc_dependencies(storage_map)
dependency_map_list = [ dependency_map_list = [
[vars_idx[d] for d in dependency_map[vars_idx_inv[i]]] [vars_idx[d] for d in dependency_map[vars_idx_inv[i]]]
for i in xrange(len(vars_idx_inv))] for i in xrange(len(vars_idx_inv))]
else:
dependency_map_list = None
# build the pointers to node inputs and offsets # build the pointers to node inputs and offsets
base_input_output_list = [] base_input_output_list = []
...@@ -869,8 +866,7 @@ class VM_Linker(link.LocalLinker): ...@@ -869,8 +866,7 @@ class VM_Linker(link.LocalLinker):
thunks, thunks,
pre_call_clear) pre_call_clear)
else: else:
deps = None # Needed when allow_gc=True and profiling
if self.allow_gc:
deps = self.compute_gc_dependencies(storage_map) deps = self.compute_gc_dependencies(storage_map)
vm = Stack( vm = Stack(
nodes, thunks, pre_call_clear, nodes, thunks, pre_call_clear,
......
...@@ -561,6 +561,7 @@ class GpuCAReduce(GpuOp): ...@@ -561,6 +561,7 @@ class GpuCAReduce(GpuOp):
self.pre_scalar_op = None self.pre_scalar_op = None
def make_node(self, x): def make_node(self, x):
x = as_cuda_ndarray_variable(x)
if (x.type.ndim != len(self.reduce_mask)): if (x.type.ndim != len(self.reduce_mask)):
raise TypeError("x must have rank %i" % len(self.reduce_mask)) raise TypeError("x must have rank %i" % len(self.reduce_mask))
o_broadcast = [x.type.broadcastable[i] for i o_broadcast = [x.type.broadcastable[i] for i
......
...@@ -801,27 +801,6 @@ class BaseGpuCorrMM(GpuOp): ...@@ -801,27 +801,6 @@ class BaseGpuCorrMM(GpuOp):
class GpuCorrMM(BaseGpuCorrMM): class GpuCorrMM(BaseGpuCorrMM):
"""GPU correlation implementation using Matrix Multiplication. """GPU correlation implementation using Matrix Multiplication.
:note: You can either enable the Theano flag `optimizer_including=conv_gemm`
to automatically replace all convolution operations with `GpuCorrMM`
or one of its gradients, or you can use it as a replacement for
:func:`conv2d <theano.tensor.nnet.conv.conv2d>`, called as
`GpuCorrMM(subsample=...)(image, filters)`. The latter is currently
faster, but note that it computes a correlation -- if you need to
compute a convolution, flip the filters as `filters[:,:,::-1,::-1]`.
:warning: For 700 series Nvidia GPUs of compute capability 3.5 and CUDA 5.0
to 6.0, there is a bug in CUBLAS' matrix multiplication function that
can make GpuCorrMM or its gradients crash for some input and filter
shapes. So if you have a Tesla K20, Tesla K40, Quadro K6000, GeForce GT
640 (DDR5), GeForce GTX 780 (or Ti), GeForce GTX TITAN (or Black or Z)
and experience a crash, switching to CUDA 6.5 or CUDA 4.2 should fix it.
If this is not possible, changing the input or filter shapes (e.g., the
batchsize or number of filters) may also work around the CUBLAS bug.
"""
def __init__(self, border_mode="valid",
subsample=(1, 1),
pad=(0, 0)):
"""
:param border_mode: currently supports "valid" only; "full" can be :param border_mode: currently supports "valid" only; "full" can be
simulated by setting `pad="full"` (at the cost of performance), or simulated by setting `pad="full"` (at the cost of performance), or
by using `GpuCorrMM_gradInputs` by using `GpuCorrMM_gradInputs`
...@@ -841,7 +820,27 @@ class GpuCorrMM(BaseGpuCorrMM): ...@@ -841,7 +820,27 @@ class GpuCorrMM(BaseGpuCorrMM):
C-contiguous. Use :func:`gpu_contiguous C-contiguous. Use :func:`gpu_contiguous
<theano.sandbox.cuda.basic_ops.gpu_contiguous>` on these arguments <theano.sandbox.cuda.basic_ops.gpu_contiguous>` on these arguments
if needed. if needed.
:note: You can either enable the Theano flag `optimizer_including=conv_gemm`
to automatically replace all convolution operations with `GpuCorrMM`
or one of its gradients, or you can use it as a replacement for
:func:`conv2d <theano.tensor.nnet.conv.conv2d>`, called as
`GpuCorrMM(subsample=...)(image, filters)`. The latter is currently
faster, but note that it computes a correlation -- if you need to
compute a convolution, flip the filters as `filters[:,:,::-1,::-1]`.
:warning: For 700 series Nvidia GPUs of compute capability 3.5 and CUDA 5.0
to 6.0, there is a bug in CUBLAS' matrix multiplication function that
can make GpuCorrMM or its gradients crash for some input and filter
shapes. So if you have a Tesla K20, Tesla K40, Quadro K6000, GeForce GT
640 (DDR5), GeForce GTX 780 (or Ti), GeForce GTX TITAN (or Black or Z)
and experience a crash, switching to CUDA 6.5 or CUDA 4.2 should fix it.
If this is not possible, changing the input or filter shapes (e.g., the
batchsize or number of filters) may also work around the CUBLAS bug.
""" """
def __init__(self, border_mode="valid",
subsample=(1, 1),
pad=(0, 0)):
super(GpuCorrMM, self).__init__(border_mode, subsample, pad) super(GpuCorrMM, self).__init__(border_mode, subsample, pad)
def make_node(self, img, kern): def make_node(self, img, kern):
......
...@@ -7,8 +7,7 @@ from theano.gof.type import CDataType ...@@ -7,8 +7,7 @@ from theano.gof.type import CDataType
from theano.compat import PY3 from theano.compat import PY3
from theano.tensor.nnet import SoftmaxGrad from theano.tensor.nnet import SoftmaxGrad
from theano.sandbox.cuda.type import CudaNdarrayType from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda import (GpuOp, cuda_available, active_device_number, from theano.sandbox.cuda import (GpuOp, cuda_available)
device_properties)
from theano.sandbox.cuda.basic_ops import (as_cuda_ndarray_variable, from theano.sandbox.cuda.basic_ops import (as_cuda_ndarray_variable,
gpu_contiguous, HostFromGpu) gpu_contiguous, HostFromGpu)
from theano.sandbox.cuda.blas import (GpuConv, GpuDownsampleFactorMax, from theano.sandbox.cuda.blas import (GpuConv, GpuDownsampleFactorMax,
...@@ -21,8 +20,8 @@ from theano.sandbox.cuda.nvcc_compiler import NVCC_compiler ...@@ -21,8 +20,8 @@ from theano.sandbox.cuda.nvcc_compiler import NVCC_compiler
def dnn_available(): def dnn_available():
if dnn_available.avail is None: if dnn_available.avail is None:
dev = active_device_number() dev = theano.sandbox.cuda.active_device_number()
if device_properties(dev)['major'] < 3: if theano.sandbox.cuda.device_properties(dev)['major'] < 3:
dnn_available.msg = "Device not supported by cuDNN" dnn_available.msg = "Device not supported by cuDNN"
dnn_available.avail = False dnn_available.avail = False
else: else:
...@@ -295,9 +294,9 @@ if ((err%(id)d = cudnnCreateFilterDescriptor(&kerns%(id)d)) != CUDNN_STATUS_SUCC ...@@ -295,9 +294,9 @@ if ((err%(id)d = cudnnCreateFilterDescriptor(&kerns%(id)d)) != CUDNN_STATUS_SUCC
def c_cleanup_code_struct(self, node, struct_id): def c_cleanup_code_struct(self, node, struct_id):
return """ return """
cudnnDestroyTensor4dDescriptor(input%(id)d); if (input%(id)d != NULL) {cudnnDestroyTensor4dDescriptor(input%(id)d);}
cudnnDestroyTensor4dDescriptor(output%(id)d); if (output%(id)d != NULL) {cudnnDestroyTensor4dDescriptor(output%(id)d);}
cudnnDestroyFilterDescriptor(kerns%(id)d); if (kerns%(id)d != NULL) {cudnnDestroyFilterDescriptor(kerns%(id)d);}
""" % dict(id=struct_id) """ % dict(id=struct_id)
def c_set_filter(self, var, desc, err, fail): def c_set_filter(self, var, desc, err, fail):
...@@ -400,7 +399,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) { ...@@ -400,7 +399,7 @@ if (err%(name)s != CUDNN_STATUS_SUCCESS) {
method=self.conv_op, path=self.path_flag) method=self.conv_op, path=self.path_flag)
def c_code_cache_version(self): def c_code_cache_version(self):
return (7,) return (8,)
class GpuDnnConv(GpuDnnConvBase): class GpuDnnConv(GpuDnnConvBase):
......
...@@ -48,6 +48,12 @@ class ScikitsCudaOp(GpuOp): ...@@ -48,6 +48,12 @@ class ScikitsCudaOp(GpuOp):
return theano.Apply(self, [inp], [self.output_type(inp)()]) return theano.Apply(self, [inp], [self.output_type(inp)()])
def make_thunk(self, node, storage_map, _, _2):
if not scikits_cuda_available:
raise RuntimeError(
"scikits.cuda is needed for all GPU fft implementation,"
" including fftconv.")
class CuFFTOp(ScikitsCudaOp): class CuFFTOp(ScikitsCudaOp):
def output_type(self, inp): def output_type(self, inp):
...@@ -56,6 +62,8 @@ class CuFFTOp(ScikitsCudaOp): ...@@ -56,6 +62,8 @@ class CuFFTOp(ScikitsCudaOp):
broadcastable=[False] * (inp.type.ndim + 1)) broadcastable=[False] * (inp.type.ndim + 1))
def make_thunk(self, node, storage_map, _, _2): def make_thunk(self, node, storage_map, _, _2):
super(CuFFTOp, self).make_thunk(node, storage_map, _, _2)
from theano.misc.pycuda_utils import to_gpuarray from theano.misc.pycuda_utils import to_gpuarray
inputs = [storage_map[v] for v in node.inputs] inputs = [storage_map[v] for v in node.inputs]
outputs = [storage_map[v] for v in node.outputs] outputs = [storage_map[v] for v in node.outputs]
...@@ -111,6 +119,8 @@ class CuIFFTOp(ScikitsCudaOp): ...@@ -111,6 +119,8 @@ class CuIFFTOp(ScikitsCudaOp):
broadcastable=[False] * (inp.type.ndim - 1)) broadcastable=[False] * (inp.type.ndim - 1))
def make_thunk(self, node, storage_map, _, _2): def make_thunk(self, node, storage_map, _, _2):
super(CuIFFTOp, self).make_thunk(node, storage_map, _, _2)
from theano.misc.pycuda_utils import to_gpuarray from theano.misc.pycuda_utils import to_gpuarray
inputs = [storage_map[v] for v in node.inputs] inputs = [storage_map[v] for v in node.inputs]
outputs = [storage_map[v] for v in node.outputs] outputs = [storage_map[v] for v in node.outputs]
...@@ -300,6 +310,8 @@ class BatchedComplexDotOp(ScikitsCudaOp): ...@@ -300,6 +310,8 @@ class BatchedComplexDotOp(ScikitsCudaOp):
return CudaNdarrayType(broadcastable=[False] * inp.type.ndim) return CudaNdarrayType(broadcastable=[False] * inp.type.ndim)
def make_thunk(self, node, storage_map, _, _2): def make_thunk(self, node, storage_map, _, _2):
super(BatchedComplexDotOp, self).make_thunk(node, storage_map, _, _2)
inputs = [storage_map[v] for v in node.inputs] inputs = [storage_map[v] for v in node.inputs]
outputs = [storage_map[v] for v in node.outputs] outputs = [storage_map[v] for v in node.outputs]
......
...@@ -14,11 +14,13 @@ if cuda.cuda_available == False: ...@@ -14,11 +14,13 @@ if cuda.cuda_available == False:
if theano.config.mode == 'FAST_COMPILE': if theano.config.mode == 'FAST_COMPILE':
mode_with_gpu = theano.compile.mode.get_mode('FAST_RUN').including('gpu') mode_with_gpu = theano.compile.mode.get_mode('FAST_RUN').including('gpu')
mode_without_gpu = theano.compile.mode.get_mode( # We should not exclude the 'gpu' tag, as some CPU opt are tagged
'FAST_RUN').excluding('gpu') # as GPU to make them run in fast_compile with gpu.
mode_without_gpu = theano.compile.mode.get_mode('FAST_RUN')
else: else:
mode_with_gpu = theano.compile.mode.get_default_mode().including('gpu') mode_with_gpu = theano.compile.mode.get_default_mode().including('gpu')
mode_without_gpu = theano.compile.mode.get_default_mode().excluding('gpu') mode_without_gpu = theano.compile.mode.get_default_mode()
def test_GpuCrossentropySoftmaxArgmax1HotWithBias(): def test_GpuCrossentropySoftmaxArgmax1HotWithBias():
......
...@@ -1171,14 +1171,22 @@ class ShapeFeature(object): ...@@ -1171,14 +1171,22 @@ class ShapeFeature(object):
self.set_shape_i(v, ii, new_r) self.set_shape_i(v, ii, new_r)
self.shape_of_reverse_index[r] = set() self.shape_of_reverse_index[r] = set()
def same_shape(self, x, y): def same_shape(self, x, y, dim_x=None, dim_y=None):
"""Return True if we are able to assert that x and y have the """Return True if we are able to assert that x and y have the
same shape same shape.
dim_x and dim_y are optional. If used, they should be an index
to compare only 1 shape of x or y.
""" """
sx = self.shape_of[x] sx = self.shape_of[x]
sy = self.shape_of[y] sy = self.shape_of[y]
if sx is None or sy is None: if sx is None or sy is None:
return False return False
if dim_x is not None:
sx = [sx[dim_x]]
if dim_y is not None:
sy = [sy[dim_y]]
assert len(sx) == len(sy) assert len(sx) == len(sy)
for dx, dy in zip(sx, sy): for dx, dy in zip(sx, sy):
...@@ -1449,6 +1457,29 @@ def local_alloc_unary(node): ...@@ -1449,6 +1457,29 @@ def local_alloc_unary(node):
return [T.alloc(T.cast(v, node.outputs[0].dtype), *shp)] return [T.alloc(T.cast(v, node.outputs[0].dtype), *shp)]
@register_canonicalize
@register_specialize
@gof.local_optimizer([T.Elemwise])
def local_cast_cast(node):
"""cast(cast(x, dtype1), dtype2)
when those contrain:
dtype1 == dtype2
TODO: the base dtype is the same (int, uint, float, complex)
and the first cast cause an upcast.
"""
if (not isinstance(node.op, T.Elemwise) or
not isinstance(node.op.scalar_op, scalar.Cast)):
return
x = node.inputs[0]
if (not x.owner or
not isinstance(x.owner.op, T.Elemwise) or
not isinstance(x.owner.op.scalar_op, scalar.Cast)):
return
if node.op.scalar_op.o_type == x.owner.op.scalar_op.o_type:
return [x]
class Assert(T.Op): class Assert(T.Op):
""" """
Implements assertion in a computational graph. Implements assertion in a computational graph.
...@@ -1551,9 +1582,32 @@ def local_remove_useless_assert(node): ...@@ -1551,9 +1582,32 @@ def local_remove_useless_assert(node):
return [assert_(node.inputs[0], *cond)] return [assert_(node.inputs[0], *cond)]
@gof.local_optimizer([Assert])
def local_remove_all_assert(node):
"""An optimization disabled by default that removes all asserts from
the graph.
:note: See the :ref:`unsafe` section to know how to enable it.
"""
if not isinstance(node.op, Assert):
return
return [node.inputs[0]]
# Disabled by default
compile.optdb['canonicalize'].register('local_remove_all_assert',
local_remove_all_assert,
use_db_name_as_tag=False)
compile.optdb['stabilize'].register('local_remove_all_assert',
local_remove_all_assert,
use_db_name_as_tag=False)
compile.optdb['specialize'].register('local_remove_all_assert',
local_remove_all_assert,
use_db_name_as_tag=False)
@register_specialize @register_specialize
@gof.local_optimizer([T.Elemwise]) @gof.local_optimizer([T.Elemwise])
def local_alloc_elemwise(node): def local_elemwise_alloc(node):
""" """
elemwise(alloc(x, shp), ..., y.TensorType(BROADCAST CONDITION)) elemwise(alloc(x, shp), ..., y.TensorType(BROADCAST CONDITION))
-> elemwise(x, y.TensorType(BROADCAST CONDITION)) -> elemwise(x, y.TensorType(BROADCAST CONDITION))
...@@ -1692,10 +1746,12 @@ theano.configparser.AddConfigVar('experimental.local_alloc_elemwise', ...@@ -1692,10 +1746,12 @@ theano.configparser.AddConfigVar('experimental.local_alloc_elemwise',
is_valid=lambda x: x is_valid=lambda x: x
), ),
in_c_key=False) in_c_key=False)
#This version if faster but not as safe.
theano.configparser.AddConfigVar('experimental.local_alloc_elemwise_assert', # False could make the graph faster but not as safe.
"If False enable the experimental optimization local_alloc_elemwise" theano.configparser.AddConfigVar(
" but WITHOUT assert into the graph!", 'experimental.local_alloc_elemwise_assert',
"When the local_alloc_elemwise is applied, add"
" an assert to highlight shape errors.",
theano.configparser.BoolParam(True), theano.configparser.BoolParam(True),
in_c_key=False) in_c_key=False)
...@@ -2452,6 +2508,48 @@ def local_setsubtensor_of_constants(node): ...@@ -2452,6 +2508,48 @@ def local_setsubtensor_of_constants(node):
return False return False
@register_canonicalize
@register_stabilize
@gof.local_optimizer([AdvancedSubtensor1])
def local_adv_sub1_adv_inc_sub1(node):
"""Optimize the possible AdvSub1(AdvIncSub1(...), ...)
AdvancedSubtensor1(AdvancedIncSubtensor1(0s, y, idx), idx) -> y
AdvancedSubtensor1(AdvancedSetSubtensor1(x, y, idx), idx) -> y
:note: This opt add AssertOp. Otherwise, it would remove shape and
index error. If you want to get rid of them, see the
:ref:`unsafe_optimization` section.
"""
if not isinstance(node.op, AdvancedSubtensor1):
return
inp = node.inputs[0]
if (not inp.owner or
not isinstance(inp.owner.op, AdvancedIncSubtensor1)):
return
idx = node.inputs[1]
idx2 = inp.owner.inputs[2]
x = inp.owner.inputs[0]
y = inp.owner.inputs[1]
if idx is not idx2:
return
if (not inp.owner.op.set_instead_of_inc and
T.extract_constant(x) != 0):
return
cond = [T.all(T.and_(T.lt(idx, x.shape[0]),
T.ge(idx, -x.shape[0])))]
if not node.fgraph.shape_feature.same_shape(idx, y, 0, 0):
cond.append(T.eq(idx.shape[0], y.shape[0]))
y = Assert("Bad indexing or shapes in a AdvancedIncSubtensor1 that was optimized away")(y, *cond)
if y.dtype == node.outputs[0].dtype:
return [y]
# It is possible that y is upcast or downcast to x.dtype.
# In all case, as we set or add with 0, we can just cast y.
return [T.cast(y, node.outputs[0].dtype)]
#################### ####################
# Rebroadcast opts # # Rebroadcast opts #
#################### ####################
......
...@@ -2417,6 +2417,84 @@ class test_local_subtensor_merge(unittest.TestCase): ...@@ -2417,6 +2417,84 @@ class test_local_subtensor_merge(unittest.TestCase):
f(x_val, *i_val) f(x_val, *i_val)
class test_local_adv_sub1_adv_inc_sub1(unittest.TestCase):
def setUp(self):
utt.seed_rng()
mode = theano.compile.mode.get_default_mode()
self.mode = mode.including("local_adv_sub1_adv_inc_sub1").excluding("fusion")
self.mode_no_assert = self.mode.including("local_remove_all_assert")
def test0(self):
for dtype1, dtype2 in [("float32", "float32"),
("float32", "float64"),
("float64", "float32"),
("float64", "float64")]:
x = tensor.matrix(dtype=dtype1)
y = tensor.matrix(dtype=dtype2)
idx = tensor.ivector()
dx = numpy.random.rand(4, 5).astype(dtype1)
dy = numpy.random.rand(2, 5).astype(dtype2)
didx = numpy.asarray([1, 3], "int32")
# set_subtensor
inc = tensor.set_subtensor(x[idx], y)
o = inc[idx]
f = theano.function([x, y, idx], o, self.mode_no_assert)
res = f(dx, dy, didx)
assert numpy.allclose(dy, res)
topo = f.maker.fgraph.toposort()
if opt:
assert len(topo) == 1
assert isinstance(topo[0].op, (compile.DeepCopyOp, T.Elemwise))
else:
assert len(topo) == 2
# inc_subtensor(data[idx], y)
inc = tensor.inc_subtensor(x[idx], y)
o = inc[idx]
f = theano.function([x, y, idx], o, self.mode_no_assert)
res = f(dx, dy, didx)
assert numpy.allclose((dx[didx] + dy), res)
topo = f.maker.fgraph.toposort()
len(topo) == 2
# inc_subtensor(0[idx], y)
inc = tensor.inc_subtensor(x.zeros_like()[idx], y)
o = inc[idx]
f = theano.function([x, y, idx], o, self.mode_no_assert)
res = f(dx, dy, didx)
assert numpy.allclose(dy, res)
topo = f.maker.fgraph.toposort()
if opt:
assert len(topo) == 1
assert isinstance(topo[0].op, (compile.DeepCopyOp, T.Elemwise))
else:
assert len(topo) > 2
def test_assert(self):
x = tensor.matrix("x")
y = tensor.matrix("y")
idx = tensor.ivector()
dx = numpy.random.rand(4, 5).astype(config.floatX)
dy = numpy.random.rand(2, 5).astype(config.floatX)
didx = numpy.asarray([1, 3], "int32")
# set_subtensor
inc = tensor.set_subtensor(x[idx], y)
o = inc[idx]
f = theano.function([x, y, idx], o, self.mode)
# test wrong index
for i in [dx.shape[0], -dx.shape[0] - 1]:
self.assertRaises(AssertionError, f, dx, dy, [i, i])
# test wrong shape
self.assertRaises(AssertionError, f, dx, dy, [1])
class Test_alloc_zero(unittest.TestCase): class Test_alloc_zero(unittest.TestCase):
def setUp(self): def setUp(self):
mode = theano.compile.mode.get_default_mode() mode = theano.compile.mode.get_default_mode()
...@@ -2653,7 +2731,7 @@ def test_local_subtensor_of_dot(): ...@@ -2653,7 +2731,7 @@ def test_local_subtensor_of_dot():
assert test_equality(f(d1, d2, 1), numpy.dot(d1, d2)[1:4,:,1:,1]) assert test_equality(f(d1, d2, 1), numpy.dot(d1, d2)[1:4,:,1:,1])
class Test_local_alloc_elemwise(unittest.TestCase): class Test_local_elemwise_alloc(unittest.TestCase):
dtype = config.floatX dtype = config.floatX
def setUp(self): def setUp(self):
...@@ -3166,8 +3244,8 @@ class test_assert(utt.InferShapeTester): ...@@ -3166,8 +3244,8 @@ class test_assert(utt.InferShapeTester):
f(1, 1) f(1, 1)
self.assertRaises(AssertionError, f, 1, 0) self.assertRaises(AssertionError, f, 1, 0)
def test1(self): def test_local_remove_useless_assert1(self):
#remove assert that are always true # remove assert that are always true
mode = theano.config.mode mode = theano.config.mode
if mode == 'FAST_COMPILE': if mode == 'FAST_COMPILE':
mode = 'FAST_RUN' mode = 'FAST_RUN'
...@@ -3181,8 +3259,8 @@ class test_assert(utt.InferShapeTester): ...@@ -3181,8 +3259,8 @@ class test_assert(utt.InferShapeTester):
assert len(topo) == 1 assert len(topo) == 1
assert topo[0].op == deep_copy_op assert topo[0].op == deep_copy_op
def test2(self): def test_test_local_remove_useless_assert2(self):
#remove assert condition that are always true # remove assert condition that are always true
mode = theano.config.mode mode = theano.config.mode
if mode == 'FAST_COMPILE': if mode == 'FAST_COMPILE':
mode = 'FAST_RUN' mode = 'FAST_RUN'
...@@ -3199,8 +3277,8 @@ class test_assert(utt.InferShapeTester): ...@@ -3199,8 +3277,8 @@ class test_assert(utt.InferShapeTester):
assert len(topo[0].inputs) == 2 assert len(topo[0].inputs) == 2
assert topo[1].op == deep_copy_op assert topo[1].op == deep_copy_op
def test3(self): def test_local_remove_useless_assert3(self):
#don't remove assert condition that are always false # don't remove assert condition that are always false
mode = theano.config.mode mode = theano.config.mode
if mode == 'FAST_COMPILE': if mode == 'FAST_COMPILE':
mode = 'FAST_RUN' mode = 'FAST_RUN'
...@@ -3216,6 +3294,22 @@ class test_assert(utt.InferShapeTester): ...@@ -3216,6 +3294,22 @@ class test_assert(utt.InferShapeTester):
assert len(topo[0].inputs) == 3 assert len(topo[0].inputs) == 3
assert topo[1].op == deep_copy_op assert topo[1].op == deep_copy_op
def test_local_remove_all_assert1(self):
# remove assert condition that are unknown
mode = theano.config.mode
if mode == 'FAST_COMPILE':
mode = 'FAST_RUN'
mode = compile.mode.get_mode(mode).including('local_remove_all_assert')
x = T.scalar()
y = T.scalar()
f = theano.function([x, y], theano.tensor.opt.assert_op(x, y),
mode=mode)
f(1, 0) # Without opt, it should fail.
topo = f.maker.fgraph.toposort()
assert len(topo) == 1, topo
assert topo[0].op == deep_copy_op, topo
def test_infer_shape(self): def test_infer_shape(self):
adscal = dscalar() adscal = dscalar()
...@@ -3541,6 +3635,31 @@ class T_useless_elemwise(unittest.TestCase): ...@@ -3541,6 +3635,31 @@ class T_useless_elemwise(unittest.TestCase):
assert topo[0].op == deep_copy_op assert topo[0].op == deep_copy_op
class T_cast_cast(unittest.TestCase):
def setUp(self):
mode = theano.compile.get_default_mode()
self.mode = mode.including('local_cast_cast')
def test(self):
x = T.fmatrix()
o = T.Elemwise(scal.Cast(scal.Scalar("float64")))(x.astype("float64"))
f = theano.function([x], o, mode=self.mode)
dx = numpy.random.rand(5, 4).astype("float32")
f(dx)
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, T.Elemwise)
x = T.dmatrix()
o = T.Elemwise(scal.Cast(scal.Scalar("float32")))(x.astype("float32"))
f = theano.function([x], o, mode=self.mode)
dx = numpy.random.rand(5, 4)
f(dx)
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, T.Elemwise)
def test_constant_folding(): def test_constant_folding():
""" Test that constant folding get registered at fast_compile """ Test that constant folding get registered at fast_compile
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论