提交 59cb7198 authored 作者: lamblin's avatar lamblin

Merge pull request #1364 from delallea/minor

Minor fixes
......@@ -235,40 +235,40 @@ function. The ``[fibby]`` argument is a hint that our optimizer works on nodes
whose ``.op`` attribute equals ``fibby``.
The function here (``fibby_of_zero``) expects an ``Apply`` instance as an
argument for parameter ``node``. It tests using
function ``get_constant_value``, which determines if a
function ``get_scalar_constant_value``, which determines if a
Variable (``x``) is guaranteed to be a constant, and if so, what constant.
Test the optimization
=====================
Here is some code that test the optimization is applied only when needed.
Here is some code to test that the optimization is applied only when needed.
.. code-block:: python
# Test it don't apply when not needed
# Test it does not apply when not needed
x = T.dvector()
f = function([x], fibby(x))
#theano.printing.debugprint(f)
#We call the function to make sure it run.
#If you run in DebugMode, it will compare the C and Python output
# We call the function to make sure it runs.
# If you run in DebugMode, it will compare the C and Python outputs.
f(numpy.random.rand(5))
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, Fibby)
# Test that the optimization get applied
# Test that the optimization gets applied.
f_zero = function([], fibby(T.zeros([5])))
#theano.printing.debugprint(f_zero)
#If you run in DebugMode, it will compare the output before
# and after the optimization
# If you run in DebugMode, it will compare the output before
# and after the optimization.
f_zero()
#Check that the optimization remove the Fibby Op.
#For security, the Theano memory interface make that the output
#of the function is always memory not aliaced to the input.
#That is why there is a DeepCopyOp op.
# Check that the optimization removes the Fibby Op.
# For security, the Theano memory interface ensures that the output
# of the function is always memory not aliased to the input.
# That is why there is a DeepCopyOp op.
topo = f_zero.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp)
......@@ -108,36 +108,36 @@ default values.
.. method:: get_shape_info(obj)
Optional. Only needed to profile the memory of this Type of object
Optional. Only needed to profile the memory of this Type of object.
Return the information needed to compute the memory size of obj.
Return the information needed to compute the memory size of ``obj``.
The memory size is only the data, so this exclude the container.
The memory size is only the data, so this excludes the container.
For an ndarray, this is the data, but not the ndarray object and
others data structures as shape and strides.
other data structures such as shape and strides.
get_shape_info() and get_size() work in tendem for the memory profiler.
``get_shape_info()`` and ``get_size()`` work in tandem for the memory profiler.
get_shape_info() is called during the execution of the function.
``get_shape_info()`` is called during the execution of the function.
So it is better that it is not too slow.
get_size() will be called with the output of this function
``get_size()`` will be called on the output of this function
when printing the memory profile.
:param obj: The object that this Type represent during execution
:return: Python object that self.get_size() understand
:param obj: The object that this Type represents during execution
:return: Python object that ``self.get_size()`` understands
.. method:: get_size(shape_info)
Number of bytes taken by the object represented by shape_info
Number of bytes taken by the object represented by shape_info.
Optional. Only needed to profile the memory of this Type of object
Optional. Only needed to profile the memory of this Type of object.
:param shape_info: the output of the call to get_shape_info()
:return: the number of bytes taken by the object described by
``shape_info``.
:param shape_info: the output of the call to get_shape_info()
:return: the number of bytes taken by the object described in
shape_info.
"""
For each method, the *default* is what ``Type`` defines
for you. So, if you create an instance of ``Type`` or an
instance of a subclass of ``Type``, you
......
......@@ -271,7 +271,7 @@ import theano and print the config variable, as in:
Default False
Do the vm/cvm linkers profile the execution of Theano functions?
Do the vm/cvm linkers profile the execution time of Theano functions?
.. attribute:: profile_memory
......@@ -279,8 +279,8 @@ import theano and print the config variable, as in:
Default False
Do the vm/cvm linkers profile the memory of Theano functions get printed?
It only work when profile=True.
Do the vm/cvm linkers profile the memory usage of Theano functions?
It only works when profile=True.
.. attribute:: profile_optimizer
......@@ -289,26 +289,26 @@ import theano and print the config variable, as in:
Default False
Do the vm/cvm linkers profile the optimization phase when compiling a Theano function?
It only work when profile=True.
It only works when profile=True.
.. attribute:: profiling.n_apply
Positive int value, default: 20.
The number of apply node to print in the profiler output
The number of Apply nodes to print in the profiler output
.. attribute:: profiling.n_ops
Positive int value, default: 20.
The number of ops to print in the profiler output
The number of Ops to print in the profiler output
.. attribute:: profiling.min_memory_size
Positive int value, default: 1024.
For the memory profile, do not print apply nodes if the size
of their outputs (in bytes) is lower then this.
For the memory profile, do not print Apply nodes if the size
of their outputs (in bytes) is lower than this.
.. attribute:: config.lib.amdlibm
......
......@@ -904,23 +904,23 @@ Theano fully supports basic indexing
`Integer advanced indexing
<http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer>`_
will be supported in 0.6rc4 (or the development version). We do not
support boolean masks, as Theano do not have a boolean type (we use
int8 for the output of logic operator). To imitate boolean advanced
support boolean masks, as Theano does not have a boolean type (we use
int8 for the output of logic operators). To imitate boolean advanced
indexing, you can do::
# NumPy indexing with a mask
n = np.arange(9).reshape(3,3)
n[n>4] # array([5, 6, 7, 8])
n[n > 4] # array([5, 6, 7, 8])
# Theano indexing with a "mask"
t = tt.arange(9).reshape((3,3))
t[t>4].eval() # an array with shape (3, 3, 3)
# Theano indexing with a "mask" (incorrect approach)
t = theano.tensor.arange(9).reshape((3,3))
t[t > 4].eval() # an array with shape (3, 3, 3)
# getting a Theano result like NumPy
t[(t>4).nonzero()].eval() # array([5, 6, 7, 8])
t[(t > 4).nonzero()].eval() # array([5, 6, 7, 8])
The gradient of Advanced indexing need in many cases NumPy
1.8. It isn't released as of April 30, 2013. You can use NumPy
The gradient of Advanced indexing needs in many cases NumPy
1.8. It is not released yet as of April 30th, 2013. You can use NumPy
development version to have this feature now.
......
......@@ -49,12 +49,12 @@ class Profile_Maker(FunctionMaker):
theano.sandbox.cuda.cuda_enabled):
if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1':
raise Exception(
"You are running Theano profiler with CUDA enabled."
" Theano GPU ops execution are asynchron by default."
"You are running the Theano profiler with CUDA enabled."
" Theano GPU ops execution is asynchronous by default."
" So by default, the profile is useless."
" You must use set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA drvier to"
" synchonize the execution to get meaning full profile.")
" You must set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA driver to"
" synchronize the execution to get a meaningful profile.")
# create a function-specific storage container for profiling info
profile = ProfileStats(atexit_print=False)
......@@ -584,14 +584,21 @@ Test them first, as they are not guaranteed to always provide a speedup."""
if not config.lib.amdlibm and any([exp_float32_op(a.op) and
a.inputs[0].dtype == 'float32'
for i, a in apply_time]):
print " - With the default gcc libm, exp in float32 is slower than in float64! Try Theano flag floatX=float64, or install amdlibm and set the theano flags lib.amdlibm=True"
print (" - With the default gcc libm, exp in float32 is slower "
"than in float64! Try Theano flag floatX=float64, or "
"install amdlibm and set the theano flags lib.amdlibm=True")
printed_tip = True
#tip 4
for a, t in apply_time.iteritems():
node = a[1]
if isinstance(node.op, T.Dot) and all([ len(i.type.broadcastable)==2 for i in node.inputs]):
print " - You have a dot operation that was not optimized to dot22 (which is faster). Make sure the inputs are float32 or 64, and are the same for both inputs. Currently they are:",[i.type for i in node.inputs]
if (isinstance(node.op, T.Dot) and
all([len(i.type.broadcastable) == 2 for i in node.inputs])):
print (" - You have a dot operation that was not optimized to"
" dot22 (which is faster). Make sure the inputs are "
"float32 or float64, and are the same for both inputs. "
"Currently they are: %s" %
[i.type for i in node.inputs])
printed_tip = True
#tip 5
......@@ -599,9 +606,13 @@ Test them first, as they are not guaranteed to always provide a speedup."""
node = a[1]
if isinstance(node.op, RandomFunction):
printed_tip = True
print " - Replace the default random number generator by 'from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams', as this is is faster. It is still experimental, but seems to work correctly."
print (" - Replace the default random number generator by "
"'from theano.sandbox.rng_mrg import MRG_RandomStreams "
"as RandomStreams', as this is is faster. It is still "
"experimental, but seems to work correctly.")
if config.device.startswith("gpu"):
print " - MRG_RandomStreams is the only random number generator supported on the GPU."
print (" - MRG_RandomStreams is the only random number"
" generator supported on the GPU.")
break
if not printed_tip:
......
......@@ -37,18 +37,18 @@ AddConfigVar('profiling.time_thunks',
BoolParam(True))
AddConfigVar('profiling.n_apply',
"Number of apply instances to print by default",
"Number of Apply instances to print by default",
IntParam(20, lambda i: i > 0),
in_c_key=False)
AddConfigVar('profiling.n_ops',
"Number of ops to print by default",
"Number of Ops to print by default",
IntParam(20, lambda i: i > 0),
in_c_key=False)
AddConfigVar('profiling.min_memory_size',
"""For the memory profile, do not print apply nodes if the size
of their outputs (in bytes) is lower then this threshold""",
"""For the memory profile, do not print Apply nodes if the size
of their outputs (in bytes) is lower than this threshold""",
IntParam(1024, lambda i: i >= 0),
in_c_key=False)
......@@ -185,12 +185,12 @@ class ProfileStats(object):
theano.sandbox.cuda.cuda_enabled):
if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1':
raise Exception(
"You are running Theano profiler with CUDA enabled."
" Theano GPU ops execution are asynchron by default."
"You are running the Theano profiler with CUDA enabled."
" Theano GPU ops execution is asynchronous by default."
" So by default, the profile is useless."
" You must use set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA drvier to"
" synchonize the execution to get meaning full profile.")
" You must set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA driver to"
" synchronize the execution to get a meaningful profile.")
self.apply_callcount = {}
self.output_size = {}
......@@ -708,7 +708,7 @@ class ProfileStats(object):
if len(fct_memory) > 1:
print >> file, ("Memory Profile "
"(the max between all function in that profile)")
"(the max between all functions in that profile)")
else:
print >> file, "Memory Profile"
......@@ -717,15 +717,15 @@ class ProfileStats(object):
print >> file, "---"
# print >> file, " Max if no gc, inplace and view: %dKB" % int(
# round(max_sum_size / 1024))
print >> file, " Max if linker=cvm (default): unknow"
print >> file, " Max if linker=cvm (default): unknown"
print >> file, " Max if no gc (allow_gc=False): %dKB" % int(round(
max_node_memory_size / 1024.))
print >> file, " Max if linker=c|py: %dKB" % int(round(
max_running_max_memory_size / 1024.))
# print >> file, " Memory saved if view are used: %dKB" % int(round(
# max_node_memory_saved_by_view / 1024.))
# print >> file, " Memory saved if inplace op are used: %dKB" % int(
# round(max_node_memory_saved_by_inplace / 1024.))
# print >> file, " Memory saved if views are used: %dKB" % int(
# round(max_node_memory_saved_by_view / 1024.))
# print >> file, " Memory saved if inplace ops are used: %dKB" % \
# int(round(max_node_memory_saved_by_inplace / 1024.))
print >> file, " Memory saved if gc is enabled (linker=c|py): %dKB" % int(
round(max_node_memory_size - max_running_max_memory_size) / 1024.)
if (hasattr(theano, 'sandbox') and
......@@ -734,7 +734,7 @@ class ProfileStats(object):
hasattr(theano.sandbox.cuda.cuda_ndarray.cuda_ndarray,
'theano_allocated')):
_, gpu_max = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.theano_allocated()
print >> file, (" Max Memory allocated on the GPU"
print >> file, (" Max Memory allocated on the GPU "
"(for all functions): %dKB" %
int(round(gpu_max / 1024.)))
......@@ -785,11 +785,11 @@ class ProfileStats(object):
)
print >> file, ''
if N == 0:
print >> file, (' All Apply node have outputs size that take'
' less then %dB.' %
print >> file, (' All Apply nodes have output sizes that take'
' less than %dB.' %
config.profiling.min_memory_size)
print >> file, (
" <created/inplace/view> is taked from the op declaration.")
" <created/inplace/view> is taken from the Op's declaration.")
print >> file, (" Apply nodes marked 'inplace' or 'view' may"
" actually allocate memory, this is not reported"
" here. If you use DebugMode, warnings will be"
......@@ -999,16 +999,25 @@ if 0: # old code still to be ported from ProfileMode
#tip 4
for a, t in apply_time.iteritems():
node = a
if isinstance(node.op, T.Dot) and all([ len(i.type.broadcastable)==2 for i in node.inputs]):
print " - You have a dot operation that was not optimized to dot22 that is faster. Make sure the inputs are float32 or 64 and are the same for both input. Currently they are:",[i.type for i in node.inputs]
if (isinstance(node.op, T.Dot) and
all([len(i.type.broadcastable) == 2 for i in node.inputs])):
print (" - You have a dot operation that was not optimized "
"to dot22 that is faster. Make sure the inputs are "
"float32 or float64 and are the same for both inputs. "
"Currently they are: %s" %
[i.type for i in node.inputs])
#tip 5
for a, t in apply_time.iteritems():
node = a
if isinstance(node.op, RandomFunction):
print " - Replace the default random number generator by 'from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams' as this is is faster. It is still experimental, but seam to work correctly."
print (" - Replace the default random number generator by "
"'from theano.sandbox.rng_mrg import MRG_RandomStreams "
"as RandomStreams' as this is is faster. It is still "
"experimental, but seams to work correctly.")
if config.device.startswith("gpu"):
print " - MRG_RandomStreams is the only random number supported on the GPU."
print (" - MRG_RandomStreams is the only random number"
" supported on the GPU.")
break
def print_summary(self,
......
......@@ -287,7 +287,6 @@ class Stack(VM):
if self.allow_gc and self.dependencies is None:
raise ValueError("Must set dependencies when using GC")
def run_thunk_of_node(self, node):
"""Run the thunk corresponding to Apply instance `node`
......@@ -582,12 +581,12 @@ class VM_Linker(link.LocalLinker):
theano.sandbox.cuda.cuda_enabled):
if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1':
raise Exception(
"You are running Theano profiler with CUDA enabled."
" Theano GPU ops execution are asynchron by default."
"You are running the Theano profiler with CUDA enabled."
" Theano GPU ops execution is asynchronous by default."
" So by default, the profile is useless."
" You must use set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA drvier to"
" synchonize the execution to get meaning full profile.")
" You must set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA driver to"
" synchronize the execution to get a meaningful profile.")
if no_recycling is None:
no_recycling = []
......@@ -661,7 +660,9 @@ class VM_Linker(link.LocalLinker):
pre_call_clear = [storage_map[v] for v in self.no_recycling]
if self.callback is not None or (config.profile and config.profile_memory):
if (self.callback is not None or
(config.profile and config.profile_memory)):
if self.use_cloop and self.callback is not None:
logger.warn('CVM does not support callback, using Stack VM.')
if self.use_cloop and config.profile_memory:
......
......@@ -366,8 +366,8 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
# Theano graph, because on Windows 64, all shapes are expressed
# with longs.
# If a long fits in int64, we convert it into an int64, like
# numpy.asarray() does up to 1.7. NumPy 1.7.1 upcaset to int64
# if possible, but fallback to uint64 if int64 isn't possible but
# numpy.asarray() does up to 1.7. NumPy 1.7.1 upcasts to int64
# if possible, but falls back to uint64 if int64 isn't possible but
# uint64 is. We always do as NumPy 1.7.1 here.
# If x is too big, an OverflowError will be raised by numpy.
try:
......@@ -382,10 +382,10 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
if x.dtype == 'bool':
x_ = numpy.asarray(x_, dtype='uint8')
else:
# Here x is probably a list or a tuple. If it contain a long,
# we will behave like the current NumPy version: 1.7 and bellow,
# it will only work if the long fit in int64. For NumPy 1.7.1+,
# it will work if the long git in int64 or uint64.
# Here x is probably a list or a tuple. If it contains a long,
# we will behave like the current NumPy version: 1.7 and below,
# it will only work if the long fits in int64. For NumPy 1.7.1+,
# it will work if the long fits in int64 or uint64.
x_ = numpy.asarray(x)
assert type(x_) == numpy.ndarray
......@@ -1199,32 +1199,33 @@ class TensorType(Type):
return numpy.zeros(shape, dtype=self.dtype)
def get_shape_info(self, obj):
"""Return the information needed to compute the memory size of obj.
"""
Return the information needed to compute the memory size of ``obj``.
The memory size is only the data, so this exclude the container.
The memory size is only the data, so this excludes the container.
For an ndarray, this is the data, but not the ndarray object and
others data structures as shape and strides.
other data structures such as shape and strides.
get_shape_info() and get_size() work in tendem for the memory profiler.
``get_shape_info()`` and ``get_size()`` work in tandem for the memory
profiler.
get_shape_info() is called during the execution of the function.
``get_shape_info()`` is called during the execution of the function.
So it is better that it is not too slow.
get_size() will be called with the output of this function
``get_size()`` will be called on the output of this function
when printing the memory profile.
:param obj: The object that this Type represent during execution
:return: Python object that self.get_size() understand
:param obj: The object that this Type represents during execution
:return: Python object that ``self.get_size()`` understands
"""
return obj.shape
def get_size(self, shape_info):
""" Number of bytes taken by the object represented by shape_info
""" Number of bytes taken by the object represented by shape_info.
:param shape_info: the output of the call to get_shape_info()
:return: the number of bytes taken by the object described in
shape_info.
:return: the number of bytes taken by the object described by
``shape_info``.
"""
if shape_info:
return numpy.prod(shape_info) * numpy.dtype(self.dtype).itemsize
......
......@@ -844,10 +844,9 @@ class T_using_gpu(unittest.TestCase):
assert not numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()])
def test_using_gpu_3(self):
if theano.config.device.find('gpu') >-1:
if theano.config.device.find('gpu') > -1:
from theano import function, config, shared, sandbox, Out
import theano.tensor as T
......@@ -870,12 +869,14 @@ class T_using_gpu(unittest.TestCase):
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
print 'Numpy result is', numpy.asarray(r)
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
if numpy.any([isinstance(x.op, T.Elemwise)
for x in f.maker.fgraph.toposort()]):
print 'Used the cpu'
else:
print 'Used the gpu'
assert not numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()])
assert not numpy.any([isinstance(x.op, T.Elemwise)
for x in f.maker.fgraph.toposort()])
class T_fibby(unittest.TestCase):
......@@ -904,13 +905,14 @@ class T_fibby(unittest.TestCase):
return theano.Apply(self,
inputs=[x_],
outputs=[x_.type()])
# using x_.type() is dangerous, it copies x's broadcasting behaviour
# using x_.type() is dangerous, it copies x's broadcasting
# behaviour
def perform(self, node, inputs, output_storage):
x, = inputs
y = output_storage[0][0] = x.copy()
for i in range(2, len(x)):
y[i] = y[i-1] * y[i-2] + x[i]
y[i] = y[i - 1] * y[i - 2] + x[i]
def c_code(self, node, name, inames, onames, sub):
x, = inames
......@@ -950,30 +952,30 @@ class T_fibby(unittest.TestCase):
except NotScalarConstantError:
pass
# Test it don't apply when not needed
# Test it does not apply when not needed
x = T.dvector()
f = function([x], fibby(x))
#theano.printing.debugprint(f)
#We call the function to make sure it run.
#If you run in DebugMode, it will compare the C and Python output
# We call the function to make sure it runs.
# If you run in DebugMode, it will compare the C and Python outputs.
f(numpy.random.rand(5))
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, Fibby)
# Test that the optimization get applied
# Test that the optimization gets applied.
f_zero = function([], fibby(T.zeros([5])))
#theano.printing.debugprint(f_zero)
#If you run in DebugMode, it will compare the output before
# and after the optimization
# If you run in DebugMode, it will compare the output before
# and after the optimization.
f_zero()
#Check that the optimization remove the Fibby Op.
#For security, the Theano memory interface make that the output
#of the function is always memory not aliaced to the input.
#That is why there is a DeepCopyOp op.
# Check that the optimization removes the Fibby Op.
# For security, the Theano memory interface ensures that the output
# of the function is always memory not aliased to the input.
# That is why there is a DeepCopyOp op.
topo = f_zero.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp)
......@@ -1002,35 +1004,35 @@ class T_graphstructures(unittest.TestCase):
from theano.tensor import add, mul, Apply, Variable, TensorType
# Instantiate a type that represents a matrix of doubles
float64_matrix = TensorType(dtype = 'float64', # double
broadcastable = (False, False)) # matrix
float64_matrix = TensorType(dtype='float64', # double
broadcastable=(False, False)) # matrix
# We make the Variable instances we need.
x = Variable(type = float64_matrix, name = 'x')
y = Variable(type = float64_matrix, name = 'y')
z = Variable(type = float64_matrix, name = 'z')
x = Variable(type=float64_matrix, name='x')
y = Variable(type=float64_matrix, name='y')
z = Variable(type=float64_matrix, name='z')
# This is the Variable that we want to symbolically represents y*z
mul_variable = Variable(type = float64_matrix)
mul_variable = Variable(type=float64_matrix)
assert mul_variable.owner is None
# Instantiate a symbolic multiplication
node_mul = Apply(op = mul,
inputs = [y, z],
outputs = [mul_variable])
node_mul = Apply(op=mul,
inputs=[y, z],
outputs=[mul_variable])
# Fields 'owner' and 'index' are set by Apply
assert mul_variable.owner is node_mul
# 'index' is the position of mul_variable in mode_mul's outputs
assert mul_variable.index == 0
# This is the Variable that we want to symbolically represents x+(y*z)
add_variable = Variable(type = float64_matrix)
add_variable = Variable(type=float64_matrix)
assert add_variable.owner is None
# Instantiate a symbolic addition
node_add = Apply(op = add,
inputs = [x, mul_variable],
outputs = [add_variable])
node_add = Apply(op=add,
inputs=[x, mul_variable],
outputs=[add_variable])
# Fields 'owner' and 'index' are set by Apply
assert add_variable.owner is node_add
assert add_variable.index == 0
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论