提交 59cb7198 authored 作者: lamblin's avatar lamblin

Merge pull request #1364 from delallea/minor

Minor fixes
...@@ -235,40 +235,40 @@ function. The ``[fibby]`` argument is a hint that our optimizer works on nodes ...@@ -235,40 +235,40 @@ function. The ``[fibby]`` argument is a hint that our optimizer works on nodes
whose ``.op`` attribute equals ``fibby``. whose ``.op`` attribute equals ``fibby``.
The function here (``fibby_of_zero``) expects an ``Apply`` instance as an The function here (``fibby_of_zero``) expects an ``Apply`` instance as an
argument for parameter ``node``. It tests using argument for parameter ``node``. It tests using
function ``get_constant_value``, which determines if a function ``get_scalar_constant_value``, which determines if a
Variable (``x``) is guaranteed to be a constant, and if so, what constant. Variable (``x``) is guaranteed to be a constant, and if so, what constant.
Test the optimization Test the optimization
===================== =====================
Here is some code that test the optimization is applied only when needed. Here is some code to test that the optimization is applied only when needed.
.. code-block:: python .. code-block:: python
# Test it don't apply when not needed # Test it does not apply when not needed
x = T.dvector() x = T.dvector()
f = function([x], fibby(x)) f = function([x], fibby(x))
#theano.printing.debugprint(f) #theano.printing.debugprint(f)
#We call the function to make sure it run. # We call the function to make sure it runs.
#If you run in DebugMode, it will compare the C and Python output # If you run in DebugMode, it will compare the C and Python outputs.
f(numpy.random.rand(5)) f(numpy.random.rand(5))
topo = f.maker.fgraph.toposort() topo = f.maker.fgraph.toposort()
assert len(topo) == 1 assert len(topo) == 1
assert isinstance(topo[0].op, Fibby) assert isinstance(topo[0].op, Fibby)
# Test that the optimization get applied # Test that the optimization gets applied.
f_zero = function([], fibby(T.zeros([5]))) f_zero = function([], fibby(T.zeros([5])))
#theano.printing.debugprint(f_zero) #theano.printing.debugprint(f_zero)
#If you run in DebugMode, it will compare the output before # If you run in DebugMode, it will compare the output before
# and after the optimization # and after the optimization.
f_zero() f_zero()
#Check that the optimization remove the Fibby Op. # Check that the optimization removes the Fibby Op.
#For security, the Theano memory interface make that the output # For security, the Theano memory interface ensures that the output
#of the function is always memory not aliaced to the input. # of the function is always memory not aliased to the input.
#That is why there is a DeepCopyOp op. # That is why there is a DeepCopyOp op.
topo = f_zero.maker.fgraph.toposort() topo = f_zero.maker.fgraph.toposort()
assert len(topo) == 1 assert len(topo) == 1
assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp) assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp)
...@@ -108,36 +108,36 @@ default values. ...@@ -108,36 +108,36 @@ default values.
.. method:: get_shape_info(obj) .. method:: get_shape_info(obj)
Optional. Only needed to profile the memory of this Type of object Optional. Only needed to profile the memory of this Type of object.
Return the information needed to compute the memory size of obj. Return the information needed to compute the memory size of ``obj``.
The memory size is only the data, so this exclude the container. The memory size is only the data, so this excludes the container.
For an ndarray, this is the data, but not the ndarray object and For an ndarray, this is the data, but not the ndarray object and
others data structures as shape and strides. other data structures such as shape and strides.
get_shape_info() and get_size() work in tendem for the memory profiler. ``get_shape_info()`` and ``get_size()`` work in tandem for the memory profiler.
get_shape_info() is called during the execution of the function. ``get_shape_info()`` is called during the execution of the function.
So it is better that it is not too slow. So it is better that it is not too slow.
get_size() will be called with the output of this function ``get_size()`` will be called on the output of this function
when printing the memory profile. when printing the memory profile.
:param obj: The object that this Type represent during execution :param obj: The object that this Type represents during execution
:return: Python object that self.get_size() understand :return: Python object that ``self.get_size()`` understands
.. method:: get_size(shape_info) .. method:: get_size(shape_info)
Number of bytes taken by the object represented by shape_info Number of bytes taken by the object represented by shape_info.
Optional. Only needed to profile the memory of this Type of object Optional. Only needed to profile the memory of this Type of object.
:param shape_info: the output of the call to get_shape_info() :param shape_info: the output of the call to get_shape_info()
:return: the number of bytes taken by the object described in :return: the number of bytes taken by the object described by
shape_info. ``shape_info``.
"""
For each method, the *default* is what ``Type`` defines For each method, the *default* is what ``Type`` defines
for you. So, if you create an instance of ``Type`` or an for you. So, if you create an instance of ``Type`` or an
instance of a subclass of ``Type``, you instance of a subclass of ``Type``, you
......
...@@ -271,7 +271,7 @@ import theano and print the config variable, as in: ...@@ -271,7 +271,7 @@ import theano and print the config variable, as in:
Default False Default False
Do the vm/cvm linkers profile the execution of Theano functions? Do the vm/cvm linkers profile the execution time of Theano functions?
.. attribute:: profile_memory .. attribute:: profile_memory
...@@ -279,8 +279,8 @@ import theano and print the config variable, as in: ...@@ -279,8 +279,8 @@ import theano and print the config variable, as in:
Default False Default False
Do the vm/cvm linkers profile the memory of Theano functions get printed? Do the vm/cvm linkers profile the memory usage of Theano functions?
It only work when profile=True. It only works when profile=True.
.. attribute:: profile_optimizer .. attribute:: profile_optimizer
...@@ -289,26 +289,26 @@ import theano and print the config variable, as in: ...@@ -289,26 +289,26 @@ import theano and print the config variable, as in:
Default False Default False
Do the vm/cvm linkers profile the optimization phase when compiling a Theano function? Do the vm/cvm linkers profile the optimization phase when compiling a Theano function?
It only work when profile=True. It only works when profile=True.
.. attribute:: profiling.n_apply .. attribute:: profiling.n_apply
Positive int value, default: 20. Positive int value, default: 20.
The number of apply node to print in the profiler output The number of Apply nodes to print in the profiler output
.. attribute:: profiling.n_ops .. attribute:: profiling.n_ops
Positive int value, default: 20. Positive int value, default: 20.
The number of ops to print in the profiler output The number of Ops to print in the profiler output
.. attribute:: profiling.min_memory_size .. attribute:: profiling.min_memory_size
Positive int value, default: 1024. Positive int value, default: 1024.
For the memory profile, do not print apply nodes if the size For the memory profile, do not print Apply nodes if the size
of their outputs (in bytes) is lower then this. of their outputs (in bytes) is lower than this.
.. attribute:: config.lib.amdlibm .. attribute:: config.lib.amdlibm
......
...@@ -904,23 +904,23 @@ Theano fully supports basic indexing ...@@ -904,23 +904,23 @@ Theano fully supports basic indexing
`Integer advanced indexing `Integer advanced indexing
<http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer>`_ <http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer>`_
will be supported in 0.6rc4 (or the development version). We do not will be supported in 0.6rc4 (or the development version). We do not
support boolean masks, as Theano do not have a boolean type (we use support boolean masks, as Theano does not have a boolean type (we use
int8 for the output of logic operator). To imitate boolean advanced int8 for the output of logic operators). To imitate boolean advanced
indexing, you can do:: indexing, you can do::
# NumPy indexing with a mask # NumPy indexing with a mask
n = np.arange(9).reshape(3,3) n = np.arange(9).reshape(3,3)
n[n>4] # array([5, 6, 7, 8]) n[n > 4] # array([5, 6, 7, 8])
# Theano indexing with a "mask" # Theano indexing with a "mask" (incorrect approach)
t = tt.arange(9).reshape((3,3)) t = theano.tensor.arange(9).reshape((3,3))
t[t>4].eval() # an array with shape (3, 3, 3) t[t > 4].eval() # an array with shape (3, 3, 3)
# getting a Theano result like NumPy # getting a Theano result like NumPy
t[(t>4).nonzero()].eval() # array([5, 6, 7, 8]) t[(t > 4).nonzero()].eval() # array([5, 6, 7, 8])
The gradient of Advanced indexing need in many cases NumPy The gradient of Advanced indexing needs in many cases NumPy
1.8. It isn't released as of April 30, 2013. You can use NumPy 1.8. It is not released yet as of April 30th, 2013. You can use NumPy
development version to have this feature now. development version to have this feature now.
......
...@@ -49,12 +49,12 @@ class Profile_Maker(FunctionMaker): ...@@ -49,12 +49,12 @@ class Profile_Maker(FunctionMaker):
theano.sandbox.cuda.cuda_enabled): theano.sandbox.cuda.cuda_enabled):
if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1': if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1':
raise Exception( raise Exception(
"You are running Theano profiler with CUDA enabled." "You are running the Theano profiler with CUDA enabled."
" Theano GPU ops execution are asynchron by default." " Theano GPU ops execution is asynchronous by default."
" So by default, the profile is useless." " So by default, the profile is useless."
" You must use set the environment variable" " You must set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA drvier to" " CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA driver to"
" synchonize the execution to get meaning full profile.") " synchronize the execution to get a meaningful profile.")
# create a function-specific storage container for profiling info # create a function-specific storage container for profiling info
profile = ProfileStats(atexit_print=False) profile = ProfileStats(atexit_print=False)
...@@ -584,14 +584,21 @@ Test them first, as they are not guaranteed to always provide a speedup.""" ...@@ -584,14 +584,21 @@ Test them first, as they are not guaranteed to always provide a speedup."""
if not config.lib.amdlibm and any([exp_float32_op(a.op) and if not config.lib.amdlibm and any([exp_float32_op(a.op) and
a.inputs[0].dtype == 'float32' a.inputs[0].dtype == 'float32'
for i, a in apply_time]): for i, a in apply_time]):
print " - With the default gcc libm, exp in float32 is slower than in float64! Try Theano flag floatX=float64, or install amdlibm and set the theano flags lib.amdlibm=True" print (" - With the default gcc libm, exp in float32 is slower "
"than in float64! Try Theano flag floatX=float64, or "
"install amdlibm and set the theano flags lib.amdlibm=True")
printed_tip = True printed_tip = True
#tip 4 #tip 4
for a, t in apply_time.iteritems(): for a, t in apply_time.iteritems():
node = a[1] node = a[1]
if isinstance(node.op, T.Dot) and all([ len(i.type.broadcastable)==2 for i in node.inputs]): if (isinstance(node.op, T.Dot) and
print " - You have a dot operation that was not optimized to dot22 (which is faster). Make sure the inputs are float32 or 64, and are the same for both inputs. Currently they are:",[i.type for i in node.inputs] all([len(i.type.broadcastable) == 2 for i in node.inputs])):
print (" - You have a dot operation that was not optimized to"
" dot22 (which is faster). Make sure the inputs are "
"float32 or float64, and are the same for both inputs. "
"Currently they are: %s" %
[i.type for i in node.inputs])
printed_tip = True printed_tip = True
#tip 5 #tip 5
...@@ -599,9 +606,13 @@ Test them first, as they are not guaranteed to always provide a speedup.""" ...@@ -599,9 +606,13 @@ Test them first, as they are not guaranteed to always provide a speedup."""
node = a[1] node = a[1]
if isinstance(node.op, RandomFunction): if isinstance(node.op, RandomFunction):
printed_tip = True printed_tip = True
print " - Replace the default random number generator by 'from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams', as this is is faster. It is still experimental, but seems to work correctly." print (" - Replace the default random number generator by "
"'from theano.sandbox.rng_mrg import MRG_RandomStreams "
"as RandomStreams', as this is is faster. It is still "
"experimental, but seems to work correctly.")
if config.device.startswith("gpu"): if config.device.startswith("gpu"):
print " - MRG_RandomStreams is the only random number generator supported on the GPU." print (" - MRG_RandomStreams is the only random number"
" generator supported on the GPU.")
break break
if not printed_tip: if not printed_tip:
......
...@@ -37,18 +37,18 @@ AddConfigVar('profiling.time_thunks', ...@@ -37,18 +37,18 @@ AddConfigVar('profiling.time_thunks',
BoolParam(True)) BoolParam(True))
AddConfigVar('profiling.n_apply', AddConfigVar('profiling.n_apply',
"Number of apply instances to print by default", "Number of Apply instances to print by default",
IntParam(20, lambda i: i > 0), IntParam(20, lambda i: i > 0),
in_c_key=False) in_c_key=False)
AddConfigVar('profiling.n_ops', AddConfigVar('profiling.n_ops',
"Number of ops to print by default", "Number of Ops to print by default",
IntParam(20, lambda i: i > 0), IntParam(20, lambda i: i > 0),
in_c_key=False) in_c_key=False)
AddConfigVar('profiling.min_memory_size', AddConfigVar('profiling.min_memory_size',
"""For the memory profile, do not print apply nodes if the size """For the memory profile, do not print Apply nodes if the size
of their outputs (in bytes) is lower then this threshold""", of their outputs (in bytes) is lower than this threshold""",
IntParam(1024, lambda i: i >= 0), IntParam(1024, lambda i: i >= 0),
in_c_key=False) in_c_key=False)
...@@ -185,12 +185,12 @@ class ProfileStats(object): ...@@ -185,12 +185,12 @@ class ProfileStats(object):
theano.sandbox.cuda.cuda_enabled): theano.sandbox.cuda.cuda_enabled):
if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1': if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1':
raise Exception( raise Exception(
"You are running Theano profiler with CUDA enabled." "You are running the Theano profiler with CUDA enabled."
" Theano GPU ops execution are asynchron by default." " Theano GPU ops execution is asynchronous by default."
" So by default, the profile is useless." " So by default, the profile is useless."
" You must use set the environment variable" " You must set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA drvier to" " CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA driver to"
" synchonize the execution to get meaning full profile.") " synchronize the execution to get a meaningful profile.")
self.apply_callcount = {} self.apply_callcount = {}
self.output_size = {} self.output_size = {}
...@@ -708,7 +708,7 @@ class ProfileStats(object): ...@@ -708,7 +708,7 @@ class ProfileStats(object):
if len(fct_memory) > 1: if len(fct_memory) > 1:
print >> file, ("Memory Profile " print >> file, ("Memory Profile "
"(the max between all function in that profile)") "(the max between all functions in that profile)")
else: else:
print >> file, "Memory Profile" print >> file, "Memory Profile"
...@@ -717,15 +717,15 @@ class ProfileStats(object): ...@@ -717,15 +717,15 @@ class ProfileStats(object):
print >> file, "---" print >> file, "---"
# print >> file, " Max if no gc, inplace and view: %dKB" % int( # print >> file, " Max if no gc, inplace and view: %dKB" % int(
# round(max_sum_size / 1024)) # round(max_sum_size / 1024))
print >> file, " Max if linker=cvm (default): unknow" print >> file, " Max if linker=cvm (default): unknown"
print >> file, " Max if no gc (allow_gc=False): %dKB" % int(round( print >> file, " Max if no gc (allow_gc=False): %dKB" % int(round(
max_node_memory_size / 1024.)) max_node_memory_size / 1024.))
print >> file, " Max if linker=c|py: %dKB" % int(round( print >> file, " Max if linker=c|py: %dKB" % int(round(
max_running_max_memory_size / 1024.)) max_running_max_memory_size / 1024.))
# print >> file, " Memory saved if view are used: %dKB" % int(round( # print >> file, " Memory saved if views are used: %dKB" % int(
# max_node_memory_saved_by_view / 1024.)) # round(max_node_memory_saved_by_view / 1024.))
# print >> file, " Memory saved if inplace op are used: %dKB" % int( # print >> file, " Memory saved if inplace ops are used: %dKB" % \
# round(max_node_memory_saved_by_inplace / 1024.)) # int(round(max_node_memory_saved_by_inplace / 1024.))
print >> file, " Memory saved if gc is enabled (linker=c|py): %dKB" % int( print >> file, " Memory saved if gc is enabled (linker=c|py): %dKB" % int(
round(max_node_memory_size - max_running_max_memory_size) / 1024.) round(max_node_memory_size - max_running_max_memory_size) / 1024.)
if (hasattr(theano, 'sandbox') and if (hasattr(theano, 'sandbox') and
...@@ -734,7 +734,7 @@ class ProfileStats(object): ...@@ -734,7 +734,7 @@ class ProfileStats(object):
hasattr(theano.sandbox.cuda.cuda_ndarray.cuda_ndarray, hasattr(theano.sandbox.cuda.cuda_ndarray.cuda_ndarray,
'theano_allocated')): 'theano_allocated')):
_, gpu_max = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.theano_allocated() _, gpu_max = theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.theano_allocated()
print >> file, (" Max Memory allocated on the GPU" print >> file, (" Max Memory allocated on the GPU "
"(for all functions): %dKB" % "(for all functions): %dKB" %
int(round(gpu_max / 1024.))) int(round(gpu_max / 1024.)))
...@@ -785,11 +785,11 @@ class ProfileStats(object): ...@@ -785,11 +785,11 @@ class ProfileStats(object):
) )
print >> file, '' print >> file, ''
if N == 0: if N == 0:
print >> file, (' All Apply node have outputs size that take' print >> file, (' All Apply nodes have output sizes that take'
' less then %dB.' % ' less than %dB.' %
config.profiling.min_memory_size) config.profiling.min_memory_size)
print >> file, ( print >> file, (
" <created/inplace/view> is taked from the op declaration.") " <created/inplace/view> is taken from the Op's declaration.")
print >> file, (" Apply nodes marked 'inplace' or 'view' may" print >> file, (" Apply nodes marked 'inplace' or 'view' may"
" actually allocate memory, this is not reported" " actually allocate memory, this is not reported"
" here. If you use DebugMode, warnings will be" " here. If you use DebugMode, warnings will be"
...@@ -999,16 +999,25 @@ if 0: # old code still to be ported from ProfileMode ...@@ -999,16 +999,25 @@ if 0: # old code still to be ported from ProfileMode
#tip 4 #tip 4
for a, t in apply_time.iteritems(): for a, t in apply_time.iteritems():
node = a node = a
if isinstance(node.op, T.Dot) and all([ len(i.type.broadcastable)==2 for i in node.inputs]): if (isinstance(node.op, T.Dot) and
print " - You have a dot operation that was not optimized to dot22 that is faster. Make sure the inputs are float32 or 64 and are the same for both input. Currently they are:",[i.type for i in node.inputs] all([len(i.type.broadcastable) == 2 for i in node.inputs])):
print (" - You have a dot operation that was not optimized "
"to dot22 that is faster. Make sure the inputs are "
"float32 or float64 and are the same for both inputs. "
"Currently they are: %s" %
[i.type for i in node.inputs])
#tip 5 #tip 5
for a, t in apply_time.iteritems(): for a, t in apply_time.iteritems():
node = a node = a
if isinstance(node.op, RandomFunction): if isinstance(node.op, RandomFunction):
print " - Replace the default random number generator by 'from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams' as this is is faster. It is still experimental, but seam to work correctly." print (" - Replace the default random number generator by "
"'from theano.sandbox.rng_mrg import MRG_RandomStreams "
"as RandomStreams' as this is is faster. It is still "
"experimental, but seams to work correctly.")
if config.device.startswith("gpu"): if config.device.startswith("gpu"):
print " - MRG_RandomStreams is the only random number supported on the GPU." print (" - MRG_RandomStreams is the only random number"
" supported on the GPU.")
break break
def print_summary(self, def print_summary(self,
......
...@@ -287,7 +287,6 @@ class Stack(VM): ...@@ -287,7 +287,6 @@ class Stack(VM):
if self.allow_gc and self.dependencies is None: if self.allow_gc and self.dependencies is None:
raise ValueError("Must set dependencies when using GC") raise ValueError("Must set dependencies when using GC")
def run_thunk_of_node(self, node): def run_thunk_of_node(self, node):
"""Run the thunk corresponding to Apply instance `node` """Run the thunk corresponding to Apply instance `node`
...@@ -582,12 +581,12 @@ class VM_Linker(link.LocalLinker): ...@@ -582,12 +581,12 @@ class VM_Linker(link.LocalLinker):
theano.sandbox.cuda.cuda_enabled): theano.sandbox.cuda.cuda_enabled):
if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1': if os.environ.get('CUDA_LAUNCH_BLOCKING', '0') != '1':
raise Exception( raise Exception(
"You are running Theano profiler with CUDA enabled." "You are running the Theano profiler with CUDA enabled."
" Theano GPU ops execution are asynchron by default." " Theano GPU ops execution is asynchronous by default."
" So by default, the profile is useless." " So by default, the profile is useless."
" You must use set the environment variable" " You must set the environment variable"
" CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA drvier to" " CUDA_LAUNCH_BLOCKING to 1 to tell the CUDA driver to"
" synchonize the execution to get meaning full profile.") " synchronize the execution to get a meaningful profile.")
if no_recycling is None: if no_recycling is None:
no_recycling = [] no_recycling = []
...@@ -661,7 +660,9 @@ class VM_Linker(link.LocalLinker): ...@@ -661,7 +660,9 @@ class VM_Linker(link.LocalLinker):
pre_call_clear = [storage_map[v] for v in self.no_recycling] pre_call_clear = [storage_map[v] for v in self.no_recycling]
if self.callback is not None or (config.profile and config.profile_memory): if (self.callback is not None or
(config.profile and config.profile_memory)):
if self.use_cloop and self.callback is not None: if self.use_cloop and self.callback is not None:
logger.warn('CVM does not support callback, using Stack VM.') logger.warn('CVM does not support callback, using Stack VM.')
if self.use_cloop and config.profile_memory: if self.use_cloop and config.profile_memory:
......
...@@ -366,8 +366,8 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None): ...@@ -366,8 +366,8 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
# Theano graph, because on Windows 64, all shapes are expressed # Theano graph, because on Windows 64, all shapes are expressed
# with longs. # with longs.
# If a long fits in int64, we convert it into an int64, like # If a long fits in int64, we convert it into an int64, like
# numpy.asarray() does up to 1.7. NumPy 1.7.1 upcaset to int64 # numpy.asarray() does up to 1.7. NumPy 1.7.1 upcasts to int64
# if possible, but fallback to uint64 if int64 isn't possible but # if possible, but falls back to uint64 if int64 isn't possible but
# uint64 is. We always do as NumPy 1.7.1 here. # uint64 is. We always do as NumPy 1.7.1 here.
# If x is too big, an OverflowError will be raised by numpy. # If x is too big, an OverflowError will be raised by numpy.
try: try:
...@@ -382,10 +382,10 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None): ...@@ -382,10 +382,10 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
if x.dtype == 'bool': if x.dtype == 'bool':
x_ = numpy.asarray(x_, dtype='uint8') x_ = numpy.asarray(x_, dtype='uint8')
else: else:
# Here x is probably a list or a tuple. If it contain a long, # Here x is probably a list or a tuple. If it contains a long,
# we will behave like the current NumPy version: 1.7 and bellow, # we will behave like the current NumPy version: 1.7 and below,
# it will only work if the long fit in int64. For NumPy 1.7.1+, # it will only work if the long fits in int64. For NumPy 1.7.1+,
# it will work if the long git in int64 or uint64. # it will work if the long fits in int64 or uint64.
x_ = numpy.asarray(x) x_ = numpy.asarray(x)
assert type(x_) == numpy.ndarray assert type(x_) == numpy.ndarray
...@@ -1199,32 +1199,33 @@ class TensorType(Type): ...@@ -1199,32 +1199,33 @@ class TensorType(Type):
return numpy.zeros(shape, dtype=self.dtype) return numpy.zeros(shape, dtype=self.dtype)
def get_shape_info(self, obj): def get_shape_info(self, obj):
"""Return the information needed to compute the memory size of obj. """
Return the information needed to compute the memory size of ``obj``.
The memory size is only the data, so this exclude the container. The memory size is only the data, so this excludes the container.
For an ndarray, this is the data, but not the ndarray object and For an ndarray, this is the data, but not the ndarray object and
others data structures as shape and strides. other data structures such as shape and strides.
get_shape_info() and get_size() work in tendem for the memory profiler. ``get_shape_info()`` and ``get_size()`` work in tandem for the memory
profiler.
get_shape_info() is called during the execution of the function. ``get_shape_info()`` is called during the execution of the function.
So it is better that it is not too slow. So it is better that it is not too slow.
get_size() will be called with the output of this function ``get_size()`` will be called on the output of this function
when printing the memory profile. when printing the memory profile.
:param obj: The object that this Type represent during execution :param obj: The object that this Type represents during execution
:return: Python object that self.get_size() understand :return: Python object that ``self.get_size()`` understands
""" """
return obj.shape return obj.shape
def get_size(self, shape_info): def get_size(self, shape_info):
""" Number of bytes taken by the object represented by shape_info """ Number of bytes taken by the object represented by shape_info.
:param shape_info: the output of the call to get_shape_info() :param shape_info: the output of the call to get_shape_info()
:return: the number of bytes taken by the object described in :return: the number of bytes taken by the object described by
shape_info. ``shape_info``.
""" """
if shape_info: if shape_info:
return numpy.prod(shape_info) * numpy.dtype(self.dtype).itemsize return numpy.prod(shape_info) * numpy.dtype(self.dtype).itemsize
......
...@@ -844,10 +844,9 @@ class T_using_gpu(unittest.TestCase): ...@@ -844,10 +844,9 @@ class T_using_gpu(unittest.TestCase):
assert not numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]) assert not numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()])
def test_using_gpu_3(self): def test_using_gpu_3(self):
if theano.config.device.find('gpu') >-1: if theano.config.device.find('gpu') > -1:
from theano import function, config, shared, sandbox, Out from theano import function, config, shared, sandbox, Out
import theano.tensor as T import theano.tensor as T
...@@ -870,12 +869,14 @@ class T_using_gpu(unittest.TestCase): ...@@ -870,12 +869,14 @@ class T_using_gpu(unittest.TestCase):
print 'Looping %d times took' % iters, t1 - t0, 'seconds' print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r print 'Result is', r
print 'Numpy result is', numpy.asarray(r) print 'Numpy result is', numpy.asarray(r)
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): if numpy.any([isinstance(x.op, T.Elemwise)
for x in f.maker.fgraph.toposort()]):
print 'Used the cpu' print 'Used the cpu'
else: else:
print 'Used the gpu' print 'Used the gpu'
assert not numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]) assert not numpy.any([isinstance(x.op, T.Elemwise)
for x in f.maker.fgraph.toposort()])
class T_fibby(unittest.TestCase): class T_fibby(unittest.TestCase):
...@@ -904,13 +905,14 @@ class T_fibby(unittest.TestCase): ...@@ -904,13 +905,14 @@ class T_fibby(unittest.TestCase):
return theano.Apply(self, return theano.Apply(self,
inputs=[x_], inputs=[x_],
outputs=[x_.type()]) outputs=[x_.type()])
# using x_.type() is dangerous, it copies x's broadcasting behaviour # using x_.type() is dangerous, it copies x's broadcasting
# behaviour
def perform(self, node, inputs, output_storage): def perform(self, node, inputs, output_storage):
x, = inputs x, = inputs
y = output_storage[0][0] = x.copy() y = output_storage[0][0] = x.copy()
for i in range(2, len(x)): for i in range(2, len(x)):
y[i] = y[i-1] * y[i-2] + x[i] y[i] = y[i - 1] * y[i - 2] + x[i]
def c_code(self, node, name, inames, onames, sub): def c_code(self, node, name, inames, onames, sub):
x, = inames x, = inames
...@@ -950,30 +952,30 @@ class T_fibby(unittest.TestCase): ...@@ -950,30 +952,30 @@ class T_fibby(unittest.TestCase):
except NotScalarConstantError: except NotScalarConstantError:
pass pass
# Test it don't apply when not needed # Test it does not apply when not needed
x = T.dvector() x = T.dvector()
f = function([x], fibby(x)) f = function([x], fibby(x))
#theano.printing.debugprint(f) #theano.printing.debugprint(f)
#We call the function to make sure it run. # We call the function to make sure it runs.
#If you run in DebugMode, it will compare the C and Python output # If you run in DebugMode, it will compare the C and Python outputs.
f(numpy.random.rand(5)) f(numpy.random.rand(5))
topo = f.maker.fgraph.toposort() topo = f.maker.fgraph.toposort()
assert len(topo) == 1 assert len(topo) == 1
assert isinstance(topo[0].op, Fibby) assert isinstance(topo[0].op, Fibby)
# Test that the optimization get applied # Test that the optimization gets applied.
f_zero = function([], fibby(T.zeros([5]))) f_zero = function([], fibby(T.zeros([5])))
#theano.printing.debugprint(f_zero) #theano.printing.debugprint(f_zero)
#If you run in DebugMode, it will compare the output before # If you run in DebugMode, it will compare the output before
# and after the optimization # and after the optimization.
f_zero() f_zero()
#Check that the optimization remove the Fibby Op. # Check that the optimization removes the Fibby Op.
#For security, the Theano memory interface make that the output # For security, the Theano memory interface ensures that the output
#of the function is always memory not aliaced to the input. # of the function is always memory not aliased to the input.
#That is why there is a DeepCopyOp op. # That is why there is a DeepCopyOp op.
topo = f_zero.maker.fgraph.toposort() topo = f_zero.maker.fgraph.toposort()
assert len(topo) == 1 assert len(topo) == 1
assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp) assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp)
...@@ -1002,35 +1004,35 @@ class T_graphstructures(unittest.TestCase): ...@@ -1002,35 +1004,35 @@ class T_graphstructures(unittest.TestCase):
from theano.tensor import add, mul, Apply, Variable, TensorType from theano.tensor import add, mul, Apply, Variable, TensorType
# Instantiate a type that represents a matrix of doubles # Instantiate a type that represents a matrix of doubles
float64_matrix = TensorType(dtype = 'float64', # double float64_matrix = TensorType(dtype='float64', # double
broadcastable = (False, False)) # matrix broadcastable=(False, False)) # matrix
# We make the Variable instances we need. # We make the Variable instances we need.
x = Variable(type = float64_matrix, name = 'x') x = Variable(type=float64_matrix, name='x')
y = Variable(type = float64_matrix, name = 'y') y = Variable(type=float64_matrix, name='y')
z = Variable(type = float64_matrix, name = 'z') z = Variable(type=float64_matrix, name='z')
# This is the Variable that we want to symbolically represents y*z # This is the Variable that we want to symbolically represents y*z
mul_variable = Variable(type = float64_matrix) mul_variable = Variable(type=float64_matrix)
assert mul_variable.owner is None assert mul_variable.owner is None
# Instantiate a symbolic multiplication # Instantiate a symbolic multiplication
node_mul = Apply(op = mul, node_mul = Apply(op=mul,
inputs = [y, z], inputs=[y, z],
outputs = [mul_variable]) outputs=[mul_variable])
# Fields 'owner' and 'index' are set by Apply # Fields 'owner' and 'index' are set by Apply
assert mul_variable.owner is node_mul assert mul_variable.owner is node_mul
# 'index' is the position of mul_variable in mode_mul's outputs # 'index' is the position of mul_variable in mode_mul's outputs
assert mul_variable.index == 0 assert mul_variable.index == 0
# This is the Variable that we want to symbolically represents x+(y*z) # This is the Variable that we want to symbolically represents x+(y*z)
add_variable = Variable(type = float64_matrix) add_variable = Variable(type=float64_matrix)
assert add_variable.owner is None assert add_variable.owner is None
# Instantiate a symbolic addition # Instantiate a symbolic addition
node_add = Apply(op = add, node_add = Apply(op=add,
inputs = [x, mul_variable], inputs=[x, mul_variable],
outputs = [add_variable]) outputs=[add_variable])
# Fields 'owner' and 'index' are set by Apply # Fields 'owner' and 'index' are set by Apply
assert add_variable.owner is node_add assert add_variable.owner is node_add
assert add_variable.index == 0 assert add_variable.index == 0
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论