Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
cb315e22
提交
cb315e22
authored
3月 26, 2009
作者:
desjagui@atchoum.iro.umontreal.ca
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Documentation for profilemode
上级
a7b1f4ff
显示空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
117 行增加
和
0 行删除
+117
-0
profilemode.txt
doc/advanced/profilemode.txt
+117
-0
没有找到文件。
doc/advanced/profilemode.txt
0 → 100644
浏览文件 @
cb315e22
=========================================
ProfileMode
=========================================
To profile a Theano graph, a special mode called ProfileMode, must be passed as
an argument when compiling your graph. Using ProfileMode is a three-step
process.
Creating a ProfileMode Instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First create a ProfileMode instance.
>>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
The ProfileMode constructor takes as input an optimizer and a linker. Which optimizer
and linker to use will depend on the application. For example, a user wanting
to profile the Python implementation only, should use the gof.PerformLinker (or
"py" for short). On the other hand, a user wanting to profile his graph using
c-implementations wherever possible should use the ``gof.OpWiseCLinker`` (or "c|py").
In the same manner, modifying which optimizer is passed to ProfileMode
will decide which optimizations are applied to the graph, prior to
profiling. Changing the optimizer should be especially useful when developing
new graph optimizations, in order to evaluate their impact on performance.
Note that most users will want to use ProfileMode to optimize their graph and
find where most of the computation time is being spent. In this context,
'fast_run' optimizer and ``gof.OpWiseCLinker`` are the most appropriate choices.
Compiling your Graph with ProfileMode
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once the ProfileMode instance is created, simply compile your graph as you
would normally, by specifying the mode parameter.
>>> # with functions
>>> f = theano.function([input1,input2],[output1], mode=profmode)
>>> # with modules
>>> m = theano.Module()
>>> minst = m.make(mode=profmode)
Retrieving Timing Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once your graph is compiled, simply run the program or operation you wish to
profile, then call ``profmode.print_summary()``. This will provide you with
the desired timing information, indicating where your graph is spending most
of its time.
This is best shown through an example. Lets use the example of logistic
regression, covered previously in the `Module`_ section.
.. _Module : module.html?highlight=nnet#advanced-example
Compiling the module with ProfileMode and calling ``profmode.print_summary()``
generates the following output:
.. code-block:: python
local_time 0.0508708953857 (Time spent running thunks)
Apply-wise summary: <fraction of local_time spent at this position> (<Apply position>, <Apply Op name>)
0.397 6 Subtensor{0, ::}
0.110 18 <theano.tensor.blas.Gemm object at 0x15eb3d0>
0.047 1 _dot22
0.033 0 InplaceDimShuffle{x,0}
0.032 2 InplaceDimShuffle{1,0}
0.030 7 second
0.029 8 <theano.tensor.nnet.SoftmaxWithBias object at 0x1619150>
0.028 16 Sum
0.027 3 InplaceDimShuffle{x}
0.024 9 sub
0.024 17 Sum{0}
0.024 15 <theano.tensor.nnet.SoftmaxWithBiasDx object at 0x177fcd0>
0.023 10 sqr
0.023 12 Sum{1}
0.023 4 neg
... (remaining 6 Apply instances account for 0.13 of the runtime)
Op-wise summary: <fraction of local_time spent on this kind of Op> <Op name>
0.397 Subtensor{0, ::}
0.110 * <theano.tensor.blas.Gemm object at 0x15eb3d0>
0.047 * _dot22
0.043 * Elemwise{Mul{output_types_preference=<theano.scalar.basic.transfer_type object at 0x176dbd0>}}[(0, 1)]
0.033 * InplaceDimShuffle{x,0}
0.032 * InplaceDimShuffle{1,0}
0.030 * second
0.029 * <theano.tensor.nnet.SoftmaxWithBias object at 0x1619150>
0.028 * Sum
0.027 * InplaceDimShuffle{x}
0.024 * sub
0.024 * Sum{0}
0.024 * <theano.tensor.nnet.SoftmaxWithBiasDx object at 0x177fcd0>
0.023 * sqr
0.023 * Sum{1}
0.023 * neg
0.022 * Elemwise{Sub{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1900850>}}[(0, 0)]
0.021 * Elemwise{Add{output_types_preference=<theano.scalar.basic.transfer_type object at 0x18ab350>}}[(0, 0)]
0.021 * Elemwise{Second{output_types_preference=<theano.scalar.basic.transfer_type object at 0x177f090>}}[(0, 1)]
0.020 * Elemwise{Neg{output_types_preference=<theano.scalar.basic.transfer_type object at 0x17b4690>}}[(0, 0)]
... (remaining 0 Ops account for 0.00 of the runtime)
(*) Op is running a c implementation
The summary has two components to it. In the first section called the Apply-wise
summary, timing information is provided for the worst offending Apply nodes. This
corresponds to individual nodes within your graph which take the longest to
execute. In the second portion, the Op-wise summary, the execution time of
all Apply nodes executing the same Op are grouped together and the total
execution time per Op is shown.
Note that the ProfileMode also shows which Ops were running a c implementation.
Developers wishing to optimize the performance of their graph, should focus on the
worst offending Ops. If no c-implementation exists for this op, consider writing
a c-implementation yourself or use the mailing list, to suggest that a c-implementation
be provided.
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论