Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
0a7a4c06
提交
0a7a4c06
authored
6月 10, 2016
作者:
Chinnadhurai Sankar
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'master' of
git://github.com/Theano/Theano
上级
1a53098a
59a5dfbb
全部展开
隐藏空白字符变更
内嵌
并排
正在显示
65 个修改的文件
包含
454 行增加
和
348 行删除
+454
-348
.gitignore
.gitignore
+2
-0
README.txt
README.txt
+6
-7
extending_theano.txt
doc/extending/extending_theano.txt
+2
-2
install.txt
doc/install.txt
+10
-8
install_ubuntu.txt
doc/install_ubuntu.txt
+11
-11
install_windows.txt
doc/install_windows.txt
+7
-5
config.txt
doc/library/config.txt
+0
-0
basic.txt
doc/library/tensor/basic.txt
+1
-1
optimizations.txt
doc/optimizations.txt
+1
-1
aliasing.txt
doc/tutorial/aliasing.txt
+0
-47
modes.txt
doc/tutorial/modes.txt
+3
-3
using_gpu.txt
doc/tutorial/using_gpu.txt
+0
-0
using_gpu_solution_1.py
doc/tutorial/using_gpu_solution_1.py
+0
-0
using_multi_gpu.txt
doc/tutorial/using_multi_gpu.txt
+3
-3
nanguardmode.py
theano/compile/nanguardmode.py
+17
-46
profiling.py
theano/compile/profiling.py
+9
-4
configdefaults.py
theano/configdefaults.py
+7
-7
link.py
theano/gof/link.py
+3
-0
opt.py
theano/gof/opt.py
+0
-0
optdb.py
theano/gof/optdb.py
+12
-1
vm.py
theano/gof/vm.py
+26
-9
__init__.py
theano/gpuarray/__init__.py
+10
-1
basic_ops.py
theano/gpuarray/basic_ops.py
+13
-6
blockgemv.c
theano/gpuarray/blockgemv.c
+22
-29
blockger.c
theano/gpuarray/blockger.c
+19
-26
dnn.py
theano/gpuarray/dnn.py
+8
-8
dnn_fwd.c
theano/gpuarray/dnn_fwd.c
+3
-3
dnn_gi.c
theano/gpuarray/dnn_gi.c
+3
-3
dnn_gw.c
theano/gpuarray/dnn_gw.c
+3
-3
elemwise.py
theano/gpuarray/elemwise.py
+3
-3
extra_ops.py
theano/gpuarray/extra_ops.py
+6
-9
gemm16.c
theano/gpuarray/gemm16.c
+1
-1
neighbours.py
theano/gpuarray/neighbours.py
+1
-1
nerv.py
theano/gpuarray/nerv.py
+1
-1
nnet.py
theano/gpuarray/nnet.py
+4
-4
opt.py
theano/gpuarray/opt.py
+45
-28
subtensor.py
theano/gpuarray/subtensor.py
+0
-0
test_basic_ops.py
theano/gpuarray/tests/test_basic_ops.py
+16
-1
test_elemwise.py
theano/gpuarray/tests/test_elemwise.py
+2
-2
test_extra_ops.py
theano/gpuarray/tests/test_extra_ops.py
+1
-1
test_opt.py
theano/gpuarray/tests/test_opt.py
+1
-1
test_subtensor.py
theano/gpuarray/tests/test_subtensor.py
+29
-0
type.py
theano/gpuarray/type.py
+2
-8
check_blas.py
theano/misc/check_blas.py
+6
-0
dnn.py
theano/sandbox/cuda/dnn.py
+2
-1
opt.py
theano/sandbox/cuda/opt.py
+9
-9
__init__.py
theano/sandbox/gpuarray/__init__.py
+3
-2
basic.py
theano/scalar/basic.py
+1
-1
scan_opt.py
theano/scan_module/scan_opt.py
+3
-3
basic.py
theano/tensor/basic.py
+6
-9
blas.py
theano/tensor/blas.py
+4
-1
nlinalg.py
theano/tensor/nlinalg.py
+59
-0
sigm.py
theano/tensor/nnet/sigm.py
+14
-4
test_sigm.py
theano/tensor/nnet/tests/test_sigm.py
+11
-10
opt.py
theano/tensor/opt.py
+0
-0
pool.py
theano/tensor/signal/pool.py
+23
-14
test_pool.py
theano/tensor/signal/tests/test_pool.py
+0
-0
slinalg.py
theano/tensor/slinalg.py
+0
-0
subtensor.py
theano/tensor/subtensor.py
+0
-0
test_basic.py
theano/tensor/tests/test_basic.py
+0
-0
test_blas_c.py
theano/tensor/tests/test_blas_c.py
+0
-0
test_nlinalg.py
theano/tensor/tests/test_nlinalg.py
+0
-0
test_opt.py
theano/tensor/tests/test_opt.py
+0
-0
test_slinalg.py
theano/tensor/tests/test_slinalg.py
+0
-0
test_flake8.py
theano/tests/test_flake8.py
+0
-0
没有找到文件。
.gitignore
浏览文件 @
0a7a4c06
...
@@ -37,3 +37,4 @@ Theano.suo
...
@@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints
.ipynb_checkpoints
.pydevproject
.pydevproject
.ropeproject
.ropeproject
core
\ No newline at end of file
README.txt
浏览文件 @
0a7a4c06
...
@@ -10,15 +10,14 @@ Related Projects:
...
@@ -10,15 +10,14 @@ Related Projects:
https://github.com/Theano/Theano/wiki/Related-projects
https://github.com/Theano/Theano/wiki/Related-projects
We recommend you look at the documentation on the website, since it
It is recommended that you look at the documentation on the website, as it will be more current than the documentation included with the package.
will be more current than the documentation included with the package.
If you really wish to build the documentation yourself, you will need
sphinx. Issue the following command:
In order to build the documentation yourself, you will need sphinx. Issue the following command:
python ./doc/scripts/docgen.py
python ./doc/scripts/docgen.py
Documentation is built into html/
Documentation is built into html/
The PDF of the documentation is html/theano.pdf
The PDF of the documentation can be found at html/theano.pdf
DIRECTORY LAYOUT
DIRECTORY LAYOUT
...
@@ -31,7 +30,7 @@ Theano (current directory) is the distribution directory.
...
@@ -31,7 +30,7 @@ Theano (current directory) is the distribution directory.
* tensor depends upon scalar
* tensor depends upon scalar
* sparse depends upon tensor
* sparse depends upon tensor
* sandbox can depend on everything else
* sandbox can depend on everything else
* Theano/examples are copies of the example on the wiki
* Theano/examples are copies of the example
found
on the wiki
* Theano/benchmark and Theano/examples are in the distribution, but not in
* Theano/benchmark and Theano/examples are in the distribution, but not in
the Python package
the Python package
* Theano/bin contains executable scripts that are copied to the bin folder
* Theano/bin contains executable scripts that are copied to the bin folder
...
@@ -39,4 +38,4 @@ Theano (current directory) is the distribution directory.
...
@@ -39,4 +38,4 @@ Theano (current directory) is the distribution directory.
* Tests are distributed and are part of the package, i.e. fall in
* Tests are distributed and are part of the package, i.e. fall in
the appropriate submodules
the appropriate submodules
* Theano/doc contains files and scripts used to generate the documentation
* Theano/doc contains files and scripts used to generate the documentation
* Theano/html is
the place
where the documentation will be generated
* Theano/html is where the documentation will be generated
doc/extending/extending_theano.txt
浏览文件 @
0a7a4c06
...
@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
...
@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops
Testing GPU Ops
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^
Ops to be executed on the GPU should inherit from the
When using the old GPU backend, Ops to be executed on the GPU should inherit
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
from
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the
Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU.
NVIDIA driver works correctly with our sum reduction code on the GPU.
...
...
doc/install.txt
浏览文件 @
0a7a4c06
...
@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
...
@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
If you want GPU-related tests to run on a specific GPU device, and not
If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`.
the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=
gpu
1``.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=
cuda
1``.
See :ref:`libdoc_config` for more information on how to change these
See :ref:`libdoc_config` for more information on how to change these
configuration options.
configuration options.
...
@@ -508,25 +508,25 @@ Any one of them is enough.
...
@@ -508,25 +508,25 @@ Any one of them is enough.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32.
computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=
gpu
,floatX=float32'``.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=
cuda
,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section:
You can also set these options in the .theanorc file's ``[global]`` section:
.. code-block:: cfg
.. code-block:: cfg
[global]
[global]
device =
gpu
device =
cuda
floatX = float32
floatX = float32
Note that:
Note that:
* If your computer has multiple GPUs and you use 'device=
gpu
', the driver
* If your computer has multiple GPUs and you use 'device=
cuda
', the driver
selects the one to use (usually
gpu
0).
selects the one to use (usually
cuda
0).
* You can use the program
nvida-smi
to change this policy.
* You can use the program
``nvidia-smi``
to change this policy.
* You can choose one specific GPU by specifying 'device=
gpu
X', with X the
* You can choose one specific GPU by specifying 'device=
cuda
X', with X the
the corresponding GPU index (0, 1, 2, ...)
the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations,
* By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU.
Theano will fall back to the CPU if there is a problem with the GPU.
...
@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
...
@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with
toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once your setup is complete, head to :ref:`using_gpu` to find how to verify
Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly.
everything is working properly.
...
...
doc/install_ubuntu.txt
浏览文件 @
0a7a4c06
...
@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
...
@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
sudo pip install Theano
On 14.04, this will install Python 2 by default. If you want to use Python 3:
On 14.04, this will install Python 2 by default. If you want to use Python 3:
.. code-block:: bash
.. code-block:: bash
...
@@ -104,30 +104,30 @@ For Ubuntu 11.04:
...
@@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and
The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it.
probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs
Bleeding Edge Installs
----------------------
----------------------
If you would like, instead, to install the bleeding edge Theano (from github)
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
such that you can edit and contribute to Theano, replace the `pip install Theano`
command with:
command with:
.. code-block:: bash
.. code-block:: bash
git clone git://github.com/Theano/Theano.git
git clone git://github.com/Theano/Theano.git
cd Theano
cd Theano
python setup.py develop --user
python setup.py develop --user
cd ..
cd ..
VirtualEnv
VirtualEnv
----------
----------
If you would like to install Theano in a VirtualEnv, you will want to pass the
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`.
the system-provided `Numpy` and `SciPy`.
.. code-block:: bash
.. code-block:: bash
virtualenv --system-site-packages -p python2.7 theano-env
virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate
source theano-env/bin/activate
pip install Theano
pip install Theano
...
@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
...
@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run:
Change to the Theano directory and run:
.. code-block:: bash
.. code-block:: bash
git pull
git pull
...
@@ -303,7 +303,7 @@ Test GPU configuration
...
@@ -303,7 +303,7 @@ Test GPU configuration
.. code-block:: bash
.. code-block:: bash
THEANO_FLAGS=floatX=float32,device=
gpu
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
THEANO_FLAGS=floatX=float32,device=
cuda
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
.. note::
.. note::
...
...
doc/install_windows.txt
浏览文件 @
0a7a4c06
...
@@ -423,16 +423,16 @@ Create a test file containing:
...
@@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start))
np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
.. testoutput::
.. testoutput::
:hide:
:hide:
:options: +ELLIPSIS
:options: +ELLIPSIS
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ...
Result difference: ...
.. code-block:: none
.. code-block:: none
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000
Result difference: 0.000000
...
@@ -445,6 +445,8 @@ routine for matrix multiplication)
...
@@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use
Configure Theano for GPU use
############################
############################
Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
Theano can be configured with a ``.theanorc`` text file (or
Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under
``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the
Windows). It should be placed in the directory pointed to by the
...
@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
...
@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg
.. code-block:: cfg
[global]
[global]
device =
gpu
device =
cuda
floatX = float32
floatX = float32
[nvcc]
[nvcc]
...
@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
...
@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS
Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
If you installed Python through WinPython or EPD, Theano will automatically
If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS.
link with the MKL library, so you should not need to compile your own BLAS.
.. note::
.. note::
...
...
doc/library/config.txt
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
doc/library/tensor/basic.txt
浏览文件 @
0a7a4c06
...
@@ -1414,7 +1414,7 @@ Mathematical
...
@@ -1414,7 +1414,7 @@ Mathematical
.. function:: abs_(a)
.. function:: abs_(a)
Returns a variable representingthe absolute of a, ie ``|a|``.
Returns a variable representing
the absolute of a, ie ``|a|``.
.. note:: Can also be accessed with ``abs(a)``.
.. note:: Can also be accessed with ``abs(a)``.
...
...
doc/optimizations.txt
浏览文件 @
0a7a4c06
...
@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
...
@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ =============
========================================================= ========= ============ =============
:term:`merge` x x
:term:`merge` x x
:term:`constant folding<constant folding>` x x
:term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x
:term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x
:term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x
...
@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
...
@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x
:term:`inplace_elemwise` x
:term:`inplace_random` x
:term:`inplace_random` x
:term:`elemwise fusion` x
:term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x
:term:`local_log_softmax` x x
:term:`local_remove_all_assert`
:term:`local_remove_all_assert`
========================================================= ========= ============ =============
========================================================= ========= ============ =============
...
...
doc/tutorial/aliasing.txt
浏览文件 @
0a7a4c06
...
@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
...
@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the
hints that give more flexibility to the compilation and optimization of the
graph.
graph.
For GPU graphs, this borrowing can have a major speed impact. See the following code:
.. code-block:: python
from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time
vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Which produces this output:
.. code-block:: none
$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu
*Take home message:*
*Take home message:*
When an input *x* to a function is not needed after the function
When an input *x* to a function is not needed after the function
...
@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
...
@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when
footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y,
it's returned, then consider marking it with an ``Out(y,
borrow=True)``.
borrow=True)``.
doc/tutorial/modes.txt
浏览文件 @
0a7a4c06
...
@@ -168,8 +168,8 @@ Linkers
...
@@ -168,8 +168,8 @@ Linkers
=======
=======
A mode is composed of 2 things: an optimizer and a linker. Some modes,
A mode is composed of 2 things: an optimizer and a linker. Some modes,
like ``NanGuardMode`` and ``DebugMode``, add logic around the
optimizer and
like ``NanGuardMode`` and ``DebugMode``, add logic around the
linker. ``NanGuardMode`` and ``DebugMode`` use their
own linker.
optimizer and linker. ``DebugMode`` uses its
own linker.
You can select which linker to use with the Theano flag :attr:`config.linker`.
You can select which linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers.
Here is a table to compare the different linkers.
...
@@ -183,7 +183,7 @@ c|py [#cpy1]_ yes yes "+++" Try C code. If none exis
...
@@ -183,7 +183,7 @@ c|py [#cpy1]_ yes yes "+++" Try C code. If none exis
c|py_nogc no yes "++" As c|py, but without gc
c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only C code (if none available for an op, raise an error)
c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only Python code
py yes yes "+++" Use only Python code
NanGuardMode
no no "++++"
Check if nodes generate NaN
NanGuardMode
yes yes "++++"
Check if nodes generate NaN
DebugMode no yes VERY HIGH Make many checks on what Theano computes
DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= ===
============= ========= ================= ========= ===
...
...
doc/tutorial/using_gpu.txt
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
doc/tutorial/using_gpu_solution_1.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
doc/tutorial/using_multi_gpu.txt
浏览文件 @
0a7a4c06
...
@@ -81,7 +81,7 @@ single name and a single device.
...
@@ -81,7 +81,7 @@ single name and a single device.
It is often the case that multi-gpu operation requires or assumes
It is often the case that multi-gpu operation requires or assumes
that all the GPUs involved are equivalent. This is not the case
that all the GPUs involved are equivalent. This is not the case
for this implementation. Since the user has the task of
for this implementation. Since the user has the task of
distr
u
buting the jobs across the different device a model can be
distr
i
buting the jobs across the different device a model can be
built on the assumption that one of the GPU is slower or has
built on the assumption that one of the GPU is slower or has
smaller memory.
smaller memory.
...
@@ -140,5 +140,5 @@ is a example.
...
@@ -140,5 +140,5 @@ is a example.
cv = gv.transfer('cpu')
cv = gv.transfer('cpu')
Of course you can mix transfers and operations in any order you
Of course you can mix transfers and operations in any order you
choose.
However you should try to minimize transfer operations
choose. However you should try to minimize transfer operations
because they will introduce overhead
any
may reduce performance.
because they will introduce overhead
that
may reduce performance.
theano/compile/nanguardmode.py
浏览文件 @
0a7a4c06
...
@@ -73,7 +73,7 @@ def contains_nan(arr, node=None):
...
@@ -73,7 +73,7 @@ def contains_nan(arr, node=None):
elif
arr
.
size
==
0
:
elif
arr
.
size
==
0
:
return
False
return
False
elif
cuda
.
cuda_available
and
isinstance
(
arr
,
cuda
.
CudaNdarray
):
elif
cuda
.
cuda_available
and
isinstance
(
arr
,
cuda
.
CudaNdarray
):
if
(
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
if
(
node
and
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
isinstance
(
isinstance
(
node
.
op
,
node
.
op
,
# It store ints in float container
# It store ints in float container
...
@@ -119,7 +119,7 @@ def contains_inf(arr, node=None):
...
@@ -119,7 +119,7 @@ def contains_inf(arr, node=None):
elif
arr
.
size
==
0
:
elif
arr
.
size
==
0
:
return
False
return
False
elif
cuda
.
cuda_available
and
isinstance
(
arr
,
cuda
.
CudaNdarray
):
elif
cuda
.
cuda_available
and
isinstance
(
arr
,
cuda
.
CudaNdarray
):
if
(
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
if
(
node
and
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
isinstance
(
isinstance
(
node
.
op
,
node
.
op
,
# It store ints in float container
# It store ints in float container
...
@@ -215,7 +215,7 @@ class NanGuardMode(Mode):
...
@@ -215,7 +215,7 @@ class NanGuardMode(Mode):
assert
nan_is_error
or
inf_is_error
or
big_is_error
assert
nan_is_error
or
inf_is_error
or
big_is_error
compile_gpu_func
(
nan_is_error
,
inf_is_error
,
big_is_error
)
compile_gpu_func
(
nan_is_error
,
inf_is_error
,
big_is_error
)
def
do_check_on
(
var
,
nd
,
f
,
is_input
):
def
do_check_on
(
var
,
nd
):
"""
"""
Checks `var` for NaNs / Infs. If detected, raises an exception
Checks `var` for NaNs / Infs. If detected, raises an exception
and / or prints information about `nd`, `f`, and `is_input` to
and / or prints information about `nd`, `f`, and `is_input` to
...
@@ -227,11 +227,6 @@ class NanGuardMode(Mode):
...
@@ -227,11 +227,6 @@ class NanGuardMode(Mode):
The value to be checked.
The value to be checked.
nd : theano.gof.Apply
nd : theano.gof.Apply
The Apply node being executed.
The Apply node being executed.
f : callable
The thunk for the apply node.
is_input : bool
If True, `var` is an input to `nd`.
If False, it is an output.
"""
"""
error
=
False
error
=
False
...
@@ -262,17 +257,13 @@ class NanGuardMode(Mode):
...
@@ -262,17 +257,13 @@ class NanGuardMode(Mode):
print
(
'Big value detected'
,
file
=
sio
)
print
(
'Big value detected'
,
file
=
sio
)
error
=
True
error
=
True
if
error
:
if
error
:
if
n
ot
is_input
:
if
n
d
:
print
(
"NanGuardMode found an error in the"
print
(
"NanGuardMode found an error in the
"
"
output of a node in this variable:"
,
file
=
sio
)
"output of a node in this variable:"
,
file
=
sio
)
print
(
theano
.
printing
.
debugprint
(
nd
,
file
=
'str'
),
file
=
sio
)
print
(
theano
.
printing
.
debugprint
(
nd
,
file
=
'str'
),
file
=
sio
)
else
:
else
:
print
(
"NanGuardMode found an error in an"
print
(
"NanGuardMode found an error in an input of the "
" input of this node."
,
file
=
sio
)
"graph."
,
file
=
sio
)
print
(
'Node:'
,
file
=
sio
)
print
(
nd
,
file
=
sio
)
print
(
"The input variable that cause problem:"
,
file
=
sio
)
print
(
theano
.
printing
.
debugprint
(
nd
,
file
=
'str'
),
file
=
sio
)
msg
=
sio
.
getvalue
()
msg
=
sio
.
getvalue
()
if
config
.
NanGuardMode
.
action
==
'raise'
:
if
config
.
NanGuardMode
.
action
==
'raise'
:
raise
AssertionError
(
msg
)
raise
AssertionError
(
msg
)
...
@@ -283,36 +274,16 @@ class NanGuardMode(Mode):
...
@@ -283,36 +274,16 @@ class NanGuardMode(Mode):
elif
config
.
NanGuardMode
.
action
==
'warn'
:
elif
config
.
NanGuardMode
.
action
==
'warn'
:
logger
.
error
(
msg
)
logger
.
error
(
msg
)
def
nan_check
(
i
,
node
,
fn
):
def
nan_check
(
node
,
thunk
,
storage_map
,
compute_map
):
"""
for
var
in
node
.
outputs
:
Runs `fn` while checking its inputs and outputs for NaNs / Infs.
Parameters
----------
i :
Currently ignored.
TODO: determine why it is here or remove).
node : theano.gof.Apply
The Apply node currently being executed.
fn : callable
The thunk to execute for this Apply node.
"""
inputs
=
fn
.
inputs
for
x
,
var
in
zip
(
inputs
,
node
.
inputs
):
# If the input is the result of computation, then we
# don't need to check it. It is already done after the
# computation.
if
(
var
.
owner
is
None
and
getattr
(
var
.
tag
,
'nan_guard_mode_check'
,
True
)):
do_check_on
(
x
[
0
],
node
,
fn
,
True
)
fn
()
outputs
=
fn
.
outputs
for
x
,
var
in
zip
(
outputs
,
node
.
outputs
):
if
getattr
(
var
.
tag
,
'nan_guard_mode_check'
,
True
):
if
getattr
(
var
.
tag
,
'nan_guard_mode_check'
,
True
):
do_check_on
(
x
[
0
],
node
,
fn
,
False
)
do_check_on
(
storage_map
[
var
][
0
],
node
)
def
nan_check_input
(
var
,
value
):
if
getattr
(
var
.
tag
,
'nan_guard_mode_check'
,
True
):
do_check_on
(
value
,
None
)
wrap_linker
=
theano
.
gof
.
WrapLinker
([
theano
.
gof
.
OpWiseCLinker
()]
,
wrap_linker
=
theano
.
gof
.
vm
.
VM_Linker
(
callback
=
nan_check
,
nan_check
)
callback_input
=
nan_check_input
)
super
(
NanGuardMode
,
self
)
.
__init__
(
wrap_linker
,
super
(
NanGuardMode
,
self
)
.
__init__
(
wrap_linker
,
optimizer
=
self
.
provided_optimizer
)
optimizer
=
self
.
provided_optimizer
)
theano/compile/profiling.py
浏览文件 @
0a7a4c06
...
@@ -84,10 +84,15 @@ def _atexit_print_fn():
...
@@ -84,10 +84,15 @@ def _atexit_print_fn():
cum_attr
[
key
]
=
val
cum_attr
[
key
]
=
val
if
cum
.
optimizer_profile
and
ps
.
optimizer_profile
:
if
cum
.
optimizer_profile
and
ps
.
optimizer_profile
:
merge
=
cum
.
optimizer_profile
[
0
]
.
merge_profile
(
try
:
cum
.
optimizer_profile
[
1
],
merge
=
cum
.
optimizer_profile
[
0
]
.
merge_profile
(
ps
.
optimizer_profile
[
1
])
cum
.
optimizer_profile
[
1
],
cum
.
optimizer_profile
=
(
cum
.
optimizer_profile
[
0
],
merge
)
ps
.
optimizer_profile
[
1
])
cum
.
optimizer_profile
=
(
cum
.
optimizer_profile
[
0
],
merge
)
except
Exception
as
e
:
print
(
"Got an exception while merging profile"
)
print
(
e
)
cum
.
optimizer_profile
=
None
else
:
else
:
cum
.
optimizer_profile
=
None
cum
.
optimizer_profile
=
None
...
...
theano/configdefaults.py
浏览文件 @
0a7a4c06
...
@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam):
...
@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam):
AddConfigVar
(
AddConfigVar
(
'device'
,
'device'
,
(
"Default device for computations. If gpu*, change the default to try "
(
"Default device for computations. If cuda* or opencl*, change the"
"to move computation to it and to put shared variable of float32 "
"default to try to move computation to the GPU. Do not use upper case"
"on it. Do not use upper case letters, only lower case even if "
"letters, only lower case even if NVIDIA uses capital letters."
),
"NVIDIA use capital letters."
),
DeviceParam
(
'cpu'
,
allow_override
=
False
),
DeviceParam
(
'cpu'
,
allow_override
=
False
),
in_c_key
=
False
)
in_c_key
=
False
)
...
@@ -273,7 +272,8 @@ def safe_no_dnn_workmem_bwd(workmem):
...
@@ -273,7 +272,8 @@ def safe_no_dnn_workmem_bwd(workmem):
return
True
return
True
AddConfigVar
(
'dnn.conv.workmem_bwd'
,
AddConfigVar
(
'dnn.conv.workmem_bwd'
,
"This flag is deprecated; use dnn.conv.algo_bwd."
,
"This flag is deprecated; use `dnn.conv.algo_bwd_filter` "
"and `dnn.conv.algo_bwd_data` instead."
,
ConfigParam
(
''
,
allow_override
=
False
,
ConfigParam
(
''
,
allow_override
=
False
,
filter
=
safe_no_dnn_workmem_bwd
),
filter
=
safe_no_dnn_workmem_bwd
),
in_c_key
=
False
)
in_c_key
=
False
)
...
@@ -651,8 +651,8 @@ AddConfigVar('warn.ignore_bug_before',
...
@@ -651,8 +651,8 @@ AddConfigVar('warn.ignore_bug_before',
"bugs found after that version. "
"bugs found after that version. "
"Warning for specific bugs can be configured with specific "
"Warning for specific bugs can be configured with specific "
"[warn] flags."
),
"[warn] flags."
),
EnumStr
(
'0.7'
,
'None'
,
'all'
,
'0.3'
,
'0.4'
,
'0.4.1'
,
'0.5'
,
'0.
7
'
,
EnumStr
(
'0.7'
,
'None'
,
'all'
,
'0.3'
,
'0.4'
,
'0.4.1'
,
'0.5'
,
'0.
6
'
,
'0.
8
'
,
'0.
7'
,
'0.8'
,
'0.8.1'
,
'0.8.2
'
,
allow_override
=
False
),
allow_override
=
False
),
in_c_key
=
False
)
in_c_key
=
False
)
...
...
theano/gof/link.py
浏览文件 @
0a7a4c06
...
@@ -165,6 +165,9 @@ def raise_with_op(node, thunk=None, exc_info=None, storage_map=None):
...
@@ -165,6 +165,9 @@ def raise_with_op(node, thunk=None, exc_info=None, storage_map=None):
detailed_err_msg
+=
(
"Inputs shapes:
%
s"
%
shapes
+
detailed_err_msg
+=
(
"Inputs shapes:
%
s"
%
shapes
+
"
\n
Inputs strides:
%
s"
%
strides
+
"
\n
Inputs strides:
%
s"
%
strides
+
"
\n
Inputs values:
%
s"
%
scalar_values
)
"
\n
Inputs values:
%
s"
%
scalar_values
)
if
theano
.
config
.
exception_verbosity
==
'high'
:
detailed_err_msg
+=
"
\n
Inputs type_num:
%
s"
%
str
(
[
getattr
(
getattr
(
i
[
0
],
'dtype'
,
''
),
'num'
,
''
)
for
i
in
thunk
.
inputs
])
if
hasattr
(
node
.
op
,
'__input_name__'
):
if
hasattr
(
node
.
op
,
'__input_name__'
):
detailed_err_msg
+=
"
\n
Inputs name:
%
s
\n
"
%
str
(
node
.
op
.
__input_name__
)
detailed_err_msg
+=
"
\n
Inputs name:
%
s
\n
"
%
str
(
node
.
op
.
__input_name__
)
...
...
theano/gof/opt.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/gof/optdb.py
浏览文件 @
0a7a4c06
...
@@ -244,16 +244,26 @@ class EquilibriumDB(DB):
...
@@ -244,16 +244,26 @@ class EquilibriumDB(DB):
optimization application. This could result in less fgraph iterations,
optimization application. This could result in less fgraph iterations,
but this doesn't mean it will be faster globally.
but this doesn't mean it will be faster globally.
tracks_on_change_inputs
If True, we will re-apply local opt on nodes whose inputs
changed during local optimization application. This could
result in less fgraph iterations, but this doesn't mean it
will be faster globally.
Notes
Notes
-----
-----
We can put LocalOptimizer and Optimizer as EquilibriumOptimizer
We can put LocalOptimizer and Optimizer as EquilibriumOptimizer
suppor both.
suppor both.
It is probably not a good idea to have ignore_newtrees=False and
tracks_on_change_inputs=True
"""
"""
def
__init__
(
self
,
ignore_newtrees
=
True
):
def
__init__
(
self
,
ignore_newtrees
=
True
,
tracks_on_change_inputs
=
False
):
super
(
EquilibriumDB
,
self
)
.
__init__
()
super
(
EquilibriumDB
,
self
)
.
__init__
()
self
.
ignore_newtrees
=
ignore_newtrees
self
.
ignore_newtrees
=
ignore_newtrees
self
.
tracks_on_change_inputs
=
tracks_on_change_inputs
self
.
__final__
=
{}
self
.
__final__
=
{}
self
.
__cleanup__
=
{}
self
.
__cleanup__
=
{}
...
@@ -281,6 +291,7 @@ class EquilibriumDB(DB):
...
@@ -281,6 +291,7 @@ class EquilibriumDB(DB):
opts
,
opts
,
max_use_ratio
=
config
.
optdb
.
max_use_ratio
,
max_use_ratio
=
config
.
optdb
.
max_use_ratio
,
ignore_newtrees
=
self
.
ignore_newtrees
,
ignore_newtrees
=
self
.
ignore_newtrees
,
tracks_on_change_inputs
=
self
.
tracks_on_change_inputs
,
failure_callback
=
opt
.
NavigatorOptimizer
.
warn_inplace
,
failure_callback
=
opt
.
NavigatorOptimizer
.
warn_inplace
,
final_optimizers
=
final_opts
,
final_optimizers
=
final_opts
,
cleanup_optimizers
=
cleanup_opts
)
cleanup_optimizers
=
cleanup_opts
)
...
...
theano/gof/vm.py
浏览文件 @
0a7a4c06
...
@@ -332,7 +332,7 @@ class Stack(VM):
...
@@ -332,7 +332,7 @@ class Stack(VM):
def
__init__
(
self
,
nodes
,
thunks
,
pre_call_clear
,
def
__init__
(
self
,
nodes
,
thunks
,
pre_call_clear
,
storage_map
,
compute_map
,
fgraph
,
allow_gc
,
storage_map
,
compute_map
,
fgraph
,
allow_gc
,
dependencies
=
None
,
callback
=
None
):
dependencies
=
None
,
callback
=
None
,
callback_input
=
None
):
super
(
Stack
,
self
)
.
__init__
(
nodes
,
thunks
,
pre_call_clear
)
super
(
Stack
,
self
)
.
__init__
(
nodes
,
thunks
,
pre_call_clear
)
self
.
allow_gc
=
allow_gc
self
.
allow_gc
=
allow_gc
...
@@ -345,6 +345,7 @@ class Stack(VM):
...
@@ -345,6 +345,7 @@ class Stack(VM):
self
.
compute_map
=
compute_map
self
.
compute_map
=
compute_map
self
.
node_idx
=
node_idx
=
{}
self
.
node_idx
=
node_idx
=
{}
self
.
callback
=
callback
self
.
callback
=
callback
self
.
callback_input
=
callback_input
ords
=
fgraph
.
orderings
()
ords
=
fgraph
.
orderings
()
...
@@ -411,6 +412,8 @@ class Stack(VM):
...
@@ -411,6 +412,8 @@ class Stack(VM):
for
k
in
self
.
storage_map
:
for
k
in
self
.
storage_map
:
compute_map
[
k
][
0
]
=
(
k
.
owner
is
None
)
compute_map
[
k
][
0
]
=
(
k
.
owner
is
None
)
if
self
.
callback_input
and
compute_map
[
k
][
0
]:
self
.
callback_input
(
k
,
self
.
storage_map
[
k
][
0
])
# apply_stack contains nodes
# apply_stack contains nodes
if
output_subset
is
not
None
:
if
output_subset
is
not
None
:
...
@@ -684,6 +687,11 @@ class VM_Linker(link.LocalLinker):
...
@@ -684,6 +687,11 @@ class VM_Linker(link.LocalLinker):
A callable object to call after each call to a thunk within
A callable object to call after each call to a thunk within
the virtual machine. It will be called with four arguments called
the virtual machine. It will be called with four arguments called
'node', 'thunk', 'storage_map', and 'compute_map'.
'node', 'thunk', 'storage_map', and 'compute_map'.
callback_input
A callable object to call on each input to the graph
(variables with no owner). This includes constants and shared
variables values. It will be called with two arguments:
'var', 'value'.
lazy
lazy
Useful only when use_cloop is False. When lazy is None, use the
Useful only when use_cloop is False. When lazy is None, use the
theano flag vm.lazy value. Then if we have a None (default) we auto
theano flag vm.lazy value. Then if we have a None (default) we auto
...
@@ -700,8 +708,8 @@ class VM_Linker(link.LocalLinker):
...
@@ -700,8 +708,8 @@ class VM_Linker(link.LocalLinker):
"""
"""
def
__init__
(
self
,
allow_gc
=
None
,
use_cloop
=
False
,
callback
=
None
,
def
__init__
(
self
,
allow_gc
=
None
,
use_cloop
=
False
,
callback
=
None
,
lazy
=
None
,
schedule
=
None
,
c_thunks
=
None
,
callback_input
=
None
,
lazy
=
None
,
schedule
=
None
,
allow_partial_eval
=
None
):
c_thunks
=
None
,
allow_partial_eval
=
None
):
# Note: if more parameters are added to __init__, make sure to forward
# Note: if more parameters are added to __init__, make sure to forward
# them in the "type(self)(...)" call in the "accept" method below.
# them in the "type(self)(...)" call in the "accept" method below.
if
allow_gc
is
None
:
if
allow_gc
is
None
:
...
@@ -710,6 +718,7 @@ class VM_Linker(link.LocalLinker):
...
@@ -710,6 +718,7 @@ class VM_Linker(link.LocalLinker):
self
.
allow_gc
=
allow_gc
self
.
allow_gc
=
allow_gc
self
.
use_cloop
=
use_cloop
self
.
use_cloop
=
use_cloop
self
.
callback
=
callback
self
.
callback
=
callback
self
.
callback_input
=
callback_input
self
.
lazy
=
lazy
self
.
lazy
=
lazy
self
.
c_thunks
=
c_thunks
self
.
c_thunks
=
c_thunks
self
.
allow_partial_eval
=
allow_partial_eval
self
.
allow_partial_eval
=
allow_partial_eval
...
@@ -760,9 +769,11 @@ class VM_Linker(link.LocalLinker):
...
@@ -760,9 +769,11 @@ class VM_Linker(link.LocalLinker):
allow_gc
=
self
.
allow_gc
,
allow_gc
=
self
.
allow_gc
,
use_cloop
=
self
.
use_cloop
,
use_cloop
=
self
.
use_cloop
,
callback
=
self
.
callback
,
callback
=
self
.
callback
,
callback_input
=
self
.
callback_input
,
lazy
=
self
.
lazy
,
lazy
=
self
.
lazy
,
schedule
=
self
.
schedule
,
schedule
=
self
.
schedule
,
c_thunks
=
self
.
c_thunks
,
c_thunks
=
self
.
c_thunks
,
allow_partial_eval
=
self
.
allow_partial_eval
)
.
accept
(
fgraph
,
no_recycling
)
)
.
accept
(
fgraph
,
no_recycling
)
self
.
fgraph
=
fgraph
self
.
fgraph
=
fgraph
self
.
no_recycling
=
no_recycling
self
.
no_recycling
=
no_recycling
...
@@ -829,16 +840,17 @@ class VM_Linker(link.LocalLinker):
...
@@ -829,16 +840,17 @@ class VM_Linker(link.LocalLinker):
pre_call_clear
=
[
storage_map
[
v
]
for
v
in
self
.
no_recycling
]
pre_call_clear
=
[
storage_map
[
v
]
for
v
in
self
.
no_recycling
]
if
(
self
.
callback
is
not
None
or
if
(
self
.
callback
is
not
None
or
self
.
callback_input
is
not
None
or
(
config
.
profile
and
config
.
profile_memory
)
or
(
config
.
profile
and
config
.
profile_memory
)
or
getattr
(
self
,
'allow_partial_eval'
,
False
)
):
self
.
allow_partial_eval
):
if
self
.
use_cloop
and
self
.
callback
is
not
None
:
if
self
.
use_cloop
and
(
self
.
callback
is
not
None
or
self
.
callback_input
is
not
None
):
logger
.
warn
(
'CVM does not support callback, using Stack VM.'
)
logger
.
warn
(
'CVM does not support callback, using Stack VM.'
)
if
self
.
use_cloop
and
config
.
profile_memory
:
if
self
.
use_cloop
and
config
.
profile_memory
:
warnings
.
warn
(
warnings
.
warn
(
'CVM does not support memory profile, using Stack VM.'
)
'CVM does not support memory profile, using Stack VM.'
)
if
self
.
use_cloop
and
getattr
(
self
,
'allow_partial_eval'
,
False
)
:
if
self
.
use_cloop
and
self
.
allow_partial_eval
:
warnings
.
warn
(
warnings
.
warn
(
'CVM does not support partial evaluation yet, '
'CVM does not support partial evaluation yet, '
'using Stack VM.'
)
'using Stack VM.'
)
...
@@ -849,7 +861,8 @@ class VM_Linker(link.LocalLinker):
...
@@ -849,7 +861,8 @@ class VM_Linker(link.LocalLinker):
storage_map
,
compute_map
,
storage_map
,
compute_map
,
self
.
fgraph
,
self
.
allow_gc
,
self
.
fgraph
,
self
.
allow_gc
,
dependencies
=
deps
,
dependencies
=
deps
,
callback
=
self
.
callback
)
callback
=
self
.
callback
,
callback_input
=
self
.
callback_input
)
elif
self
.
use_cloop
:
elif
self
.
use_cloop
:
# create a map from nodes to ints and vars to ints
# create a map from nodes to ints and vars to ints
nodes_idx
=
{}
nodes_idx
=
{}
...
@@ -1046,7 +1059,7 @@ class VM_Linker(link.LocalLinker):
...
@@ -1046,7 +1059,7 @@ class VM_Linker(link.LocalLinker):
if
lazy
is
None
:
if
lazy
is
None
:
lazy
=
not
all
([(
not
th
.
lazy
)
for
th
in
thunks
])
lazy
=
not
all
([(
not
th
.
lazy
)
for
th
in
thunks
])
if
not
(
lazy
or
(
config
.
profile
and
config
.
profile_memory
)
or
if
not
(
lazy
or
(
config
.
profile
and
config
.
profile_memory
)
or
self
.
use_cloop
or
self
.
callback
):
self
.
use_cloop
or
self
.
callback
or
self
.
callback_input
):
for
pair
in
itervalues
(
reallocated_info
):
for
pair
in
itervalues
(
reallocated_info
):
storage_map
[
pair
[
1
]]
=
storage_map
[
pair
[
0
]]
storage_map
[
pair
[
1
]]
=
storage_map
[
pair
[
0
]]
...
@@ -1088,3 +1101,7 @@ class VM_Linker(link.LocalLinker):
...
@@ -1088,3 +1101,7 @@ class VM_Linker(link.LocalLinker):
self
.
__dict__
.
update
(
d
)
self
.
__dict__
.
update
(
d
)
if
not
hasattr
(
self
,
'c_thunks'
):
if
not
hasattr
(
self
,
'c_thunks'
):
self
.
c_thunks
=
True
self
.
c_thunks
=
True
if
not
hasattr
(
self
,
'allow_partial_eval'
):
self
.
allow_partial_eval
=
None
if
not
hasattr
(
self
,
'callback_input'
):
self
.
callback_input
=
None
theano/gpuarray/__init__.py
浏览文件 @
0a7a4c06
...
@@ -42,7 +42,7 @@ register_transfer(transfer)
...
@@ -42,7 +42,7 @@ register_transfer(transfer)
def
init_dev
(
dev
,
name
=
None
):
def
init_dev
(
dev
,
name
=
None
):
v
=
pygpu
.
gpuarray
.
api_version
()
v
=
pygpu
.
gpuarray
.
api_version
()
expected
=
-
999
8
expected
=
-
999
7
if
v
[
0
]
!=
expected
:
if
v
[
0
]
!=
expected
:
raise
RuntimeError
(
"Wrong major API version for gpuarray:"
,
v
[
0
],
raise
RuntimeError
(
"Wrong major API version for gpuarray:"
,
v
[
0
],
"Make sure Theano and libgpuarray/pygpu "
"Make sure Theano and libgpuarray/pygpu "
...
@@ -50,6 +50,15 @@ def init_dev(dev, name=None):
...
@@ -50,6 +50,15 @@ def init_dev(dev, name=None):
if
v
[
1
]
<
0
:
if
v
[
1
]
<
0
:
raise
RuntimeError
(
"Wrong minor API version for gpuarray:"
,
v
[
1
],
raise
RuntimeError
(
"Wrong minor API version for gpuarray:"
,
v
[
1
],
"Please update libgpuarray/pygpu."
)
"Please update libgpuarray/pygpu."
)
if
len
(
v
)
<
3
:
vpy
=
-
1
else
:
vpy
=
v
[
2
]
vpye
=
0
if
vpy
<
vpye
:
print
(
"Wrong python API version for gpuarray:"
,
vpy
,
"expected:"
,
vpye
,
"Some python ops may not work correctly and/or crash. "
"Consider updating pygpu."
,
file
=
sys
.
stderr
)
global
pygpu_activated
global
pygpu_activated
if
dev
not
in
init_dev
.
devmap
:
if
dev
not
in
init_dev
.
devmap
:
ctx
=
pygpu
.
init
(
dev
,
ctx
=
pygpu
.
init
(
dev
,
...
...
theano/gpuarray/basic_ops.py
浏览文件 @
0a7a4c06
...
@@ -259,14 +259,14 @@ class GpuKernelBase(object):
...
@@ -259,14 +259,14 @@ class GpuKernelBase(object):
int types[
%(numargs)
u] = {
%(types)
s};
int types[
%(numargs)
u] = {
%(types)
s};
const char *bcode =
%(bvar)
s;
const char *bcode =
%(bvar)
s;
size_t sz = sizeof(
%(bvar)
s);
size_t sz = sizeof(
%(bvar)
s);
if (GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->
ops,
%(ctx)
s->
ctx, 1, &bcode, &sz,
if (GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->ctx, 1, &bcode, &sz,
"
%(kname)
s",
%(numargs)
u, types, GA_USE_BINARY, NULL)
"
%(kname)
s",
%(numargs)
u, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) {
!= GA_NO_ERROR) {
if ((err = GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->
ops,
%(ctx)
s->
ctx, 1,
if ((err = GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->ctx, 1,
&
%(cname)
s, NULL, "
%(kname)
s",
%(numargs)
u,
&
%(cname)
s, NULL, "
%(kname)
s",
%(numargs)
u,
types,
%(flags)
s, NULL)) != GA_NO_ERROR) {
types,
%(flags)
s, NULL)) != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "GpuKernel_init error
%%
d:
%%
s",
PyErr_Format(PyExc_RuntimeError, "GpuKernel_init error
%%
d:
%%
s",
err,
Gpu_error(
%(ctx)
s->ops,
%(ctx)
s->ctx, err));
err,
gpucontext_error(
%(ctx)
s->ctx, err));
%(fail)
s
%(fail)
s
}
}
}
}
...
@@ -310,7 +310,7 @@ class GpuKernelBase(object):
...
@@ -310,7 +310,7 @@ class GpuKernelBase(object):
The node that we need the cache version for.
The node that we need the cache version for.
"""
"""
return
(
3
,
self
.
get_params
(
node
)
.
bin_id
)
return
(
4
,
self
.
get_params
(
node
)
.
bin_id
)
class
HostFromGpu
(
Op
):
class
HostFromGpu
(
Op
):
...
@@ -529,15 +529,22 @@ class GpuToGpu(Op):
...
@@ -529,15 +529,22 @@ class GpuToGpu(Op):
def
c_code
(
self
,
node
,
name
,
inputs
,
outputs
,
sub
):
def
c_code
(
self
,
node
,
name
,
inputs
,
outputs
,
sub
):
return
"""
return
"""
Py_XDECREF(
%(out)
s);
Py_XDECREF(
%(out)
s);
%(out)
s = pygpu_transfer(
%(inp)
s,
%(ctx)
s, 0);
%(out)
s = pygpu_empty(
%(inp)
s->ga.nd,
%(inp)
s->ga.dimensions,
%(inp)
s->ga.typecode,
GpuArray_IS_C_CONTIGUOUS(&(
%(inp)
s->ga)) ? GA_C_ORDER:GA_F_ORDER,
%(ctx)
s, Py_None);
if (
%(out)
s == NULL) {
if (
%(out)
s == NULL) {
%(fail)
s
%(fail)
s
}
}
if (pygpu_transfer(
%(out)
s,
%(inp)
s)) {
%(fail)
s
}
"""
%
{
'inp'
:
inputs
[
0
],
'ctx'
:
sub
[
'params'
],
"""
%
{
'inp'
:
inputs
[
0
],
'ctx'
:
sub
[
'params'
],
'out'
:
outputs
[
0
],
'fail'
:
sub
[
'fail'
]}
'out'
:
outputs
[
0
],
'fail'
:
sub
[
'fail'
]}
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
return
(
0
,)
return
(
1
,)
class
GpuAlloc
(
HideC
,
Alloc
):
class
GpuAlloc
(
HideC
,
Alloc
):
...
...
theano/gpuarray/blockgemv.c
浏览文件 @
0a7a4c06
...
@@ -24,16 +24,9 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
...
@@ -24,16 +24,9 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
size_t
*
offW
=
NULL
;
size_t
*
offW
=
NULL
;
size_t
*
offInp
=
NULL
;
size_t
*
offInp
=
NULL
;
size_t
*
offOut
=
NULL
;
size_t
*
offOut
=
NULL
;
gpuarray_blas_ops
*
blas_ops
;
int
err
;
int
err
;
err
=
ctx
->
ops
->
property
(
ctx
->
ctx
,
NULL
,
NULL
,
err
=
gpublas_setup
(
ctx
->
ctx
);
GA_CTX_PROP_BLAS_OPS
,
&
blas_ops
);
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't get blas ops"
);
return
-
1
;
}
err
=
blas_ops
->
setup
(
ctx
->
ctx
);
if
(
err
!=
GA_NO_ERROR
)
{
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't setup blas"
);
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't setup blas"
);
return
-
1
;
return
-
1
;
...
@@ -93,29 +86,29 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
...
@@ -93,29 +86,29 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
}
}
if
(
out
->
ga
.
typecode
==
GA_FLOAT
)
{
if
(
out
->
ga
.
typecode
==
GA_FLOAT
)
{
err
=
blas_ops
->
sgemvBatch
(
cb_fortran
,
transA
,
err
=
gpublas_
sgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_DOUBLE
)
{
}
else
if
(
out
->
ga
.
typecode
==
GA_DOUBLE
)
{
err
=
blas_ops
->
dgemvBatch
(
cb_fortran
,
transA
,
err
=
gpublas_
dgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_HALF
)
{
}
else
if
(
out
->
ga
.
typecode
==
GA_HALF
)
{
err
=
blas_ops
->
sgemvBatch
(
cb_fortran
,
transA
,
err
=
gpublas_
sgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
}
else
{
}
else
{
err
=
GA_INVALID_ERROR
;
err
=
GA_INVALID_ERROR
;
}
}
...
...
theano/gpuarray/blockger.c
浏览文件 @
0a7a4c06
...
@@ -12,16 +12,9 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
...
@@ -12,16 +12,9 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
size_t
*
offOut
=
NULL
;
size_t
*
offOut
=
NULL
;
size_t
*
offX
=
NULL
;
size_t
*
offX
=
NULL
;
size_t
*
offY
=
NULL
;
size_t
*
offY
=
NULL
;
gpuarray_blas_ops
*
blas_ops
;
int
err
;
int
err
;
err
=
ctx
->
ops
->
property
(
ctx
->
ctx
,
NULL
,
NULL
,
err
=
gpublas_setup
(
ctx
->
ctx
);
GA_CTX_PROP_BLAS_OPS
,
&
blas_ops
);
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't get blas ops"
);
return
-
1
;
}
err
=
blas_ops
->
setup
(
ctx
->
ctx
);
if
(
err
!=
GA_NO_ERROR
)
{
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't setup blas"
);
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't setup blas"
);
return
-
1
;
return
-
1
;
...
@@ -84,26 +77,26 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
...
@@ -84,26 +77,26 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
ssize_t
str_out
=
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
);
ssize_t
str_out
=
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
);
if
(
out
->
ga
.
typecode
==
GA_FLOAT
)
{
if
(
out
->
ga
.
typecode
==
GA_FLOAT
)
{
err
=
blas_ops
->
sgerBatch
(
cb_fortran
,
err
=
gpublas_
sgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_DOUBLE
)
{
}
else
if
(
out
->
ga
.
typecode
==
GA_DOUBLE
)
{
err
=
blas_ops
->
dgerBatch
(
cb_fortran
,
err
=
gpublas_
dgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
double
*
)
PyArray_GETPTR1
(
alpha
,
0
),
*
(
double
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_HALF
)
{
}
else
if
(
out
->
ga
.
typecode
==
GA_HALF
)
{
err
=
blas_ops
->
hgerBatch
(
cb_fortran
,
err
=
gpublas_
hgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
}
else
{
}
else
{
err
=
GA_INVALID_ERROR
;
err
=
GA_INVALID_ERROR
;
}
}
...
...
theano/gpuarray/dnn.py
浏览文件 @
0a7a4c06
...
@@ -125,7 +125,7 @@ def dnn_available(context_name):
...
@@ -125,7 +125,7 @@ def dnn_available(context_name):
ctx
=
get_context
(
context_name
)
ctx
=
get_context
(
context_name
)
if
not
ctx
.
kind
==
'cuda'
:
if
not
ctx
.
kind
==
b
'cuda'
:
dnn_available
.
msg
=
"Not on a CUDA device."
dnn_available
.
msg
=
"Not on a CUDA device."
return
False
return
False
...
@@ -1493,7 +1493,7 @@ def local_dnn_convi_output_merge(node, *inputs):
...
@@ -1493,7 +1493,7 @@ def local_dnn_convi_output_merge(node, *inputs):
return
[
GpuDnnConvGradI
(
algo
=
node
.
op
.
algo
)(
*
inputs
)]
return
[
GpuDnnConvGradI
(
algo
=
node
.
op
.
algo
)(
*
inputs
)]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
Pool
])
@op_lifter
([
Pool
])
def
local_pool_dnn_alternative
(
node
,
ctx_name
):
def
local_pool_dnn_alternative
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
@@ -1509,7 +1509,7 @@ def local_pool_dnn_alternative(node, ctx_name):
...
@@ -1509,7 +1509,7 @@ def local_pool_dnn_alternative(node, ctx_name):
return
dnn_pool
(
gpu_contiguous
(
img
),
ds
,
stride
=
stride
,
pad
=
pad
,
mode
=
mode
)
return
dnn_pool
(
gpu_contiguous
(
img
),
ds
,
stride
=
stride
,
pad
=
pad
,
mode
=
mode
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
MaxPoolGrad
])
@op_lifter
([
MaxPoolGrad
])
def
local_pool_dnn_grad_stride
(
node
,
ctx_name
):
def
local_pool_dnn_grad_stride
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
@@ -1533,7 +1533,7 @@ def local_pool_dnn_grad_stride(node, ctx_name):
...
@@ -1533,7 +1533,7 @@ def local_pool_dnn_grad_stride(node, ctx_name):
pad
)
pad
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
AveragePoolGrad
])
@op_lifter
([
AveragePoolGrad
])
def
local_avg_pool_dnn_grad_stride
(
node
,
ctx_name
):
def
local_avg_pool_dnn_grad_stride
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
@@ -1556,7 +1556,7 @@ def local_avg_pool_dnn_grad_stride(node, ctx_name):
...
@@ -1556,7 +1556,7 @@ def local_avg_pool_dnn_grad_stride(node, ctx_name):
return
GpuDnnPoolGrad
(
mode
=
mode
)(
gpu_contiguous
(
inp
),
cg
,
cg
,
ds
,
st
,
pad
)
return
GpuDnnPoolGrad
(
mode
=
mode
)(
gpu_contiguous
(
inp
),
cg
,
cg
,
ds
,
st
,
pad
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@local_optimizer
([
GpuSoftmax
])
@local_optimizer
([
GpuSoftmax
])
def
local_softmax_dnn
(
node
):
def
local_softmax_dnn
(
node
):
if
isinstance
(
node
.
op
,
GpuSoftmax
):
if
isinstance
(
node
.
op
,
GpuSoftmax
):
...
@@ -1569,7 +1569,7 @@ def local_softmax_dnn(node):
...
@@ -1569,7 +1569,7 @@ def local_softmax_dnn(node):
return
[
out
]
return
[
out
]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'stabilize'
)
@local_optimizer
([
GpuElemwise
])
@local_optimizer
([
GpuElemwise
])
def
local_log_softmax_dnn
(
node
):
def
local_log_softmax_dnn
(
node
):
# This looks for GpuDnnSoftmax so we know that we have cudnn.
# This looks for GpuDnnSoftmax so we know that we have cudnn.
...
@@ -1586,7 +1586,7 @@ def local_log_softmax_dnn(node):
...
@@ -1586,7 +1586,7 @@ def local_log_softmax_dnn(node):
return
[
new_softmax
(
softmax_node
.
inputs
[
0
])]
return
[
new_softmax
(
softmax_node
.
inputs
[
0
])]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
LogSoftmax
])
@op_lifter
([
LogSoftmax
])
def
local_logsoftmax_to_dnn
(
node
,
ctx_name
):
def
local_logsoftmax_to_dnn
(
node
,
ctx_name
):
# Transform the input in the format expected by GpuDnnSoftmax
# Transform the input in the format expected by GpuDnnSoftmax
...
@@ -1624,7 +1624,7 @@ class NoCuDNNRaise(Optimizer):
...
@@ -1624,7 +1624,7 @@ class NoCuDNNRaise(Optimizer):
gpu_seqopt
.
register
(
"NoCuDNNRaise"
,
NoCuDNNRaise
(),
0
,
'cudnn'
)
gpu_seqopt
.
register
(
"NoCuDNNRaise"
,
NoCuDNNRaise
(),
0
,
'cudnn'
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
SoftmaxGrad
])
@op_lifter
([
SoftmaxGrad
])
def
local_softmax_dnn_grad
(
node
,
ctx_name
):
def
local_softmax_dnn_grad
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
...
theano/gpuarray/dnn_fwd.c
浏览文件 @
0a7a4c06
...
@@ -105,7 +105,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
...
@@ -105,7 +105,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
algo
=
choice
.
algo
;
algo
=
choice
.
algo
;
#else
#else
size_t
free
;
size_t
free
;
int
err2
=
c
->
ops
->
property
(
c
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
int
err2
=
gpucontext_property
(
c
->
ctx
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
if
(
err2
!=
GA_NO_ERROR
)
{
if
(
err2
!=
GA_NO_ERROR
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
...
@@ -234,7 +234,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
...
@@ -234,7 +234,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
* to place a nice get_work_mem() function in.
* to place a nice get_work_mem() function in.
*/
*/
if
(
worksize
!=
0
)
{
if
(
worksize
!=
0
)
{
workspace
=
c
->
ops
->
buffer
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
workspace
=
gpudata
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
if
(
workspace
==
NULL
)
{
if
(
workspace
==
NULL
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
PyErr_SetString
(
PyExc_RuntimeError
,
"Could not allocate working memory"
);
"Could not allocate working memory"
);
...
@@ -258,7 +258,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
...
@@ -258,7 +258,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
APPLY_SPECIFIC
(
output
),
PyGpuArray_DEV_DATA
(
*
output
));
APPLY_SPECIFIC
(
output
),
PyGpuArray_DEV_DATA
(
*
output
));
if
(
worksize
!=
0
)
if
(
worksize
!=
0
)
c
->
ops
->
buffer
_release
(
workspace
);
gpudata
_release
(
workspace
);
cuda_record
(
input
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
input
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
kerns
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
kerns
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
...
...
theano/gpuarray/dnn_gi.c
浏览文件 @
0a7a4c06
...
@@ -106,7 +106,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
...
@@ -106,7 +106,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
algo
=
choice
.
algo
;
algo
=
choice
.
algo
;
#else
#else
size_t
free
;
size_t
free
;
int
err2
=
c
->
ops
->
property
(
c
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
int
err2
=
gpucontext_property
(
c
->
ctx
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
if
(
err2
!=
GA_NO_ERROR
)
{
if
(
err2
!=
GA_NO_ERROR
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
...
@@ -204,7 +204,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
...
@@ -204,7 +204,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
}
}
if
(
worksize
!=
0
)
{
if
(
worksize
!=
0
)
{
workspace
=
c
->
ops
->
buffer
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
workspace
=
gpudata
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
if
(
workspace
==
NULL
)
{
if
(
workspace
==
NULL
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
PyErr_SetString
(
PyExc_RuntimeError
,
"Could not allocate working memory"
);
"Could not allocate working memory"
);
...
@@ -227,7 +227,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
...
@@ -227,7 +227,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
APPLY_SPECIFIC
(
input
),
PyGpuArray_DEV_DATA
(
*
input
));
APPLY_SPECIFIC
(
input
),
PyGpuArray_DEV_DATA
(
*
input
));
if
(
worksize
!=
0
)
if
(
worksize
!=
0
)
c
->
ops
->
buffer
_release
(
workspace
);
gpudata
_release
(
workspace
);
cuda_record
(
kerns
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
kerns
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
output
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
output
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
...
...
theano/gpuarray/dnn_gw.c
浏览文件 @
0a7a4c06
...
@@ -107,7 +107,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
...
@@ -107,7 +107,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
algo
=
choice
.
algo
;
algo
=
choice
.
algo
;
#else
#else
size_t
free
;
size_t
free
;
int
err2
=
c
->
ops
->
property
(
c
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
int
err2
=
gpucontext_property
(
c
->
ctx
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
if
(
err2
!=
GA_NO_ERROR
)
{
if
(
err2
!=
GA_NO_ERROR
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
...
@@ -192,7 +192,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
...
@@ -192,7 +192,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
}
}
if
(
worksize
!=
0
)
{
if
(
worksize
!=
0
)
{
workspace
=
c
->
ops
->
buffer
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
workspace
=
gpudata
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
if
(
workspace
==
NULL
)
{
if
(
workspace
==
NULL
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Could not allocate working memory"
);
PyErr_SetString
(
PyExc_RuntimeError
,
"Could not allocate working memory"
);
cuda_exit
(
c
->
ctx
);
cuda_exit
(
c
->
ctx
);
...
@@ -214,7 +214,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
...
@@ -214,7 +214,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
APPLY_SPECIFIC
(
kerns
),
PyGpuArray_DEV_DATA
(
*
kerns
));
APPLY_SPECIFIC
(
kerns
),
PyGpuArray_DEV_DATA
(
*
kerns
));
if
(
worksize
!=
0
)
if
(
worksize
!=
0
)
c
->
ops
->
buffer
_release
(
workspace
);
gpudata
_release
(
workspace
);
cuda_record
(
input
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
input
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
output
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
output
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
...
...
theano/gpuarray/elemwise.py
浏览文件 @
0a7a4c06
...
@@ -199,7 +199,7 @@ class GpuElemwise(HideC, Elemwise):
...
@@ -199,7 +199,7 @@ class GpuElemwise(HideC, Elemwise):
typecode
=
o
.
type
.
typecode
)
typecode
=
o
.
type
.
typecode
)
res
+=
"""
res
+=
"""
ge = GpuElemwise_new(
%(ctx)
s->
ops,
%(ctx)
s->
ctx,
%(support)
s,
%(kop)
s,
%(nargs)
s, args,
%(nd)
s, 0);
ge = GpuElemwise_new(
%(ctx)
s->ctx,
%(support)
s,
%(kop)
s,
%(nargs)
s, args,
%(nd)
s, 0);
if (ge == NULL) {
if (ge == NULL) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize elemwise support");
PyErr_SetString(PyExc_RuntimeError, "Could not initialize elemwise support");
%(fail)
s
%(fail)
s
...
@@ -360,7 +360,7 @@ class GpuElemwise(HideC, Elemwise):
...
@@ -360,7 +360,7 @@ class GpuElemwise(HideC, Elemwise):
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
ver
=
self
.
scalar_op
.
c_code_cache_version
()
ver
=
self
.
scalar_op
.
c_code_cache_version
()
if
ver
:
if
ver
:
return
(
6
,
ver
)
return
(
7
,
ver
)
else
:
else
:
return
ver
return
ver
...
@@ -554,7 +554,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
...
@@ -554,7 +554,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
def
make_node
(
self
,
x
):
def
make_node
(
self
,
x
):
x
=
as_gpuarray_variable
(
x
,
infer_context_name
(
x
))
x
=
as_gpuarray_variable
(
x
,
infer_context_name
(
x
))
if
x
.
type
.
context
.
kind
!=
'cuda'
:
if
x
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
TypeError
(
"GpuCAReduceCuda doesn't work for non-cuda devices"
)
raise
TypeError
(
"GpuCAReduceCuda doesn't work for non-cuda devices"
)
ret
=
super
(
GpuCAReduceCuda
,
self
)
.
make_node
(
x
)
ret
=
super
(
GpuCAReduceCuda
,
self
)
.
make_node
(
x
)
self
=
copy
.
copy
(
self
)
self
=
copy
.
copy
(
self
)
...
...
theano/gpuarray/extra_ops.py
浏览文件 @
0a7a4c06
...
@@ -26,11 +26,8 @@ class GpuCumsum(GpuKernelBase, Op):
...
@@ -26,11 +26,8 @@ class GpuCumsum(GpuKernelBase, Op):
def
__init__
(
self
,
axis
):
def
__init__
(
self
,
axis
):
self
.
axis
=
axis
self
.
axis
=
axis
def
__str__
(
self
):
def
c_code_cache_version
(
self
):
return
"
%
s{
%
s}"
%
(
self
.
__class__
.
__name__
,
self
.
axis
)
return
(
3
,)
def
c_code_cache_version_apply
(
self
,
node
):
return
(
1
,)
def
c_headers
(
self
):
def
c_headers
(
self
):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
,
'<gpuarray_helper.h>'
]
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
,
'<gpuarray_helper.h>'
]
...
@@ -221,7 +218,7 @@ class GpuCumsum(GpuKernelBase, Op):
...
@@ -221,7 +218,7 @@ class GpuCumsum(GpuKernelBase, Op):
return
kernels
return
kernels
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
raise
NotImplementedError
(
"cuda only"
)
x
,
=
inp
x
,
=
inp
z
,
=
out
z
,
=
out
...
@@ -249,17 +246,17 @@ class GpuCumsum(GpuKernelBase, Op):
...
@@ -249,17 +246,17 @@ class GpuCumsum(GpuKernelBase, Op):
size_t max_grid_size1;
size_t max_grid_size1;
size_t max_grid_size2;
size_t max_grid_size2;
int err;
int err;
err =
%(ctx)
s->ops->property(
%(ctx)
s->ctx, NULL, NULL
, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0);
err =
gpucontext_property(
%(ctx)
s->ctx
, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0);
if (err != GA_NO_ERROR){
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_threads_dims0");
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_threads_dims0");
%(fail)
s;
%(fail)
s;
}
}
err =
%(ctx)
s->ops->property(
%(ctx)
s->ctx, NULL, NULL
, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1);
err =
gpucontext_property(
%(ctx)
s->ctx
, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1);
if (err != GA_NO_ERROR){
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size1");
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size1");
%(fail)
s;
%(fail)
s;
}
}
err =
%(ctx)
s->ops->property(
%(ctx)
s->ctx, NULL, NULL
, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2);
err =
gpucontext_property(
%(ctx)
s->ctx
, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2);
if (err != GA_NO_ERROR){
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size2");
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size2");
%(fail)
s;
%(fail)
s;
...
...
theano/gpuarray/gemm16.c
浏览文件 @
0a7a4c06
...
@@ -117,7 +117,7 @@ int gemm16(PyGpuArrayObject *C, float alpha,
...
@@ -117,7 +117,7 @@ int gemm16(PyGpuArrayObject *C, float alpha,
if
(
48
<
n128
&&
n128
<=
64
)
{
if
(
48
<
n128
&&
n128
<=
64
)
{
n64
=
n
/
64
;
n64
=
n
/
64
;
if
(
nprocs
==
0
)
if
(
nprocs
==
0
)
if
(
A
->
ga
.
ops
->
property
(
A
->
context
->
ctx
,
NULL
,
NULL
,
if
(
gpucontext_property
(
A
->
context
->
ctx
,
GA_CTX_PROP_NUMPROCS
,
&
nprocs
))
{
GA_CTX_PROP_NUMPROCS
,
&
nprocs
))
{
nprocs
=
0
;
nprocs
=
0
;
res
=
1
;
res
=
1
;
...
...
theano/gpuarray/neighbours.py
浏览文件 @
0a7a4c06
...
@@ -243,7 +243,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
...
@@ -243,7 +243,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
return
kernels
return
kernels
def
c_code
(
self
,
node
,
name
,
inp
,
out
,
sub
):
def
c_code
(
self
,
node
,
name
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
raise
NotImplementedError
(
"cuda only"
)
dtype_ten4
=
node
.
inputs
[
0
]
.
dtype
dtype_ten4
=
node
.
inputs
[
0
]
.
dtype
dtype_neib_shape
=
node
.
inputs
[
1
]
.
dtype
dtype_neib_shape
=
node
.
inputs
[
1
]
.
dtype
...
...
theano/gpuarray/nerv.py
浏览文件 @
0a7a4c06
...
@@ -105,7 +105,7 @@ class Gemm16(COp):
...
@@ -105,7 +105,7 @@ class Gemm16(COp):
return
"""
return
"""
bcode = bin_
%(name)
s;
bcode = bin_
%(name)
s;
sz = sizeof(bin_
%(name)
s);
sz = sizeof(bin_
%(name)
s);
if (GpuKernel_init(&k_
%(name)
s, c->
ops, c->
ctx, 1, &bcode, &sz,
if (GpuKernel_init(&k_
%(name)
s, c->ctx, 1, &bcode, &sz,
"hgemm_
%(name)
s", 13, types, GA_USE_BINARY, NULL)
"hgemm_
%(name)
s", 13, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) {
!= GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize kernel
%(name)
s");
PyErr_SetString(PyExc_RuntimeError, "Could not initialize kernel
%(name)
s");
...
...
theano/gpuarray/nnet.py
浏览文件 @
0a7a4c06
...
@@ -189,7 +189,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op):
...
@@ -189,7 +189,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op):
flags
=
flags
,
objvar
=
k_var
)]
flags
=
flags
,
objvar
=
k_var
)]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
'cuda only'
)
raise
NotImplementedError
(
'cuda only'
)
typecode_x
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
inputs
[
0
]
.
dtype
)
typecode_x
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
inputs
[
0
]
.
dtype
)
typecode_b
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
inputs
[
1
]
.
dtype
)
typecode_b
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
inputs
[
1
]
.
dtype
)
...
@@ -375,7 +375,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op):
...
@@ -375,7 +375,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
raise
NotImplementedError
(
"cuda only"
)
typecode_dx
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
outputs
[
0
]
.
dtype
)
typecode_dx
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
outputs
[
0
]
.
dtype
)
itemsize_dnll
=
numpy
.
dtype
(
node
.
inputs
[
0
]
.
dtype
)
.
itemsize
itemsize_dnll
=
numpy
.
dtype
(
node
.
inputs
[
0
]
.
dtype
)
.
itemsize
...
@@ -584,7 +584,7 @@ class GpuSoftmax(GpuKernelBase, Op):
...
@@ -584,7 +584,7 @@ class GpuSoftmax(GpuKernelBase, Op):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
raise
NotImplementedError
(
"cuda only"
)
dtype_x
=
node
.
inputs
[
0
]
.
dtype
dtype_x
=
node
.
inputs
[
0
]
.
dtype
work_x
=
work_dtype
(
dtype_x
)
work_x
=
work_dtype
(
dtype_x
)
...
@@ -783,7 +783,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op):
...
@@ -783,7 +783,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
'cuda only'
)
raise
NotImplementedError
(
'cuda only'
)
dtype_x
=
node
.
inputs
[
0
]
.
dtype
dtype_x
=
node
.
inputs
[
0
]
.
dtype
dtype_b
=
node
.
inputs
[
1
]
.
dtype
dtype_b
=
node
.
inputs
[
1
]
.
dtype
...
...
theano/gpuarray/opt.py
浏览文件 @
0a7a4c06
...
@@ -33,12 +33,16 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name,
...
@@ -33,12 +33,16 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name,
GpuSplit
,
GpuContiguous
,
gpu_contiguous
,
GpuSplit
,
GpuContiguous
,
gpu_contiguous
,
GpuAlloc
,
GpuAllocEmpty
,
GpuReshape
,
GpuAlloc
,
GpuAllocEmpty
,
GpuReshape
,
GpuEye
,
gpu_join
,
GpuJoin
)
GpuEye
,
gpu_join
,
GpuJoin
)
from
.blas
import
(
gpu_dot22
,
GpuGemv
,
GpuGemm
,
GpuGer
,
GpuGemmBatch
,
from
.blas
import
(
gpu_dot22
,
GpuGemm
,
GpuGer
,
GpuGemmBatch
,
gpugemm_no_inplace
,
gpugemmbatch_no_inplace
)
gpugemm_no_inplace
,
gpugemm_inplace
,
gpugemmbatch_no_inplace
,
from
.blocksparse
import
GpuSparseBlockGemv
,
GpuSparseBlockOuter
gpugemv_no_inplace
,
gpugemv_inplace
)
from
.nnet
import
(
GpuCrossentropySoftmaxArgmax1HotWithBias
,
from
.blocksparse
import
(
GpuSparseBlockGemv
,
GpuSparseBlockOuter
,
GpuCrossentropySoftmax1HotWithBiasDx
,
gpu_sparse_block_outer
,
gpu_sparse_block_outer_inplace
,
GpuSoftmaxWithBias
,
GpuSoftmax
)
gpu_sparse_block_gemv
,
gpu_sparse_block_gemv_inplace
)
from
.nnet
import
(
gpu_crossentropy_softmax_1hot_with_bias_dx
,
gpu_crossentropy_softmax_argmax_1hot_with_bias
,
gpu_softmax_with_bias
,
gpu_softmax
)
from
.elemwise
import
(
GpuElemwise
,
GpuDimShuffle
,
GpuCAReduceCuda
,
from
.elemwise
import
(
GpuElemwise
,
GpuDimShuffle
,
GpuCAReduceCuda
,
GpuCAReduceCPY
)
GpuCAReduceCPY
)
from
.subtensor
import
(
GpuIncSubtensor
,
GpuSubtensor
,
from
.subtensor
import
(
GpuIncSubtensor
,
GpuSubtensor
,
...
@@ -49,6 +53,7 @@ from .opt_util import alpha_merge, output_merge
...
@@ -49,6 +53,7 @@ from .opt_util import alpha_merge, output_merge
_logger
=
logging
.
getLogger
(
"theano.gpuarray.opt"
)
_logger
=
logging
.
getLogger
(
"theano.gpuarray.opt"
)
gpu_optimizer
=
EquilibriumDB
()
gpu_optimizer
=
EquilibriumDB
()
gpu_cut_copies
=
EquilibriumDB
()
gpu_cut_copies
=
EquilibriumDB
()
...
@@ -146,7 +151,7 @@ def op_lifter(OP, cuda_only=False):
...
@@ -146,7 +151,7 @@ def op_lifter(OP, cuda_only=False):
# Check if we should replace
# Check if we should replace
if
(
not
replace
or
if
(
not
replace
or
(
cuda_only
and
(
cuda_only
and
get_context
(
context_name
)
.
kind
!=
'cuda'
)):
get_context
(
context_name
)
.
kind
!=
b
'cuda'
)):
return
False
return
False
# tag the inputs with the context in case
# tag the inputs with the context in case
...
@@ -643,7 +648,7 @@ def local_gpua_advanced_subtensor(node, context_name):
...
@@ -643,7 +648,7 @@ def local_gpua_advanced_subtensor(node, context_name):
def
local_gpua_advanced_incsubtensor
(
node
,
context_name
):
def
local_gpua_advanced_incsubtensor
(
node
,
context_name
):
context
=
get_context
(
context_name
)
context
=
get_context
(
context_name
)
# This is disabled on non-cuda contexts
# This is disabled on non-cuda contexts
if
context
.
kind
!=
'cuda'
:
if
context
.
kind
!=
b
'cuda'
:
return
None
return
None
x
,
y
,
ilist
=
node
.
inputs
x
,
y
,
ilist
=
node
.
inputs
...
@@ -674,12 +679,12 @@ def local_gpua_careduce(node, context_name):
...
@@ -674,12 +679,12 @@ def local_gpua_careduce(node, context_name):
if
isinstance
(
node
.
op
.
scalar_op
,
(
scalar
.
Add
,
scalar
.
Mul
,
if
isinstance
(
node
.
op
.
scalar_op
,
(
scalar
.
Add
,
scalar
.
Mul
,
scalar
.
Maximum
,
scalar
.
Minimum
)):
scalar
.
Maximum
,
scalar
.
Minimum
)):
ctx
=
get_context
(
context_name
)
ctx
=
get_context
(
context_name
)
if
ctx
.
kind
==
'opencl'
:
if
ctx
.
kind
==
b
'opencl'
:
op
=
GpuCAReduceCPY
op
=
GpuCAReduceCPY
if
node
.
op
.
scalar_op
not
in
[
scalar
.
add
,
scalar
.
mul
]:
if
node
.
op
.
scalar_op
not
in
[
scalar
.
add
,
scalar
.
mul
]:
# We don't support yet all reduction with cpy code.
# We don't support yet all reduction with cpy code.
return
return
elif
ctx
.
kind
==
'cuda'
:
elif
ctx
.
kind
==
b
'cuda'
:
op
=
GpuCAReduceCuda
op
=
GpuCAReduceCuda
else
:
else
:
return
False
return
False
...
@@ -711,18 +716,14 @@ def local_gpua_careduce(node, context_name):
...
@@ -711,18 +716,14 @@ def local_gpua_careduce(node, context_name):
assert
reduce_mask
[
a
]
==
0
assert
reduce_mask
[
a
]
==
0
reduce_mask
[
a
]
=
1
reduce_mask
[
a
]
=
1
shape_of
=
node
.
fgraph
.
shape_feature
.
shape_of
new_in_shp
=
[
shape_i
(
x
,
0
)]
x_shape
=
shape_of
[
x
]
new_in_shp
=
[
x_shape
[
0
]]
new_mask
=
[
reduce_mask
[
0
]]
new_mask
=
[
reduce_mask
[
0
]]
for
i
in
xrange
(
1
,
x
.
type
.
ndim
):
for
i
in
xrange
(
1
,
x
.
type
.
ndim
):
if
reduce_mask
[
i
]
==
reduce_mask
[
i
-
1
]:
if
reduce_mask
[
i
]
==
reduce_mask
[
i
-
1
]:
new_in_shp
[
-
1
]
*=
x_shape
[
i
]
new_in_shp
[
-
1
]
*=
shape_i
(
x
,
i
)
else
:
else
:
new_mask
.
append
(
reduce_mask
[
i
])
new_mask
.
append
(
reduce_mask
[
i
])
new_in_shp
.
append
(
x_shape
[
i
]
)
new_in_shp
.
append
(
shape_i
(
x
,
i
)
)
new_axis
=
[]
new_axis
=
[]
for
idx
,
m
in
enumerate
(
new_mask
):
for
idx
,
m
in
enumerate
(
new_mask
):
if
m
==
1
:
if
m
==
1
:
...
@@ -744,8 +745,12 @@ def local_gpua_careduce(node, context_name):
...
@@ -744,8 +745,12 @@ def local_gpua_careduce(node, context_name):
greduce
(
gpu_reshaped_x
))
greduce
(
gpu_reshaped_x
))
if
reduce_reshaped_x
.
ndim
!=
node
.
outputs
[
0
]
.
ndim
:
if
reduce_reshaped_x
.
ndim
!=
node
.
outputs
[
0
]
.
ndim
:
out_shp
=
[]
for
i
in
range
(
x
.
ndim
):
if
i
not
in
node
.
op
.
axis
:
out_shp
.
append
(
shape_i
(
x
,
i
))
unreshaped_reduce
=
reduce_reshaped_x
.
reshape
(
unreshaped_reduce
=
reduce_reshaped_x
.
reshape
(
tensor
.
stack
(
shape_of
[
node
.
outputs
[
0
]]
))
tensor
.
stack
(
out_shp
))
else
:
else
:
unreshaped_reduce
=
reduce_reshaped_x
unreshaped_reduce
=
reduce_reshaped_x
return
[
unreshaped_reduce
]
return
[
unreshaped_reduce
]
...
@@ -754,13 +759,19 @@ def local_gpua_careduce(node, context_name):
...
@@ -754,13 +759,19 @@ def local_gpua_careduce(node, context_name):
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
blas
.
Gemv
,
tensor
.
blas_c
.
CGemv
])
@op_lifter
([
tensor
.
blas
.
Gemv
,
tensor
.
blas_c
.
CGemv
])
def
local_gpua_gemv
(
node
,
context_name
):
def
local_gpua_gemv
(
node
,
context_name
):
return
GpuGemv
(
inplace
=
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpugemv_inplace
else
:
return
gpugemv_no_inplace
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
blas
.
Gemm
])
@op_lifter
([
tensor
.
blas
.
Gemm
])
def
local_gpua_gemm
(
node
,
context_name
):
def
local_gpua_gemm
(
node
,
context_name
):
return
GpuGemm
(
inplace
=
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpugemm_inplace
else
:
return
gpugemm_no_inplace
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
...
@@ -834,7 +845,7 @@ def local_gpua_dot22scalar(node, context_name):
...
@@ -834,7 +845,7 @@ def local_gpua_dot22scalar(node, context_name):
x
=
as_gpuarray_variable
(
x
,
context_name
)
x
=
as_gpuarray_variable
(
x
,
context_name
)
y
=
as_gpuarray_variable
(
y
,
context_name
)
y
=
as_gpuarray_variable
(
y
,
context_name
)
z
=
GpuAllocEmpty
(
x
.
dtype
,
context_name
)(
x
.
shape
[
0
],
y
.
shape
[
1
])
z
=
GpuAllocEmpty
(
x
.
dtype
,
context_name
)(
x
.
shape
[
0
],
y
.
shape
[
1
])
return
[
GpuGemm
(
inplace
=
False
)
(
z
,
a
,
x
,
y
,
0
)]
return
[
gpugemm_no_inplace
(
z
,
a
,
x
,
y
,
0
)]
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
...
@@ -846,25 +857,25 @@ def local_gpua_eye(node, context_name):
...
@@ -846,25 +857,25 @@ def local_gpua_eye(node, context_name):
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
CrossentropySoftmaxArgmax1HotWithBias
],
cuda_only
=
True
)
@op_lifter
([
tensor
.
nnet
.
CrossentropySoftmaxArgmax1HotWithBias
],
cuda_only
=
True
)
def
local_gpua_crossentropysoftmaxargmax1hotwithbias
(
node
,
context_name
):
def
local_gpua_crossentropysoftmaxargmax1hotwithbias
(
node
,
context_name
):
return
GpuCrossentropySoftmaxArgmax1HotWithBias
()
return
gpu_crossentropy_softmax_argmax_1hot_with_bias
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
CrossentropySoftmax1HotWithBiasDx
],
cuda_only
=
True
)
@op_lifter
([
tensor
.
nnet
.
CrossentropySoftmax1HotWithBiasDx
],
cuda_only
=
True
)
def
local_gpua_crossentropysoftmax1hotwithbiasdx
(
node
,
context_name
):
def
local_gpua_crossentropysoftmax1hotwithbiasdx
(
node
,
context_name
):
return
GpuCrossentropySoftmax1HotWithBiasDx
()
return
gpu_crossentropy_softmax_1hot_with_bias_dx
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
Softmax
],
cuda_only
=
True
)
@op_lifter
([
tensor
.
nnet
.
Softmax
],
cuda_only
=
True
)
def
local_gpua_softmax
(
node
,
context_name
):
def
local_gpua_softmax
(
node
,
context_name
):
return
GpuSoftmax
()
return
gpu_softmax
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
SoftmaxWithBias
],
cuda_only
=
True
)
@op_lifter
([
tensor
.
nnet
.
SoftmaxWithBias
],
cuda_only
=
True
)
def
local_gpua_softmaxwithbias
(
node
,
context_name
):
def
local_gpua_softmaxwithbias
(
node
,
context_name
):
return
GpuSoftmaxWithBias
()
return
gpu_softmax_with_bias
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
...
@@ -889,20 +900,26 @@ theano.tensor.nnet.conv2d()
...
@@ -889,20 +900,26 @@ theano.tensor.nnet.conv2d()
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
SparseBlockGemv
])
@op_lifter
([
SparseBlockGemv
])
def
local_lift_sparseblockgemv
(
node
,
context_name
):
def
local_lift_sparseblockgemv
(
node
,
context_name
):
return
GpuSparseBlockGemv
(
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpu_sparse_block_gemv_inplace
else
:
return
gpu_sparse_block_gemv
@register_opt
(
'fast_compile'
)
@register_opt
(
'fast_compile'
)
@op_lifter
([
SparseBlockOuter
])
@op_lifter
([
SparseBlockOuter
])
def
local_lift_sparseblockouter
(
node
,
context_name
):
def
local_lift_sparseblockouter
(
node
,
context_name
):
return
GpuSparseBlockOuter
(
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpu_sparse_block_outer_inplace
else
:
return
gpu_sparse_block_outer
@register_inplace
()
@register_inplace
()
@local_optimizer
([
GpuSparseBlockGemv
],
inplace
=
True
)
@local_optimizer
([
GpuSparseBlockGemv
],
inplace
=
True
)
def
local_inplace_sparseblockgemv
(
node
):
def
local_inplace_sparseblockgemv
(
node
):
if
isinstance
(
node
.
op
,
GpuSparseBlockGemv
)
and
not
node
.
op
.
inplace
:
if
isinstance
(
node
.
op
,
GpuSparseBlockGemv
)
and
not
node
.
op
.
inplace
:
return
[
GpuSparseBlockGemv
(
inplace
=
True
)
(
*
node
.
inputs
)]
return
[
gpu_sparse_block_gemv_inplace
(
*
node
.
inputs
)]
@register_inplace
()
@register_inplace
()
...
...
theano/gpuarray/subtensor.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/gpuarray/tests/test_basic_ops.py
浏览文件 @
0a7a4c06
...
@@ -18,7 +18,7 @@ from theano.tests import unittest_tools as utt
...
@@ -18,7 +18,7 @@ from theano.tests import unittest_tools as utt
from
..type
import
(
GpuArrayType
,
get_context
,
from
..type
import
(
GpuArrayType
,
get_context
,
gpuarray_shared_constructor
)
gpuarray_shared_constructor
)
from
..basic_ops
import
(
from
..basic_ops
import
(
host_from_gpu
,
HostFromGpu
,
GpuFromHost
,
GpuReshape
,
host_from_gpu
,
HostFromGpu
,
GpuFromHost
,
GpuReshape
,
GpuToGpu
,
GpuAlloc
,
GpuAllocEmpty
,
GpuContiguous
,
GpuAlloc
,
GpuAllocEmpty
,
GpuContiguous
,
gpu_join
,
GpuJoin
,
GpuSplit
,
GpuEye
,
gpu_contiguous
)
gpu_join
,
GpuJoin
,
GpuSplit
,
GpuEye
,
gpu_contiguous
)
from
..subtensor
import
GpuSubtensor
from
..subtensor
import
GpuSubtensor
...
@@ -182,6 +182,21 @@ def test_transfer_cpu_gpu():
...
@@ -182,6 +182,21 @@ def test_transfer_cpu_gpu():
assert
numpy
.
all
(
fv
==
av
)
assert
numpy
.
all
(
fv
==
av
)
def
test_transfer_gpu_gpu
():
g
=
GpuArrayType
(
dtype
=
'float32'
,
broadcastable
=
(
False
,
False
),
context_name
=
test_ctx_name
)()
av
=
numpy
.
asarray
(
rng
.
rand
(
5
,
4
),
dtype
=
'float32'
)
gv
=
gpuarray
.
array
(
av
,
context
=
get_context
(
test_ctx_name
))
mode
=
mode_with_gpu
.
excluding
(
'cut_gpua_host_transfers'
,
'local_cut_gpua_host_gpua'
)
f
=
theano
.
function
([
g
],
GpuToGpu
(
test_ctx_name
)(
g
),
mode
=
mode
)
topo
=
f
.
maker
.
fgraph
.
toposort
()
assert
len
(
topo
)
==
1
assert
isinstance
(
topo
[
0
]
.
op
,
GpuToGpu
)
fv
=
f
(
gv
)
assert
GpuArrayType
.
values_eq
(
fv
,
gv
)
def
test_transfer_strided
():
def
test_transfer_strided
():
# This is just to ensure that it works in theano
# This is just to ensure that it works in theano
# libgpuarray has a much more comprehensive suit of tests to
# libgpuarray has a much more comprehensive suit of tests to
...
...
theano/gpuarray/tests/test_elemwise.py
浏览文件 @
0a7a4c06
...
@@ -197,7 +197,7 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY):
...
@@ -197,7 +197,7 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY):
def
setUp
(
self
):
def
setUp
(
self
):
super
(
test_GpuCAReduceCuda
,
self
)
.
setUp
()
super
(
test_GpuCAReduceCuda
,
self
)
.
setUp
()
if
get_context
(
test_ctx_name
)
.
kind
!=
'cuda'
:
if
get_context
(
test_ctx_name
)
.
kind
!=
b
'cuda'
:
raise
SkipTest
(
"Cuda specific tests"
)
raise
SkipTest
(
"Cuda specific tests"
)
...
@@ -212,7 +212,7 @@ class T_gpureduce_dtype(test_elemwise.T_reduce_dtype):
...
@@ -212,7 +212,7 @@ class T_gpureduce_dtype(test_elemwise.T_reduce_dtype):
'float32'
,
'float64'
]
'float32'
,
'float64'
]
def
setUp
(
self
):
def
setUp
(
self
):
if
get_context
(
test_ctx_name
)
.
kind
!=
'cuda'
:
if
get_context
(
test_ctx_name
)
.
kind
!=
b
'cuda'
:
raise
SkipTest
(
"Cuda specific tests"
)
raise
SkipTest
(
"Cuda specific tests"
)
...
...
theano/gpuarray/tests/test_extra_ops.py
浏览文件 @
0a7a4c06
...
@@ -24,7 +24,7 @@ class TestGpuCumsum(theano.tensor.tests.test_extra_ops.TestCumsumOp):
...
@@ -24,7 +24,7 @@ class TestGpuCumsum(theano.tensor.tests.test_extra_ops.TestCumsumOp):
def
setUp
(
self
):
def
setUp
(
self
):
super
(
TestGpuCumsum
,
self
)
.
setUp
()
super
(
TestGpuCumsum
,
self
)
.
setUp
()
test_ctx
=
get_context
(
test_ctx_name
)
test_ctx
=
get_context
(
test_ctx_name
)
if
test_ctx
.
kind
!=
'cuda'
:
if
test_ctx
.
kind
!=
b
'cuda'
:
raise
SkipTest
(
"Cuda specific tests"
)
raise
SkipTest
(
"Cuda specific tests"
)
self
.
max_threads_dim0
=
test_ctx
.
maxlsize0
self
.
max_threads_dim0
=
test_ctx
.
maxlsize0
self
.
max_grid_size1
=
test_ctx
.
maxgsize2
self
.
max_grid_size1
=
test_ctx
.
maxgsize2
...
...
theano/gpuarray/tests/test_opt.py
浏览文件 @
0a7a4c06
...
@@ -125,7 +125,7 @@ def test_reduce():
...
@@ -125,7 +125,7 @@ def test_reduce():
topo
=
f
.
maker
.
fgraph
.
toposort
()
topo
=
f
.
maker
.
fgraph
.
toposort
()
ops
=
[
type
(
node
.
op
)
for
node
in
topo
]
ops
=
[
type
(
node
.
op
)
for
node
in
topo
]
if
kind
==
'opencl'
and
method
in
[
"max"
,
"min"
]:
if
kind
==
b
'opencl'
and
method
in
[
"max"
,
"min"
]:
assert
not
(
GpuCAReduceCuda
in
ops
or
GpuCAReduceCPY
in
ops
)
assert
not
(
GpuCAReduceCuda
in
ops
or
GpuCAReduceCPY
in
ops
)
else
:
else
:
assert
GpuCAReduceCuda
in
ops
or
GpuCAReduceCPY
in
ops
assert
GpuCAReduceCuda
in
ops
or
GpuCAReduceCPY
in
ops
...
...
theano/gpuarray/tests/test_subtensor.py
浏览文件 @
0a7a4c06
...
@@ -56,3 +56,32 @@ def test_advinc_subtensor1():
...
@@ -56,3 +56,32 @@ def test_advinc_subtensor1():
rep
=
xval
.
copy
()
rep
=
xval
.
copy
()
rep
[[
0
,
2
]]
+=
yval
rep
[[
0
,
2
]]
+=
yval
assert
numpy
.
allclose
(
rval
,
rep
)
assert
numpy
.
allclose
(
rval
,
rep
)
def
test_incsub_f16
():
shp
=
(
3
,
3
)
shared
=
gpuarray_shared_constructor
xval
=
numpy
.
arange
(
numpy
.
prod
(
shp
),
dtype
=
'float16'
)
.
reshape
(
shp
)
+
1
yval
=
numpy
.
empty
((
2
,)
+
shp
[
1
:],
dtype
=
'float16'
)
yval
[:]
=
2
x
=
shared
(
xval
,
name
=
'x'
)
y
=
tensor
.
tensor
(
dtype
=
'float16'
,
broadcastable
=
(
False
,)
*
len
(
shp
),
name
=
'y'
)
expr
=
tensor
.
advanced_inc_subtensor1
(
x
,
y
,
[
0
,
2
])
f
=
theano
.
function
([
y
],
expr
,
mode
=
mode_with_gpu
)
assert
sum
([
isinstance
(
node
.
op
,
GpuAdvancedIncSubtensor1
)
for
node
in
f
.
maker
.
fgraph
.
toposort
()])
==
1
rval
=
f
(
yval
)
rep
=
xval
.
copy
()
rep
[[
0
,
2
]]
+=
yval
assert
numpy
.
allclose
(
rval
,
rep
)
expr
=
tensor
.
inc_subtensor
(
x
[
1
:],
y
)
f
=
theano
.
function
([
y
],
expr
,
mode
=
mode_with_gpu
)
assert
sum
([
isinstance
(
node
.
op
,
GpuIncSubtensor
)
for
node
in
f
.
maker
.
fgraph
.
toposort
()])
==
1
rval
=
f
(
yval
)
rep
=
xval
.
copy
()
rep
[
1
:]
+=
yval
assert
numpy
.
allclose
(
rval
,
rep
)
theano/gpuarray/type.py
浏览文件 @
0a7a4c06
...
@@ -301,20 +301,14 @@ class GpuArrayType(Type):
...
@@ -301,20 +301,14 @@ class GpuArrayType(Type):
raise
NotImplementedError
(
raise
NotImplementedError
(
"GpuArrayType.values_eq_approx() don't implemented the"
"GpuArrayType.values_eq_approx() don't implemented the"
" allow_remove_inf and allow_remove_nan parameter"
)
" allow_remove_inf and allow_remove_nan parameter"
)
if
a
.
dtype
==
'float16'
or
b
.
dtype
==
'float16'
:
an
=
numpy
.
asarray
(
a
)
bn
=
numpy
.
asarray
(
b
)
return
tensor
.
TensorType
.
values_eq_approx
(
an
,
bn
,
allow_remove_inf
=
allow_remove_inf
,
allow_remove_nan
=
allow_remove_nan
,
rtol
=
rtol
,
atol
=
atol
)
atol_
,
rtol_
=
theano
.
tensor
.
basic
.
_get_atol_rtol
(
a
,
b
)
atol_
,
rtol_
=
theano
.
tensor
.
basic
.
_get_atol_rtol
(
a
,
b
)
if
rtol
is
not
None
:
if
rtol
is
not
None
:
rtol_
=
rtol
rtol_
=
rtol
if
atol
is
not
None
:
if
atol
is
not
None
:
atol_
=
atol
atol_
=
atol
res
=
elemwise2
(
a
,
''
,
b
,
a
,
odtype
=
numpy
.
dtype
(
'bool'
),
res
=
elemwise2
(
a
,
''
,
b
,
a
,
odtype
=
numpy
.
dtype
(
'bool'
),
op_tmpl
=
"res
[i] = (fabs(
%%(a)
s -
%%(b)
s
) <"
op_tmpl
=
"res
= (fabs(a - b
) <"
"(
%(atol_)
s +
%(rtol_)
s * fabs(
%%(b)
s
)))"
%
"(
%(atol_)
s +
%(rtol_)
s * fabs(
b
)))"
%
locals
())
locals
())
ret
=
numpy
.
asarray
(
res
)
.
all
()
ret
=
numpy
.
asarray
(
res
)
.
all
()
if
ret
:
if
ret
:
...
...
theano/misc/check_blas.py
100755 → 100644
浏览文件 @
0a7a4c06
...
@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
...
@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
t0
=
0
t0
=
0
t1
=
-
1
t1
=
-
1
f
()
# Ignore first function call to get representative time.
if
execute
:
if
execute
:
sync
=
(
hasattr
(
theano
,
"sandbox"
)
and
sync
=
(
hasattr
(
theano
,
"sandbox"
)
and
hasattr
(
theano
.
sandbox
,
"cuda"
)
and
hasattr
(
theano
.
sandbox
,
"cuda"
)
and
theano
.
sandbox
.
cuda
.
cuda_available
)
theano
.
sandbox
.
cuda
.
cuda_available
)
sync2
=
(
hasattr
(
theano
,
"gpuarray"
)
and
theano
.
gpuarray
.
pygpu_activated
)
t0
=
time
.
time
()
t0
=
time
.
time
()
for
i
in
range
(
iters
):
for
i
in
range
(
iters
):
f
()
f
()
if
sync
:
if
sync
:
theano
.
sandbox
.
cuda
.
synchronize
()
theano
.
sandbox
.
cuda
.
synchronize
()
if
sync2
:
c
.
get_value
(
borrow
=
True
,
return_internal_type
=
True
)
.
sync
()
t1
=
time
.
time
()
t1
=
time
.
time
()
return
t1
-
t0
,
impl
return
t1
-
t0
,
impl
...
@@ -244,6 +249,7 @@ if __name__ == "__main__":
...
@@ -244,6 +249,7 @@ if __name__ == "__main__":
cuda version 7.5 7.0 6.5
cuda version 7.5 7.0 6.5
gpu
gpu
M40 0.47s
k80 0.96s
k80 0.96s
K6000/NOECC 0.69s
K6000/NOECC 0.69s
K40 0.88s
K40 0.88s
...
...
theano/sandbox/cuda/dnn.py
浏览文件 @
0a7a4c06
...
@@ -2526,7 +2526,8 @@ if True:
...
@@ -2526,7 +2526,8 @@ if True:
out
=
as_cuda_ndarray_variable
(
out
.
dimshuffle
(
0
,
1
))
out
=
as_cuda_ndarray_variable
(
out
.
dimshuffle
(
0
,
1
))
return
[
out
]
return
[
out
]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'stabilize'
,
'fast_compile'
)
# We put fast_compile as otherwise it won't be on the GPU.
@local_optimizer
([
GpuElemwise
,
LogSoftmax
])
@local_optimizer
([
GpuElemwise
,
LogSoftmax
])
def
local_log_softmax_dnn
(
node
):
def
local_log_softmax_dnn
(
node
):
# The log-softmax implementation is only available starting at cuDNN V3
# The log-softmax implementation is only available starting at cuDNN V3
...
...
theano/sandbox/cuda/opt.py
浏览文件 @
0a7a4c06
...
@@ -14,6 +14,7 @@ from . import dnn
...
@@ -14,6 +14,7 @@ from . import dnn
import
theano
import
theano
from
theano
import
scalar
as
scal
from
theano
import
scalar
as
scal
from
theano
import
config
,
tensor
,
gof
from
theano
import
config
,
tensor
,
gof
from
theano.compile.ops
import
shape_i
import
theano.ifelse
import
theano.ifelse
import
theano.tensor.signal.pool
import
theano.tensor.signal.pool
import
theano.tensor.nnet
import
theano.tensor.nnet
...
@@ -900,18 +901,14 @@ def local_gpu_careduce(node):
...
@@ -900,18 +901,14 @@ def local_gpu_careduce(node):
# to make them a single dimension, do the reduction, and
# to make them a single dimension, do the reduction, and
# then reshape to get them back.
# then reshape to get them back.
shape_of
=
node
.
fgraph
.
shape_feature
.
shape_of
new_in_shp
=
[
shape_i
(
x
,
0
)]
x_shape
=
shape_of
[
x
]
new_in_shp
=
[
x_shape
[
0
]]
new_mask
=
[
reduce_mask
[
0
]]
new_mask
=
[
reduce_mask
[
0
]]
for
i
in
xrange
(
1
,
x
.
type
.
ndim
):
for
i
in
xrange
(
1
,
x
.
type
.
ndim
):
if
reduce_mask
[
i
]
==
reduce_mask
[
i
-
1
]:
if
reduce_mask
[
i
]
==
reduce_mask
[
i
-
1
]:
new_in_shp
[
-
1
]
*=
x_shape
[
i
]
new_in_shp
[
-
1
]
*=
shape_i
(
x
,
i
)
else
:
else
:
new_mask
.
append
(
reduce_mask
[
i
])
new_mask
.
append
(
reduce_mask
[
i
])
new_in_shp
.
append
(
x_shape
[
i
]
)
new_in_shp
.
append
(
shape_i
(
x
,
i
)
)
new_greduce
=
GpuCAReduce
(
new_mask
,
scalar_op
)
new_greduce
=
GpuCAReduce
(
new_mask
,
scalar_op
)
new_x
=
x
.
reshape
(
tensor
.
stack
(
new_in_shp
))
new_x
=
x
.
reshape
(
tensor
.
stack
(
new_in_shp
))
...
@@ -936,8 +933,11 @@ def local_gpu_careduce(node):
...
@@ -936,8 +933,11 @@ def local_gpu_careduce(node):
# Restore the expected shape of the output
# Restore the expected shape of the output
if
rval
.
ndim
!=
out
.
ndim
:
if
rval
.
ndim
!=
out
.
ndim
:
rval
=
rval
.
reshape
(
out_shp
=
[]
tensor
.
stack
(
shape_of
[
out
]))
for
i
in
range
(
x
.
ndim
):
if
i
not
in
node
.
op
.
axis
:
out_shp
.
append
(
shape_i
(
x
,
i
))
rval
=
rval
.
reshape
(
tensor
.
stack
(
out_shp
))
if
rval
.
type
==
out
.
type
:
if
rval
.
type
==
out
.
type
:
return
[
rval
]
return
[
rval
]
...
...
theano/sandbox/gpuarray/__init__.py
浏览文件 @
0a7a4c06
...
@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray."""
...
@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray."""
import
warnings
import
warnings
from
theano.gpuarray
import
*
from
theano.gpuarray
import
*
message
=
"theano.sandbox.gpuarray has been moved to theano.gpuarray."
+
\
message
=
(
"theano.sandbox.gpuarray has been moved to theano.gpuarray. "
" Please update your code and pickles."
"Please update your code and pickles. If the warning persists, "
"clear theano's cache ('$theano/bin/theano-cache clear')."
)
warnings
.
warn
(
message
)
warnings
.
warn
(
message
)
theano/scalar/basic.py
浏览文件 @
0a7a4c06
...
@@ -2543,7 +2543,7 @@ class Log2(UnaryScalarOp):
...
@@ -2543,7 +2543,7 @@ class Log2(UnaryScalarOp):
else
:
else
:
return
[
x
.
zeros_like
()]
return
[
x
.
zeros_like
()]
return
gz
/
(
x
*
math
.
log
(
2.0
)),
return
gz
/
(
x
*
numpy
.
asarray
(
math
.
log
(
2.0
))
.
astype
(
x
.
dtype
)),
def
c_code
(
self
,
node
,
name
,
inputs
,
outputs
,
sub
):
def
c_code
(
self
,
node
,
name
,
inputs
,
outputs
,
sub
):
(
x
,)
=
inputs
(
x
,)
=
inputs
...
...
theano/scan_module/scan_opt.py
浏览文件 @
0a7a4c06
...
@@ -202,7 +202,7 @@ def remove_constants_and_unused_inputs_scan(node):
...
@@ -202,7 +202,7 @@ def remove_constants_and_unused_inputs_scan(node):
# DEBUG CHECK
# DEBUG CHECK
nwScan
=
scan_op
.
Scan
(
nw_inner
,
op_outs
,
nw_info
)
nwScan
=
scan_op
.
Scan
(
nw_inner
,
op_outs
,
nw_info
)
nw_outs
=
nwScan
(
*
nw_outer
,
**
dict
(
return_list
=
True
))
nw_outs
=
nwScan
(
*
nw_outer
,
**
dict
(
return_list
=
True
))
return
d
ict
([(
"remove"
,
[
node
])]
+
list
(
zip
(
node
.
outputs
,
nw_outs
)))
return
OrderedD
ict
([(
"remove"
,
[
node
])]
+
list
(
zip
(
node
.
outputs
,
nw_outs
)))
else
:
else
:
return
False
return
False
...
@@ -2072,8 +2072,8 @@ def scan_merge_inouts(node):
...
@@ -2072,8 +2072,8 @@ def scan_merge_inouts(node):
new_outer_out_mit_mot
.
append
(
outer_omm
)
new_outer_out_mit_mot
.
append
(
outer_omm
)
na
.
outer_out_mit_mot
=
new_outer_out_mit_mot
na
.
outer_out_mit_mot
=
new_outer_out_mit_mot
if
remove
:
if
remove
:
return
d
ict
([(
"remove"
,
remove
)]
+
return
OrderedD
ict
([(
"remove"
,
remove
)]
+
list
(
zip
(
node
.
outputs
,
na
.
outer_outputs
)))
list
(
zip
(
node
.
outputs
,
na
.
outer_outputs
)))
return
na
.
outer_outputs
return
na
.
outer_outputs
...
...
theano/tensor/basic.py
浏览文件 @
0a7a4c06
...
@@ -612,14 +612,14 @@ def get_scalar_constant_value(orig_v, elemwise=True,
...
@@ -612,14 +612,14 @@ def get_scalar_constant_value(orig_v, elemwise=True,
return
numpy
.
asarray
(
v
)
return
numpy
.
asarray
(
v
)
if
isinstance
(
v
,
numpy
.
ndarray
):
if
isinstance
(
v
,
numpy
.
ndarray
):
return
numpy_scalar
(
v
)
return
numpy_scalar
(
v
)
.
copy
()
if
isinstance
(
v
,
Constant
):
if
isinstance
(
v
,
Constant
):
if
getattr
(
v
.
tag
,
'unique_value'
,
None
)
is
not
None
:
if
getattr
(
v
.
tag
,
'unique_value'
,
None
)
is
not
None
:
data
=
v
.
tag
.
unique_value
data
=
v
.
tag
.
unique_value
else
:
else
:
data
=
v
.
data
data
=
v
.
data
return
numpy_scalar
(
data
)
return
numpy_scalar
(
data
)
.
copy
()
if
not
only_process_constants
and
getattr
(
v
,
'owner'
,
None
):
if
not
only_process_constants
and
getattr
(
v
,
'owner'
,
None
):
if
isinstance
(
v
.
owner
.
op
,
(
Alloc
,
DimShuffle
,
Rebroadcast
,
if
isinstance
(
v
.
owner
.
op
,
(
Alloc
,
DimShuffle
,
Rebroadcast
,
...
@@ -649,7 +649,7 @@ def get_scalar_constant_value(orig_v, elemwise=True,
...
@@ -649,7 +649,7 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for
i
in
v
.
owner
.
inputs
]
for
i
in
v
.
owner
.
inputs
]
ret
=
[[
None
]]
ret
=
[[
None
]]
v
.
owner
.
op
.
perform
(
v
.
owner
,
const
,
ret
)
v
.
owner
.
op
.
perform
(
v
.
owner
,
const
,
ret
)
return
ret
[
0
][
0
]
return
ret
[
0
][
0
]
.
copy
()
elif
elemwise
and
isinstance
(
v
.
owner
.
op
,
Elemwise
):
elif
elemwise
and
isinstance
(
v
.
owner
.
op
,
Elemwise
):
if
isinstance
(
v
.
owner
.
op
.
scalar_op
,
scal
.
Second
):
if
isinstance
(
v
.
owner
.
op
.
scalar_op
,
scal
.
Second
):
# We don't need both input to be constant for second
# We don't need both input to be constant for second
...
@@ -662,13 +662,13 @@ def get_scalar_constant_value(orig_v, elemwise=True,
...
@@ -662,13 +662,13 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for
i
in
v
.
owner
.
inputs
]
for
i
in
v
.
owner
.
inputs
]
ret
=
[[
None
]]
ret
=
[[
None
]]
v
.
owner
.
op
.
perform
(
v
.
owner
,
const
,
ret
)
v
.
owner
.
op
.
perform
(
v
.
owner
,
const
,
ret
)
return
ret
[
0
][
0
]
return
ret
[
0
][
0
]
.
copy
()
elif
(
isinstance
(
v
.
owner
.
op
,
theano
.
tensor
.
subtensor
.
Subtensor
)
and
elif
(
isinstance
(
v
.
owner
.
op
,
theano
.
tensor
.
subtensor
.
Subtensor
)
and
v
.
ndim
==
0
):
v
.
ndim
==
0
):
if
isinstance
(
v
.
owner
.
inputs
[
0
],
TensorConstant
):
if
isinstance
(
v
.
owner
.
inputs
[
0
],
TensorConstant
):
cdata
=
tuple
(
v
.
owner
.
op
.
get_constant_idx
(
v
.
owner
.
inputs
))
cdata
=
tuple
(
v
.
owner
.
op
.
get_constant_idx
(
v
.
owner
.
inputs
))
try
:
try
:
return
v
.
owner
.
inputs
[
0
]
.
data
.
__getitem__
(
cdata
)
return
v
.
owner
.
inputs
[
0
]
.
data
.
__getitem__
(
cdata
)
.
copy
()
except
IndexError
:
except
IndexError
:
raise
IndexError
(
raise
IndexError
(
str
(
tuple
(
v
.
owner
.
op
.
idx_list
))
+
str
(
tuple
(
v
.
owner
.
op
.
idx_list
))
+
...
@@ -1399,8 +1399,6 @@ class MaxAndArgmax(Op):
...
@@ -1399,8 +1399,6 @@ class MaxAndArgmax(Op):
%(axis_code)
s
%(axis_code)
s
%(max)
s = (PyArrayObject*)PyArray_Max(
%(x)
s, axis, NULL);
%(max)
s = (PyArrayObject*)PyArray_Max(
%(x)
s, axis, NULL);
if(
%(max)
s == NULL){
if(
%(max)
s == NULL){
PyErr_SetString(PyExc_ValueError,
"MaxAndArgmax, max failed");
%(fail)
s;
%(fail)
s;
}
}
if(!PyArray_CheckExact(
%(max)
s)){
if(!PyArray_CheckExact(
%(max)
s)){
...
@@ -1412,7 +1410,6 @@ class MaxAndArgmax(Op):
...
@@ -1412,7 +1410,6 @@ class MaxAndArgmax(Op):
%(argmax)
s = (PyArrayObject*)PyArray_ArgMax(
%(x)
s, axis, NULL);
%(argmax)
s = (PyArrayObject*)PyArray_ArgMax(
%(x)
s, axis, NULL);
if(
%(argmax)
s == NULL){
if(
%(argmax)
s == NULL){
PyErr_SetString(PyExc_ValueError, "MaxAndArgmax, argmax failed");
Py_CLEAR(
%(max)
s);
Py_CLEAR(
%(max)
s);
%(fail)
s;
%(fail)
s;
}
}
...
@@ -1434,7 +1431,7 @@ class MaxAndArgmax(Op):
...
@@ -1434,7 +1431,7 @@ class MaxAndArgmax(Op):
return
ret
%
locals
()
return
ret
%
locals
()
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
return
(
3
,)
return
(
4
,)
def
infer_shape
(
self
,
node
,
shapes
):
def
infer_shape
(
self
,
node
,
shapes
):
ishape
,
axis_shape
=
shapes
ishape
,
axis_shape
=
shapes
...
...
theano/tensor/blas.py
浏览文件 @
0a7a4c06
...
@@ -152,6 +152,7 @@ from theano.tensor import basic as T
...
@@ -152,6 +152,7 @@ from theano.tensor import basic as T
from
theano.tensor.blas_headers
import
blas_header_text
from
theano.tensor.blas_headers
import
blas_header_text
from
theano.tensor.blas_headers
import
blas_header_version
from
theano.tensor.blas_headers
import
blas_header_version
from
theano.tensor.opt
import
in2out
,
local_dimshuffle_lift
from
theano.tensor.opt
import
in2out
,
local_dimshuffle_lift
from
theano.tensor.type
import
values_eq_approx_remove_inf_nan
_logger
=
logging
.
getLogger
(
'theano.tensor.blas'
)
_logger
=
logging
.
getLogger
(
'theano.tensor.blas'
)
...
@@ -1435,7 +1436,8 @@ class GemmOptimizer(Optimizer):
...
@@ -1435,7 +1436,8 @@ class GemmOptimizer(Optimizer):
if
new_node
is
not
node
:
if
new_node
is
not
node
:
nodelist
.
append
(
new_node
)
nodelist
.
append
(
new_node
)
u
=
theano
.
gof
.
opt
.
Updater
(
on_import
,
None
,
None
)
u
=
theano
.
gof
.
opt
.
Updater
(
on_import
,
None
,
None
,
name
=
"GemmOptimizer"
)
fgraph
.
attach_feature
(
u
)
fgraph
.
attach_feature
(
u
)
while
did_something
:
while
did_something
:
nb_iter
+=
1
nb_iter
+=
1
...
@@ -1465,6 +1467,7 @@ class GemmOptimizer(Optimizer):
...
@@ -1465,6 +1467,7 @@ class GemmOptimizer(Optimizer):
if
new_outputs
:
if
new_outputs
:
new_outputs
,
old_dot22
=
new_outputs
new_outputs
,
old_dot22
=
new_outputs
assert
len
(
new_outputs
)
==
len
(
node
.
outputs
)
assert
len
(
new_outputs
)
==
len
(
node
.
outputs
)
new_outputs
[
0
]
.
tag
.
values_eq_approx
=
values_eq_approx_remove_inf_nan
try
:
try
:
fgraph
.
replace_all_validate_remove
(
fgraph
.
replace_all_validate_remove
(
list
(
zip
(
node
.
outputs
,
new_outputs
)),
list
(
zip
(
node
.
outputs
,
new_outputs
)),
...
...
theano/tensor/nlinalg.py
浏览文件 @
0a7a4c06
...
@@ -726,3 +726,62 @@ def norm(x, ord):
...
@@ -726,3 +726,62 @@ def norm(x, ord):
raise
ValueError
(
0
)
raise
ValueError
(
0
)
elif
ndim
>
2
:
elif
ndim
>
2
:
raise
NotImplementedError
(
"We don't support norm witn ndim > 2"
)
raise
NotImplementedError
(
"We don't support norm witn ndim > 2"
)
class
TensorInv
(
Op
):
"""
Class wrapper for tensorinv() function;
Theano utilization of numpy.linalg.tensorinv;
"""
_numop
=
staticmethod
(
numpy
.
linalg
.
tensorinv
)
__props__
=
(
'ind'
,)
def
__init__
(
self
,
ind
=
2
):
self
.
ind
=
ind
def
make_node
(
self
,
a
):
a
=
as_tensor_variable
(
a
)
out
=
a
.
type
()
return
Apply
(
self
,
[
a
],
[
out
])
def
perform
(
self
,
node
,
inputs
,
outputs
):
(
a
,)
=
inputs
(
x
,)
=
outputs
x
[
0
]
=
self
.
_numop
(
a
,
self
.
ind
)
def
infer_shape
(
self
,
node
,
shapes
):
sp
=
shapes
[
0
][
self
.
ind
:]
+
shapes
[
0
][:
self
.
ind
]
return
[
sp
]
def
tensorinv
(
a
,
ind
=
2
):
"""
Does not run on GPU;
Theano utilization of numpy.linalg.tensorinv;
Compute the 'inverse' of an N-dimensional array.
The result is an inverse for `a` relative to the tensordot operation
``tensordot(a, b, ind)``, i. e., up to floating-point accuracy,
``tensordot(tensorinv(a), a, ind)`` is the "identity" tensor for the
tensordot operation.
Parameters
----------
a : array_like
Tensor to 'invert'. Its shape must be 'square', i. e.,
``prod(a.shape[:ind]) == prod(a.shape[ind:])``.
ind : int, optional
Number of first indices that are involved in the inverse sum.
Must be a positive integer, default is 2.
Returns
-------
b : ndarray
`a`'s tensordot inverse, shape ``a.shape[ind:] + a.shape[:ind]``.
Raises
------
LinAlgError
If `a` is singular or not 'square' (in the above sense).
"""
return
TensorInv
(
ind
)(
a
)
theano/tensor/nnet/sigm.py
浏览文件 @
0a7a4c06
...
@@ -413,6 +413,7 @@ log1msigm_to_softplus = gof.PatternSub(
...
@@ -413,6 +413,7 @@ log1msigm_to_softplus = gof.PatternSub(
values_eq_approx
=
values_eq_approx_remove_inf
,
values_eq_approx
=
values_eq_approx_remove_inf
,
skip_identities_fn
=
_skip_mul_1
)
skip_identities_fn
=
_skip_mul_1
)
log1pexp_to_softplus
=
gof
.
PatternSub
(
log1pexp_to_softplus
=
gof
.
PatternSub
(
(
tensor
.
log1p
,
(
tensor
.
log1p
,
(
tensor
.
exp
,
'x'
)),
(
tensor
.
exp
,
'x'
)),
...
@@ -420,12 +421,20 @@ log1pexp_to_softplus = gof.PatternSub(
...
@@ -420,12 +421,20 @@ log1pexp_to_softplus = gof.PatternSub(
values_eq_approx
=
values_eq_approx_remove_inf
,
values_eq_approx
=
values_eq_approx_remove_inf
,
allow_multiple_clients
=
True
)
allow_multiple_clients
=
True
)
log1p_neg_sigmoid
=
gof
.
PatternSub
(
(
tensor
.
log1p
,
(
tensor
.
neg
,
(
sigmoid
,
'x'
))),
(
tensor
.
neg
,
(
softplus
,
'x'
)),
values_eq_approx
=
values_eq_approx_remove_inf
,
allow_multiple_clients
=
True
)
opt
.
register_stabilize
(
logsigm_to_softplus
,
name
=
'logsigm_to_softplus'
)
opt
.
register_stabilize
(
logsigm_to_softplus
,
name
=
'logsigm_to_softplus'
)
opt
.
register_stabilize
(
log1msigm_to_softplus
,
name
=
'log1msigm_to_softplus'
)
opt
.
register_stabilize
(
log1msigm_to_softplus
,
name
=
'log1msigm_to_softplus'
)
opt
.
register_stabilize
(
log1pexp_to_softplus
,
name
=
'log1pexp_to_softplus'
)
opt
.
register_stabilize
(
log1pexp_to_softplus
,
name
=
'log1pexp_to_softplus'
)
opt
.
register_stabilize
(
log1p_neg_sigmoid
,
name
=
'log1p_neg_sigmoid,'
)
def
is_1pexp
(
t
):
def
is_1pexp
(
t
,
only_process_constants
=
True
):
"""
"""
Returns
Returns
...
@@ -437,8 +446,9 @@ def is_1pexp(t):
...
@@ -437,8 +446,9 @@ def is_1pexp(t):
"""
"""
if
t
.
owner
and
t
.
owner
.
op
==
tensor
.
add
:
if
t
.
owner
and
t
.
owner
.
op
==
tensor
.
add
:
scalars
,
scalar_inputs
,
nonconsts
=
\
scalars
,
scalar_inputs
,
nonconsts
=
\
opt
.
scalarconsts_rest
(
t
.
owner
.
inputs
)
opt
.
scalarconsts_rest
(
t
.
owner
.
inputs
,
# scalar_inputs are potentially dimshuffled and fill'd scalars
only_process_constants
=
only_process_constants
)
# scalar_inputs are potentially dimshuffled and filled with scalars
if
len
(
nonconsts
)
==
1
:
if
len
(
nonconsts
)
==
1
:
maybe_exp
=
nonconsts
[
0
]
maybe_exp
=
nonconsts
[
0
]
if
maybe_exp
.
owner
and
maybe_exp
.
owner
.
op
==
tensor
.
exp
:
if
maybe_exp
.
owner
and
maybe_exp
.
owner
.
op
==
tensor
.
exp
:
...
@@ -947,7 +957,7 @@ def local_inv_1_plus_exp(node):
...
@@ -947,7 +957,7 @@ def local_inv_1_plus_exp(node):
inv_arg
=
node
.
inputs
[
0
]
inv_arg
=
node
.
inputs
[
0
]
if
inv_arg
.
owner
and
inv_arg
.
owner
.
op
==
tensor
.
add
:
if
inv_arg
.
owner
and
inv_arg
.
owner
.
op
==
tensor
.
add
:
scalars
,
scalar_inputs
,
nonconsts
=
\
scalars
,
scalar_inputs
,
nonconsts
=
\
opt
.
scalarconsts_rest
(
inv_arg
.
owner
.
inputs
)
opt
.
scalarconsts_rest
(
inv_arg
.
owner
.
inputs
,
only_process_constants
=
True
)
# scalar_inputs are potentially dimshuffled and fill'd scalars
# scalar_inputs are potentially dimshuffled and fill'd scalars
if
len
(
nonconsts
)
==
1
:
if
len
(
nonconsts
)
==
1
:
if
nonconsts
[
0
]
.
owner
and
nonconsts
[
0
]
.
owner
.
op
==
tensor
.
exp
:
if
nonconsts
[
0
]
.
owner
and
nonconsts
[
0
]
.
owner
.
op
==
tensor
.
exp
:
...
...
theano/tensor/nnet/tests/test_sigm.py
浏览文件 @
0a7a4c06
...
@@ -356,7 +356,6 @@ class T_sigmoid_opts(unittest.TestCase):
...
@@ -356,7 +356,6 @@ class T_sigmoid_opts(unittest.TestCase):
f
=
theano
.
function
([
x
],
s
,
mode
=
mode
)
f
=
theano
.
function
([
x
],
s
,
mode
=
mode
)
assert
hasattr
(
f
.
maker
.
fgraph
.
outputs
[
0
]
.
tag
,
'trace'
)
assert
hasattr
(
f
.
maker
.
fgraph
.
outputs
[
0
]
.
tag
,
'trace'
)
topo
=
f
.
maker
.
fgraph
.
toposort
()
topo
=
f
.
maker
.
fgraph
.
toposort
()
assert
len
(
topo
)
>
1
assert
not
any
([
n
.
op
==
sigmoid
for
n
in
topo
])
assert
not
any
([
n
.
op
==
sigmoid
for
n
in
topo
])
ux_v
=
f
([[
-
50
,
-
10
,
-
4
,
-
1
,
0
,
1
,
4
,
10
,
50
]])
ux_v
=
f
([[
-
50
,
-
10
,
-
4
,
-
1
,
0
,
1
,
4
,
10
,
50
]])
...
@@ -467,15 +466,17 @@ class T_sigmoid_utils(unittest.TestCase):
...
@@ -467,15 +466,17 @@ class T_sigmoid_utils(unittest.TestCase):
try
:
try
:
x
=
tensor
.
vector
(
'x'
)
x
=
tensor
.
vector
(
'x'
)
exp
=
tensor
.
exp
exp
=
tensor
.
exp
assert
is_1pexp
(
1
+
exp
(
x
))
==
(
False
,
x
)
assert
is_1pexp
(
1
+
exp
(
x
),
False
)
==
(
False
,
x
)
assert
is_1pexp
(
exp
(
x
)
+
1
)
==
(
False
,
x
)
assert
is_1pexp
(
exp
(
x
)
+
1
,
False
)
==
(
False
,
x
)
for
neg
,
exp_arg
in
imap
(
is_1pexp
,
[(
1
+
exp
(
-
x
)),
(
exp
(
-
x
)
+
1
)]):
for
neg
,
exp_arg
in
imap
(
lambda
x
:
is_1pexp
(
x
,
only_process_constants
=
False
),
[(
1
+
exp
(
-
x
)),
(
exp
(
-
x
)
+
1
)]):
assert
not
neg
and
theano
.
gof
.
graph
.
is_same_graph
(
exp_arg
,
-
x
)
assert
not
neg
and
theano
.
gof
.
graph
.
is_same_graph
(
exp_arg
,
-
x
)
assert
is_1pexp
(
1
-
exp
(
x
))
is
None
assert
is_1pexp
(
1
-
exp
(
x
)
,
False
)
is
None
assert
is_1pexp
(
2
+
exp
(
x
))
is
None
assert
is_1pexp
(
2
+
exp
(
x
)
,
False
)
is
None
assert
is_1pexp
(
exp
(
x
)
+
2
)
is
None
assert
is_1pexp
(
exp
(
x
)
+
2
,
False
)
is
None
assert
is_1pexp
(
exp
(
x
)
-
1
)
is
None
assert
is_1pexp
(
exp
(
x
)
-
1
,
False
)
is
None
assert
is_1pexp
(
-
1
+
exp
(
x
))
is
None
assert
is_1pexp
(
-
1
+
exp
(
x
)
,
False
)
is
None
assert
is_1pexp
(
1
+
2
*
exp
(
x
))
is
None
assert
is_1pexp
(
1
+
2
*
exp
(
x
)
,
False
)
is
None
finally
:
finally
:
config
.
warn
.
identify_1pexp_bug
=
backup
config
.
warn
.
identify_1pexp_bug
=
backup
theano/tensor/opt.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/signal/pool.py
浏览文件 @
0a7a4c06
...
@@ -186,8 +186,12 @@ class Pool(Op):
...
@@ -186,8 +186,12 @@ class Pool(Op):
if
st
is
None
:
if
st
is
None
:
st
=
ds
st
=
ds
r
,
c
=
imgshape
[
-
2
:]
r
,
c
=
imgshape
[
-
2
:]
r
+=
padding
[
0
]
*
2
r
=
tensor
.
extract_constant
(
r
)
c
+=
padding
[
1
]
*
2
c
=
tensor
.
extract_constant
(
c
)
if
padding
[
0
]:
r
+=
padding
[
0
]
*
2
if
padding
[
1
]:
c
+=
padding
[
1
]
*
2
if
ignore_border
:
if
ignore_border
:
if
ds
[
0
]
==
st
[
0
]:
if
ds
[
0
]
==
st
[
0
]:
...
@@ -216,7 +220,7 @@ class Pool(Op):
...
@@ -216,7 +220,7 @@ class Pool(Op):
elif
st
[
0
]
>=
ds
[
0
]:
elif
st
[
0
]
>=
ds
[
0
]:
nr
=
(
r
-
1
)
//
st
[
0
]
+
1
nr
=
(
r
-
1
)
//
st
[
0
]
+
1
else
:
else
:
nr
=
max
(
0
,
(
r
-
1
-
ds
[
0
]
)
//
st
[
0
]
+
1
)
+
1
nr
=
max
(
0
,
(
r
-
1
-
ds
[
0
]
+
st
[
0
])
//
st
[
0
]
)
+
1
if
isinstance
(
c
,
theano
.
Variable
):
if
isinstance
(
c
,
theano
.
Variable
):
nc
=
tensor
.
switch
(
tensor
.
ge
(
st
[
1
],
ds
[
1
]),
nc
=
tensor
.
switch
(
tensor
.
ge
(
st
[
1
],
ds
[
1
]),
...
@@ -226,7 +230,7 @@ class Pool(Op):
...
@@ -226,7 +230,7 @@ class Pool(Op):
elif
st
[
1
]
>=
ds
[
1
]:
elif
st
[
1
]
>=
ds
[
1
]:
nc
=
(
c
-
1
)
//
st
[
1
]
+
1
nc
=
(
c
-
1
)
//
st
[
1
]
+
1
else
:
else
:
nc
=
max
(
0
,
(
c
-
1
-
ds
[
1
]
)
//
st
[
1
]
+
1
)
+
1
nc
=
max
(
0
,
(
c
-
1
-
ds
[
1
]
+
st
[
1
])
//
st
[
1
]
)
+
1
rval
=
list
(
imgshape
[:
-
2
])
+
[
nr
,
nc
]
rval
=
list
(
imgshape
[:
-
2
])
+
[
nr
,
nc
]
return
rval
return
rval
...
@@ -257,10 +261,10 @@ class Pool(Op):
...
@@ -257,10 +261,10 @@ class Pool(Op):
self
.
mode
=
mode
self
.
mode
=
mode
def
make_node
(
self
,
x
):
def
make_node
(
self
,
x
):
if
x
.
type
.
ndim
!=
4
:
raise
TypeError
()
# TODO: consider restricting the dtype?
# TODO: consider restricting the dtype?
x
=
tensor
.
as_tensor_variable
(
x
)
x
=
tensor
.
as_tensor_variable
(
x
)
if
x
.
type
.
ndim
!=
4
:
raise
TypeError
()
# If the input shape are broadcastable we can have 0 in the output shape
# If the input shape are broadcastable we can have 0 in the output shape
broad
=
x
.
broadcastable
[:
2
]
+
(
False
,
False
)
broad
=
x
.
broadcastable
[:
2
]
+
(
False
,
False
)
out
=
tensor
.
TensorType
(
x
.
dtype
,
broad
)
out
=
tensor
.
TensorType
(
x
.
dtype
,
broad
)
...
@@ -274,6 +278,9 @@ class Pool(Op):
...
@@ -274,6 +278,9 @@ class Pool(Op):
'Pool requires 4D input for now'
)
'Pool requires 4D input for now'
)
z_shape
=
self
.
out_shape
(
x
.
shape
,
self
.
ds
,
self
.
ignore_border
,
self
.
st
,
z_shape
=
self
.
out_shape
(
x
.
shape
,
self
.
ds
,
self
.
ignore_border
,
self
.
st
,
self
.
padding
)
self
.
padding
)
if
not
self
.
ignore_border
:
assert
z_shape
[
2
]
>
0
assert
z_shape
[
3
]
>
0
if
(
z
[
0
]
is
None
)
or
(
z
[
0
]
.
shape
!=
z_shape
):
if
(
z
[
0
]
is
None
)
or
(
z
[
0
]
.
shape
!=
z_shape
):
z
[
0
]
=
numpy
.
empty
(
z_shape
,
dtype
=
x
.
dtype
)
z
[
0
]
=
numpy
.
empty
(
z_shape
,
dtype
=
x
.
dtype
)
zz
=
z
[
0
]
zz
=
z
[
0
]
...
@@ -403,7 +410,7 @@ class Pool(Op):
...
@@ -403,7 +410,7 @@ class Pool(Op):
}
}
else
else
{
{
z_r = std::max(0, (r - 1 -
%(ds0)
s
) /
%(st0)
s + 1
) + 1;
z_r = std::max(0, (r - 1 -
%(ds0)
s
+
%(st0)
s) /
%(st0)
s
) + 1;
}
}
// decide how many columns the output has
// decide how many columns the output has
if (
%(st1)
s >=
%(ds1)
s)
if (
%(st1)
s >=
%(ds1)
s)
...
@@ -412,8 +419,10 @@ class Pool(Op):
...
@@ -412,8 +419,10 @@ class Pool(Op):
}
}
else
else
{
{
z_c = std::max(0, (c - 1 -
%(ds1)
s
) /
%(st1)
s + 1
) + 1;
z_c = std::max(0, (c - 1 -
%(ds1)
s
+
%(st0)
s) /
%(st1)
s
) + 1;
}
}
assert(z_r > 0);
assert(z_c > 0);
}
}
// memory allocation of z if necessary
// memory allocation of z if necessary
if ((!
%(z)
s)
if ((!
%(z)
s)
...
@@ -522,7 +531,7 @@ class Pool(Op):
...
@@ -522,7 +531,7 @@ class Pool(Op):
return
ccode
%
locals
()
return
ccode
%
locals
()
def
c_code_cache_version
(
self
):
def
c_code_cache_version
(
self
):
return
(
0
,
6
,
8
,
3
)
return
(
0
,
6
,
8
,
4
)
class
PoolGrad
(
Op
):
class
PoolGrad
(
Op
):
...
@@ -632,12 +641,12 @@ class MaxPoolGrad(PoolGrad):
...
@@ -632,12 +641,12 @@ class MaxPoolGrad(PoolGrad):
def
make_node
(
self
,
x
,
maxout
,
gz
):
def
make_node
(
self
,
x
,
maxout
,
gz
):
# make_node should only be called by the grad function of
# make_node should only be called by the grad function of
# Pool, so these asserts should not fail.
# Pool, so these asserts should not fail.
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
maxout
,
Variable
)
and
maxout
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
x
=
tensor
.
as_tensor_variable
(
x
)
x
=
tensor
.
as_tensor_variable
(
x
)
maxout
=
tensor
.
as_tensor_variable
(
maxout
)
maxout
=
tensor
.
as_tensor_variable
(
maxout
)
gz
=
tensor
.
as_tensor_variable
(
gz
)
gz
=
tensor
.
as_tensor_variable
(
gz
)
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
maxout
,
Variable
)
and
maxout
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
return
Apply
(
self
,
[
x
,
maxout
,
gz
],
[
x
.
type
()])
return
Apply
(
self
,
[
x
,
maxout
,
gz
],
[
x
.
type
()])
...
@@ -814,10 +823,10 @@ class AveragePoolGrad(PoolGrad):
...
@@ -814,10 +823,10 @@ class AveragePoolGrad(PoolGrad):
def
make_node
(
self
,
x
,
gz
,
dummy
=
None
):
def
make_node
(
self
,
x
,
gz
,
dummy
=
None
):
# make_node should only be called by the grad function of
# make_node should only be called by the grad function of
# Pool, so these asserts should not fail.
# Pool, so these asserts should not fail.
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
x
=
tensor
.
as_tensor_variable
(
x
)
x
=
tensor
.
as_tensor_variable
(
x
)
gz
=
tensor
.
as_tensor_variable
(
gz
)
gz
=
tensor
.
as_tensor_variable
(
gz
)
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
return
Apply
(
self
,
[
x
,
gz
],
[
x
.
type
()])
return
Apply
(
self
,
[
x
,
gz
],
[
x
.
type
()])
...
...
theano/tensor/signal/tests/test_pool.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/slinalg.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/subtensor.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_basic.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_blas_c.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_nlinalg.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_opt.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_slinalg.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tests/test_flake8.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论