Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
0a7a4c06
提交
0a7a4c06
authored
6月 10, 2016
作者:
Chinnadhurai Sankar
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'master' of
git://github.com/Theano/Theano
上级
1a53098a
59a5dfbb
全部展开
隐藏空白字符变更
内嵌
并排
正在显示
65 个修改的文件
包含
454 行增加
和
348 行删除
+454
-348
.gitignore
.gitignore
+2
-0
README.txt
README.txt
+6
-7
extending_theano.txt
doc/extending/extending_theano.txt
+2
-2
install.txt
doc/install.txt
+10
-8
install_ubuntu.txt
doc/install_ubuntu.txt
+11
-11
install_windows.txt
doc/install_windows.txt
+7
-5
config.txt
doc/library/config.txt
+0
-0
basic.txt
doc/library/tensor/basic.txt
+1
-1
optimizations.txt
doc/optimizations.txt
+1
-1
aliasing.txt
doc/tutorial/aliasing.txt
+0
-47
modes.txt
doc/tutorial/modes.txt
+3
-3
using_gpu.txt
doc/tutorial/using_gpu.txt
+0
-0
using_gpu_solution_1.py
doc/tutorial/using_gpu_solution_1.py
+0
-0
using_multi_gpu.txt
doc/tutorial/using_multi_gpu.txt
+3
-3
nanguardmode.py
theano/compile/nanguardmode.py
+17
-46
profiling.py
theano/compile/profiling.py
+9
-4
configdefaults.py
theano/configdefaults.py
+7
-7
link.py
theano/gof/link.py
+3
-0
opt.py
theano/gof/opt.py
+0
-0
optdb.py
theano/gof/optdb.py
+12
-1
vm.py
theano/gof/vm.py
+26
-9
__init__.py
theano/gpuarray/__init__.py
+10
-1
basic_ops.py
theano/gpuarray/basic_ops.py
+13
-6
blockgemv.c
theano/gpuarray/blockgemv.c
+22
-29
blockger.c
theano/gpuarray/blockger.c
+19
-26
dnn.py
theano/gpuarray/dnn.py
+8
-8
dnn_fwd.c
theano/gpuarray/dnn_fwd.c
+3
-3
dnn_gi.c
theano/gpuarray/dnn_gi.c
+3
-3
dnn_gw.c
theano/gpuarray/dnn_gw.c
+3
-3
elemwise.py
theano/gpuarray/elemwise.py
+3
-3
extra_ops.py
theano/gpuarray/extra_ops.py
+6
-9
gemm16.c
theano/gpuarray/gemm16.c
+1
-1
neighbours.py
theano/gpuarray/neighbours.py
+1
-1
nerv.py
theano/gpuarray/nerv.py
+1
-1
nnet.py
theano/gpuarray/nnet.py
+4
-4
opt.py
theano/gpuarray/opt.py
+45
-28
subtensor.py
theano/gpuarray/subtensor.py
+0
-0
test_basic_ops.py
theano/gpuarray/tests/test_basic_ops.py
+16
-1
test_elemwise.py
theano/gpuarray/tests/test_elemwise.py
+2
-2
test_extra_ops.py
theano/gpuarray/tests/test_extra_ops.py
+1
-1
test_opt.py
theano/gpuarray/tests/test_opt.py
+1
-1
test_subtensor.py
theano/gpuarray/tests/test_subtensor.py
+29
-0
type.py
theano/gpuarray/type.py
+2
-8
check_blas.py
theano/misc/check_blas.py
+6
-0
dnn.py
theano/sandbox/cuda/dnn.py
+2
-1
opt.py
theano/sandbox/cuda/opt.py
+9
-9
__init__.py
theano/sandbox/gpuarray/__init__.py
+3
-2
basic.py
theano/scalar/basic.py
+1
-1
scan_opt.py
theano/scan_module/scan_opt.py
+3
-3
basic.py
theano/tensor/basic.py
+6
-9
blas.py
theano/tensor/blas.py
+4
-1
nlinalg.py
theano/tensor/nlinalg.py
+59
-0
sigm.py
theano/tensor/nnet/sigm.py
+14
-4
test_sigm.py
theano/tensor/nnet/tests/test_sigm.py
+11
-10
opt.py
theano/tensor/opt.py
+0
-0
pool.py
theano/tensor/signal/pool.py
+23
-14
test_pool.py
theano/tensor/signal/tests/test_pool.py
+0
-0
slinalg.py
theano/tensor/slinalg.py
+0
-0
subtensor.py
theano/tensor/subtensor.py
+0
-0
test_basic.py
theano/tensor/tests/test_basic.py
+0
-0
test_blas_c.py
theano/tensor/tests/test_blas_c.py
+0
-0
test_nlinalg.py
theano/tensor/tests/test_nlinalg.py
+0
-0
test_opt.py
theano/tensor/tests/test_opt.py
+0
-0
test_slinalg.py
theano/tensor/tests/test_slinalg.py
+0
-0
test_flake8.py
theano/tests/test_flake8.py
+0
-0
没有找到文件。
.gitignore
浏览文件 @
0a7a4c06
...
...
@@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints
.pydevproject
.ropeproject
core
\ No newline at end of file
README.txt
浏览文件 @
0a7a4c06
...
...
@@ -10,15 +10,14 @@ Related Projects:
https://github.com/Theano/Theano/wiki/Related-projects
We recommend you look at the documentation on the website, since it
will be more current than the documentation included with the package.
If you really wish to build the documentation yourself, you will need
sphinx. Issue the following command:
It is recommended that you look at the documentation on the website, as it will be more current than the documentation included with the package.
In order to build the documentation yourself, you will need sphinx. Issue the following command:
python ./doc/scripts/docgen.py
Documentation is built into html/
The PDF of the documentation is html/theano.pdf
The PDF of the documentation can be found at html/theano.pdf
DIRECTORY LAYOUT
...
...
@@ -31,7 +30,7 @@ Theano (current directory) is the distribution directory.
* tensor depends upon scalar
* sparse depends upon tensor
* sandbox can depend on everything else
* Theano/examples are copies of the example on the wiki
* Theano/examples are copies of the example
found
on the wiki
* Theano/benchmark and Theano/examples are in the distribution, but not in
the Python package
* Theano/bin contains executable scripts that are copied to the bin folder
...
...
@@ -39,4 +38,4 @@ Theano (current directory) is the distribution directory.
* Tests are distributed and are part of the package, i.e. fall in
the appropriate submodules
* Theano/doc contains files and scripts used to generate the documentation
* Theano/html is
the place
where the documentation will be generated
* Theano/html is where the documentation will be generated
doc/extending/extending_theano.txt
浏览文件 @
0a7a4c06
...
...
@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops
^^^^^^^^^^^^^^^
Ops to be executed on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
When using the old GPU backend, Ops to be executed on the GPU should inherit
from
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU.
...
...
doc/install.txt
浏览文件 @
0a7a4c06
...
...
@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=
gpu
1``.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=
cuda
1``.
See :ref:`libdoc_config` for more information on how to change these
configuration options.
...
...
@@ -508,25 +508,25 @@ Any one of them is enough.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=
gpu
,floatX=float32'``.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=
cuda
,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section:
.. code-block:: cfg
[global]
device =
gpu
device =
cuda
floatX = float32
Note that:
* If your computer has multiple GPUs and you use 'device=
gpu
', the driver
selects the one to use (usually
gpu
0).
* You can use the program
nvida-smi
to change this policy.
* You can choose one specific GPU by specifying 'device=
gpu
X', with X the
* If your computer has multiple GPUs and you use 'device=
cuda
', the driver
selects the one to use (usually
cuda
0).
* You can use the program
``nvidia-smi``
to change this policy.
* You can choose one specific GPU by specifying 'device=
cuda
X', with X the
the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU.
...
...
@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly.
...
...
doc/install_ubuntu.txt
浏览文件 @
0a7a4c06
...
...
@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
On 14.04, this will install Python 2 by default. If you want to use Python 3:
.. code-block:: bash
...
...
@@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs
----------------------
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
command with:
.. code-block:: bash
git clone git://github.com/Theano/Theano.git
cd Theano
cd Theano
python setup.py develop --user
cd ..
VirtualEnv
----------
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`.
.. code-block:: bash
virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate
pip install Theano
...
...
@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run:
.. code-block:: bash
git pull
...
...
@@ -303,7 +303,7 @@ Test GPU configuration
.. code-block:: bash
THEANO_FLAGS=floatX=float32,device=
gpu
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
THEANO_FLAGS=floatX=float32,device=
cuda
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
.. note::
...
...
doc/install_windows.txt
浏览文件 @
0a7a4c06
...
...
@@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
.. testoutput::
:hide:
:options: +ELLIPSIS
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ...
.. code-block:: none
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000
...
...
@@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use
############################
Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the
...
...
@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg
[global]
device =
gpu
device =
cuda
floatX = float32
[nvcc]
...
...
@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~
If you installed Python through WinPython or EPD, Theano will automatically
If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS.
.. note::
...
...
doc/library/config.txt
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
doc/library/tensor/basic.txt
浏览文件 @
0a7a4c06
...
...
@@ -1414,7 +1414,7 @@ Mathematical
.. function:: abs_(a)
Returns a variable representingthe absolute of a, ie ``|a|``.
Returns a variable representing
the absolute of a, ie ``|a|``.
.. note:: Can also be accessed with ``abs(a)``.
...
...
doc/optimizations.txt
浏览文件 @
0a7a4c06
...
...
@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ =============
:term:`merge` x x
:term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x
...
...
@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x
:term:`inplace_random` x
:term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x
:term:`local_remove_all_assert`
========================================================= ========= ============ =============
...
...
doc/tutorial/aliasing.txt
浏览文件 @
0a7a4c06
...
...
@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the
graph.
For GPU graphs, this borrowing can have a major speed impact. See the following code:
.. code-block:: python
from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time
vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Which produces this output:
.. code-block:: none
$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu
*Take home message:*
When an input *x* to a function is not needed after the function
...
...
@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y,
borrow=True)``.
doc/tutorial/modes.txt
浏览文件 @
0a7a4c06
...
...
@@ -168,8 +168,8 @@ Linkers
=======
A mode is composed of 2 things: an optimizer and a linker. Some modes,
like ``NanGuardMode`` and ``DebugMode``, add logic around the
optimizer and
linker. ``NanGuardMode`` and ``DebugMode`` use their
own linker.
like ``NanGuardMode`` and ``DebugMode``, add logic around the
optimizer and linker. ``DebugMode`` uses its
own linker.
You can select which linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers.
...
...
@@ -183,7 +183,7 @@ c|py [#cpy1]_ yes yes "+++" Try C code. If none exis
c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only Python code
NanGuardMode
no no "++++"
Check if nodes generate NaN
NanGuardMode
yes yes "++++"
Check if nodes generate NaN
DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= ===
...
...
doc/tutorial/using_gpu.txt
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
doc/tutorial/using_gpu_solution_1.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
doc/tutorial/using_multi_gpu.txt
浏览文件 @
0a7a4c06
...
...
@@ -81,7 +81,7 @@ single name and a single device.
It is often the case that multi-gpu operation requires or assumes
that all the GPUs involved are equivalent. This is not the case
for this implementation. Since the user has the task of
distr
u
buting the jobs across the different device a model can be
distr
i
buting the jobs across the different device a model can be
built on the assumption that one of the GPU is slower or has
smaller memory.
...
...
@@ -140,5 +140,5 @@ is a example.
cv = gv.transfer('cpu')
Of course you can mix transfers and operations in any order you
choose.
However you should try to minimize transfer operations
because they will introduce overhead
any
may reduce performance.
choose. However you should try to minimize transfer operations
because they will introduce overhead
that
may reduce performance.
theano/compile/nanguardmode.py
浏览文件 @
0a7a4c06
...
...
@@ -73,7 +73,7 @@ def contains_nan(arr, node=None):
elif
arr
.
size
==
0
:
return
False
elif
cuda
.
cuda_available
and
isinstance
(
arr
,
cuda
.
CudaNdarray
):
if
(
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
if
(
node
and
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
isinstance
(
node
.
op
,
# It store ints in float container
...
...
@@ -119,7 +119,7 @@ def contains_inf(arr, node=None):
elif
arr
.
size
==
0
:
return
False
elif
cuda
.
cuda_available
and
isinstance
(
arr
,
cuda
.
CudaNdarray
):
if
(
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
if
(
node
and
hasattr
(
theano
.
sandbox
,
'rng_mrg'
)
and
isinstance
(
node
.
op
,
# It store ints in float container
...
...
@@ -215,7 +215,7 @@ class NanGuardMode(Mode):
assert
nan_is_error
or
inf_is_error
or
big_is_error
compile_gpu_func
(
nan_is_error
,
inf_is_error
,
big_is_error
)
def
do_check_on
(
var
,
nd
,
f
,
is_input
):
def
do_check_on
(
var
,
nd
):
"""
Checks `var` for NaNs / Infs. If detected, raises an exception
and / or prints information about `nd`, `f`, and `is_input` to
...
...
@@ -227,11 +227,6 @@ class NanGuardMode(Mode):
The value to be checked.
nd : theano.gof.Apply
The Apply node being executed.
f : callable
The thunk for the apply node.
is_input : bool
If True, `var` is an input to `nd`.
If False, it is an output.
"""
error
=
False
...
...
@@ -262,17 +257,13 @@ class NanGuardMode(Mode):
print
(
'Big value detected'
,
file
=
sio
)
error
=
True
if
error
:
if
n
ot
is_input
:
print
(
"NanGuardMode found an error in the"
"
output of a node in this variable:"
,
file
=
sio
)
if
n
d
:
print
(
"NanGuardMode found an error in the
"
"output of a node in this variable:"
,
file
=
sio
)
print
(
theano
.
printing
.
debugprint
(
nd
,
file
=
'str'
),
file
=
sio
)
else
:
print
(
"NanGuardMode found an error in an"
" input of this node."
,
file
=
sio
)
print
(
'Node:'
,
file
=
sio
)
print
(
nd
,
file
=
sio
)
print
(
"The input variable that cause problem:"
,
file
=
sio
)
print
(
theano
.
printing
.
debugprint
(
nd
,
file
=
'str'
),
file
=
sio
)
print
(
"NanGuardMode found an error in an input of the "
"graph."
,
file
=
sio
)
msg
=
sio
.
getvalue
()
if
config
.
NanGuardMode
.
action
==
'raise'
:
raise
AssertionError
(
msg
)
...
...
@@ -283,36 +274,16 @@ class NanGuardMode(Mode):
elif
config
.
NanGuardMode
.
action
==
'warn'
:
logger
.
error
(
msg
)
def
nan_check
(
i
,
node
,
fn
):
"""
Runs `fn` while checking its inputs and outputs for NaNs / Infs.
Parameters
----------
i :
Currently ignored.
TODO: determine why it is here or remove).
node : theano.gof.Apply
The Apply node currently being executed.
fn : callable
The thunk to execute for this Apply node.
"""
inputs
=
fn
.
inputs
for
x
,
var
in
zip
(
inputs
,
node
.
inputs
):
# If the input is the result of computation, then we
# don't need to check it. It is already done after the
# computation.
if
(
var
.
owner
is
None
and
getattr
(
var
.
tag
,
'nan_guard_mode_check'
,
True
)):
do_check_on
(
x
[
0
],
node
,
fn
,
True
)
fn
()
outputs
=
fn
.
outputs
for
x
,
var
in
zip
(
outputs
,
node
.
outputs
):
def
nan_check
(
node
,
thunk
,
storage_map
,
compute_map
):
for
var
in
node
.
outputs
:
if
getattr
(
var
.
tag
,
'nan_guard_mode_check'
,
True
):
do_check_on
(
x
[
0
],
node
,
fn
,
False
)
do_check_on
(
storage_map
[
var
][
0
],
node
)
def
nan_check_input
(
var
,
value
):
if
getattr
(
var
.
tag
,
'nan_guard_mode_check'
,
True
):
do_check_on
(
value
,
None
)
wrap_linker
=
theano
.
gof
.
WrapLinker
([
theano
.
gof
.
OpWiseCLinker
()]
,
nan_check
)
wrap_linker
=
theano
.
gof
.
vm
.
VM_Linker
(
callback
=
nan_check
,
callback_input
=
nan_check_input
)
super
(
NanGuardMode
,
self
)
.
__init__
(
wrap_linker
,
optimizer
=
self
.
provided_optimizer
)
theano/compile/profiling.py
浏览文件 @
0a7a4c06
...
...
@@ -84,10 +84,15 @@ def _atexit_print_fn():
cum_attr
[
key
]
=
val
if
cum
.
optimizer_profile
and
ps
.
optimizer_profile
:
merge
=
cum
.
optimizer_profile
[
0
]
.
merge_profile
(
cum
.
optimizer_profile
[
1
],
ps
.
optimizer_profile
[
1
])
cum
.
optimizer_profile
=
(
cum
.
optimizer_profile
[
0
],
merge
)
try
:
merge
=
cum
.
optimizer_profile
[
0
]
.
merge_profile
(
cum
.
optimizer_profile
[
1
],
ps
.
optimizer_profile
[
1
])
cum
.
optimizer_profile
=
(
cum
.
optimizer_profile
[
0
],
merge
)
except
Exception
as
e
:
print
(
"Got an exception while merging profile"
)
print
(
e
)
cum
.
optimizer_profile
=
None
else
:
cum
.
optimizer_profile
=
None
...
...
theano/configdefaults.py
浏览文件 @
0a7a4c06
...
...
@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam):
AddConfigVar
(
'device'
,
(
"Default device for computations. If gpu*, change the default to try "
"to move computation to it and to put shared variable of float32 "
"on it. Do not use upper case letters, only lower case even if "
"NVIDIA use capital letters."
),
(
"Default device for computations. If cuda* or opencl*, change the"
"default to try to move computation to the GPU. Do not use upper case"
"letters, only lower case even if NVIDIA uses capital letters."
),
DeviceParam
(
'cpu'
,
allow_override
=
False
),
in_c_key
=
False
)
...
...
@@ -273,7 +272,8 @@ def safe_no_dnn_workmem_bwd(workmem):
return
True
AddConfigVar
(
'dnn.conv.workmem_bwd'
,
"This flag is deprecated; use dnn.conv.algo_bwd."
,
"This flag is deprecated; use `dnn.conv.algo_bwd_filter` "
"and `dnn.conv.algo_bwd_data` instead."
,
ConfigParam
(
''
,
allow_override
=
False
,
filter
=
safe_no_dnn_workmem_bwd
),
in_c_key
=
False
)
...
...
@@ -651,8 +651,8 @@ AddConfigVar('warn.ignore_bug_before',
"bugs found after that version. "
"Warning for specific bugs can be configured with specific "
"[warn] flags."
),
EnumStr
(
'0.7'
,
'None'
,
'all'
,
'0.3'
,
'0.4'
,
'0.4.1'
,
'0.5'
,
'0.
7
'
,
'0.
8
'
,
EnumStr
(
'0.7'
,
'None'
,
'all'
,
'0.3'
,
'0.4'
,
'0.4.1'
,
'0.5'
,
'0.
6
'
,
'0.
7'
,
'0.8'
,
'0.8.1'
,
'0.8.2
'
,
allow_override
=
False
),
in_c_key
=
False
)
...
...
theano/gof/link.py
浏览文件 @
0a7a4c06
...
...
@@ -165,6 +165,9 @@ def raise_with_op(node, thunk=None, exc_info=None, storage_map=None):
detailed_err_msg
+=
(
"Inputs shapes:
%
s"
%
shapes
+
"
\n
Inputs strides:
%
s"
%
strides
+
"
\n
Inputs values:
%
s"
%
scalar_values
)
if
theano
.
config
.
exception_verbosity
==
'high'
:
detailed_err_msg
+=
"
\n
Inputs type_num:
%
s"
%
str
(
[
getattr
(
getattr
(
i
[
0
],
'dtype'
,
''
),
'num'
,
''
)
for
i
in
thunk
.
inputs
])
if
hasattr
(
node
.
op
,
'__input_name__'
):
detailed_err_msg
+=
"
\n
Inputs name:
%
s
\n
"
%
str
(
node
.
op
.
__input_name__
)
...
...
theano/gof/opt.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/gof/optdb.py
浏览文件 @
0a7a4c06
...
...
@@ -244,16 +244,26 @@ class EquilibriumDB(DB):
optimization application. This could result in less fgraph iterations,
but this doesn't mean it will be faster globally.
tracks_on_change_inputs
If True, we will re-apply local opt on nodes whose inputs
changed during local optimization application. This could
result in less fgraph iterations, but this doesn't mean it
will be faster globally.
Notes
-----
We can put LocalOptimizer and Optimizer as EquilibriumOptimizer
suppor both.
It is probably not a good idea to have ignore_newtrees=False and
tracks_on_change_inputs=True
"""
def
__init__
(
self
,
ignore_newtrees
=
True
):
def
__init__
(
self
,
ignore_newtrees
=
True
,
tracks_on_change_inputs
=
False
):
super
(
EquilibriumDB
,
self
)
.
__init__
()
self
.
ignore_newtrees
=
ignore_newtrees
self
.
tracks_on_change_inputs
=
tracks_on_change_inputs
self
.
__final__
=
{}
self
.
__cleanup__
=
{}
...
...
@@ -281,6 +291,7 @@ class EquilibriumDB(DB):
opts
,
max_use_ratio
=
config
.
optdb
.
max_use_ratio
,
ignore_newtrees
=
self
.
ignore_newtrees
,
tracks_on_change_inputs
=
self
.
tracks_on_change_inputs
,
failure_callback
=
opt
.
NavigatorOptimizer
.
warn_inplace
,
final_optimizers
=
final_opts
,
cleanup_optimizers
=
cleanup_opts
)
...
...
theano/gof/vm.py
浏览文件 @
0a7a4c06
...
...
@@ -332,7 +332,7 @@ class Stack(VM):
def
__init__
(
self
,
nodes
,
thunks
,
pre_call_clear
,
storage_map
,
compute_map
,
fgraph
,
allow_gc
,
dependencies
=
None
,
callback
=
None
):
dependencies
=
None
,
callback
=
None
,
callback_input
=
None
):
super
(
Stack
,
self
)
.
__init__
(
nodes
,
thunks
,
pre_call_clear
)
self
.
allow_gc
=
allow_gc
...
...
@@ -345,6 +345,7 @@ class Stack(VM):
self
.
compute_map
=
compute_map
self
.
node_idx
=
node_idx
=
{}
self
.
callback
=
callback
self
.
callback_input
=
callback_input
ords
=
fgraph
.
orderings
()
...
...
@@ -411,6 +412,8 @@ class Stack(VM):
for
k
in
self
.
storage_map
:
compute_map
[
k
][
0
]
=
(
k
.
owner
is
None
)
if
self
.
callback_input
and
compute_map
[
k
][
0
]:
self
.
callback_input
(
k
,
self
.
storage_map
[
k
][
0
])
# apply_stack contains nodes
if
output_subset
is
not
None
:
...
...
@@ -684,6 +687,11 @@ class VM_Linker(link.LocalLinker):
A callable object to call after each call to a thunk within
the virtual machine. It will be called with four arguments called
'node', 'thunk', 'storage_map', and 'compute_map'.
callback_input
A callable object to call on each input to the graph
(variables with no owner). This includes constants and shared
variables values. It will be called with two arguments:
'var', 'value'.
lazy
Useful only when use_cloop is False. When lazy is None, use the
theano flag vm.lazy value. Then if we have a None (default) we auto
...
...
@@ -700,8 +708,8 @@ class VM_Linker(link.LocalLinker):
"""
def
__init__
(
self
,
allow_gc
=
None
,
use_cloop
=
False
,
callback
=
None
,
lazy
=
None
,
schedule
=
None
,
c_thunks
=
None
,
allow_partial_eval
=
None
):
callback_input
=
None
,
lazy
=
None
,
schedule
=
None
,
c_thunks
=
None
,
allow_partial_eval
=
None
):
# Note: if more parameters are added to __init__, make sure to forward
# them in the "type(self)(...)" call in the "accept" method below.
if
allow_gc
is
None
:
...
...
@@ -710,6 +718,7 @@ class VM_Linker(link.LocalLinker):
self
.
allow_gc
=
allow_gc
self
.
use_cloop
=
use_cloop
self
.
callback
=
callback
self
.
callback_input
=
callback_input
self
.
lazy
=
lazy
self
.
c_thunks
=
c_thunks
self
.
allow_partial_eval
=
allow_partial_eval
...
...
@@ -760,9 +769,11 @@ class VM_Linker(link.LocalLinker):
allow_gc
=
self
.
allow_gc
,
use_cloop
=
self
.
use_cloop
,
callback
=
self
.
callback
,
callback_input
=
self
.
callback_input
,
lazy
=
self
.
lazy
,
schedule
=
self
.
schedule
,
c_thunks
=
self
.
c_thunks
,
allow_partial_eval
=
self
.
allow_partial_eval
)
.
accept
(
fgraph
,
no_recycling
)
self
.
fgraph
=
fgraph
self
.
no_recycling
=
no_recycling
...
...
@@ -829,16 +840,17 @@ class VM_Linker(link.LocalLinker):
pre_call_clear
=
[
storage_map
[
v
]
for
v
in
self
.
no_recycling
]
if
(
self
.
callback
is
not
None
or
if
(
self
.
callback
is
not
None
or
self
.
callback_input
is
not
None
or
(
config
.
profile
and
config
.
profile_memory
)
or
getattr
(
self
,
'allow_partial_eval'
,
False
)
):
self
.
allow_partial_eval
):
if
self
.
use_cloop
and
self
.
callback
is
not
None
:
if
self
.
use_cloop
and
(
self
.
callback
is
not
None
or
self
.
callback_input
is
not
None
):
logger
.
warn
(
'CVM does not support callback, using Stack VM.'
)
if
self
.
use_cloop
and
config
.
profile_memory
:
warnings
.
warn
(
'CVM does not support memory profile, using Stack VM.'
)
if
self
.
use_cloop
and
getattr
(
self
,
'allow_partial_eval'
,
False
)
:
if
self
.
use_cloop
and
self
.
allow_partial_eval
:
warnings
.
warn
(
'CVM does not support partial evaluation yet, '
'using Stack VM.'
)
...
...
@@ -849,7 +861,8 @@ class VM_Linker(link.LocalLinker):
storage_map
,
compute_map
,
self
.
fgraph
,
self
.
allow_gc
,
dependencies
=
deps
,
callback
=
self
.
callback
)
callback
=
self
.
callback
,
callback_input
=
self
.
callback_input
)
elif
self
.
use_cloop
:
# create a map from nodes to ints and vars to ints
nodes_idx
=
{}
...
...
@@ -1046,7 +1059,7 @@ class VM_Linker(link.LocalLinker):
if
lazy
is
None
:
lazy
=
not
all
([(
not
th
.
lazy
)
for
th
in
thunks
])
if
not
(
lazy
or
(
config
.
profile
and
config
.
profile_memory
)
or
self
.
use_cloop
or
self
.
callback
):
self
.
use_cloop
or
self
.
callback
or
self
.
callback_input
):
for
pair
in
itervalues
(
reallocated_info
):
storage_map
[
pair
[
1
]]
=
storage_map
[
pair
[
0
]]
...
...
@@ -1088,3 +1101,7 @@ class VM_Linker(link.LocalLinker):
self
.
__dict__
.
update
(
d
)
if
not
hasattr
(
self
,
'c_thunks'
):
self
.
c_thunks
=
True
if
not
hasattr
(
self
,
'allow_partial_eval'
):
self
.
allow_partial_eval
=
None
if
not
hasattr
(
self
,
'callback_input'
):
self
.
callback_input
=
None
theano/gpuarray/__init__.py
浏览文件 @
0a7a4c06
...
...
@@ -42,7 +42,7 @@ register_transfer(transfer)
def
init_dev
(
dev
,
name
=
None
):
v
=
pygpu
.
gpuarray
.
api_version
()
expected
=
-
999
8
expected
=
-
999
7
if
v
[
0
]
!=
expected
:
raise
RuntimeError
(
"Wrong major API version for gpuarray:"
,
v
[
0
],
"Make sure Theano and libgpuarray/pygpu "
...
...
@@ -50,6 +50,15 @@ def init_dev(dev, name=None):
if
v
[
1
]
<
0
:
raise
RuntimeError
(
"Wrong minor API version for gpuarray:"
,
v
[
1
],
"Please update libgpuarray/pygpu."
)
if
len
(
v
)
<
3
:
vpy
=
-
1
else
:
vpy
=
v
[
2
]
vpye
=
0
if
vpy
<
vpye
:
print
(
"Wrong python API version for gpuarray:"
,
vpy
,
"expected:"
,
vpye
,
"Some python ops may not work correctly and/or crash. "
"Consider updating pygpu."
,
file
=
sys
.
stderr
)
global
pygpu_activated
if
dev
not
in
init_dev
.
devmap
:
ctx
=
pygpu
.
init
(
dev
,
...
...
theano/gpuarray/basic_ops.py
浏览文件 @
0a7a4c06
...
...
@@ -259,14 +259,14 @@ class GpuKernelBase(object):
int types[
%(numargs)
u] = {
%(types)
s};
const char *bcode =
%(bvar)
s;
size_t sz = sizeof(
%(bvar)
s);
if (GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->
ops,
%(ctx)
s->
ctx, 1, &bcode, &sz,
if (GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->ctx, 1, &bcode, &sz,
"
%(kname)
s",
%(numargs)
u, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) {
if ((err = GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->
ops,
%(ctx)
s->
ctx, 1,
if ((err = GpuKernel_init(&
%(ovar)
s,
%(ctx)
s->ctx, 1,
&
%(cname)
s, NULL, "
%(kname)
s",
%(numargs)
u,
types,
%(flags)
s, NULL)) != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "GpuKernel_init error
%%
d:
%%
s",
err,
Gpu_error(
%(ctx)
s->ops,
%(ctx)
s->ctx, err));
err,
gpucontext_error(
%(ctx)
s->ctx, err));
%(fail)
s
}
}
...
...
@@ -310,7 +310,7 @@ class GpuKernelBase(object):
The node that we need the cache version for.
"""
return
(
3
,
self
.
get_params
(
node
)
.
bin_id
)
return
(
4
,
self
.
get_params
(
node
)
.
bin_id
)
class
HostFromGpu
(
Op
):
...
...
@@ -529,15 +529,22 @@ class GpuToGpu(Op):
def
c_code
(
self
,
node
,
name
,
inputs
,
outputs
,
sub
):
return
"""
Py_XDECREF(
%(out)
s);
%(out)
s = pygpu_transfer(
%(inp)
s,
%(ctx)
s, 0);
%(out)
s = pygpu_empty(
%(inp)
s->ga.nd,
%(inp)
s->ga.dimensions,
%(inp)
s->ga.typecode,
GpuArray_IS_C_CONTIGUOUS(&(
%(inp)
s->ga)) ? GA_C_ORDER:GA_F_ORDER,
%(ctx)
s, Py_None);
if (
%(out)
s == NULL) {
%(fail)
s
}
if (pygpu_transfer(
%(out)
s,
%(inp)
s)) {
%(fail)
s
}
"""
%
{
'inp'
:
inputs
[
0
],
'ctx'
:
sub
[
'params'
],
'out'
:
outputs
[
0
],
'fail'
:
sub
[
'fail'
]}
def
c_code_cache_version
(
self
):
return
(
0
,)
return
(
1
,)
class
GpuAlloc
(
HideC
,
Alloc
):
...
...
theano/gpuarray/blockgemv.c
浏览文件 @
0a7a4c06
...
...
@@ -24,16 +24,9 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
size_t
*
offW
=
NULL
;
size_t
*
offInp
=
NULL
;
size_t
*
offOut
=
NULL
;
gpuarray_blas_ops
*
blas_ops
;
int
err
;
err
=
ctx
->
ops
->
property
(
ctx
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_BLAS_OPS
,
&
blas_ops
);
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't get blas ops"
);
return
-
1
;
}
err
=
blas_ops
->
setup
(
ctx
->
ctx
);
err
=
gpublas_setup
(
ctx
->
ctx
);
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't setup blas"
);
return
-
1
;
...
...
@@ -93,29 +86,29 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
}
if
(
out
->
ga
.
typecode
==
GA_FLOAT
)
{
err
=
blas_ops
->
sgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
err
=
gpublas_
sgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_DOUBLE
)
{
err
=
blas_ops
->
dgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
err
=
gpublas_
dgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_HALF
)
{
err
=
blas_ops
->
sgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
err
=
gpublas_
sgemvBatch
(
cb_fortran
,
transA
,
PyGpuArray_DIMS
(
out
)[
2
],
PyGpuArray_DIMS
(
h
)[
2
],
1
,
W_list
,
offW
,
lda
,
inp_list
,
offInp
,
PyGpuArray_STRIDES
(
h
)[
2
]
/
gpuarray_get_elsize
(
h
->
ga
.
typecode
),
1
,
out_list
,
offOut
,
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
),
PyGpuArray_DIMS
(
out
)[
1
]
*
PyGpuArray_DIMS
(
h
)[
1
]
*
PyGpuArray_DIMS
(
out
)[
0
],
0
);
}
else
{
err
=
GA_INVALID_ERROR
;
}
...
...
theano/gpuarray/blockger.c
浏览文件 @
0a7a4c06
...
...
@@ -12,16 +12,9 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
size_t
*
offOut
=
NULL
;
size_t
*
offX
=
NULL
;
size_t
*
offY
=
NULL
;
gpuarray_blas_ops
*
blas_ops
;
int
err
;
err
=
ctx
->
ops
->
property
(
ctx
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_BLAS_OPS
,
&
blas_ops
);
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't get blas ops"
);
return
-
1
;
}
err
=
blas_ops
->
setup
(
ctx
->
ctx
);
err
=
gpublas_setup
(
ctx
->
ctx
);
if
(
err
!=
GA_NO_ERROR
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Can't setup blas"
);
return
-
1
;
...
...
@@ -84,26 +77,26 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
ssize_t
str_out
=
PyGpuArray_STRIDES
(
out
)[
2
]
/
gpuarray_get_elsize
(
out
->
ga
.
typecode
);
if
(
out
->
ga
.
typecode
==
GA_FLOAT
)
{
err
=
blas_ops
->
sgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
err
=
gpublas_
sgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_DOUBLE
)
{
err
=
blas_ops
->
dgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
double
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
err
=
gpublas_
dgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
double
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
}
else
if
(
out
->
ga
.
typecode
==
GA_HALF
)
{
err
=
blas_ops
->
hgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
err
=
gpublas_
hgerBatch
(
cb_fortran
,
PyGpuArray_DIMS
(
y
)[
2
],
PyGpuArray_DIMS
(
x
)[
2
],
*
(
float
*
)
PyArray_GETPTR1
(
alpha
,
0
),
y_list
,
offY
,
str_y
,
x_list
,
offX
,
str_x
,
o_list
,
offOut
,
str_out
,
PyGpuArray_DIMS
(
x
)[
0
]
*
PyGpuArray_DIMS
(
x
)[
1
]
*
PyGpuArray_DIMS
(
y
)[
1
],
0
);
}
else
{
err
=
GA_INVALID_ERROR
;
}
...
...
theano/gpuarray/dnn.py
浏览文件 @
0a7a4c06
...
...
@@ -125,7 +125,7 @@ def dnn_available(context_name):
ctx
=
get_context
(
context_name
)
if
not
ctx
.
kind
==
'cuda'
:
if
not
ctx
.
kind
==
b
'cuda'
:
dnn_available
.
msg
=
"Not on a CUDA device."
return
False
...
...
@@ -1493,7 +1493,7 @@ def local_dnn_convi_output_merge(node, *inputs):
return
[
GpuDnnConvGradI
(
algo
=
node
.
op
.
algo
)(
*
inputs
)]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
Pool
])
def
local_pool_dnn_alternative
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
...
@@ -1509,7 +1509,7 @@ def local_pool_dnn_alternative(node, ctx_name):
return
dnn_pool
(
gpu_contiguous
(
img
),
ds
,
stride
=
stride
,
pad
=
pad
,
mode
=
mode
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
MaxPoolGrad
])
def
local_pool_dnn_grad_stride
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
...
@@ -1533,7 +1533,7 @@ def local_pool_dnn_grad_stride(node, ctx_name):
pad
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
AveragePoolGrad
])
def
local_avg_pool_dnn_grad_stride
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
...
@@ -1556,7 +1556,7 @@ def local_avg_pool_dnn_grad_stride(node, ctx_name):
return
GpuDnnPoolGrad
(
mode
=
mode
)(
gpu_contiguous
(
inp
),
cg
,
cg
,
ds
,
st
,
pad
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@local_optimizer
([
GpuSoftmax
])
def
local_softmax_dnn
(
node
):
if
isinstance
(
node
.
op
,
GpuSoftmax
):
...
...
@@ -1569,7 +1569,7 @@ def local_softmax_dnn(node):
return
[
out
]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'stabilize'
)
@local_optimizer
([
GpuElemwise
])
def
local_log_softmax_dnn
(
node
):
# This looks for GpuDnnSoftmax so we know that we have cudnn.
...
...
@@ -1586,7 +1586,7 @@ def local_log_softmax_dnn(node):
return
[
new_softmax
(
softmax_node
.
inputs
[
0
])]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
LogSoftmax
])
def
local_logsoftmax_to_dnn
(
node
,
ctx_name
):
# Transform the input in the format expected by GpuDnnSoftmax
...
...
@@ -1624,7 +1624,7 @@ class NoCuDNNRaise(Optimizer):
gpu_seqopt
.
register
(
"NoCuDNNRaise"
,
NoCuDNNRaise
(),
0
,
'cudnn'
)
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'fast_compile'
)
@op_lifter
([
SoftmaxGrad
])
def
local_softmax_dnn_grad
(
node
,
ctx_name
):
if
not
dnn_available
(
ctx_name
):
...
...
theano/gpuarray/dnn_fwd.c
浏览文件 @
0a7a4c06
...
...
@@ -105,7 +105,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
algo
=
choice
.
algo
;
#else
size_t
free
;
int
err2
=
c
->
ops
->
property
(
c
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
int
err2
=
gpucontext_property
(
c
->
ctx
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
if
(
err2
!=
GA_NO_ERROR
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
...
...
@@ -234,7 +234,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
* to place a nice get_work_mem() function in.
*/
if
(
worksize
!=
0
)
{
workspace
=
c
->
ops
->
buffer
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
workspace
=
gpudata
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
if
(
workspace
==
NULL
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Could not allocate working memory"
);
...
...
@@ -258,7 +258,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
APPLY_SPECIFIC
(
output
),
PyGpuArray_DEV_DATA
(
*
output
));
if
(
worksize
!=
0
)
c
->
ops
->
buffer
_release
(
workspace
);
gpudata
_release
(
workspace
);
cuda_record
(
input
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
kerns
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
...
...
theano/gpuarray/dnn_gi.c
浏览文件 @
0a7a4c06
...
...
@@ -106,7 +106,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
algo
=
choice
.
algo
;
#else
size_t
free
;
int
err2
=
c
->
ops
->
property
(
c
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
int
err2
=
gpucontext_property
(
c
->
ctx
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
if
(
err2
!=
GA_NO_ERROR
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
...
...
@@ -204,7 +204,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
}
if
(
worksize
!=
0
)
{
workspace
=
c
->
ops
->
buffer
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
workspace
=
gpudata
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
if
(
workspace
==
NULL
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Could not allocate working memory"
);
...
...
@@ -227,7 +227,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
APPLY_SPECIFIC
(
input
),
PyGpuArray_DEV_DATA
(
*
input
));
if
(
worksize
!=
0
)
c
->
ops
->
buffer
_release
(
workspace
);
gpudata
_release
(
workspace
);
cuda_record
(
kerns
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
output
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
...
...
theano/gpuarray/dnn_gw.c
浏览文件 @
0a7a4c06
...
...
@@ -107,7 +107,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
algo
=
choice
.
algo
;
#else
size_t
free
;
int
err2
=
c
->
ops
->
property
(
c
->
ctx
,
NULL
,
NULL
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
int
err2
=
gpucontext_property
(
c
->
ctx
,
GA_CTX_PROP_FREE_GMEM
,
&
free
);
if
(
err2
!=
GA_NO_ERROR
)
{
PyErr_Format
(
PyExc_RuntimeError
,
"Error when trying to find the "
...
...
@@ -192,7 +192,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
}
if
(
worksize
!=
0
)
{
workspace
=
c
->
ops
->
buffer
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
workspace
=
gpudata
_alloc
(
c
->
ctx
,
worksize
,
NULL
,
0
,
NULL
);
if
(
workspace
==
NULL
)
{
PyErr_SetString
(
PyExc_RuntimeError
,
"Could not allocate working memory"
);
cuda_exit
(
c
->
ctx
);
...
...
@@ -214,7 +214,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
APPLY_SPECIFIC
(
kerns
),
PyGpuArray_DEV_DATA
(
*
kerns
));
if
(
worksize
!=
0
)
c
->
ops
->
buffer
_release
(
workspace
);
gpudata
_release
(
workspace
);
cuda_record
(
input
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
cuda_record
(
output
->
ga
.
data
,
GPUARRAY_CUDA_WAIT_READ
);
...
...
theano/gpuarray/elemwise.py
浏览文件 @
0a7a4c06
...
...
@@ -199,7 +199,7 @@ class GpuElemwise(HideC, Elemwise):
typecode
=
o
.
type
.
typecode
)
res
+=
"""
ge = GpuElemwise_new(
%(ctx)
s->
ops,
%(ctx)
s->
ctx,
%(support)
s,
%(kop)
s,
%(nargs)
s, args,
%(nd)
s, 0);
ge = GpuElemwise_new(
%(ctx)
s->ctx,
%(support)
s,
%(kop)
s,
%(nargs)
s, args,
%(nd)
s, 0);
if (ge == NULL) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize elemwise support");
%(fail)
s
...
...
@@ -360,7 +360,7 @@ class GpuElemwise(HideC, Elemwise):
def
c_code_cache_version
(
self
):
ver
=
self
.
scalar_op
.
c_code_cache_version
()
if
ver
:
return
(
6
,
ver
)
return
(
7
,
ver
)
else
:
return
ver
...
...
@@ -554,7 +554,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
def
make_node
(
self
,
x
):
x
=
as_gpuarray_variable
(
x
,
infer_context_name
(
x
))
if
x
.
type
.
context
.
kind
!=
'cuda'
:
if
x
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
TypeError
(
"GpuCAReduceCuda doesn't work for non-cuda devices"
)
ret
=
super
(
GpuCAReduceCuda
,
self
)
.
make_node
(
x
)
self
=
copy
.
copy
(
self
)
...
...
theano/gpuarray/extra_ops.py
浏览文件 @
0a7a4c06
...
...
@@ -26,11 +26,8 @@ class GpuCumsum(GpuKernelBase, Op):
def
__init__
(
self
,
axis
):
self
.
axis
=
axis
def
__str__
(
self
):
return
"
%
s{
%
s}"
%
(
self
.
__class__
.
__name__
,
self
.
axis
)
def
c_code_cache_version_apply
(
self
,
node
):
return
(
1
,)
def
c_code_cache_version
(
self
):
return
(
3
,)
def
c_headers
(
self
):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
,
'<gpuarray_helper.h>'
]
...
...
@@ -221,7 +218,7 @@ class GpuCumsum(GpuKernelBase, Op):
return
kernels
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
x
,
=
inp
z
,
=
out
...
...
@@ -249,17 +246,17 @@ class GpuCumsum(GpuKernelBase, Op):
size_t max_grid_size1;
size_t max_grid_size2;
int err;
err =
%(ctx)
s->ops->property(
%(ctx)
s->ctx, NULL, NULL
, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0);
err =
gpucontext_property(
%(ctx)
s->ctx
, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0);
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_threads_dims0");
%(fail)
s;
}
err =
%(ctx)
s->ops->property(
%(ctx)
s->ctx, NULL, NULL
, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1);
err =
gpucontext_property(
%(ctx)
s->ctx
, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1);
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size1");
%(fail)
s;
}
err =
%(ctx)
s->ops->property(
%(ctx)
s->ctx, NULL, NULL
, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2);
err =
gpucontext_property(
%(ctx)
s->ctx
, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2);
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size2");
%(fail)
s;
...
...
theano/gpuarray/gemm16.c
浏览文件 @
0a7a4c06
...
...
@@ -117,7 +117,7 @@ int gemm16(PyGpuArrayObject *C, float alpha,
if
(
48
<
n128
&&
n128
<=
64
)
{
n64
=
n
/
64
;
if
(
nprocs
==
0
)
if
(
A
->
ga
.
ops
->
property
(
A
->
context
->
ctx
,
NULL
,
NULL
,
if
(
gpucontext_property
(
A
->
context
->
ctx
,
GA_CTX_PROP_NUMPROCS
,
&
nprocs
))
{
nprocs
=
0
;
res
=
1
;
...
...
theano/gpuarray/neighbours.py
浏览文件 @
0a7a4c06
...
...
@@ -243,7 +243,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
return
kernels
def
c_code
(
self
,
node
,
name
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
dtype_ten4
=
node
.
inputs
[
0
]
.
dtype
dtype_neib_shape
=
node
.
inputs
[
1
]
.
dtype
...
...
theano/gpuarray/nerv.py
浏览文件 @
0a7a4c06
...
...
@@ -105,7 +105,7 @@ class Gemm16(COp):
return
"""
bcode = bin_
%(name)
s;
sz = sizeof(bin_
%(name)
s);
if (GpuKernel_init(&k_
%(name)
s, c->
ops, c->
ctx, 1, &bcode, &sz,
if (GpuKernel_init(&k_
%(name)
s, c->ctx, 1, &bcode, &sz,
"hgemm_
%(name)
s", 13, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize kernel
%(name)
s");
...
...
theano/gpuarray/nnet.py
浏览文件 @
0a7a4c06
...
...
@@ -189,7 +189,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op):
flags
=
flags
,
objvar
=
k_var
)]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
'cuda only'
)
typecode_x
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
inputs
[
0
]
.
dtype
)
typecode_b
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
inputs
[
1
]
.
dtype
)
...
...
@@ -375,7 +375,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
typecode_dx
=
pygpu
.
gpuarray
.
dtype_to_typecode
(
node
.
outputs
[
0
]
.
dtype
)
itemsize_dnll
=
numpy
.
dtype
(
node
.
inputs
[
0
]
.
dtype
)
.
itemsize
...
...
@@ -584,7 +584,7 @@ class GpuSoftmax(GpuKernelBase, Op):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
"cuda only"
)
dtype_x
=
node
.
inputs
[
0
]
.
dtype
work_x
=
work_dtype
(
dtype_x
)
...
...
@@ -783,7 +783,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op):
return
[
'<numpy_compat.h>'
,
'<gpuarray/types.h>'
]
def
c_code
(
self
,
node
,
nodename
,
inp
,
out
,
sub
):
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
'cuda'
:
if
node
.
inputs
[
0
]
.
type
.
context
.
kind
!=
b
'cuda'
:
raise
NotImplementedError
(
'cuda only'
)
dtype_x
=
node
.
inputs
[
0
]
.
dtype
dtype_b
=
node
.
inputs
[
1
]
.
dtype
...
...
theano/gpuarray/opt.py
浏览文件 @
0a7a4c06
...
...
@@ -33,12 +33,16 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name,
GpuSplit
,
GpuContiguous
,
gpu_contiguous
,
GpuAlloc
,
GpuAllocEmpty
,
GpuReshape
,
GpuEye
,
gpu_join
,
GpuJoin
)
from
.blas
import
(
gpu_dot22
,
GpuGemv
,
GpuGemm
,
GpuGer
,
GpuGemmBatch
,
gpugemm_no_inplace
,
gpugemmbatch_no_inplace
)
from
.blocksparse
import
GpuSparseBlockGemv
,
GpuSparseBlockOuter
from
.nnet
import
(
GpuCrossentropySoftmaxArgmax1HotWithBias
,
GpuCrossentropySoftmax1HotWithBiasDx
,
GpuSoftmaxWithBias
,
GpuSoftmax
)
from
.blas
import
(
gpu_dot22
,
GpuGemm
,
GpuGer
,
GpuGemmBatch
,
gpugemm_no_inplace
,
gpugemm_inplace
,
gpugemmbatch_no_inplace
,
gpugemv_no_inplace
,
gpugemv_inplace
)
from
.blocksparse
import
(
GpuSparseBlockGemv
,
GpuSparseBlockOuter
,
gpu_sparse_block_outer
,
gpu_sparse_block_outer_inplace
,
gpu_sparse_block_gemv
,
gpu_sparse_block_gemv_inplace
)
from
.nnet
import
(
gpu_crossentropy_softmax_1hot_with_bias_dx
,
gpu_crossentropy_softmax_argmax_1hot_with_bias
,
gpu_softmax_with_bias
,
gpu_softmax
)
from
.elemwise
import
(
GpuElemwise
,
GpuDimShuffle
,
GpuCAReduceCuda
,
GpuCAReduceCPY
)
from
.subtensor
import
(
GpuIncSubtensor
,
GpuSubtensor
,
...
...
@@ -49,6 +53,7 @@ from .opt_util import alpha_merge, output_merge
_logger
=
logging
.
getLogger
(
"theano.gpuarray.opt"
)
gpu_optimizer
=
EquilibriumDB
()
gpu_cut_copies
=
EquilibriumDB
()
...
...
@@ -146,7 +151,7 @@ def op_lifter(OP, cuda_only=False):
# Check if we should replace
if
(
not
replace
or
(
cuda_only
and
get_context
(
context_name
)
.
kind
!=
'cuda'
)):
get_context
(
context_name
)
.
kind
!=
b
'cuda'
)):
return
False
# tag the inputs with the context in case
...
...
@@ -643,7 +648,7 @@ def local_gpua_advanced_subtensor(node, context_name):
def
local_gpua_advanced_incsubtensor
(
node
,
context_name
):
context
=
get_context
(
context_name
)
# This is disabled on non-cuda contexts
if
context
.
kind
!=
'cuda'
:
if
context
.
kind
!=
b
'cuda'
:
return
None
x
,
y
,
ilist
=
node
.
inputs
...
...
@@ -674,12 +679,12 @@ def local_gpua_careduce(node, context_name):
if
isinstance
(
node
.
op
.
scalar_op
,
(
scalar
.
Add
,
scalar
.
Mul
,
scalar
.
Maximum
,
scalar
.
Minimum
)):
ctx
=
get_context
(
context_name
)
if
ctx
.
kind
==
'opencl'
:
if
ctx
.
kind
==
b
'opencl'
:
op
=
GpuCAReduceCPY
if
node
.
op
.
scalar_op
not
in
[
scalar
.
add
,
scalar
.
mul
]:
# We don't support yet all reduction with cpy code.
return
elif
ctx
.
kind
==
'cuda'
:
elif
ctx
.
kind
==
b
'cuda'
:
op
=
GpuCAReduceCuda
else
:
return
False
...
...
@@ -711,18 +716,14 @@ def local_gpua_careduce(node, context_name):
assert
reduce_mask
[
a
]
==
0
reduce_mask
[
a
]
=
1
shape_of
=
node
.
fgraph
.
shape_feature
.
shape_of
x_shape
=
shape_of
[
x
]
new_in_shp
=
[
x_shape
[
0
]]
new_in_shp
=
[
shape_i
(
x
,
0
)]
new_mask
=
[
reduce_mask
[
0
]]
for
i
in
xrange
(
1
,
x
.
type
.
ndim
):
if
reduce_mask
[
i
]
==
reduce_mask
[
i
-
1
]:
new_in_shp
[
-
1
]
*=
x_shape
[
i
]
new_in_shp
[
-
1
]
*=
shape_i
(
x
,
i
)
else
:
new_mask
.
append
(
reduce_mask
[
i
])
new_in_shp
.
append
(
x_shape
[
i
]
)
new_in_shp
.
append
(
shape_i
(
x
,
i
)
)
new_axis
=
[]
for
idx
,
m
in
enumerate
(
new_mask
):
if
m
==
1
:
...
...
@@ -744,8 +745,12 @@ def local_gpua_careduce(node, context_name):
greduce
(
gpu_reshaped_x
))
if
reduce_reshaped_x
.
ndim
!=
node
.
outputs
[
0
]
.
ndim
:
out_shp
=
[]
for
i
in
range
(
x
.
ndim
):
if
i
not
in
node
.
op
.
axis
:
out_shp
.
append
(
shape_i
(
x
,
i
))
unreshaped_reduce
=
reduce_reshaped_x
.
reshape
(
tensor
.
stack
(
shape_of
[
node
.
outputs
[
0
]]
))
tensor
.
stack
(
out_shp
))
else
:
unreshaped_reduce
=
reduce_reshaped_x
return
[
unreshaped_reduce
]
...
...
@@ -754,13 +759,19 @@ def local_gpua_careduce(node, context_name):
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
blas
.
Gemv
,
tensor
.
blas_c
.
CGemv
])
def
local_gpua_gemv
(
node
,
context_name
):
return
GpuGemv
(
inplace
=
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpugemv_inplace
else
:
return
gpugemv_no_inplace
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
blas
.
Gemm
])
def
local_gpua_gemm
(
node
,
context_name
):
return
GpuGemm
(
inplace
=
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpugemm_inplace
else
:
return
gpugemm_no_inplace
@register_opt
(
'fast_compile'
)
...
...
@@ -834,7 +845,7 @@ def local_gpua_dot22scalar(node, context_name):
x
=
as_gpuarray_variable
(
x
,
context_name
)
y
=
as_gpuarray_variable
(
y
,
context_name
)
z
=
GpuAllocEmpty
(
x
.
dtype
,
context_name
)(
x
.
shape
[
0
],
y
.
shape
[
1
])
return
[
GpuGemm
(
inplace
=
False
)
(
z
,
a
,
x
,
y
,
0
)]
return
[
gpugemm_no_inplace
(
z
,
a
,
x
,
y
,
0
)]
@register_opt
(
'fast_compile'
)
...
...
@@ -846,25 +857,25 @@ def local_gpua_eye(node, context_name):
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
CrossentropySoftmaxArgmax1HotWithBias
],
cuda_only
=
True
)
def
local_gpua_crossentropysoftmaxargmax1hotwithbias
(
node
,
context_name
):
return
GpuCrossentropySoftmaxArgmax1HotWithBias
()
return
gpu_crossentropy_softmax_argmax_1hot_with_bias
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
CrossentropySoftmax1HotWithBiasDx
],
cuda_only
=
True
)
def
local_gpua_crossentropysoftmax1hotwithbiasdx
(
node
,
context_name
):
return
GpuCrossentropySoftmax1HotWithBiasDx
()
return
gpu_crossentropy_softmax_1hot_with_bias_dx
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
Softmax
],
cuda_only
=
True
)
def
local_gpua_softmax
(
node
,
context_name
):
return
GpuSoftmax
()
return
gpu_softmax
@register_opt
(
'fast_compile'
)
@op_lifter
([
tensor
.
nnet
.
SoftmaxWithBias
],
cuda_only
=
True
)
def
local_gpua_softmaxwithbias
(
node
,
context_name
):
return
GpuSoftmaxWithBias
()
return
gpu_softmax_with_bias
@register_opt
(
'fast_compile'
)
...
...
@@ -889,20 +900,26 @@ theano.tensor.nnet.conv2d()
@register_opt
(
'fast_compile'
)
@op_lifter
([
SparseBlockGemv
])
def
local_lift_sparseblockgemv
(
node
,
context_name
):
return
GpuSparseBlockGemv
(
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpu_sparse_block_gemv_inplace
else
:
return
gpu_sparse_block_gemv
@register_opt
(
'fast_compile'
)
@op_lifter
([
SparseBlockOuter
])
def
local_lift_sparseblockouter
(
node
,
context_name
):
return
GpuSparseBlockOuter
(
node
.
op
.
inplace
)
if
node
.
op
.
inplace
:
return
gpu_sparse_block_outer_inplace
else
:
return
gpu_sparse_block_outer
@register_inplace
()
@local_optimizer
([
GpuSparseBlockGemv
],
inplace
=
True
)
def
local_inplace_sparseblockgemv
(
node
):
if
isinstance
(
node
.
op
,
GpuSparseBlockGemv
)
and
not
node
.
op
.
inplace
:
return
[
GpuSparseBlockGemv
(
inplace
=
True
)
(
*
node
.
inputs
)]
return
[
gpu_sparse_block_gemv_inplace
(
*
node
.
inputs
)]
@register_inplace
()
...
...
theano/gpuarray/subtensor.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/gpuarray/tests/test_basic_ops.py
浏览文件 @
0a7a4c06
...
...
@@ -18,7 +18,7 @@ from theano.tests import unittest_tools as utt
from
..type
import
(
GpuArrayType
,
get_context
,
gpuarray_shared_constructor
)
from
..basic_ops
import
(
host_from_gpu
,
HostFromGpu
,
GpuFromHost
,
GpuReshape
,
host_from_gpu
,
HostFromGpu
,
GpuFromHost
,
GpuReshape
,
GpuToGpu
,
GpuAlloc
,
GpuAllocEmpty
,
GpuContiguous
,
gpu_join
,
GpuJoin
,
GpuSplit
,
GpuEye
,
gpu_contiguous
)
from
..subtensor
import
GpuSubtensor
...
...
@@ -182,6 +182,21 @@ def test_transfer_cpu_gpu():
assert
numpy
.
all
(
fv
==
av
)
def
test_transfer_gpu_gpu
():
g
=
GpuArrayType
(
dtype
=
'float32'
,
broadcastable
=
(
False
,
False
),
context_name
=
test_ctx_name
)()
av
=
numpy
.
asarray
(
rng
.
rand
(
5
,
4
),
dtype
=
'float32'
)
gv
=
gpuarray
.
array
(
av
,
context
=
get_context
(
test_ctx_name
))
mode
=
mode_with_gpu
.
excluding
(
'cut_gpua_host_transfers'
,
'local_cut_gpua_host_gpua'
)
f
=
theano
.
function
([
g
],
GpuToGpu
(
test_ctx_name
)(
g
),
mode
=
mode
)
topo
=
f
.
maker
.
fgraph
.
toposort
()
assert
len
(
topo
)
==
1
assert
isinstance
(
topo
[
0
]
.
op
,
GpuToGpu
)
fv
=
f
(
gv
)
assert
GpuArrayType
.
values_eq
(
fv
,
gv
)
def
test_transfer_strided
():
# This is just to ensure that it works in theano
# libgpuarray has a much more comprehensive suit of tests to
...
...
theano/gpuarray/tests/test_elemwise.py
浏览文件 @
0a7a4c06
...
...
@@ -197,7 +197,7 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY):
def
setUp
(
self
):
super
(
test_GpuCAReduceCuda
,
self
)
.
setUp
()
if
get_context
(
test_ctx_name
)
.
kind
!=
'cuda'
:
if
get_context
(
test_ctx_name
)
.
kind
!=
b
'cuda'
:
raise
SkipTest
(
"Cuda specific tests"
)
...
...
@@ -212,7 +212,7 @@ class T_gpureduce_dtype(test_elemwise.T_reduce_dtype):
'float32'
,
'float64'
]
def
setUp
(
self
):
if
get_context
(
test_ctx_name
)
.
kind
!=
'cuda'
:
if
get_context
(
test_ctx_name
)
.
kind
!=
b
'cuda'
:
raise
SkipTest
(
"Cuda specific tests"
)
...
...
theano/gpuarray/tests/test_extra_ops.py
浏览文件 @
0a7a4c06
...
...
@@ -24,7 +24,7 @@ class TestGpuCumsum(theano.tensor.tests.test_extra_ops.TestCumsumOp):
def
setUp
(
self
):
super
(
TestGpuCumsum
,
self
)
.
setUp
()
test_ctx
=
get_context
(
test_ctx_name
)
if
test_ctx
.
kind
!=
'cuda'
:
if
test_ctx
.
kind
!=
b
'cuda'
:
raise
SkipTest
(
"Cuda specific tests"
)
self
.
max_threads_dim0
=
test_ctx
.
maxlsize0
self
.
max_grid_size1
=
test_ctx
.
maxgsize2
...
...
theano/gpuarray/tests/test_opt.py
浏览文件 @
0a7a4c06
...
...
@@ -125,7 +125,7 @@ def test_reduce():
topo
=
f
.
maker
.
fgraph
.
toposort
()
ops
=
[
type
(
node
.
op
)
for
node
in
topo
]
if
kind
==
'opencl'
and
method
in
[
"max"
,
"min"
]:
if
kind
==
b
'opencl'
and
method
in
[
"max"
,
"min"
]:
assert
not
(
GpuCAReduceCuda
in
ops
or
GpuCAReduceCPY
in
ops
)
else
:
assert
GpuCAReduceCuda
in
ops
or
GpuCAReduceCPY
in
ops
...
...
theano/gpuarray/tests/test_subtensor.py
浏览文件 @
0a7a4c06
...
...
@@ -56,3 +56,32 @@ def test_advinc_subtensor1():
rep
=
xval
.
copy
()
rep
[[
0
,
2
]]
+=
yval
assert
numpy
.
allclose
(
rval
,
rep
)
def
test_incsub_f16
():
shp
=
(
3
,
3
)
shared
=
gpuarray_shared_constructor
xval
=
numpy
.
arange
(
numpy
.
prod
(
shp
),
dtype
=
'float16'
)
.
reshape
(
shp
)
+
1
yval
=
numpy
.
empty
((
2
,)
+
shp
[
1
:],
dtype
=
'float16'
)
yval
[:]
=
2
x
=
shared
(
xval
,
name
=
'x'
)
y
=
tensor
.
tensor
(
dtype
=
'float16'
,
broadcastable
=
(
False
,)
*
len
(
shp
),
name
=
'y'
)
expr
=
tensor
.
advanced_inc_subtensor1
(
x
,
y
,
[
0
,
2
])
f
=
theano
.
function
([
y
],
expr
,
mode
=
mode_with_gpu
)
assert
sum
([
isinstance
(
node
.
op
,
GpuAdvancedIncSubtensor1
)
for
node
in
f
.
maker
.
fgraph
.
toposort
()])
==
1
rval
=
f
(
yval
)
rep
=
xval
.
copy
()
rep
[[
0
,
2
]]
+=
yval
assert
numpy
.
allclose
(
rval
,
rep
)
expr
=
tensor
.
inc_subtensor
(
x
[
1
:],
y
)
f
=
theano
.
function
([
y
],
expr
,
mode
=
mode_with_gpu
)
assert
sum
([
isinstance
(
node
.
op
,
GpuIncSubtensor
)
for
node
in
f
.
maker
.
fgraph
.
toposort
()])
==
1
rval
=
f
(
yval
)
rep
=
xval
.
copy
()
rep
[
1
:]
+=
yval
assert
numpy
.
allclose
(
rval
,
rep
)
theano/gpuarray/type.py
浏览文件 @
0a7a4c06
...
...
@@ -301,20 +301,14 @@ class GpuArrayType(Type):
raise
NotImplementedError
(
"GpuArrayType.values_eq_approx() don't implemented the"
" allow_remove_inf and allow_remove_nan parameter"
)
if
a
.
dtype
==
'float16'
or
b
.
dtype
==
'float16'
:
an
=
numpy
.
asarray
(
a
)
bn
=
numpy
.
asarray
(
b
)
return
tensor
.
TensorType
.
values_eq_approx
(
an
,
bn
,
allow_remove_inf
=
allow_remove_inf
,
allow_remove_nan
=
allow_remove_nan
,
rtol
=
rtol
,
atol
=
atol
)
atol_
,
rtol_
=
theano
.
tensor
.
basic
.
_get_atol_rtol
(
a
,
b
)
if
rtol
is
not
None
:
rtol_
=
rtol
if
atol
is
not
None
:
atol_
=
atol
res
=
elemwise2
(
a
,
''
,
b
,
a
,
odtype
=
numpy
.
dtype
(
'bool'
),
op_tmpl
=
"res
[i] = (fabs(
%%(a)
s -
%%(b)
s
) <"
"(
%(atol_)
s +
%(rtol_)
s * fabs(
%%(b)
s
)))"
%
op_tmpl
=
"res
= (fabs(a - b
) <"
"(
%(atol_)
s +
%(rtol_)
s * fabs(
b
)))"
%
locals
())
ret
=
numpy
.
asarray
(
res
)
.
all
()
if
ret
:
...
...
theano/misc/check_blas.py
100755 → 100644
浏览文件 @
0a7a4c06
...
...
@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
t0
=
0
t1
=
-
1
f
()
# Ignore first function call to get representative time.
if
execute
:
sync
=
(
hasattr
(
theano
,
"sandbox"
)
and
hasattr
(
theano
.
sandbox
,
"cuda"
)
and
theano
.
sandbox
.
cuda
.
cuda_available
)
sync2
=
(
hasattr
(
theano
,
"gpuarray"
)
and
theano
.
gpuarray
.
pygpu_activated
)
t0
=
time
.
time
()
for
i
in
range
(
iters
):
f
()
if
sync
:
theano
.
sandbox
.
cuda
.
synchronize
()
if
sync2
:
c
.
get_value
(
borrow
=
True
,
return_internal_type
=
True
)
.
sync
()
t1
=
time
.
time
()
return
t1
-
t0
,
impl
...
...
@@ -244,6 +249,7 @@ if __name__ == "__main__":
cuda version 7.5 7.0 6.5
gpu
M40 0.47s
k80 0.96s
K6000/NOECC 0.69s
K40 0.88s
...
...
theano/sandbox/cuda/dnn.py
浏览文件 @
0a7a4c06
...
...
@@ -2526,7 +2526,8 @@ if True:
out
=
as_cuda_ndarray_variable
(
out
.
dimshuffle
(
0
,
1
))
return
[
out
]
@register_opt
(
'cudnn'
)
@register_opt
(
'cudnn'
,
'stabilize'
,
'fast_compile'
)
# We put fast_compile as otherwise it won't be on the GPU.
@local_optimizer
([
GpuElemwise
,
LogSoftmax
])
def
local_log_softmax_dnn
(
node
):
# The log-softmax implementation is only available starting at cuDNN V3
...
...
theano/sandbox/cuda/opt.py
浏览文件 @
0a7a4c06
...
...
@@ -14,6 +14,7 @@ from . import dnn
import
theano
from
theano
import
scalar
as
scal
from
theano
import
config
,
tensor
,
gof
from
theano.compile.ops
import
shape_i
import
theano.ifelse
import
theano.tensor.signal.pool
import
theano.tensor.nnet
...
...
@@ -900,18 +901,14 @@ def local_gpu_careduce(node):
# to make them a single dimension, do the reduction, and
# then reshape to get them back.
shape_of
=
node
.
fgraph
.
shape_feature
.
shape_of
x_shape
=
shape_of
[
x
]
new_in_shp
=
[
x_shape
[
0
]]
new_in_shp
=
[
shape_i
(
x
,
0
)]
new_mask
=
[
reduce_mask
[
0
]]
for
i
in
xrange
(
1
,
x
.
type
.
ndim
):
if
reduce_mask
[
i
]
==
reduce_mask
[
i
-
1
]:
new_in_shp
[
-
1
]
*=
x_shape
[
i
]
new_in_shp
[
-
1
]
*=
shape_i
(
x
,
i
)
else
:
new_mask
.
append
(
reduce_mask
[
i
])
new_in_shp
.
append
(
x_shape
[
i
]
)
new_in_shp
.
append
(
shape_i
(
x
,
i
)
)
new_greduce
=
GpuCAReduce
(
new_mask
,
scalar_op
)
new_x
=
x
.
reshape
(
tensor
.
stack
(
new_in_shp
))
...
...
@@ -936,8 +933,11 @@ def local_gpu_careduce(node):
# Restore the expected shape of the output
if
rval
.
ndim
!=
out
.
ndim
:
rval
=
rval
.
reshape
(
tensor
.
stack
(
shape_of
[
out
]))
out_shp
=
[]
for
i
in
range
(
x
.
ndim
):
if
i
not
in
node
.
op
.
axis
:
out_shp
.
append
(
shape_i
(
x
,
i
))
rval
=
rval
.
reshape
(
tensor
.
stack
(
out_shp
))
if
rval
.
type
==
out
.
type
:
return
[
rval
]
...
...
theano/sandbox/gpuarray/__init__.py
浏览文件 @
0a7a4c06
...
...
@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray."""
import
warnings
from
theano.gpuarray
import
*
message
=
"theano.sandbox.gpuarray has been moved to theano.gpuarray."
+
\
" Please update your code and pickles."
message
=
(
"theano.sandbox.gpuarray has been moved to theano.gpuarray. "
"Please update your code and pickles. If the warning persists, "
"clear theano's cache ('$theano/bin/theano-cache clear')."
)
warnings
.
warn
(
message
)
theano/scalar/basic.py
浏览文件 @
0a7a4c06
...
...
@@ -2543,7 +2543,7 @@ class Log2(UnaryScalarOp):
else
:
return
[
x
.
zeros_like
()]
return
gz
/
(
x
*
math
.
log
(
2.0
)),
return
gz
/
(
x
*
numpy
.
asarray
(
math
.
log
(
2.0
))
.
astype
(
x
.
dtype
)),
def
c_code
(
self
,
node
,
name
,
inputs
,
outputs
,
sub
):
(
x
,)
=
inputs
...
...
theano/scan_module/scan_opt.py
浏览文件 @
0a7a4c06
...
...
@@ -202,7 +202,7 @@ def remove_constants_and_unused_inputs_scan(node):
# DEBUG CHECK
nwScan
=
scan_op
.
Scan
(
nw_inner
,
op_outs
,
nw_info
)
nw_outs
=
nwScan
(
*
nw_outer
,
**
dict
(
return_list
=
True
))
return
d
ict
([(
"remove"
,
[
node
])]
+
list
(
zip
(
node
.
outputs
,
nw_outs
)))
return
OrderedD
ict
([(
"remove"
,
[
node
])]
+
list
(
zip
(
node
.
outputs
,
nw_outs
)))
else
:
return
False
...
...
@@ -2072,8 +2072,8 @@ def scan_merge_inouts(node):
new_outer_out_mit_mot
.
append
(
outer_omm
)
na
.
outer_out_mit_mot
=
new_outer_out_mit_mot
if
remove
:
return
d
ict
([(
"remove"
,
remove
)]
+
list
(
zip
(
node
.
outputs
,
na
.
outer_outputs
)))
return
OrderedD
ict
([(
"remove"
,
remove
)]
+
list
(
zip
(
node
.
outputs
,
na
.
outer_outputs
)))
return
na
.
outer_outputs
...
...
theano/tensor/basic.py
浏览文件 @
0a7a4c06
...
...
@@ -612,14 +612,14 @@ def get_scalar_constant_value(orig_v, elemwise=True,
return
numpy
.
asarray
(
v
)
if
isinstance
(
v
,
numpy
.
ndarray
):
return
numpy_scalar
(
v
)
return
numpy_scalar
(
v
)
.
copy
()
if
isinstance
(
v
,
Constant
):
if
getattr
(
v
.
tag
,
'unique_value'
,
None
)
is
not
None
:
data
=
v
.
tag
.
unique_value
else
:
data
=
v
.
data
return
numpy_scalar
(
data
)
return
numpy_scalar
(
data
)
.
copy
()
if
not
only_process_constants
and
getattr
(
v
,
'owner'
,
None
):
if
isinstance
(
v
.
owner
.
op
,
(
Alloc
,
DimShuffle
,
Rebroadcast
,
...
...
@@ -649,7 +649,7 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for
i
in
v
.
owner
.
inputs
]
ret
=
[[
None
]]
v
.
owner
.
op
.
perform
(
v
.
owner
,
const
,
ret
)
return
ret
[
0
][
0
]
return
ret
[
0
][
0
]
.
copy
()
elif
elemwise
and
isinstance
(
v
.
owner
.
op
,
Elemwise
):
if
isinstance
(
v
.
owner
.
op
.
scalar_op
,
scal
.
Second
):
# We don't need both input to be constant for second
...
...
@@ -662,13 +662,13 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for
i
in
v
.
owner
.
inputs
]
ret
=
[[
None
]]
v
.
owner
.
op
.
perform
(
v
.
owner
,
const
,
ret
)
return
ret
[
0
][
0
]
return
ret
[
0
][
0
]
.
copy
()
elif
(
isinstance
(
v
.
owner
.
op
,
theano
.
tensor
.
subtensor
.
Subtensor
)
and
v
.
ndim
==
0
):
if
isinstance
(
v
.
owner
.
inputs
[
0
],
TensorConstant
):
cdata
=
tuple
(
v
.
owner
.
op
.
get_constant_idx
(
v
.
owner
.
inputs
))
try
:
return
v
.
owner
.
inputs
[
0
]
.
data
.
__getitem__
(
cdata
)
return
v
.
owner
.
inputs
[
0
]
.
data
.
__getitem__
(
cdata
)
.
copy
()
except
IndexError
:
raise
IndexError
(
str
(
tuple
(
v
.
owner
.
op
.
idx_list
))
+
...
...
@@ -1399,8 +1399,6 @@ class MaxAndArgmax(Op):
%(axis_code)
s
%(max)
s = (PyArrayObject*)PyArray_Max(
%(x)
s, axis, NULL);
if(
%(max)
s == NULL){
PyErr_SetString(PyExc_ValueError,
"MaxAndArgmax, max failed");
%(fail)
s;
}
if(!PyArray_CheckExact(
%(max)
s)){
...
...
@@ -1412,7 +1410,6 @@ class MaxAndArgmax(Op):
%(argmax)
s = (PyArrayObject*)PyArray_ArgMax(
%(x)
s, axis, NULL);
if(
%(argmax)
s == NULL){
PyErr_SetString(PyExc_ValueError, "MaxAndArgmax, argmax failed");
Py_CLEAR(
%(max)
s);
%(fail)
s;
}
...
...
@@ -1434,7 +1431,7 @@ class MaxAndArgmax(Op):
return
ret
%
locals
()
def
c_code_cache_version
(
self
):
return
(
3
,)
return
(
4
,)
def
infer_shape
(
self
,
node
,
shapes
):
ishape
,
axis_shape
=
shapes
...
...
theano/tensor/blas.py
浏览文件 @
0a7a4c06
...
...
@@ -152,6 +152,7 @@ from theano.tensor import basic as T
from
theano.tensor.blas_headers
import
blas_header_text
from
theano.tensor.blas_headers
import
blas_header_version
from
theano.tensor.opt
import
in2out
,
local_dimshuffle_lift
from
theano.tensor.type
import
values_eq_approx_remove_inf_nan
_logger
=
logging
.
getLogger
(
'theano.tensor.blas'
)
...
...
@@ -1435,7 +1436,8 @@ class GemmOptimizer(Optimizer):
if
new_node
is
not
node
:
nodelist
.
append
(
new_node
)
u
=
theano
.
gof
.
opt
.
Updater
(
on_import
,
None
,
None
)
u
=
theano
.
gof
.
opt
.
Updater
(
on_import
,
None
,
None
,
name
=
"GemmOptimizer"
)
fgraph
.
attach_feature
(
u
)
while
did_something
:
nb_iter
+=
1
...
...
@@ -1465,6 +1467,7 @@ class GemmOptimizer(Optimizer):
if
new_outputs
:
new_outputs
,
old_dot22
=
new_outputs
assert
len
(
new_outputs
)
==
len
(
node
.
outputs
)
new_outputs
[
0
]
.
tag
.
values_eq_approx
=
values_eq_approx_remove_inf_nan
try
:
fgraph
.
replace_all_validate_remove
(
list
(
zip
(
node
.
outputs
,
new_outputs
)),
...
...
theano/tensor/nlinalg.py
浏览文件 @
0a7a4c06
...
...
@@ -726,3 +726,62 @@ def norm(x, ord):
raise
ValueError
(
0
)
elif
ndim
>
2
:
raise
NotImplementedError
(
"We don't support norm witn ndim > 2"
)
class
TensorInv
(
Op
):
"""
Class wrapper for tensorinv() function;
Theano utilization of numpy.linalg.tensorinv;
"""
_numop
=
staticmethod
(
numpy
.
linalg
.
tensorinv
)
__props__
=
(
'ind'
,)
def
__init__
(
self
,
ind
=
2
):
self
.
ind
=
ind
def
make_node
(
self
,
a
):
a
=
as_tensor_variable
(
a
)
out
=
a
.
type
()
return
Apply
(
self
,
[
a
],
[
out
])
def
perform
(
self
,
node
,
inputs
,
outputs
):
(
a
,)
=
inputs
(
x
,)
=
outputs
x
[
0
]
=
self
.
_numop
(
a
,
self
.
ind
)
def
infer_shape
(
self
,
node
,
shapes
):
sp
=
shapes
[
0
][
self
.
ind
:]
+
shapes
[
0
][:
self
.
ind
]
return
[
sp
]
def
tensorinv
(
a
,
ind
=
2
):
"""
Does not run on GPU;
Theano utilization of numpy.linalg.tensorinv;
Compute the 'inverse' of an N-dimensional array.
The result is an inverse for `a` relative to the tensordot operation
``tensordot(a, b, ind)``, i. e., up to floating-point accuracy,
``tensordot(tensorinv(a), a, ind)`` is the "identity" tensor for the
tensordot operation.
Parameters
----------
a : array_like
Tensor to 'invert'. Its shape must be 'square', i. e.,
``prod(a.shape[:ind]) == prod(a.shape[ind:])``.
ind : int, optional
Number of first indices that are involved in the inverse sum.
Must be a positive integer, default is 2.
Returns
-------
b : ndarray
`a`'s tensordot inverse, shape ``a.shape[ind:] + a.shape[:ind]``.
Raises
------
LinAlgError
If `a` is singular or not 'square' (in the above sense).
"""
return
TensorInv
(
ind
)(
a
)
theano/tensor/nnet/sigm.py
浏览文件 @
0a7a4c06
...
...
@@ -413,6 +413,7 @@ log1msigm_to_softplus = gof.PatternSub(
values_eq_approx
=
values_eq_approx_remove_inf
,
skip_identities_fn
=
_skip_mul_1
)
log1pexp_to_softplus
=
gof
.
PatternSub
(
(
tensor
.
log1p
,
(
tensor
.
exp
,
'x'
)),
...
...
@@ -420,12 +421,20 @@ log1pexp_to_softplus = gof.PatternSub(
values_eq_approx
=
values_eq_approx_remove_inf
,
allow_multiple_clients
=
True
)
log1p_neg_sigmoid
=
gof
.
PatternSub
(
(
tensor
.
log1p
,
(
tensor
.
neg
,
(
sigmoid
,
'x'
))),
(
tensor
.
neg
,
(
softplus
,
'x'
)),
values_eq_approx
=
values_eq_approx_remove_inf
,
allow_multiple_clients
=
True
)
opt
.
register_stabilize
(
logsigm_to_softplus
,
name
=
'logsigm_to_softplus'
)
opt
.
register_stabilize
(
log1msigm_to_softplus
,
name
=
'log1msigm_to_softplus'
)
opt
.
register_stabilize
(
log1pexp_to_softplus
,
name
=
'log1pexp_to_softplus'
)
opt
.
register_stabilize
(
log1p_neg_sigmoid
,
name
=
'log1p_neg_sigmoid,'
)
def
is_1pexp
(
t
):
def
is_1pexp
(
t
,
only_process_constants
=
True
):
"""
Returns
...
...
@@ -437,8 +446,9 @@ def is_1pexp(t):
"""
if
t
.
owner
and
t
.
owner
.
op
==
tensor
.
add
:
scalars
,
scalar_inputs
,
nonconsts
=
\
opt
.
scalarconsts_rest
(
t
.
owner
.
inputs
)
# scalar_inputs are potentially dimshuffled and fill'd scalars
opt
.
scalarconsts_rest
(
t
.
owner
.
inputs
,
only_process_constants
=
only_process_constants
)
# scalar_inputs are potentially dimshuffled and filled with scalars
if
len
(
nonconsts
)
==
1
:
maybe_exp
=
nonconsts
[
0
]
if
maybe_exp
.
owner
and
maybe_exp
.
owner
.
op
==
tensor
.
exp
:
...
...
@@ -947,7 +957,7 @@ def local_inv_1_plus_exp(node):
inv_arg
=
node
.
inputs
[
0
]
if
inv_arg
.
owner
and
inv_arg
.
owner
.
op
==
tensor
.
add
:
scalars
,
scalar_inputs
,
nonconsts
=
\
opt
.
scalarconsts_rest
(
inv_arg
.
owner
.
inputs
)
opt
.
scalarconsts_rest
(
inv_arg
.
owner
.
inputs
,
only_process_constants
=
True
)
# scalar_inputs are potentially dimshuffled and fill'd scalars
if
len
(
nonconsts
)
==
1
:
if
nonconsts
[
0
]
.
owner
and
nonconsts
[
0
]
.
owner
.
op
==
tensor
.
exp
:
...
...
theano/tensor/nnet/tests/test_sigm.py
浏览文件 @
0a7a4c06
...
...
@@ -356,7 +356,6 @@ class T_sigmoid_opts(unittest.TestCase):
f
=
theano
.
function
([
x
],
s
,
mode
=
mode
)
assert
hasattr
(
f
.
maker
.
fgraph
.
outputs
[
0
]
.
tag
,
'trace'
)
topo
=
f
.
maker
.
fgraph
.
toposort
()
assert
len
(
topo
)
>
1
assert
not
any
([
n
.
op
==
sigmoid
for
n
in
topo
])
ux_v
=
f
([[
-
50
,
-
10
,
-
4
,
-
1
,
0
,
1
,
4
,
10
,
50
]])
...
...
@@ -467,15 +466,17 @@ class T_sigmoid_utils(unittest.TestCase):
try
:
x
=
tensor
.
vector
(
'x'
)
exp
=
tensor
.
exp
assert
is_1pexp
(
1
+
exp
(
x
))
==
(
False
,
x
)
assert
is_1pexp
(
exp
(
x
)
+
1
)
==
(
False
,
x
)
for
neg
,
exp_arg
in
imap
(
is_1pexp
,
[(
1
+
exp
(
-
x
)),
(
exp
(
-
x
)
+
1
)]):
assert
is_1pexp
(
1
+
exp
(
x
),
False
)
==
(
False
,
x
)
assert
is_1pexp
(
exp
(
x
)
+
1
,
False
)
==
(
False
,
x
)
for
neg
,
exp_arg
in
imap
(
lambda
x
:
is_1pexp
(
x
,
only_process_constants
=
False
),
[(
1
+
exp
(
-
x
)),
(
exp
(
-
x
)
+
1
)]):
assert
not
neg
and
theano
.
gof
.
graph
.
is_same_graph
(
exp_arg
,
-
x
)
assert
is_1pexp
(
1
-
exp
(
x
))
is
None
assert
is_1pexp
(
2
+
exp
(
x
))
is
None
assert
is_1pexp
(
exp
(
x
)
+
2
)
is
None
assert
is_1pexp
(
exp
(
x
)
-
1
)
is
None
assert
is_1pexp
(
-
1
+
exp
(
x
))
is
None
assert
is_1pexp
(
1
+
2
*
exp
(
x
))
is
None
assert
is_1pexp
(
1
-
exp
(
x
)
,
False
)
is
None
assert
is_1pexp
(
2
+
exp
(
x
)
,
False
)
is
None
assert
is_1pexp
(
exp
(
x
)
+
2
,
False
)
is
None
assert
is_1pexp
(
exp
(
x
)
-
1
,
False
)
is
None
assert
is_1pexp
(
-
1
+
exp
(
x
)
,
False
)
is
None
assert
is_1pexp
(
1
+
2
*
exp
(
x
)
,
False
)
is
None
finally
:
config
.
warn
.
identify_1pexp_bug
=
backup
theano/tensor/opt.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/signal/pool.py
浏览文件 @
0a7a4c06
...
...
@@ -186,8 +186,12 @@ class Pool(Op):
if
st
is
None
:
st
=
ds
r
,
c
=
imgshape
[
-
2
:]
r
+=
padding
[
0
]
*
2
c
+=
padding
[
1
]
*
2
r
=
tensor
.
extract_constant
(
r
)
c
=
tensor
.
extract_constant
(
c
)
if
padding
[
0
]:
r
+=
padding
[
0
]
*
2
if
padding
[
1
]:
c
+=
padding
[
1
]
*
2
if
ignore_border
:
if
ds
[
0
]
==
st
[
0
]:
...
...
@@ -216,7 +220,7 @@ class Pool(Op):
elif
st
[
0
]
>=
ds
[
0
]:
nr
=
(
r
-
1
)
//
st
[
0
]
+
1
else
:
nr
=
max
(
0
,
(
r
-
1
-
ds
[
0
]
)
//
st
[
0
]
+
1
)
+
1
nr
=
max
(
0
,
(
r
-
1
-
ds
[
0
]
+
st
[
0
])
//
st
[
0
]
)
+
1
if
isinstance
(
c
,
theano
.
Variable
):
nc
=
tensor
.
switch
(
tensor
.
ge
(
st
[
1
],
ds
[
1
]),
...
...
@@ -226,7 +230,7 @@ class Pool(Op):
elif
st
[
1
]
>=
ds
[
1
]:
nc
=
(
c
-
1
)
//
st
[
1
]
+
1
else
:
nc
=
max
(
0
,
(
c
-
1
-
ds
[
1
]
)
//
st
[
1
]
+
1
)
+
1
nc
=
max
(
0
,
(
c
-
1
-
ds
[
1
]
+
st
[
1
])
//
st
[
1
]
)
+
1
rval
=
list
(
imgshape
[:
-
2
])
+
[
nr
,
nc
]
return
rval
...
...
@@ -257,10 +261,10 @@ class Pool(Op):
self
.
mode
=
mode
def
make_node
(
self
,
x
):
if
x
.
type
.
ndim
!=
4
:
raise
TypeError
()
# TODO: consider restricting the dtype?
x
=
tensor
.
as_tensor_variable
(
x
)
if
x
.
type
.
ndim
!=
4
:
raise
TypeError
()
# If the input shape are broadcastable we can have 0 in the output shape
broad
=
x
.
broadcastable
[:
2
]
+
(
False
,
False
)
out
=
tensor
.
TensorType
(
x
.
dtype
,
broad
)
...
...
@@ -274,6 +278,9 @@ class Pool(Op):
'Pool requires 4D input for now'
)
z_shape
=
self
.
out_shape
(
x
.
shape
,
self
.
ds
,
self
.
ignore_border
,
self
.
st
,
self
.
padding
)
if
not
self
.
ignore_border
:
assert
z_shape
[
2
]
>
0
assert
z_shape
[
3
]
>
0
if
(
z
[
0
]
is
None
)
or
(
z
[
0
]
.
shape
!=
z_shape
):
z
[
0
]
=
numpy
.
empty
(
z_shape
,
dtype
=
x
.
dtype
)
zz
=
z
[
0
]
...
...
@@ -403,7 +410,7 @@ class Pool(Op):
}
else
{
z_r = std::max(0, (r - 1 -
%(ds0)
s
) /
%(st0)
s + 1
) + 1;
z_r = std::max(0, (r - 1 -
%(ds0)
s
+
%(st0)
s) /
%(st0)
s
) + 1;
}
// decide how many columns the output has
if (
%(st1)
s >=
%(ds1)
s)
...
...
@@ -412,8 +419,10 @@ class Pool(Op):
}
else
{
z_c = std::max(0, (c - 1 -
%(ds1)
s
) /
%(st1)
s + 1
) + 1;
z_c = std::max(0, (c - 1 -
%(ds1)
s
+
%(st0)
s) /
%(st1)
s
) + 1;
}
assert(z_r > 0);
assert(z_c > 0);
}
// memory allocation of z if necessary
if ((!
%(z)
s)
...
...
@@ -522,7 +531,7 @@ class Pool(Op):
return
ccode
%
locals
()
def
c_code_cache_version
(
self
):
return
(
0
,
6
,
8
,
3
)
return
(
0
,
6
,
8
,
4
)
class
PoolGrad
(
Op
):
...
...
@@ -632,12 +641,12 @@ class MaxPoolGrad(PoolGrad):
def
make_node
(
self
,
x
,
maxout
,
gz
):
# make_node should only be called by the grad function of
# Pool, so these asserts should not fail.
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
maxout
,
Variable
)
and
maxout
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
x
=
tensor
.
as_tensor_variable
(
x
)
maxout
=
tensor
.
as_tensor_variable
(
maxout
)
gz
=
tensor
.
as_tensor_variable
(
gz
)
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
maxout
,
Variable
)
and
maxout
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
return
Apply
(
self
,
[
x
,
maxout
,
gz
],
[
x
.
type
()])
...
...
@@ -814,10 +823,10 @@ class AveragePoolGrad(PoolGrad):
def
make_node
(
self
,
x
,
gz
,
dummy
=
None
):
# make_node should only be called by the grad function of
# Pool, so these asserts should not fail.
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
x
=
tensor
.
as_tensor_variable
(
x
)
gz
=
tensor
.
as_tensor_variable
(
gz
)
assert
isinstance
(
x
,
Variable
)
and
x
.
ndim
==
4
assert
isinstance
(
gz
,
Variable
)
and
gz
.
ndim
==
4
return
Apply
(
self
,
[
x
,
gz
],
[
x
.
type
()])
...
...
theano/tensor/signal/tests/test_pool.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/slinalg.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/subtensor.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_basic.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_blas_c.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_nlinalg.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_opt.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tensor/tests/test_slinalg.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
theano/tests/test_flake8.py
浏览文件 @
0a7a4c06
差异被折叠。
点击展开。
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论