提交 0a7a4c06 authored 作者: Chinnadhurai Sankar's avatar Chinnadhurai Sankar

Merge branch 'master' of git://github.com/Theano/Theano

...@@ -37,3 +37,4 @@ Theano.suo ...@@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints .ipynb_checkpoints
.pydevproject .pydevproject
.ropeproject .ropeproject
core
\ No newline at end of file
...@@ -10,15 +10,14 @@ Related Projects: ...@@ -10,15 +10,14 @@ Related Projects:
https://github.com/Theano/Theano/wiki/Related-projects https://github.com/Theano/Theano/wiki/Related-projects
We recommend you look at the documentation on the website, since it It is recommended that you look at the documentation on the website, as it will be more current than the documentation included with the package.
will be more current than the documentation included with the package.
If you really wish to build the documentation yourself, you will need
sphinx. Issue the following command:
In order to build the documentation yourself, you will need sphinx. Issue the following command:
python ./doc/scripts/docgen.py python ./doc/scripts/docgen.py
Documentation is built into html/ Documentation is built into html/
The PDF of the documentation is html/theano.pdf
The PDF of the documentation can be found at html/theano.pdf
DIRECTORY LAYOUT DIRECTORY LAYOUT
...@@ -31,7 +30,7 @@ Theano (current directory) is the distribution directory. ...@@ -31,7 +30,7 @@ Theano (current directory) is the distribution directory.
* tensor depends upon scalar * tensor depends upon scalar
* sparse depends upon tensor * sparse depends upon tensor
* sandbox can depend on everything else * sandbox can depend on everything else
* Theano/examples are copies of the example on the wiki * Theano/examples are copies of the example found on the wiki
* Theano/benchmark and Theano/examples are in the distribution, but not in * Theano/benchmark and Theano/examples are in the distribution, but not in
the Python package the Python package
* Theano/bin contains executable scripts that are copied to the bin folder * Theano/bin contains executable scripts that are copied to the bin folder
...@@ -39,4 +38,4 @@ Theano (current directory) is the distribution directory. ...@@ -39,4 +38,4 @@ Theano (current directory) is the distribution directory.
* Tests are distributed and are part of the package, i.e. fall in * Tests are distributed and are part of the package, i.e. fall in
the appropriate submodules the appropriate submodules
* Theano/doc contains files and scripts used to generate the documentation * Theano/doc contains files and scripts used to generate the documentation
* Theano/html is the place where the documentation will be generated * Theano/html is where the documentation will be generated
...@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this: ...@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops Testing GPU Ops
^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
Ops to be executed on the GPU should inherit from the When using the old GPU backend, Ops to be executed on the GPU should inherit
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows from ``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU. NVIDIA driver works correctly with our sum reduction code on the GPU.
......
...@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add ...@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
If you want GPU-related tests to run on a specific GPU device, and not If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`. the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=gpu1``. For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=cuda1``.
See :ref:`libdoc_config` for more information on how to change these See :ref:`libdoc_config` for more information on how to change these
configuration options. configuration options.
...@@ -508,25 +508,25 @@ Any one of them is enough. ...@@ -508,25 +508,25 @@ Any one of them is enough.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`. :ref:`Ubuntu instructions <install_ubuntu_gpu>`.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32. computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=gpu,floatX=float32'``. For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=cuda,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section: You can also set these options in the .theanorc file's ``[global]`` section:
.. code-block:: cfg .. code-block:: cfg
[global] [global]
device = gpu device = cuda
floatX = float32 floatX = float32
Note that: Note that:
* If your computer has multiple GPUs and you use 'device=gpu', the driver * If your computer has multiple GPUs and you use 'device=cuda', the driver
selects the one to use (usually gpu0). selects the one to use (usually cuda0).
* You can use the program nvida-smi to change this policy. * You can use the program ``nvidia-smi`` to change this policy.
* You can choose one specific GPU by specifying 'device=gpuX', with X the * You can choose one specific GPU by specifying 'device=cudaX', with X the
the corresponding GPU index (0, 1, 2, ...) the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations, * By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU. Theano will fall back to the CPU if there is a problem with the GPU.
...@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats: ...@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__. `gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once your setup is complete, head to :ref:`using_gpu` to find how to verify Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly. everything is working properly.
......
...@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04: ...@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano sudo pip install Theano
On 14.04, this will install Python 2 by default. If you want to use Python 3: On 14.04, this will install Python 2 by default. If you want to use Python 3:
.. code-block:: bash .. code-block:: bash
...@@ -104,30 +104,30 @@ For Ubuntu 11.04: ...@@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it. probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs Bleeding Edge Installs
---------------------- ----------------------
If you would like, instead, to install the bleeding edge Theano (from github) If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano` such that you can edit and contribute to Theano, replace the `pip install Theano`
command with: command with:
.. code-block:: bash .. code-block:: bash
git clone git://github.com/Theano/Theano.git git clone git://github.com/Theano/Theano.git
cd Theano cd Theano
python setup.py develop --user python setup.py develop --user
cd .. cd ..
VirtualEnv VirtualEnv
---------- ----------
If you would like to install Theano in a VirtualEnv, you will want to pass the If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up `--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`. the system-provided `Numpy` and `SciPy`.
.. code-block:: bash .. code-block:: bash
virtualenv --system-site-packages -p python2.7 theano-env virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate source theano-env/bin/activate
pip install Theano pip install Theano
...@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs ...@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run: Change to the Theano directory and run:
.. code-block:: bash .. code-block:: bash
git pull git pull
...@@ -303,7 +303,7 @@ Test GPU configuration ...@@ -303,7 +303,7 @@ Test GPU configuration
.. code-block:: bash .. code-block:: bash
THEANO_FLAGS=floatX=float32,device=gpu python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py THEANO_FLAGS=floatX=float32,device=cuda python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
.. note:: .. note::
......
...@@ -423,16 +423,16 @@ Create a test file containing: ...@@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %( print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start)) np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), )) print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
.. testoutput:: .. testoutput::
:hide: :hide:
:options: +ELLIPSIS :options: +ELLIPSIS
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!) NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ... Result difference: ...
.. code-block:: none .. code-block:: none
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!) NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000 Result difference: 0.000000
...@@ -445,6 +445,8 @@ routine for matrix multiplication) ...@@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use Configure Theano for GPU use
############################ ############################
Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
Theano can be configured with a ``.theanorc`` text file (or Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under ``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the Windows). It should be placed in the directory pointed to by the
...@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file: ...@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg .. code-block:: cfg
[global] [global]
device = gpu device = cuda
floatX = float32 floatX = float32
[nvcc] [nvcc]
...@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above. ...@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
If you installed Python through WinPython or EPD, Theano will automatically If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS. link with the MKL library, so you should not need to compile your own BLAS.
.. note:: .. note::
......
差异被折叠。
...@@ -1414,7 +1414,7 @@ Mathematical ...@@ -1414,7 +1414,7 @@ Mathematical
.. function:: abs_(a) .. function:: abs_(a)
Returns a variable representingthe absolute of a, ie ``|a|``. Returns a variable representing the absolute of a, ie ``|a|``.
.. note:: Can also be accessed with ``abs(a)``. .. note:: Can also be accessed with ``abs(a)``.
......
...@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE ...@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ ============= ========================================================= ========= ============ =============
:term:`merge` x x :term:`merge` x x
:term:`constant folding<constant folding>` x x :term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x :term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x :term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x :term:`inc_subtensor srlz.<inc_subtensor serialization>` x
...@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE ...@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x :term:`inplace_elemwise` x
:term:`inplace_random` x :term:`inplace_random` x
:term:`elemwise fusion` x :term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x :term:`local_log_softmax` x x
:term:`local_remove_all_assert` :term:`local_remove_all_assert`
========================================================= ========= ============ ============= ========================================================= ========= ============ =============
......
...@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to ...@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the hints that give more flexibility to the compilation and optimization of the
graph. graph.
For GPU graphs, this borrowing can have a major speed impact. See the following code:
.. code-block:: python
from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time
vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Which produces this output:
.. code-block:: none
$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu
*Take home message:* *Take home message:*
When an input *x* to a function is not needed after the function When an input *x* to a function is not needed after the function
...@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory ...@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y, it's returned, then consider marking it with an ``Out(y,
borrow=True)``. borrow=True)``.
...@@ -168,8 +168,8 @@ Linkers ...@@ -168,8 +168,8 @@ Linkers
======= =======
A mode is composed of 2 things: an optimizer and a linker. Some modes, A mode is composed of 2 things: an optimizer and a linker. Some modes,
like ``NanGuardMode`` and ``DebugMode``, add logic around the optimizer and like ``NanGuardMode`` and ``DebugMode``, add logic around the
linker. ``NanGuardMode`` and ``DebugMode`` use their own linker. optimizer and linker. ``DebugMode`` uses its own linker.
You can select which linker to use with the Theano flag :attr:`config.linker`. You can select which linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers. Here is a table to compare the different linkers.
...@@ -183,7 +183,7 @@ c|py [#cpy1]_ yes yes "+++" Try C code. If none exis ...@@ -183,7 +183,7 @@ c|py [#cpy1]_ yes yes "+++" Try C code. If none exis
c|py_nogc no yes "++" As c|py, but without gc c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only C code (if none available for an op, raise an error) c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only Python code py yes yes "+++" Use only Python code
NanGuardMode no no "++++" Check if nodes generate NaN NanGuardMode yes yes "++++" Check if nodes generate NaN
DebugMode no yes VERY HIGH Make many checks on what Theano computes DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= === ============= ========= ================= ========= ===
......
差异被折叠。
...@@ -81,7 +81,7 @@ single name and a single device. ...@@ -81,7 +81,7 @@ single name and a single device.
It is often the case that multi-gpu operation requires or assumes It is often the case that multi-gpu operation requires or assumes
that all the GPUs involved are equivalent. This is not the case that all the GPUs involved are equivalent. This is not the case
for this implementation. Since the user has the task of for this implementation. Since the user has the task of
distrubuting the jobs across the different device a model can be distributing the jobs across the different device a model can be
built on the assumption that one of the GPU is slower or has built on the assumption that one of the GPU is slower or has
smaller memory. smaller memory.
...@@ -140,5 +140,5 @@ is a example. ...@@ -140,5 +140,5 @@ is a example.
cv = gv.transfer('cpu') cv = gv.transfer('cpu')
Of course you can mix transfers and operations in any order you Of course you can mix transfers and operations in any order you
choose. However you should try to minimize transfer operations choose. However you should try to minimize transfer operations
because they will introduce overhead any may reduce performance. because they will introduce overhead that may reduce performance.
...@@ -73,7 +73,7 @@ def contains_nan(arr, node=None): ...@@ -73,7 +73,7 @@ def contains_nan(arr, node=None):
elif arr.size == 0: elif arr.size == 0:
return False return False
elif cuda.cuda_available and isinstance(arr, cuda.CudaNdarray): elif cuda.cuda_available and isinstance(arr, cuda.CudaNdarray):
if (hasattr(theano.sandbox, 'rng_mrg') and if (node and hasattr(theano.sandbox, 'rng_mrg') and
isinstance( isinstance(
node.op, node.op,
# It store ints in float container # It store ints in float container
...@@ -119,7 +119,7 @@ def contains_inf(arr, node=None): ...@@ -119,7 +119,7 @@ def contains_inf(arr, node=None):
elif arr.size == 0: elif arr.size == 0:
return False return False
elif cuda.cuda_available and isinstance(arr, cuda.CudaNdarray): elif cuda.cuda_available and isinstance(arr, cuda.CudaNdarray):
if (hasattr(theano.sandbox, 'rng_mrg') and if (node and hasattr(theano.sandbox, 'rng_mrg') and
isinstance( isinstance(
node.op, node.op,
# It store ints in float container # It store ints in float container
...@@ -215,7 +215,7 @@ class NanGuardMode(Mode): ...@@ -215,7 +215,7 @@ class NanGuardMode(Mode):
assert nan_is_error or inf_is_error or big_is_error assert nan_is_error or inf_is_error or big_is_error
compile_gpu_func(nan_is_error, inf_is_error, big_is_error) compile_gpu_func(nan_is_error, inf_is_error, big_is_error)
def do_check_on(var, nd, f, is_input): def do_check_on(var, nd):
""" """
Checks `var` for NaNs / Infs. If detected, raises an exception Checks `var` for NaNs / Infs. If detected, raises an exception
and / or prints information about `nd`, `f`, and `is_input` to and / or prints information about `nd`, `f`, and `is_input` to
...@@ -227,11 +227,6 @@ class NanGuardMode(Mode): ...@@ -227,11 +227,6 @@ class NanGuardMode(Mode):
The value to be checked. The value to be checked.
nd : theano.gof.Apply nd : theano.gof.Apply
The Apply node being executed. The Apply node being executed.
f : callable
The thunk for the apply node.
is_input : bool
If True, `var` is an input to `nd`.
If False, it is an output.
""" """
error = False error = False
...@@ -262,17 +257,13 @@ class NanGuardMode(Mode): ...@@ -262,17 +257,13 @@ class NanGuardMode(Mode):
print('Big value detected', file=sio) print('Big value detected', file=sio)
error = True error = True
if error: if error:
if not is_input: if nd:
print("NanGuardMode found an error in the" print("NanGuardMode found an error in the "
" output of a node in this variable:", file=sio) "output of a node in this variable:", file=sio)
print(theano.printing.debugprint(nd, file='str'), file=sio) print(theano.printing.debugprint(nd, file='str'), file=sio)
else: else:
print("NanGuardMode found an error in an" print("NanGuardMode found an error in an input of the "
" input of this node.", file=sio) "graph.", file=sio)
print('Node:', file=sio)
print(nd, file=sio)
print("The input variable that cause problem:", file=sio)
print(theano.printing.debugprint(nd, file='str'), file=sio)
msg = sio.getvalue() msg = sio.getvalue()
if config.NanGuardMode.action == 'raise': if config.NanGuardMode.action == 'raise':
raise AssertionError(msg) raise AssertionError(msg)
...@@ -283,36 +274,16 @@ class NanGuardMode(Mode): ...@@ -283,36 +274,16 @@ class NanGuardMode(Mode):
elif config.NanGuardMode.action == 'warn': elif config.NanGuardMode.action == 'warn':
logger.error(msg) logger.error(msg)
def nan_check(i, node, fn): def nan_check(node, thunk, storage_map, compute_map):
""" for var in node.outputs:
Runs `fn` while checking its inputs and outputs for NaNs / Infs.
Parameters
----------
i :
Currently ignored.
TODO: determine why it is here or remove).
node : theano.gof.Apply
The Apply node currently being executed.
fn : callable
The thunk to execute for this Apply node.
"""
inputs = fn.inputs
for x, var in zip(inputs, node.inputs):
# If the input is the result of computation, then we
# don't need to check it. It is already done after the
# computation.
if (var.owner is None and
getattr(var.tag, 'nan_guard_mode_check', True)):
do_check_on(x[0], node, fn, True)
fn()
outputs = fn.outputs
for x, var in zip(outputs, node.outputs):
if getattr(var.tag, 'nan_guard_mode_check', True): if getattr(var.tag, 'nan_guard_mode_check', True):
do_check_on(x[0], node, fn, False) do_check_on(storage_map[var][0], node)
def nan_check_input(var, value):
if getattr(var.tag, 'nan_guard_mode_check', True):
do_check_on(value, None)
wrap_linker = theano.gof.WrapLinker([theano.gof.OpWiseCLinker()], wrap_linker = theano.gof.vm.VM_Linker(callback=nan_check,
nan_check) callback_input=nan_check_input)
super(NanGuardMode, self).__init__(wrap_linker, super(NanGuardMode, self).__init__(wrap_linker,
optimizer=self.provided_optimizer) optimizer=self.provided_optimizer)
...@@ -84,10 +84,15 @@ def _atexit_print_fn(): ...@@ -84,10 +84,15 @@ def _atexit_print_fn():
cum_attr[key] = val cum_attr[key] = val
if cum.optimizer_profile and ps.optimizer_profile: if cum.optimizer_profile and ps.optimizer_profile:
merge = cum.optimizer_profile[0].merge_profile( try:
cum.optimizer_profile[1], merge = cum.optimizer_profile[0].merge_profile(
ps.optimizer_profile[1]) cum.optimizer_profile[1],
cum.optimizer_profile = (cum.optimizer_profile[0], merge) ps.optimizer_profile[1])
cum.optimizer_profile = (cum.optimizer_profile[0], merge)
except Exception as e:
print("Got an exception while merging profile")
print(e)
cum.optimizer_profile = None
else: else:
cum.optimizer_profile = None cum.optimizer_profile = None
......
...@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam): ...@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam):
AddConfigVar( AddConfigVar(
'device', 'device',
("Default device for computations. If gpu*, change the default to try " ("Default device for computations. If cuda* or opencl*, change the"
"to move computation to it and to put shared variable of float32 " "default to try to move computation to the GPU. Do not use upper case"
"on it. Do not use upper case letters, only lower case even if " "letters, only lower case even if NVIDIA uses capital letters."),
"NVIDIA use capital letters."),
DeviceParam('cpu', allow_override=False), DeviceParam('cpu', allow_override=False),
in_c_key=False) in_c_key=False)
...@@ -273,7 +272,8 @@ def safe_no_dnn_workmem_bwd(workmem): ...@@ -273,7 +272,8 @@ def safe_no_dnn_workmem_bwd(workmem):
return True return True
AddConfigVar('dnn.conv.workmem_bwd', AddConfigVar('dnn.conv.workmem_bwd',
"This flag is deprecated; use dnn.conv.algo_bwd.", "This flag is deprecated; use `dnn.conv.algo_bwd_filter` "
"and `dnn.conv.algo_bwd_data` instead.",
ConfigParam('', allow_override=False, ConfigParam('', allow_override=False,
filter=safe_no_dnn_workmem_bwd), filter=safe_no_dnn_workmem_bwd),
in_c_key=False) in_c_key=False)
...@@ -651,8 +651,8 @@ AddConfigVar('warn.ignore_bug_before', ...@@ -651,8 +651,8 @@ AddConfigVar('warn.ignore_bug_before',
"bugs found after that version. " "bugs found after that version. "
"Warning for specific bugs can be configured with specific " "Warning for specific bugs can be configured with specific "
"[warn] flags."), "[warn] flags."),
EnumStr('0.7', 'None', 'all', '0.3', '0.4', '0.4.1', '0.5', '0.7', EnumStr('0.7', 'None', 'all', '0.3', '0.4', '0.4.1', '0.5', '0.6',
'0.8', '0.7', '0.8', '0.8.1', '0.8.2',
allow_override=False), allow_override=False),
in_c_key=False) in_c_key=False)
......
...@@ -165,6 +165,9 @@ def raise_with_op(node, thunk=None, exc_info=None, storage_map=None): ...@@ -165,6 +165,9 @@ def raise_with_op(node, thunk=None, exc_info=None, storage_map=None):
detailed_err_msg += ("Inputs shapes: %s" % shapes + detailed_err_msg += ("Inputs shapes: %s" % shapes +
"\nInputs strides: %s" % strides + "\nInputs strides: %s" % strides +
"\nInputs values: %s" % scalar_values) "\nInputs values: %s" % scalar_values)
if theano.config.exception_verbosity == 'high':
detailed_err_msg += "\nInputs type_num: %s" % str(
[getattr(getattr(i[0], 'dtype', ''), 'num', '') for i in thunk.inputs])
if hasattr(node.op, '__input_name__'): if hasattr(node.op, '__input_name__'):
detailed_err_msg += "\nInputs name: %s\n" % str(node.op.__input_name__) detailed_err_msg += "\nInputs name: %s\n" % str(node.op.__input_name__)
......
差异被折叠。
...@@ -244,16 +244,26 @@ class EquilibriumDB(DB): ...@@ -244,16 +244,26 @@ class EquilibriumDB(DB):
optimization application. This could result in less fgraph iterations, optimization application. This could result in less fgraph iterations,
but this doesn't mean it will be faster globally. but this doesn't mean it will be faster globally.
tracks_on_change_inputs
If True, we will re-apply local opt on nodes whose inputs
changed during local optimization application. This could
result in less fgraph iterations, but this doesn't mean it
will be faster globally.
Notes Notes
----- -----
We can put LocalOptimizer and Optimizer as EquilibriumOptimizer We can put LocalOptimizer and Optimizer as EquilibriumOptimizer
suppor both. suppor both.
It is probably not a good idea to have ignore_newtrees=False and
tracks_on_change_inputs=True
""" """
def __init__(self, ignore_newtrees=True): def __init__(self, ignore_newtrees=True, tracks_on_change_inputs=False):
super(EquilibriumDB, self).__init__() super(EquilibriumDB, self).__init__()
self.ignore_newtrees = ignore_newtrees self.ignore_newtrees = ignore_newtrees
self.tracks_on_change_inputs = tracks_on_change_inputs
self.__final__ = {} self.__final__ = {}
self.__cleanup__ = {} self.__cleanup__ = {}
...@@ -281,6 +291,7 @@ class EquilibriumDB(DB): ...@@ -281,6 +291,7 @@ class EquilibriumDB(DB):
opts, opts,
max_use_ratio=config.optdb.max_use_ratio, max_use_ratio=config.optdb.max_use_ratio,
ignore_newtrees=self.ignore_newtrees, ignore_newtrees=self.ignore_newtrees,
tracks_on_change_inputs=self.tracks_on_change_inputs,
failure_callback=opt.NavigatorOptimizer.warn_inplace, failure_callback=opt.NavigatorOptimizer.warn_inplace,
final_optimizers=final_opts, final_optimizers=final_opts,
cleanup_optimizers=cleanup_opts) cleanup_optimizers=cleanup_opts)
......
...@@ -332,7 +332,7 @@ class Stack(VM): ...@@ -332,7 +332,7 @@ class Stack(VM):
def __init__(self, nodes, thunks, pre_call_clear, def __init__(self, nodes, thunks, pre_call_clear,
storage_map, compute_map, fgraph, allow_gc, storage_map, compute_map, fgraph, allow_gc,
dependencies=None, callback=None): dependencies=None, callback=None, callback_input=None):
super(Stack, self).__init__(nodes, thunks, pre_call_clear) super(Stack, self).__init__(nodes, thunks, pre_call_clear)
self.allow_gc = allow_gc self.allow_gc = allow_gc
...@@ -345,6 +345,7 @@ class Stack(VM): ...@@ -345,6 +345,7 @@ class Stack(VM):
self.compute_map = compute_map self.compute_map = compute_map
self.node_idx = node_idx = {} self.node_idx = node_idx = {}
self.callback = callback self.callback = callback
self.callback_input = callback_input
ords = fgraph.orderings() ords = fgraph.orderings()
...@@ -411,6 +412,8 @@ class Stack(VM): ...@@ -411,6 +412,8 @@ class Stack(VM):
for k in self.storage_map: for k in self.storage_map:
compute_map[k][0] = (k.owner is None) compute_map[k][0] = (k.owner is None)
if self.callback_input and compute_map[k][0]:
self.callback_input(k, self.storage_map[k][0])
# apply_stack contains nodes # apply_stack contains nodes
if output_subset is not None: if output_subset is not None:
...@@ -684,6 +687,11 @@ class VM_Linker(link.LocalLinker): ...@@ -684,6 +687,11 @@ class VM_Linker(link.LocalLinker):
A callable object to call after each call to a thunk within A callable object to call after each call to a thunk within
the virtual machine. It will be called with four arguments called the virtual machine. It will be called with four arguments called
'node', 'thunk', 'storage_map', and 'compute_map'. 'node', 'thunk', 'storage_map', and 'compute_map'.
callback_input
A callable object to call on each input to the graph
(variables with no owner). This includes constants and shared
variables values. It will be called with two arguments:
'var', 'value'.
lazy lazy
Useful only when use_cloop is False. When lazy is None, use the Useful only when use_cloop is False. When lazy is None, use the
theano flag vm.lazy value. Then if we have a None (default) we auto theano flag vm.lazy value. Then if we have a None (default) we auto
...@@ -700,8 +708,8 @@ class VM_Linker(link.LocalLinker): ...@@ -700,8 +708,8 @@ class VM_Linker(link.LocalLinker):
""" """
def __init__(self, allow_gc=None, use_cloop=False, callback=None, def __init__(self, allow_gc=None, use_cloop=False, callback=None,
lazy=None, schedule=None, c_thunks=None, callback_input=None, lazy=None, schedule=None,
allow_partial_eval=None): c_thunks=None, allow_partial_eval=None):
# Note: if more parameters are added to __init__, make sure to forward # Note: if more parameters are added to __init__, make sure to forward
# them in the "type(self)(...)" call in the "accept" method below. # them in the "type(self)(...)" call in the "accept" method below.
if allow_gc is None: if allow_gc is None:
...@@ -710,6 +718,7 @@ class VM_Linker(link.LocalLinker): ...@@ -710,6 +718,7 @@ class VM_Linker(link.LocalLinker):
self.allow_gc = allow_gc self.allow_gc = allow_gc
self.use_cloop = use_cloop self.use_cloop = use_cloop
self.callback = callback self.callback = callback
self.callback_input = callback_input
self.lazy = lazy self.lazy = lazy
self.c_thunks = c_thunks self.c_thunks = c_thunks
self.allow_partial_eval = allow_partial_eval self.allow_partial_eval = allow_partial_eval
...@@ -760,9 +769,11 @@ class VM_Linker(link.LocalLinker): ...@@ -760,9 +769,11 @@ class VM_Linker(link.LocalLinker):
allow_gc=self.allow_gc, allow_gc=self.allow_gc,
use_cloop=self.use_cloop, use_cloop=self.use_cloop,
callback=self.callback, callback=self.callback,
callback_input=self.callback_input,
lazy=self.lazy, lazy=self.lazy,
schedule=self.schedule, schedule=self.schedule,
c_thunks=self.c_thunks, c_thunks=self.c_thunks,
allow_partial_eval=self.allow_partial_eval
).accept(fgraph, no_recycling) ).accept(fgraph, no_recycling)
self.fgraph = fgraph self.fgraph = fgraph
self.no_recycling = no_recycling self.no_recycling = no_recycling
...@@ -829,16 +840,17 @@ class VM_Linker(link.LocalLinker): ...@@ -829,16 +840,17 @@ class VM_Linker(link.LocalLinker):
pre_call_clear = [storage_map[v] for v in self.no_recycling] pre_call_clear = [storage_map[v] for v in self.no_recycling]
if (self.callback is not None or if (self.callback is not None or self.callback_input is not None or
(config.profile and config.profile_memory) or (config.profile and config.profile_memory) or
getattr(self, 'allow_partial_eval', False)): self.allow_partial_eval):
if self.use_cloop and self.callback is not None: if self.use_cloop and (self.callback is not None or
self.callback_input is not None):
logger.warn('CVM does not support callback, using Stack VM.') logger.warn('CVM does not support callback, using Stack VM.')
if self.use_cloop and config.profile_memory: if self.use_cloop and config.profile_memory:
warnings.warn( warnings.warn(
'CVM does not support memory profile, using Stack VM.') 'CVM does not support memory profile, using Stack VM.')
if self.use_cloop and getattr(self, 'allow_partial_eval', False): if self.use_cloop and self.allow_partial_eval:
warnings.warn( warnings.warn(
'CVM does not support partial evaluation yet, ' 'CVM does not support partial evaluation yet, '
'using Stack VM.') 'using Stack VM.')
...@@ -849,7 +861,8 @@ class VM_Linker(link.LocalLinker): ...@@ -849,7 +861,8 @@ class VM_Linker(link.LocalLinker):
storage_map, compute_map, storage_map, compute_map,
self.fgraph, self.allow_gc, self.fgraph, self.allow_gc,
dependencies=deps, dependencies=deps,
callback=self.callback) callback=self.callback,
callback_input=self.callback_input)
elif self.use_cloop: elif self.use_cloop:
# create a map from nodes to ints and vars to ints # create a map from nodes to ints and vars to ints
nodes_idx = {} nodes_idx = {}
...@@ -1046,7 +1059,7 @@ class VM_Linker(link.LocalLinker): ...@@ -1046,7 +1059,7 @@ class VM_Linker(link.LocalLinker):
if lazy is None: if lazy is None:
lazy = not all([(not th.lazy) for th in thunks]) lazy = not all([(not th.lazy) for th in thunks])
if not (lazy or (config.profile and config.profile_memory) or if not (lazy or (config.profile and config.profile_memory) or
self.use_cloop or self.callback): self.use_cloop or self.callback or self.callback_input):
for pair in itervalues(reallocated_info): for pair in itervalues(reallocated_info):
storage_map[pair[1]] = storage_map[pair[0]] storage_map[pair[1]] = storage_map[pair[0]]
...@@ -1088,3 +1101,7 @@ class VM_Linker(link.LocalLinker): ...@@ -1088,3 +1101,7 @@ class VM_Linker(link.LocalLinker):
self.__dict__.update(d) self.__dict__.update(d)
if not hasattr(self, 'c_thunks'): if not hasattr(self, 'c_thunks'):
self.c_thunks = True self.c_thunks = True
if not hasattr(self, 'allow_partial_eval'):
self.allow_partial_eval = None
if not hasattr(self, 'callback_input'):
self.callback_input = None
...@@ -42,7 +42,7 @@ register_transfer(transfer) ...@@ -42,7 +42,7 @@ register_transfer(transfer)
def init_dev(dev, name=None): def init_dev(dev, name=None):
v = pygpu.gpuarray.api_version() v = pygpu.gpuarray.api_version()
expected = -9998 expected = -9997
if v[0] != expected: if v[0] != expected:
raise RuntimeError("Wrong major API version for gpuarray:", v[0], raise RuntimeError("Wrong major API version for gpuarray:", v[0],
"Make sure Theano and libgpuarray/pygpu " "Make sure Theano and libgpuarray/pygpu "
...@@ -50,6 +50,15 @@ def init_dev(dev, name=None): ...@@ -50,6 +50,15 @@ def init_dev(dev, name=None):
if v[1] < 0: if v[1] < 0:
raise RuntimeError("Wrong minor API version for gpuarray:", v[1], raise RuntimeError("Wrong minor API version for gpuarray:", v[1],
"Please update libgpuarray/pygpu.") "Please update libgpuarray/pygpu.")
if len(v) < 3:
vpy = -1
else:
vpy = v[2]
vpye = 0
if vpy < vpye:
print("Wrong python API version for gpuarray:", vpy, "expected:", vpye,
"Some python ops may not work correctly and/or crash. "
"Consider updating pygpu.", file=sys.stderr)
global pygpu_activated global pygpu_activated
if dev not in init_dev.devmap: if dev not in init_dev.devmap:
ctx = pygpu.init(dev, ctx = pygpu.init(dev,
......
...@@ -259,14 +259,14 @@ class GpuKernelBase(object): ...@@ -259,14 +259,14 @@ class GpuKernelBase(object):
int types[%(numargs)u] = {%(types)s}; int types[%(numargs)u] = {%(types)s};
const char *bcode = %(bvar)s; const char *bcode = %(bvar)s;
size_t sz = sizeof(%(bvar)s); size_t sz = sizeof(%(bvar)s);
if (GpuKernel_init(&%(ovar)s, %(ctx)s->ops, %(ctx)s->ctx, 1, &bcode, &sz, if (GpuKernel_init(&%(ovar)s, %(ctx)s->ctx, 1, &bcode, &sz,
"%(kname)s", %(numargs)u, types, GA_USE_BINARY, NULL) "%(kname)s", %(numargs)u, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) { != GA_NO_ERROR) {
if ((err = GpuKernel_init(&%(ovar)s, %(ctx)s->ops, %(ctx)s->ctx, 1, if ((err = GpuKernel_init(&%(ovar)s, %(ctx)s->ctx, 1,
&%(cname)s, NULL, "%(kname)s", %(numargs)u, &%(cname)s, NULL, "%(kname)s", %(numargs)u,
types, %(flags)s, NULL)) != GA_NO_ERROR) { types, %(flags)s, NULL)) != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "GpuKernel_init error %%d: %%s", PyErr_Format(PyExc_RuntimeError, "GpuKernel_init error %%d: %%s",
err, Gpu_error(%(ctx)s->ops, %(ctx)s->ctx, err)); err, gpucontext_error(%(ctx)s->ctx, err));
%(fail)s %(fail)s
} }
} }
...@@ -310,7 +310,7 @@ class GpuKernelBase(object): ...@@ -310,7 +310,7 @@ class GpuKernelBase(object):
The node that we need the cache version for. The node that we need the cache version for.
""" """
return (3, self.get_params(node).bin_id) return (4, self.get_params(node).bin_id)
class HostFromGpu(Op): class HostFromGpu(Op):
...@@ -529,15 +529,22 @@ class GpuToGpu(Op): ...@@ -529,15 +529,22 @@ class GpuToGpu(Op):
def c_code(self, node, name, inputs, outputs, sub): def c_code(self, node, name, inputs, outputs, sub):
return """ return """
Py_XDECREF(%(out)s); Py_XDECREF(%(out)s);
%(out)s = pygpu_transfer(%(inp)s, %(ctx)s, 0); %(out)s = pygpu_empty(%(inp)s->ga.nd,
%(inp)s->ga.dimensions,
%(inp)s->ga.typecode,
GpuArray_IS_C_CONTIGUOUS(&(%(inp)s->ga)) ? GA_C_ORDER:GA_F_ORDER,
%(ctx)s, Py_None);
if (%(out)s == NULL) { if (%(out)s == NULL) {
%(fail)s %(fail)s
} }
if (pygpu_transfer(%(out)s, %(inp)s)) {
%(fail)s
}
""" % {'inp': inputs[0], 'ctx': sub['params'], """ % {'inp': inputs[0], 'ctx': sub['params'],
'out': outputs[0], 'fail': sub['fail']} 'out': outputs[0], 'fail': sub['fail']}
def c_code_cache_version(self): def c_code_cache_version(self):
return (0,) return (1,)
class GpuAlloc(HideC, Alloc): class GpuAlloc(HideC, Alloc):
......
...@@ -24,16 +24,9 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W, ...@@ -24,16 +24,9 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
size_t *offW = NULL; size_t *offW = NULL;
size_t *offInp = NULL; size_t *offInp = NULL;
size_t *offOut = NULL; size_t *offOut = NULL;
gpuarray_blas_ops *blas_ops;
int err; int err;
err = ctx->ops->property(ctx->ctx, NULL, NULL, err = gpublas_setup(ctx->ctx);
GA_CTX_PROP_BLAS_OPS, &blas_ops);
if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't get blas ops");
return -1;
}
err = blas_ops->setup(ctx->ctx);
if (err != GA_NO_ERROR) { if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't setup blas"); PyErr_SetString(PyExc_RuntimeError, "Can't setup blas");
return -1; return -1;
...@@ -93,29 +86,29 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W, ...@@ -93,29 +86,29 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
} }
if (out->ga.typecode == GA_FLOAT) { if (out->ga.typecode == GA_FLOAT) {
err = blas_ops->sgemvBatch(cb_fortran, transA, err = gpublas_sgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2], PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1, PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda, W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode), inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode), 1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0); PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
} else if (out->ga.typecode == GA_DOUBLE) { } else if (out->ga.typecode == GA_DOUBLE) {
err = blas_ops->dgemvBatch(cb_fortran, transA, err = gpublas_dgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2], PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1, PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda, W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode), inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode), 1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0); PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
} else if (out->ga.typecode == GA_HALF) { } else if (out->ga.typecode == GA_HALF) {
err = blas_ops->sgemvBatch(cb_fortran, transA, err = gpublas_sgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2], PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1, PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda, W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode), inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode), 1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0); PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
} else { } else {
err = GA_INVALID_ERROR; err = GA_INVALID_ERROR;
} }
......
...@@ -12,16 +12,9 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x, ...@@ -12,16 +12,9 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
size_t *offOut = NULL; size_t *offOut = NULL;
size_t *offX = NULL; size_t *offX = NULL;
size_t *offY = NULL; size_t *offY = NULL;
gpuarray_blas_ops *blas_ops;
int err; int err;
err = ctx->ops->property(ctx->ctx, NULL, NULL, err = gpublas_setup(ctx->ctx);
GA_CTX_PROP_BLAS_OPS, &blas_ops);
if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't get blas ops");
return -1;
}
err = blas_ops->setup(ctx->ctx);
if (err != GA_NO_ERROR) { if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't setup blas"); PyErr_SetString(PyExc_RuntimeError, "Can't setup blas");
return -1; return -1;
...@@ -84,26 +77,26 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x, ...@@ -84,26 +77,26 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
ssize_t str_out = PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode); ssize_t str_out = PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode);
if (out->ga.typecode == GA_FLOAT) { if (out->ga.typecode == GA_FLOAT) {
err = blas_ops->sgerBatch(cb_fortran, err = gpublas_sgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2], PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(float *)PyArray_GETPTR1(alpha, 0), *(float *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x, y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out, o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0); PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
} else if (out->ga.typecode == GA_DOUBLE) { } else if (out->ga.typecode == GA_DOUBLE) {
err = blas_ops->dgerBatch(cb_fortran, err = gpublas_dgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2], PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(double *)PyArray_GETPTR1(alpha, 0), *(double *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x, y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out, o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0); PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
} else if (out->ga.typecode == GA_HALF) { } else if (out->ga.typecode == GA_HALF) {
err = blas_ops->hgerBatch(cb_fortran, err = gpublas_hgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2], PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(float *)PyArray_GETPTR1(alpha, 0), *(float *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x, y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out, o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0); PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
} else { } else {
err = GA_INVALID_ERROR; err = GA_INVALID_ERROR;
} }
......
...@@ -125,7 +125,7 @@ def dnn_available(context_name): ...@@ -125,7 +125,7 @@ def dnn_available(context_name):
ctx = get_context(context_name) ctx = get_context(context_name)
if not ctx.kind == 'cuda': if not ctx.kind == b'cuda':
dnn_available.msg = "Not on a CUDA device." dnn_available.msg = "Not on a CUDA device."
return False return False
...@@ -1493,7 +1493,7 @@ def local_dnn_convi_output_merge(node, *inputs): ...@@ -1493,7 +1493,7 @@ def local_dnn_convi_output_merge(node, *inputs):
return [GpuDnnConvGradI(algo=node.op.algo)(*inputs)] return [GpuDnnConvGradI(algo=node.op.algo)(*inputs)]
@register_opt('cudnn') @register_opt('cudnn', 'fast_compile')
@op_lifter([Pool]) @op_lifter([Pool])
def local_pool_dnn_alternative(node, ctx_name): def local_pool_dnn_alternative(node, ctx_name):
if not dnn_available(ctx_name): if not dnn_available(ctx_name):
...@@ -1509,7 +1509,7 @@ def local_pool_dnn_alternative(node, ctx_name): ...@@ -1509,7 +1509,7 @@ def local_pool_dnn_alternative(node, ctx_name):
return dnn_pool(gpu_contiguous(img), ds, stride=stride, pad=pad, mode=mode) return dnn_pool(gpu_contiguous(img), ds, stride=stride, pad=pad, mode=mode)
@register_opt('cudnn') @register_opt('cudnn', 'fast_compile')
@op_lifter([MaxPoolGrad]) @op_lifter([MaxPoolGrad])
def local_pool_dnn_grad_stride(node, ctx_name): def local_pool_dnn_grad_stride(node, ctx_name):
if not dnn_available(ctx_name): if not dnn_available(ctx_name):
...@@ -1533,7 +1533,7 @@ def local_pool_dnn_grad_stride(node, ctx_name): ...@@ -1533,7 +1533,7 @@ def local_pool_dnn_grad_stride(node, ctx_name):
pad) pad)
@register_opt('cudnn') @register_opt('cudnn', 'fast_compile')
@op_lifter([AveragePoolGrad]) @op_lifter([AveragePoolGrad])
def local_avg_pool_dnn_grad_stride(node, ctx_name): def local_avg_pool_dnn_grad_stride(node, ctx_name):
if not dnn_available(ctx_name): if not dnn_available(ctx_name):
...@@ -1556,7 +1556,7 @@ def local_avg_pool_dnn_grad_stride(node, ctx_name): ...@@ -1556,7 +1556,7 @@ def local_avg_pool_dnn_grad_stride(node, ctx_name):
return GpuDnnPoolGrad(mode=mode)(gpu_contiguous(inp), cg, cg, ds, st, pad) return GpuDnnPoolGrad(mode=mode)(gpu_contiguous(inp), cg, cg, ds, st, pad)
@register_opt('cudnn') @register_opt('cudnn', 'fast_compile')
@local_optimizer([GpuSoftmax]) @local_optimizer([GpuSoftmax])
def local_softmax_dnn(node): def local_softmax_dnn(node):
if isinstance(node.op, GpuSoftmax): if isinstance(node.op, GpuSoftmax):
...@@ -1569,7 +1569,7 @@ def local_softmax_dnn(node): ...@@ -1569,7 +1569,7 @@ def local_softmax_dnn(node):
return [out] return [out]
@register_opt('cudnn') @register_opt('cudnn', 'stabilize')
@local_optimizer([GpuElemwise]) @local_optimizer([GpuElemwise])
def local_log_softmax_dnn(node): def local_log_softmax_dnn(node):
# This looks for GpuDnnSoftmax so we know that we have cudnn. # This looks for GpuDnnSoftmax so we know that we have cudnn.
...@@ -1586,7 +1586,7 @@ def local_log_softmax_dnn(node): ...@@ -1586,7 +1586,7 @@ def local_log_softmax_dnn(node):
return [new_softmax(softmax_node.inputs[0])] return [new_softmax(softmax_node.inputs[0])]
@register_opt('cudnn') @register_opt('cudnn', 'fast_compile')
@op_lifter([LogSoftmax]) @op_lifter([LogSoftmax])
def local_logsoftmax_to_dnn(node, ctx_name): def local_logsoftmax_to_dnn(node, ctx_name):
# Transform the input in the format expected by GpuDnnSoftmax # Transform the input in the format expected by GpuDnnSoftmax
...@@ -1624,7 +1624,7 @@ class NoCuDNNRaise(Optimizer): ...@@ -1624,7 +1624,7 @@ class NoCuDNNRaise(Optimizer):
gpu_seqopt.register("NoCuDNNRaise", NoCuDNNRaise(), 0, 'cudnn') gpu_seqopt.register("NoCuDNNRaise", NoCuDNNRaise(), 0, 'cudnn')
@register_opt('cudnn') @register_opt('cudnn', 'fast_compile')
@op_lifter([SoftmaxGrad]) @op_lifter([SoftmaxGrad])
def local_softmax_dnn_grad(node, ctx_name): def local_softmax_dnn_grad(node, ctx_name):
if not dnn_available(ctx_name): if not dnn_available(ctx_name):
......
...@@ -105,7 +105,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns, ...@@ -105,7 +105,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
algo = choice.algo; algo = choice.algo;
#else #else
size_t free; size_t free;
int err2 = c->ops->property(c->ctx, NULL, NULL, GA_CTX_PROP_FREE_GMEM, &free); int err2 = gpucontext_property(c->ctx, GA_CTX_PROP_FREE_GMEM, &free);
if (err2 != GA_NO_ERROR) { if (err2 != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "Error when trying to find the " PyErr_Format(PyExc_RuntimeError, "Error when trying to find the "
...@@ -234,7 +234,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns, ...@@ -234,7 +234,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
* to place a nice get_work_mem() function in. * to place a nice get_work_mem() function in.
*/ */
if (worksize != 0) { if (worksize != 0) {
workspace = c->ops->buffer_alloc(c->ctx, worksize, NULL, 0, NULL); workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, NULL);
if (workspace == NULL) { if (workspace == NULL) {
PyErr_SetString(PyExc_RuntimeError, PyErr_SetString(PyExc_RuntimeError,
"Could not allocate working memory"); "Could not allocate working memory");
...@@ -258,7 +258,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns, ...@@ -258,7 +258,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
APPLY_SPECIFIC(output), PyGpuArray_DEV_DATA(*output)); APPLY_SPECIFIC(output), PyGpuArray_DEV_DATA(*output));
if (worksize != 0) if (worksize != 0)
c->ops->buffer_release(workspace); gpudata_release(workspace);
cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ); cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ);
cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ); cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ);
......
...@@ -106,7 +106,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output, ...@@ -106,7 +106,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
algo = choice.algo; algo = choice.algo;
#else #else
size_t free; size_t free;
int err2 = c->ops->property(c->ctx, NULL, NULL, GA_CTX_PROP_FREE_GMEM, &free); int err2 = gpucontext_property(c->ctx, GA_CTX_PROP_FREE_GMEM, &free);
if (err2 != GA_NO_ERROR) { if (err2 != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "Error when trying to find the " PyErr_Format(PyExc_RuntimeError, "Error when trying to find the "
...@@ -204,7 +204,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output, ...@@ -204,7 +204,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
} }
if (worksize != 0) { if (worksize != 0) {
workspace = c->ops->buffer_alloc(c->ctx, worksize, NULL, 0, NULL); workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, NULL);
if (workspace == NULL) { if (workspace == NULL) {
PyErr_SetString(PyExc_RuntimeError, PyErr_SetString(PyExc_RuntimeError,
"Could not allocate working memory"); "Could not allocate working memory");
...@@ -227,7 +227,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output, ...@@ -227,7 +227,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
APPLY_SPECIFIC(input), PyGpuArray_DEV_DATA(*input)); APPLY_SPECIFIC(input), PyGpuArray_DEV_DATA(*input));
if (worksize != 0) if (worksize != 0)
c->ops->buffer_release(workspace); gpudata_release(workspace);
cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ); cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ);
cuda_record(output->ga.data, GPUARRAY_CUDA_WAIT_READ); cuda_record(output->ga.data, GPUARRAY_CUDA_WAIT_READ);
......
...@@ -107,7 +107,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output, ...@@ -107,7 +107,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
algo = choice.algo; algo = choice.algo;
#else #else
size_t free; size_t free;
int err2 = c->ops->property(c->ctx, NULL, NULL, GA_CTX_PROP_FREE_GMEM, &free); int err2 = gpucontext_property(c->ctx, GA_CTX_PROP_FREE_GMEM, &free);
if (err2 != GA_NO_ERROR) { if (err2 != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "Error when trying to find the " PyErr_Format(PyExc_RuntimeError, "Error when trying to find the "
...@@ -192,7 +192,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output, ...@@ -192,7 +192,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
} }
if (worksize != 0) { if (worksize != 0) {
workspace = c->ops->buffer_alloc(c->ctx, worksize, NULL, 0, NULL); workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, NULL);
if (workspace == NULL) { if (workspace == NULL) {
PyErr_SetString(PyExc_RuntimeError, "Could not allocate working memory"); PyErr_SetString(PyExc_RuntimeError, "Could not allocate working memory");
cuda_exit(c->ctx); cuda_exit(c->ctx);
...@@ -214,7 +214,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output, ...@@ -214,7 +214,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
APPLY_SPECIFIC(kerns), PyGpuArray_DEV_DATA(*kerns)); APPLY_SPECIFIC(kerns), PyGpuArray_DEV_DATA(*kerns));
if (worksize != 0) if (worksize != 0)
c->ops->buffer_release(workspace); gpudata_release(workspace);
cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ); cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ);
cuda_record(output->ga.data, GPUARRAY_CUDA_WAIT_READ); cuda_record(output->ga.data, GPUARRAY_CUDA_WAIT_READ);
......
...@@ -199,7 +199,7 @@ class GpuElemwise(HideC, Elemwise): ...@@ -199,7 +199,7 @@ class GpuElemwise(HideC, Elemwise):
typecode=o.type.typecode) typecode=o.type.typecode)
res += """ res += """
ge = GpuElemwise_new(%(ctx)s->ops, %(ctx)s->ctx, %(support)s, %(kop)s, %(nargs)s, args, %(nd)s, 0); ge = GpuElemwise_new(%(ctx)s->ctx, %(support)s, %(kop)s, %(nargs)s, args, %(nd)s, 0);
if (ge == NULL) { if (ge == NULL) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize elemwise support"); PyErr_SetString(PyExc_RuntimeError, "Could not initialize elemwise support");
%(fail)s %(fail)s
...@@ -360,7 +360,7 @@ class GpuElemwise(HideC, Elemwise): ...@@ -360,7 +360,7 @@ class GpuElemwise(HideC, Elemwise):
def c_code_cache_version(self): def c_code_cache_version(self):
ver = self.scalar_op.c_code_cache_version() ver = self.scalar_op.c_code_cache_version()
if ver: if ver:
return (6, ver) return (7, ver)
else: else:
return ver return ver
...@@ -554,7 +554,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype): ...@@ -554,7 +554,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
def make_node(self, x): def make_node(self, x):
x = as_gpuarray_variable(x, infer_context_name(x)) x = as_gpuarray_variable(x, infer_context_name(x))
if x.type.context.kind != 'cuda': if x.type.context.kind != b'cuda':
raise TypeError("GpuCAReduceCuda doesn't work for non-cuda devices") raise TypeError("GpuCAReduceCuda doesn't work for non-cuda devices")
ret = super(GpuCAReduceCuda, self).make_node(x) ret = super(GpuCAReduceCuda, self).make_node(x)
self = copy.copy(self) self = copy.copy(self)
......
...@@ -26,11 +26,8 @@ class GpuCumsum(GpuKernelBase, Op): ...@@ -26,11 +26,8 @@ class GpuCumsum(GpuKernelBase, Op):
def __init__(self, axis): def __init__(self, axis):
self.axis = axis self.axis = axis
def __str__(self): def c_code_cache_version(self):
return "%s{%s}" % (self.__class__.__name__, self.axis) return (3,)
def c_code_cache_version_apply(self, node):
return (1,)
def c_headers(self): def c_headers(self):
return ['<numpy_compat.h>', '<gpuarray/types.h>', '<gpuarray_helper.h>'] return ['<numpy_compat.h>', '<gpuarray/types.h>', '<gpuarray_helper.h>']
...@@ -221,7 +218,7 @@ class GpuCumsum(GpuKernelBase, Op): ...@@ -221,7 +218,7 @@ class GpuCumsum(GpuKernelBase, Op):
return kernels return kernels
def c_code(self, node, nodename, inp, out, sub): def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda': if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only") raise NotImplementedError("cuda only")
x, = inp x, = inp
z, = out z, = out
...@@ -249,17 +246,17 @@ class GpuCumsum(GpuKernelBase, Op): ...@@ -249,17 +246,17 @@ class GpuCumsum(GpuKernelBase, Op):
size_t max_grid_size1; size_t max_grid_size1;
size_t max_grid_size2; size_t max_grid_size2;
int err; int err;
err = %(ctx)s->ops->property(%(ctx)s->ctx, NULL, NULL, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0); err = gpucontext_property(%(ctx)s->ctx, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0);
if (err != GA_NO_ERROR){ if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_threads_dims0"); PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_threads_dims0");
%(fail)s; %(fail)s;
} }
err = %(ctx)s->ops->property(%(ctx)s->ctx, NULL, NULL, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1); err = gpucontext_property(%(ctx)s->ctx, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1);
if (err != GA_NO_ERROR){ if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size1"); PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size1");
%(fail)s; %(fail)s;
} }
err = %(ctx)s->ops->property(%(ctx)s->ctx, NULL, NULL, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2); err = gpucontext_property(%(ctx)s->ctx, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2);
if (err != GA_NO_ERROR){ if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size2"); PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size2");
%(fail)s; %(fail)s;
......
...@@ -117,7 +117,7 @@ int gemm16(PyGpuArrayObject *C, float alpha, ...@@ -117,7 +117,7 @@ int gemm16(PyGpuArrayObject *C, float alpha,
if (48 < n128 && n128 <= 64) { if (48 < n128 && n128 <= 64) {
n64 = n / 64; n64 = n / 64;
if (nprocs == 0) if (nprocs == 0)
if (A->ga.ops->property(A->context->ctx, NULL, NULL, if (gpucontext_property(A->context->ctx,
GA_CTX_PROP_NUMPROCS, &nprocs)) { GA_CTX_PROP_NUMPROCS, &nprocs)) {
nprocs = 0; nprocs = 0;
res = 1; res = 1;
......
...@@ -243,7 +243,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op): ...@@ -243,7 +243,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
return kernels return kernels
def c_code(self, node, name, inp, out, sub): def c_code(self, node, name, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda': if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only") raise NotImplementedError("cuda only")
dtype_ten4 = node.inputs[0].dtype dtype_ten4 = node.inputs[0].dtype
dtype_neib_shape = node.inputs[1].dtype dtype_neib_shape = node.inputs[1].dtype
......
...@@ -105,7 +105,7 @@ class Gemm16(COp): ...@@ -105,7 +105,7 @@ class Gemm16(COp):
return """ return """
bcode = bin_%(name)s; bcode = bin_%(name)s;
sz = sizeof(bin_%(name)s); sz = sizeof(bin_%(name)s);
if (GpuKernel_init(&k_%(name)s, c->ops, c->ctx, 1, &bcode, &sz, if (GpuKernel_init(&k_%(name)s, c->ctx, 1, &bcode, &sz,
"hgemm_%(name)s", 13, types, GA_USE_BINARY, NULL) "hgemm_%(name)s", 13, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) { != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize kernel %(name)s"); PyErr_SetString(PyExc_RuntimeError, "Could not initialize kernel %(name)s");
......
...@@ -189,7 +189,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op): ...@@ -189,7 +189,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op):
flags=flags, objvar=k_var)] flags=flags, objvar=k_var)]
def c_code(self, node, nodename, inp, out, sub): def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda': if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError('cuda only') raise NotImplementedError('cuda only')
typecode_x = pygpu.gpuarray.dtype_to_typecode(node.inputs[0].dtype) typecode_x = pygpu.gpuarray.dtype_to_typecode(node.inputs[0].dtype)
typecode_b = pygpu.gpuarray.dtype_to_typecode(node.inputs[1].dtype) typecode_b = pygpu.gpuarray.dtype_to_typecode(node.inputs[1].dtype)
...@@ -375,7 +375,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op): ...@@ -375,7 +375,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op):
return ['<numpy_compat.h>', '<gpuarray/types.h>'] return ['<numpy_compat.h>', '<gpuarray/types.h>']
def c_code(self, node, nodename, inp, out, sub): def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda': if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only") raise NotImplementedError("cuda only")
typecode_dx = pygpu.gpuarray.dtype_to_typecode(node.outputs[0].dtype) typecode_dx = pygpu.gpuarray.dtype_to_typecode(node.outputs[0].dtype)
itemsize_dnll = numpy.dtype(node.inputs[0].dtype).itemsize itemsize_dnll = numpy.dtype(node.inputs[0].dtype).itemsize
...@@ -584,7 +584,7 @@ class GpuSoftmax(GpuKernelBase, Op): ...@@ -584,7 +584,7 @@ class GpuSoftmax(GpuKernelBase, Op):
return ['<numpy_compat.h>', '<gpuarray/types.h>'] return ['<numpy_compat.h>', '<gpuarray/types.h>']
def c_code(self, node, nodename, inp, out, sub): def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda': if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only") raise NotImplementedError("cuda only")
dtype_x = node.inputs[0].dtype dtype_x = node.inputs[0].dtype
work_x = work_dtype(dtype_x) work_x = work_dtype(dtype_x)
...@@ -783,7 +783,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op): ...@@ -783,7 +783,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op):
return ['<numpy_compat.h>', '<gpuarray/types.h>'] return ['<numpy_compat.h>', '<gpuarray/types.h>']
def c_code(self, node, nodename, inp, out, sub): def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda': if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError('cuda only') raise NotImplementedError('cuda only')
dtype_x = node.inputs[0].dtype dtype_x = node.inputs[0].dtype
dtype_b = node.inputs[1].dtype dtype_b = node.inputs[1].dtype
......
...@@ -33,12 +33,16 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name, ...@@ -33,12 +33,16 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name,
GpuSplit, GpuContiguous, gpu_contiguous, GpuSplit, GpuContiguous, gpu_contiguous,
GpuAlloc, GpuAllocEmpty, GpuReshape, GpuAlloc, GpuAllocEmpty, GpuReshape,
GpuEye, gpu_join, GpuJoin) GpuEye, gpu_join, GpuJoin)
from .blas import (gpu_dot22, GpuGemv, GpuGemm, GpuGer, GpuGemmBatch, from .blas import (gpu_dot22, GpuGemm, GpuGer, GpuGemmBatch,
gpugemm_no_inplace, gpugemmbatch_no_inplace) gpugemm_no_inplace, gpugemm_inplace, gpugemmbatch_no_inplace,
from .blocksparse import GpuSparseBlockGemv, GpuSparseBlockOuter gpugemv_no_inplace, gpugemv_inplace)
from .nnet import (GpuCrossentropySoftmaxArgmax1HotWithBias, from .blocksparse import (GpuSparseBlockGemv, GpuSparseBlockOuter,
GpuCrossentropySoftmax1HotWithBiasDx, gpu_sparse_block_outer, gpu_sparse_block_outer_inplace,
GpuSoftmaxWithBias, GpuSoftmax) gpu_sparse_block_gemv, gpu_sparse_block_gemv_inplace)
from .nnet import (gpu_crossentropy_softmax_1hot_with_bias_dx,
gpu_crossentropy_softmax_argmax_1hot_with_bias,
gpu_softmax_with_bias, gpu_softmax)
from .elemwise import (GpuElemwise, GpuDimShuffle, GpuCAReduceCuda, from .elemwise import (GpuElemwise, GpuDimShuffle, GpuCAReduceCuda,
GpuCAReduceCPY) GpuCAReduceCPY)
from .subtensor import (GpuIncSubtensor, GpuSubtensor, from .subtensor import (GpuIncSubtensor, GpuSubtensor,
...@@ -49,6 +53,7 @@ from .opt_util import alpha_merge, output_merge ...@@ -49,6 +53,7 @@ from .opt_util import alpha_merge, output_merge
_logger = logging.getLogger("theano.gpuarray.opt") _logger = logging.getLogger("theano.gpuarray.opt")
gpu_optimizer = EquilibriumDB() gpu_optimizer = EquilibriumDB()
gpu_cut_copies = EquilibriumDB() gpu_cut_copies = EquilibriumDB()
...@@ -146,7 +151,7 @@ def op_lifter(OP, cuda_only=False): ...@@ -146,7 +151,7 @@ def op_lifter(OP, cuda_only=False):
# Check if we should replace # Check if we should replace
if (not replace or if (not replace or
(cuda_only and (cuda_only and
get_context(context_name).kind != 'cuda')): get_context(context_name).kind != b'cuda')):
return False return False
# tag the inputs with the context in case # tag the inputs with the context in case
...@@ -643,7 +648,7 @@ def local_gpua_advanced_subtensor(node, context_name): ...@@ -643,7 +648,7 @@ def local_gpua_advanced_subtensor(node, context_name):
def local_gpua_advanced_incsubtensor(node, context_name): def local_gpua_advanced_incsubtensor(node, context_name):
context = get_context(context_name) context = get_context(context_name)
# This is disabled on non-cuda contexts # This is disabled on non-cuda contexts
if context.kind != 'cuda': if context.kind != b'cuda':
return None return None
x, y, ilist = node.inputs x, y, ilist = node.inputs
...@@ -674,12 +679,12 @@ def local_gpua_careduce(node, context_name): ...@@ -674,12 +679,12 @@ def local_gpua_careduce(node, context_name):
if isinstance(node.op.scalar_op, (scalar.Add, scalar.Mul, if isinstance(node.op.scalar_op, (scalar.Add, scalar.Mul,
scalar.Maximum, scalar.Minimum)): scalar.Maximum, scalar.Minimum)):
ctx = get_context(context_name) ctx = get_context(context_name)
if ctx.kind == 'opencl': if ctx.kind == b'opencl':
op = GpuCAReduceCPY op = GpuCAReduceCPY
if node.op.scalar_op not in [scalar.add, scalar.mul]: if node.op.scalar_op not in [scalar.add, scalar.mul]:
# We don't support yet all reduction with cpy code. # We don't support yet all reduction with cpy code.
return return
elif ctx.kind == 'cuda': elif ctx.kind == b'cuda':
op = GpuCAReduceCuda op = GpuCAReduceCuda
else: else:
return False return False
...@@ -711,18 +716,14 @@ def local_gpua_careduce(node, context_name): ...@@ -711,18 +716,14 @@ def local_gpua_careduce(node, context_name):
assert reduce_mask[a] == 0 assert reduce_mask[a] == 0
reduce_mask[a] = 1 reduce_mask[a] = 1
shape_of = node.fgraph.shape_feature.shape_of new_in_shp = [shape_i(x, 0)]
x_shape = shape_of[x]
new_in_shp = [x_shape[0]]
new_mask = [reduce_mask[0]] new_mask = [reduce_mask[0]]
for i in xrange(1, x.type.ndim): for i in xrange(1, x.type.ndim):
if reduce_mask[i] == reduce_mask[i - 1]: if reduce_mask[i] == reduce_mask[i - 1]:
new_in_shp[-1] *= x_shape[i] new_in_shp[-1] *= shape_i(x, i)
else: else:
new_mask.append(reduce_mask[i]) new_mask.append(reduce_mask[i])
new_in_shp.append(x_shape[i]) new_in_shp.append(shape_i(x, i))
new_axis = [] new_axis = []
for idx, m in enumerate(new_mask): for idx, m in enumerate(new_mask):
if m == 1: if m == 1:
...@@ -744,8 +745,12 @@ def local_gpua_careduce(node, context_name): ...@@ -744,8 +745,12 @@ def local_gpua_careduce(node, context_name):
greduce(gpu_reshaped_x)) greduce(gpu_reshaped_x))
if reduce_reshaped_x.ndim != node.outputs[0].ndim: if reduce_reshaped_x.ndim != node.outputs[0].ndim:
out_shp = []
for i in range(x.ndim):
if i not in node.op.axis:
out_shp.append(shape_i(x, i))
unreshaped_reduce = reduce_reshaped_x.reshape( unreshaped_reduce = reduce_reshaped_x.reshape(
tensor.stack(shape_of[node.outputs[0]])) tensor.stack(out_shp))
else: else:
unreshaped_reduce = reduce_reshaped_x unreshaped_reduce = reduce_reshaped_x
return [unreshaped_reduce] return [unreshaped_reduce]
...@@ -754,13 +759,19 @@ def local_gpua_careduce(node, context_name): ...@@ -754,13 +759,19 @@ def local_gpua_careduce(node, context_name):
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.blas.Gemv, tensor.blas_c.CGemv]) @op_lifter([tensor.blas.Gemv, tensor.blas_c.CGemv])
def local_gpua_gemv(node, context_name): def local_gpua_gemv(node, context_name):
return GpuGemv(inplace=node.op.inplace) if node.op.inplace:
return gpugemv_inplace
else:
return gpugemv_no_inplace
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.blas.Gemm]) @op_lifter([tensor.blas.Gemm])
def local_gpua_gemm(node, context_name): def local_gpua_gemm(node, context_name):
return GpuGemm(inplace=node.op.inplace) if node.op.inplace:
return gpugemm_inplace
else:
return gpugemm_no_inplace
@register_opt('fast_compile') @register_opt('fast_compile')
...@@ -834,7 +845,7 @@ def local_gpua_dot22scalar(node, context_name): ...@@ -834,7 +845,7 @@ def local_gpua_dot22scalar(node, context_name):
x = as_gpuarray_variable(x, context_name) x = as_gpuarray_variable(x, context_name)
y = as_gpuarray_variable(y, context_name) y = as_gpuarray_variable(y, context_name)
z = GpuAllocEmpty(x.dtype, context_name)(x.shape[0], y.shape[1]) z = GpuAllocEmpty(x.dtype, context_name)(x.shape[0], y.shape[1])
return [GpuGemm(inplace=False)(z, a, x, y, 0)] return [gpugemm_no_inplace(z, a, x, y, 0)]
@register_opt('fast_compile') @register_opt('fast_compile')
...@@ -846,25 +857,25 @@ def local_gpua_eye(node, context_name): ...@@ -846,25 +857,25 @@ def local_gpua_eye(node, context_name):
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.CrossentropySoftmaxArgmax1HotWithBias], cuda_only=True) @op_lifter([tensor.nnet.CrossentropySoftmaxArgmax1HotWithBias], cuda_only=True)
def local_gpua_crossentropysoftmaxargmax1hotwithbias(node, context_name): def local_gpua_crossentropysoftmaxargmax1hotwithbias(node, context_name):
return GpuCrossentropySoftmaxArgmax1HotWithBias() return gpu_crossentropy_softmax_argmax_1hot_with_bias
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx], cuda_only=True) @op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx], cuda_only=True)
def local_gpua_crossentropysoftmax1hotwithbiasdx(node, context_name): def local_gpua_crossentropysoftmax1hotwithbiasdx(node, context_name):
return GpuCrossentropySoftmax1HotWithBiasDx() return gpu_crossentropy_softmax_1hot_with_bias_dx
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.Softmax], cuda_only=True) @op_lifter([tensor.nnet.Softmax], cuda_only=True)
def local_gpua_softmax(node, context_name): def local_gpua_softmax(node, context_name):
return GpuSoftmax() return gpu_softmax
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.SoftmaxWithBias], cuda_only=True) @op_lifter([tensor.nnet.SoftmaxWithBias], cuda_only=True)
def local_gpua_softmaxwithbias(node, context_name): def local_gpua_softmaxwithbias(node, context_name):
return GpuSoftmaxWithBias() return gpu_softmax_with_bias
@register_opt('fast_compile') @register_opt('fast_compile')
...@@ -889,20 +900,26 @@ theano.tensor.nnet.conv2d() ...@@ -889,20 +900,26 @@ theano.tensor.nnet.conv2d()
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([SparseBlockGemv]) @op_lifter([SparseBlockGemv])
def local_lift_sparseblockgemv(node, context_name): def local_lift_sparseblockgemv(node, context_name):
return GpuSparseBlockGemv(node.op.inplace) if node.op.inplace:
return gpu_sparse_block_gemv_inplace
else:
return gpu_sparse_block_gemv
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([SparseBlockOuter]) @op_lifter([SparseBlockOuter])
def local_lift_sparseblockouter(node, context_name): def local_lift_sparseblockouter(node, context_name):
return GpuSparseBlockOuter(node.op.inplace) if node.op.inplace:
return gpu_sparse_block_outer_inplace
else:
return gpu_sparse_block_outer
@register_inplace() @register_inplace()
@local_optimizer([GpuSparseBlockGemv], inplace=True) @local_optimizer([GpuSparseBlockGemv], inplace=True)
def local_inplace_sparseblockgemv(node): def local_inplace_sparseblockgemv(node):
if isinstance(node.op, GpuSparseBlockGemv) and not node.op.inplace: if isinstance(node.op, GpuSparseBlockGemv) and not node.op.inplace:
return [GpuSparseBlockGemv(inplace=True)(*node.inputs)] return [gpu_sparse_block_gemv_inplace(*node.inputs)]
@register_inplace() @register_inplace()
......
...@@ -18,7 +18,7 @@ from theano.tests import unittest_tools as utt ...@@ -18,7 +18,7 @@ from theano.tests import unittest_tools as utt
from ..type import (GpuArrayType, get_context, from ..type import (GpuArrayType, get_context,
gpuarray_shared_constructor) gpuarray_shared_constructor)
from ..basic_ops import ( from ..basic_ops import (
host_from_gpu, HostFromGpu, GpuFromHost, GpuReshape, host_from_gpu, HostFromGpu, GpuFromHost, GpuReshape, GpuToGpu,
GpuAlloc, GpuAllocEmpty, GpuContiguous, GpuAlloc, GpuAllocEmpty, GpuContiguous,
gpu_join, GpuJoin, GpuSplit, GpuEye, gpu_contiguous) gpu_join, GpuJoin, GpuSplit, GpuEye, gpu_contiguous)
from ..subtensor import GpuSubtensor from ..subtensor import GpuSubtensor
...@@ -182,6 +182,21 @@ def test_transfer_cpu_gpu(): ...@@ -182,6 +182,21 @@ def test_transfer_cpu_gpu():
assert numpy.all(fv == av) assert numpy.all(fv == av)
def test_transfer_gpu_gpu():
g = GpuArrayType(dtype='float32', broadcastable=(False, False),
context_name=test_ctx_name)()
av = numpy.asarray(rng.rand(5, 4), dtype='float32')
gv = gpuarray.array(av, context=get_context(test_ctx_name))
mode = mode_with_gpu.excluding('cut_gpua_host_transfers', 'local_cut_gpua_host_gpua')
f = theano.function([g], GpuToGpu(test_ctx_name)(g), mode=mode)
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, GpuToGpu)
fv = f(gv)
assert GpuArrayType.values_eq(fv, gv)
def test_transfer_strided(): def test_transfer_strided():
# This is just to ensure that it works in theano # This is just to ensure that it works in theano
# libgpuarray has a much more comprehensive suit of tests to # libgpuarray has a much more comprehensive suit of tests to
......
...@@ -197,7 +197,7 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY): ...@@ -197,7 +197,7 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY):
def setUp(self): def setUp(self):
super(test_GpuCAReduceCuda, self).setUp() super(test_GpuCAReduceCuda, self).setUp()
if get_context(test_ctx_name).kind != 'cuda': if get_context(test_ctx_name).kind != b'cuda':
raise SkipTest("Cuda specific tests") raise SkipTest("Cuda specific tests")
...@@ -212,7 +212,7 @@ class T_gpureduce_dtype(test_elemwise.T_reduce_dtype): ...@@ -212,7 +212,7 @@ class T_gpureduce_dtype(test_elemwise.T_reduce_dtype):
'float32', 'float64'] 'float32', 'float64']
def setUp(self): def setUp(self):
if get_context(test_ctx_name).kind != 'cuda': if get_context(test_ctx_name).kind != b'cuda':
raise SkipTest("Cuda specific tests") raise SkipTest("Cuda specific tests")
......
...@@ -24,7 +24,7 @@ class TestGpuCumsum(theano.tensor.tests.test_extra_ops.TestCumsumOp): ...@@ -24,7 +24,7 @@ class TestGpuCumsum(theano.tensor.tests.test_extra_ops.TestCumsumOp):
def setUp(self): def setUp(self):
super(TestGpuCumsum, self).setUp() super(TestGpuCumsum, self).setUp()
test_ctx = get_context(test_ctx_name) test_ctx = get_context(test_ctx_name)
if test_ctx.kind != 'cuda': if test_ctx.kind != b'cuda':
raise SkipTest("Cuda specific tests") raise SkipTest("Cuda specific tests")
self.max_threads_dim0 = test_ctx.maxlsize0 self.max_threads_dim0 = test_ctx.maxlsize0
self.max_grid_size1 = test_ctx.maxgsize2 self.max_grid_size1 = test_ctx.maxgsize2
......
...@@ -125,7 +125,7 @@ def test_reduce(): ...@@ -125,7 +125,7 @@ def test_reduce():
topo = f.maker.fgraph.toposort() topo = f.maker.fgraph.toposort()
ops = [type(node.op) for node in topo] ops = [type(node.op) for node in topo]
if kind == 'opencl' and method in ["max", "min"]: if kind == b'opencl' and method in ["max", "min"]:
assert not(GpuCAReduceCuda in ops or GpuCAReduceCPY in ops) assert not(GpuCAReduceCuda in ops or GpuCAReduceCPY in ops)
else: else:
assert GpuCAReduceCuda in ops or GpuCAReduceCPY in ops assert GpuCAReduceCuda in ops or GpuCAReduceCPY in ops
......
...@@ -56,3 +56,32 @@ def test_advinc_subtensor1(): ...@@ -56,3 +56,32 @@ def test_advinc_subtensor1():
rep = xval.copy() rep = xval.copy()
rep[[0, 2]] += yval rep[[0, 2]] += yval
assert numpy.allclose(rval, rep) assert numpy.allclose(rval, rep)
def test_incsub_f16():
shp = (3, 3)
shared = gpuarray_shared_constructor
xval = numpy.arange(numpy.prod(shp), dtype='float16').reshape(shp) + 1
yval = numpy.empty((2,) + shp[1:], dtype='float16')
yval[:] = 2
x = shared(xval, name='x')
y = tensor.tensor(dtype='float16',
broadcastable=(False,) * len(shp),
name='y')
expr = tensor.advanced_inc_subtensor1(x, y, [0, 2])
f = theano.function([y], expr, mode=mode_with_gpu)
assert sum([isinstance(node.op, GpuAdvancedIncSubtensor1)
for node in f.maker.fgraph.toposort()]) == 1
rval = f(yval)
rep = xval.copy()
rep[[0, 2]] += yval
assert numpy.allclose(rval, rep)
expr = tensor.inc_subtensor(x[1:], y)
f = theano.function([y], expr, mode=mode_with_gpu)
assert sum([isinstance(node.op, GpuIncSubtensor)
for node in f.maker.fgraph.toposort()]) == 1
rval = f(yval)
rep = xval.copy()
rep[1:] += yval
assert numpy.allclose(rval, rep)
...@@ -301,20 +301,14 @@ class GpuArrayType(Type): ...@@ -301,20 +301,14 @@ class GpuArrayType(Type):
raise NotImplementedError( raise NotImplementedError(
"GpuArrayType.values_eq_approx() don't implemented the" "GpuArrayType.values_eq_approx() don't implemented the"
" allow_remove_inf and allow_remove_nan parameter") " allow_remove_inf and allow_remove_nan parameter")
if a.dtype == 'float16' or b.dtype == 'float16':
an = numpy.asarray(a)
bn = numpy.asarray(b)
return tensor.TensorType.values_eq_approx(
an, bn, allow_remove_inf=allow_remove_inf,
allow_remove_nan=allow_remove_nan, rtol=rtol, atol=atol)
atol_, rtol_ = theano.tensor.basic._get_atol_rtol(a, b) atol_, rtol_ = theano.tensor.basic._get_atol_rtol(a, b)
if rtol is not None: if rtol is not None:
rtol_ = rtol rtol_ = rtol
if atol is not None: if atol is not None:
atol_ = atol atol_ = atol
res = elemwise2(a, '', b, a, odtype=numpy.dtype('bool'), res = elemwise2(a, '', b, a, odtype=numpy.dtype('bool'),
op_tmpl="res[i] = (fabs(%%(a)s - %%(b)s) <" op_tmpl="res = (fabs(a - b) <"
"(%(atol_)s + %(rtol_)s * fabs(%%(b)s)))" % "(%(atol_)s + %(rtol_)s * fabs(b)))" %
locals()) locals())
ret = numpy.asarray(res).all() ret = numpy.asarray(res).all()
if ret: if ret:
......
...@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000, ...@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
t0 = 0 t0 = 0
t1 = -1 t1 = -1
f() # Ignore first function call to get representative time.
if execute: if execute:
sync = (hasattr(theano, "sandbox") and sync = (hasattr(theano, "sandbox") and
hasattr(theano.sandbox, "cuda") and hasattr(theano.sandbox, "cuda") and
theano.sandbox.cuda.cuda_available) theano.sandbox.cuda.cuda_available)
sync2 = (hasattr(theano, "gpuarray") and
theano.gpuarray.pygpu_activated)
t0 = time.time() t0 = time.time()
for i in range(iters): for i in range(iters):
f() f()
if sync: if sync:
theano.sandbox.cuda.synchronize() theano.sandbox.cuda.synchronize()
if sync2:
c.get_value(borrow=True, return_internal_type=True).sync()
t1 = time.time() t1 = time.time()
return t1 - t0, impl return t1 - t0, impl
...@@ -244,6 +249,7 @@ if __name__ == "__main__": ...@@ -244,6 +249,7 @@ if __name__ == "__main__":
cuda version 7.5 7.0 6.5 cuda version 7.5 7.0 6.5
gpu gpu
M40 0.47s
k80 0.96s k80 0.96s
K6000/NOECC 0.69s K6000/NOECC 0.69s
K40 0.88s K40 0.88s
......
...@@ -2526,7 +2526,8 @@ if True: ...@@ -2526,7 +2526,8 @@ if True:
out = as_cuda_ndarray_variable(out.dimshuffle(0, 1)) out = as_cuda_ndarray_variable(out.dimshuffle(0, 1))
return [out] return [out]
@register_opt('cudnn') @register_opt('cudnn', 'stabilize', 'fast_compile')
# We put fast_compile as otherwise it won't be on the GPU.
@local_optimizer([GpuElemwise, LogSoftmax]) @local_optimizer([GpuElemwise, LogSoftmax])
def local_log_softmax_dnn(node): def local_log_softmax_dnn(node):
# The log-softmax implementation is only available starting at cuDNN V3 # The log-softmax implementation is only available starting at cuDNN V3
......
...@@ -14,6 +14,7 @@ from . import dnn ...@@ -14,6 +14,7 @@ from . import dnn
import theano import theano
from theano import scalar as scal from theano import scalar as scal
from theano import config, tensor, gof from theano import config, tensor, gof
from theano.compile.ops import shape_i
import theano.ifelse import theano.ifelse
import theano.tensor.signal.pool import theano.tensor.signal.pool
import theano.tensor.nnet import theano.tensor.nnet
...@@ -900,18 +901,14 @@ def local_gpu_careduce(node): ...@@ -900,18 +901,14 @@ def local_gpu_careduce(node):
# to make them a single dimension, do the reduction, and # to make them a single dimension, do the reduction, and
# then reshape to get them back. # then reshape to get them back.
shape_of = node.fgraph.shape_feature.shape_of new_in_shp = [shape_i(x, 0)]
x_shape = shape_of[x]
new_in_shp = [x_shape[0]]
new_mask = [reduce_mask[0]] new_mask = [reduce_mask[0]]
for i in xrange(1, x.type.ndim): for i in xrange(1, x.type.ndim):
if reduce_mask[i] == reduce_mask[i - 1]: if reduce_mask[i] == reduce_mask[i - 1]:
new_in_shp[-1] *= x_shape[i] new_in_shp[-1] *= shape_i(x, i)
else: else:
new_mask.append(reduce_mask[i]) new_mask.append(reduce_mask[i])
new_in_shp.append(x_shape[i]) new_in_shp.append(shape_i(x, i))
new_greduce = GpuCAReduce(new_mask, scalar_op) new_greduce = GpuCAReduce(new_mask, scalar_op)
new_x = x.reshape(tensor.stack(new_in_shp)) new_x = x.reshape(tensor.stack(new_in_shp))
...@@ -936,8 +933,11 @@ def local_gpu_careduce(node): ...@@ -936,8 +933,11 @@ def local_gpu_careduce(node):
# Restore the expected shape of the output # Restore the expected shape of the output
if rval.ndim != out.ndim: if rval.ndim != out.ndim:
rval = rval.reshape( out_shp = []
tensor.stack(shape_of[out])) for i in range(x.ndim):
if i not in node.op.axis:
out_shp.append(shape_i(x, i))
rval = rval.reshape(tensor.stack(out_shp))
if rval.type == out.type: if rval.type == out.type:
return [rval] return [rval]
......
...@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray.""" ...@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray."""
import warnings import warnings
from theano.gpuarray import * from theano.gpuarray import *
message = "theano.sandbox.gpuarray has been moved to theano.gpuarray." + \ message = ("theano.sandbox.gpuarray has been moved to theano.gpuarray. "
" Please update your code and pickles." "Please update your code and pickles. If the warning persists, "
"clear theano's cache ('$theano/bin/theano-cache clear').")
warnings.warn(message) warnings.warn(message)
...@@ -2543,7 +2543,7 @@ class Log2(UnaryScalarOp): ...@@ -2543,7 +2543,7 @@ class Log2(UnaryScalarOp):
else: else:
return [x.zeros_like()] return [x.zeros_like()]
return gz / (x * math.log(2.0)), return gz / (x * numpy.asarray(math.log(2.0)).astype(x.dtype)),
def c_code(self, node, name, inputs, outputs, sub): def c_code(self, node, name, inputs, outputs, sub):
(x,) = inputs (x,) = inputs
......
...@@ -202,7 +202,7 @@ def remove_constants_and_unused_inputs_scan(node): ...@@ -202,7 +202,7 @@ def remove_constants_and_unused_inputs_scan(node):
# DEBUG CHECK # DEBUG CHECK
nwScan = scan_op.Scan(nw_inner, op_outs, nw_info) nwScan = scan_op.Scan(nw_inner, op_outs, nw_info)
nw_outs = nwScan(*nw_outer, **dict(return_list=True)) nw_outs = nwScan(*nw_outer, **dict(return_list=True))
return dict([("remove", [node])] + list(zip(node.outputs, nw_outs))) return OrderedDict([("remove", [node])] + list(zip(node.outputs, nw_outs)))
else: else:
return False return False
...@@ -2072,8 +2072,8 @@ def scan_merge_inouts(node): ...@@ -2072,8 +2072,8 @@ def scan_merge_inouts(node):
new_outer_out_mit_mot.append(outer_omm) new_outer_out_mit_mot.append(outer_omm)
na.outer_out_mit_mot = new_outer_out_mit_mot na.outer_out_mit_mot = new_outer_out_mit_mot
if remove: if remove:
return dict([("remove", remove)] + return OrderedDict([("remove", remove)] +
list(zip(node.outputs, na.outer_outputs))) list(zip(node.outputs, na.outer_outputs)))
return na.outer_outputs return na.outer_outputs
......
...@@ -612,14 +612,14 @@ def get_scalar_constant_value(orig_v, elemwise=True, ...@@ -612,14 +612,14 @@ def get_scalar_constant_value(orig_v, elemwise=True,
return numpy.asarray(v) return numpy.asarray(v)
if isinstance(v, numpy.ndarray): if isinstance(v, numpy.ndarray):
return numpy_scalar(v) return numpy_scalar(v).copy()
if isinstance(v, Constant): if isinstance(v, Constant):
if getattr(v.tag, 'unique_value', None) is not None: if getattr(v.tag, 'unique_value', None) is not None:
data = v.tag.unique_value data = v.tag.unique_value
else: else:
data = v.data data = v.data
return numpy_scalar(data) return numpy_scalar(data).copy()
if not only_process_constants and getattr(v, 'owner', None): if not only_process_constants and getattr(v, 'owner', None):
if isinstance(v.owner.op, (Alloc, DimShuffle, Rebroadcast, if isinstance(v.owner.op, (Alloc, DimShuffle, Rebroadcast,
...@@ -649,7 +649,7 @@ def get_scalar_constant_value(orig_v, elemwise=True, ...@@ -649,7 +649,7 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for i in v.owner.inputs] for i in v.owner.inputs]
ret = [[None]] ret = [[None]]
v.owner.op.perform(v.owner, const, ret) v.owner.op.perform(v.owner, const, ret)
return ret[0][0] return ret[0][0].copy()
elif elemwise and isinstance(v.owner.op, Elemwise): elif elemwise and isinstance(v.owner.op, Elemwise):
if isinstance(v.owner.op.scalar_op, scal.Second): if isinstance(v.owner.op.scalar_op, scal.Second):
# We don't need both input to be constant for second # We don't need both input to be constant for second
...@@ -662,13 +662,13 @@ def get_scalar_constant_value(orig_v, elemwise=True, ...@@ -662,13 +662,13 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for i in v.owner.inputs] for i in v.owner.inputs]
ret = [[None]] ret = [[None]]
v.owner.op.perform(v.owner, const, ret) v.owner.op.perform(v.owner, const, ret)
return ret[0][0] return ret[0][0].copy()
elif (isinstance(v.owner.op, theano.tensor.subtensor.Subtensor) and elif (isinstance(v.owner.op, theano.tensor.subtensor.Subtensor) and
v.ndim == 0): v.ndim == 0):
if isinstance(v.owner.inputs[0], TensorConstant): if isinstance(v.owner.inputs[0], TensorConstant):
cdata = tuple(v.owner.op.get_constant_idx(v.owner.inputs)) cdata = tuple(v.owner.op.get_constant_idx(v.owner.inputs))
try: try:
return v.owner.inputs[0].data.__getitem__(cdata) return v.owner.inputs[0].data.__getitem__(cdata).copy()
except IndexError: except IndexError:
raise IndexError( raise IndexError(
str(tuple(v.owner.op.idx_list)) + str(tuple(v.owner.op.idx_list)) +
...@@ -1399,8 +1399,6 @@ class MaxAndArgmax(Op): ...@@ -1399,8 +1399,6 @@ class MaxAndArgmax(Op):
%(axis_code)s %(axis_code)s
%(max)s = (PyArrayObject*)PyArray_Max(%(x)s, axis, NULL); %(max)s = (PyArrayObject*)PyArray_Max(%(x)s, axis, NULL);
if(%(max)s == NULL){ if(%(max)s == NULL){
PyErr_SetString(PyExc_ValueError,
"MaxAndArgmax, max failed");
%(fail)s; %(fail)s;
} }
if(!PyArray_CheckExact(%(max)s)){ if(!PyArray_CheckExact(%(max)s)){
...@@ -1412,7 +1410,6 @@ class MaxAndArgmax(Op): ...@@ -1412,7 +1410,6 @@ class MaxAndArgmax(Op):
%(argmax)s = (PyArrayObject*)PyArray_ArgMax(%(x)s, axis, NULL); %(argmax)s = (PyArrayObject*)PyArray_ArgMax(%(x)s, axis, NULL);
if(%(argmax)s == NULL){ if(%(argmax)s == NULL){
PyErr_SetString(PyExc_ValueError, "MaxAndArgmax, argmax failed");
Py_CLEAR(%(max)s); Py_CLEAR(%(max)s);
%(fail)s; %(fail)s;
} }
...@@ -1434,7 +1431,7 @@ class MaxAndArgmax(Op): ...@@ -1434,7 +1431,7 @@ class MaxAndArgmax(Op):
return ret % locals() return ret % locals()
def c_code_cache_version(self): def c_code_cache_version(self):
return (3,) return (4,)
def infer_shape(self, node, shapes): def infer_shape(self, node, shapes):
ishape, axis_shape = shapes ishape, axis_shape = shapes
......
...@@ -152,6 +152,7 @@ from theano.tensor import basic as T ...@@ -152,6 +152,7 @@ from theano.tensor import basic as T
from theano.tensor.blas_headers import blas_header_text from theano.tensor.blas_headers import blas_header_text
from theano.tensor.blas_headers import blas_header_version from theano.tensor.blas_headers import blas_header_version
from theano.tensor.opt import in2out, local_dimshuffle_lift from theano.tensor.opt import in2out, local_dimshuffle_lift
from theano.tensor.type import values_eq_approx_remove_inf_nan
_logger = logging.getLogger('theano.tensor.blas') _logger = logging.getLogger('theano.tensor.blas')
...@@ -1435,7 +1436,8 @@ class GemmOptimizer(Optimizer): ...@@ -1435,7 +1436,8 @@ class GemmOptimizer(Optimizer):
if new_node is not node: if new_node is not node:
nodelist.append(new_node) nodelist.append(new_node)
u = theano.gof.opt.Updater(on_import, None, None) u = theano.gof.opt.Updater(on_import, None, None,
name="GemmOptimizer")
fgraph.attach_feature(u) fgraph.attach_feature(u)
while did_something: while did_something:
nb_iter += 1 nb_iter += 1
...@@ -1465,6 +1467,7 @@ class GemmOptimizer(Optimizer): ...@@ -1465,6 +1467,7 @@ class GemmOptimizer(Optimizer):
if new_outputs: if new_outputs:
new_outputs, old_dot22 = new_outputs new_outputs, old_dot22 = new_outputs
assert len(new_outputs) == len(node.outputs) assert len(new_outputs) == len(node.outputs)
new_outputs[0].tag.values_eq_approx = values_eq_approx_remove_inf_nan
try: try:
fgraph.replace_all_validate_remove( fgraph.replace_all_validate_remove(
list(zip(node.outputs, new_outputs)), list(zip(node.outputs, new_outputs)),
......
...@@ -726,3 +726,62 @@ def norm(x, ord): ...@@ -726,3 +726,62 @@ def norm(x, ord):
raise ValueError(0) raise ValueError(0)
elif ndim > 2: elif ndim > 2:
raise NotImplementedError("We don't support norm witn ndim > 2") raise NotImplementedError("We don't support norm witn ndim > 2")
class TensorInv(Op):
"""
Class wrapper for tensorinv() function;
Theano utilization of numpy.linalg.tensorinv;
"""
_numop = staticmethod(numpy.linalg.tensorinv)
__props__ = ('ind',)
def __init__(self, ind=2):
self.ind = ind
def make_node(self, a):
a = as_tensor_variable(a)
out = a.type()
return Apply(self, [a], [out])
def perform(self, node, inputs, outputs):
(a,) = inputs
(x,) = outputs
x[0] = self._numop(a, self.ind)
def infer_shape(self, node, shapes):
sp = shapes[0][self.ind:] + shapes[0][:self.ind]
return [sp]
def tensorinv(a, ind=2):
"""
Does not run on GPU;
Theano utilization of numpy.linalg.tensorinv;
Compute the 'inverse' of an N-dimensional array.
The result is an inverse for `a` relative to the tensordot operation
``tensordot(a, b, ind)``, i. e., up to floating-point accuracy,
``tensordot(tensorinv(a), a, ind)`` is the "identity" tensor for the
tensordot operation.
Parameters
----------
a : array_like
Tensor to 'invert'. Its shape must be 'square', i. e.,
``prod(a.shape[:ind]) == prod(a.shape[ind:])``.
ind : int, optional
Number of first indices that are involved in the inverse sum.
Must be a positive integer, default is 2.
Returns
-------
b : ndarray
`a`'s tensordot inverse, shape ``a.shape[ind:] + a.shape[:ind]``.
Raises
------
LinAlgError
If `a` is singular or not 'square' (in the above sense).
"""
return TensorInv(ind)(a)
...@@ -413,6 +413,7 @@ log1msigm_to_softplus = gof.PatternSub( ...@@ -413,6 +413,7 @@ log1msigm_to_softplus = gof.PatternSub(
values_eq_approx=values_eq_approx_remove_inf, values_eq_approx=values_eq_approx_remove_inf,
skip_identities_fn=_skip_mul_1) skip_identities_fn=_skip_mul_1)
log1pexp_to_softplus = gof.PatternSub( log1pexp_to_softplus = gof.PatternSub(
(tensor.log1p, (tensor.log1p,
(tensor.exp, 'x')), (tensor.exp, 'x')),
...@@ -420,12 +421,20 @@ log1pexp_to_softplus = gof.PatternSub( ...@@ -420,12 +421,20 @@ log1pexp_to_softplus = gof.PatternSub(
values_eq_approx=values_eq_approx_remove_inf, values_eq_approx=values_eq_approx_remove_inf,
allow_multiple_clients=True) allow_multiple_clients=True)
log1p_neg_sigmoid = gof.PatternSub(
(tensor.log1p,
(tensor.neg, (sigmoid, 'x'))),
(tensor.neg, (softplus, 'x')),
values_eq_approx=values_eq_approx_remove_inf,
allow_multiple_clients=True)
opt.register_stabilize(logsigm_to_softplus, name='logsigm_to_softplus') opt.register_stabilize(logsigm_to_softplus, name='logsigm_to_softplus')
opt.register_stabilize(log1msigm_to_softplus, name='log1msigm_to_softplus') opt.register_stabilize(log1msigm_to_softplus, name='log1msigm_to_softplus')
opt.register_stabilize(log1pexp_to_softplus, name='log1pexp_to_softplus') opt.register_stabilize(log1pexp_to_softplus, name='log1pexp_to_softplus')
opt.register_stabilize(log1p_neg_sigmoid, name='log1p_neg_sigmoid,')
def is_1pexp(t): def is_1pexp(t, only_process_constants=True):
""" """
Returns Returns
...@@ -437,8 +446,9 @@ def is_1pexp(t): ...@@ -437,8 +446,9 @@ def is_1pexp(t):
""" """
if t.owner and t.owner.op == tensor.add: if t.owner and t.owner.op == tensor.add:
scalars, scalar_inputs, nonconsts = \ scalars, scalar_inputs, nonconsts = \
opt.scalarconsts_rest(t.owner.inputs) opt.scalarconsts_rest(t.owner.inputs,
# scalar_inputs are potentially dimshuffled and fill'd scalars only_process_constants=only_process_constants)
# scalar_inputs are potentially dimshuffled and filled with scalars
if len(nonconsts) == 1: if len(nonconsts) == 1:
maybe_exp = nonconsts[0] maybe_exp = nonconsts[0]
if maybe_exp.owner and maybe_exp.owner.op == tensor.exp: if maybe_exp.owner and maybe_exp.owner.op == tensor.exp:
...@@ -947,7 +957,7 @@ def local_inv_1_plus_exp(node): ...@@ -947,7 +957,7 @@ def local_inv_1_plus_exp(node):
inv_arg = node.inputs[0] inv_arg = node.inputs[0]
if inv_arg.owner and inv_arg.owner.op == tensor.add: if inv_arg.owner and inv_arg.owner.op == tensor.add:
scalars, scalar_inputs, nonconsts = \ scalars, scalar_inputs, nonconsts = \
opt.scalarconsts_rest(inv_arg.owner.inputs) opt.scalarconsts_rest(inv_arg.owner.inputs, only_process_constants=True)
# scalar_inputs are potentially dimshuffled and fill'd scalars # scalar_inputs are potentially dimshuffled and fill'd scalars
if len(nonconsts) == 1: if len(nonconsts) == 1:
if nonconsts[0].owner and nonconsts[0].owner.op == tensor.exp: if nonconsts[0].owner and nonconsts[0].owner.op == tensor.exp:
......
...@@ -356,7 +356,6 @@ class T_sigmoid_opts(unittest.TestCase): ...@@ -356,7 +356,6 @@ class T_sigmoid_opts(unittest.TestCase):
f = theano.function([x], s, mode=mode) f = theano.function([x], s, mode=mode)
assert hasattr(f.maker.fgraph.outputs[0].tag, 'trace') assert hasattr(f.maker.fgraph.outputs[0].tag, 'trace')
topo = f.maker.fgraph.toposort() topo = f.maker.fgraph.toposort()
assert len(topo) > 1
assert not any([n.op == sigmoid for n in topo]) assert not any([n.op == sigmoid for n in topo])
ux_v = f([[-50, -10, -4, -1, 0, 1, 4, 10, 50]]) ux_v = f([[-50, -10, -4, -1, 0, 1, 4, 10, 50]])
...@@ -467,15 +466,17 @@ class T_sigmoid_utils(unittest.TestCase): ...@@ -467,15 +466,17 @@ class T_sigmoid_utils(unittest.TestCase):
try: try:
x = tensor.vector('x') x = tensor.vector('x')
exp = tensor.exp exp = tensor.exp
assert is_1pexp(1 + exp(x)) == (False, x) assert is_1pexp(1 + exp(x), False) == (False, x)
assert is_1pexp(exp(x) + 1) == (False, x) assert is_1pexp(exp(x) + 1, False) == (False, x)
for neg, exp_arg in imap(is_1pexp, [(1 + exp(-x)), (exp(-x) + 1)]): for neg, exp_arg in imap(lambda x:
is_1pexp(x, only_process_constants=False),
[(1 + exp(-x)), (exp(-x) + 1)]):
assert not neg and theano.gof.graph.is_same_graph(exp_arg, -x) assert not neg and theano.gof.graph.is_same_graph(exp_arg, -x)
assert is_1pexp(1 - exp(x)) is None assert is_1pexp(1 - exp(x), False) is None
assert is_1pexp(2 + exp(x)) is None assert is_1pexp(2 + exp(x), False) is None
assert is_1pexp(exp(x) + 2) is None assert is_1pexp(exp(x) + 2, False) is None
assert is_1pexp(exp(x) - 1) is None assert is_1pexp(exp(x) - 1, False) is None
assert is_1pexp(-1 + exp(x)) is None assert is_1pexp(-1 + exp(x), False) is None
assert is_1pexp(1 + 2 * exp(x)) is None assert is_1pexp(1 + 2 * exp(x), False) is None
finally: finally:
config.warn.identify_1pexp_bug = backup config.warn.identify_1pexp_bug = backup
差异被折叠。
...@@ -186,8 +186,12 @@ class Pool(Op): ...@@ -186,8 +186,12 @@ class Pool(Op):
if st is None: if st is None:
st = ds st = ds
r, c = imgshape[-2:] r, c = imgshape[-2:]
r += padding[0] * 2 r = tensor.extract_constant(r)
c += padding[1] * 2 c = tensor.extract_constant(c)
if padding[0]:
r += padding[0] * 2
if padding[1]:
c += padding[1] * 2
if ignore_border: if ignore_border:
if ds[0] == st[0]: if ds[0] == st[0]:
...@@ -216,7 +220,7 @@ class Pool(Op): ...@@ -216,7 +220,7 @@ class Pool(Op):
elif st[0] >= ds[0]: elif st[0] >= ds[0]:
nr = (r - 1) // st[0] + 1 nr = (r - 1) // st[0] + 1
else: else:
nr = max(0, (r - 1 - ds[0]) // st[0] + 1) + 1 nr = max(0, (r - 1 - ds[0] + st[0]) // st[0]) + 1
if isinstance(c, theano.Variable): if isinstance(c, theano.Variable):
nc = tensor.switch(tensor.ge(st[1], ds[1]), nc = tensor.switch(tensor.ge(st[1], ds[1]),
...@@ -226,7 +230,7 @@ class Pool(Op): ...@@ -226,7 +230,7 @@ class Pool(Op):
elif st[1] >= ds[1]: elif st[1] >= ds[1]:
nc = (c - 1) // st[1] + 1 nc = (c - 1) // st[1] + 1
else: else:
nc = max(0, (c - 1 - ds[1]) // st[1] + 1) + 1 nc = max(0, (c - 1 - ds[1] + st[1]) // st[1]) + 1
rval = list(imgshape[:-2]) + [nr, nc] rval = list(imgshape[:-2]) + [nr, nc]
return rval return rval
...@@ -257,10 +261,10 @@ class Pool(Op): ...@@ -257,10 +261,10 @@ class Pool(Op):
self.mode = mode self.mode = mode
def make_node(self, x): def make_node(self, x):
if x.type.ndim != 4:
raise TypeError()
# TODO: consider restricting the dtype? # TODO: consider restricting the dtype?
x = tensor.as_tensor_variable(x) x = tensor.as_tensor_variable(x)
if x.type.ndim != 4:
raise TypeError()
# If the input shape are broadcastable we can have 0 in the output shape # If the input shape are broadcastable we can have 0 in the output shape
broad = x.broadcastable[:2] + (False, False) broad = x.broadcastable[:2] + (False, False)
out = tensor.TensorType(x.dtype, broad) out = tensor.TensorType(x.dtype, broad)
...@@ -274,6 +278,9 @@ class Pool(Op): ...@@ -274,6 +278,9 @@ class Pool(Op):
'Pool requires 4D input for now') 'Pool requires 4D input for now')
z_shape = self.out_shape(x.shape, self.ds, self.ignore_border, self.st, z_shape = self.out_shape(x.shape, self.ds, self.ignore_border, self.st,
self.padding) self.padding)
if not self.ignore_border:
assert z_shape[2] > 0
assert z_shape[3] > 0
if (z[0] is None) or (z[0].shape != z_shape): if (z[0] is None) or (z[0].shape != z_shape):
z[0] = numpy.empty(z_shape, dtype=x.dtype) z[0] = numpy.empty(z_shape, dtype=x.dtype)
zz = z[0] zz = z[0]
...@@ -403,7 +410,7 @@ class Pool(Op): ...@@ -403,7 +410,7 @@ class Pool(Op):
} }
else else
{ {
z_r = std::max(0, (r - 1 - %(ds0)s) / %(st0)s + 1) + 1; z_r = std::max(0, (r - 1 - %(ds0)s + %(st0)s) / %(st0)s) + 1;
} }
// decide how many columns the output has // decide how many columns the output has
if (%(st1)s >= %(ds1)s) if (%(st1)s >= %(ds1)s)
...@@ -412,8 +419,10 @@ class Pool(Op): ...@@ -412,8 +419,10 @@ class Pool(Op):
} }
else else
{ {
z_c = std::max(0, (c - 1 - %(ds1)s) / %(st1)s + 1) + 1; z_c = std::max(0, (c - 1 - %(ds1)s + %(st0)s) / %(st1)s) + 1;
} }
assert(z_r > 0);
assert(z_c > 0);
} }
// memory allocation of z if necessary // memory allocation of z if necessary
if ((!%(z)s) if ((!%(z)s)
...@@ -522,7 +531,7 @@ class Pool(Op): ...@@ -522,7 +531,7 @@ class Pool(Op):
return ccode % locals() return ccode % locals()
def c_code_cache_version(self): def c_code_cache_version(self):
return (0, 6, 8, 3) return (0, 6, 8, 4)
class PoolGrad(Op): class PoolGrad(Op):
...@@ -632,12 +641,12 @@ class MaxPoolGrad(PoolGrad): ...@@ -632,12 +641,12 @@ class MaxPoolGrad(PoolGrad):
def make_node(self, x, maxout, gz): def make_node(self, x, maxout, gz):
# make_node should only be called by the grad function of # make_node should only be called by the grad function of
# Pool, so these asserts should not fail. # Pool, so these asserts should not fail.
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(maxout, Variable) and maxout.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
x = tensor.as_tensor_variable(x) x = tensor.as_tensor_variable(x)
maxout = tensor.as_tensor_variable(maxout) maxout = tensor.as_tensor_variable(maxout)
gz = tensor.as_tensor_variable(gz) gz = tensor.as_tensor_variable(gz)
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(maxout, Variable) and maxout.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
return Apply(self, [x, maxout, gz], [x.type()]) return Apply(self, [x, maxout, gz], [x.type()])
...@@ -814,10 +823,10 @@ class AveragePoolGrad(PoolGrad): ...@@ -814,10 +823,10 @@ class AveragePoolGrad(PoolGrad):
def make_node(self, x, gz, dummy=None): def make_node(self, x, gz, dummy=None):
# make_node should only be called by the grad function of # make_node should only be called by the grad function of
# Pool, so these asserts should not fail. # Pool, so these asserts should not fail.
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
x = tensor.as_tensor_variable(x) x = tensor.as_tensor_variable(x)
gz = tensor.as_tensor_variable(gz) gz = tensor.as_tensor_variable(gz)
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
return Apply(self, [x, gz], [x.type()]) return Apply(self, [x, gz], [x.type()])
......
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论