提交 0a7a4c06 authored 作者: Chinnadhurai Sankar's avatar Chinnadhurai Sankar

Merge branch 'master' of git://github.com/Theano/Theano

......@@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints
.pydevproject
.ropeproject
core
\ No newline at end of file
......@@ -10,15 +10,14 @@ Related Projects:
https://github.com/Theano/Theano/wiki/Related-projects
We recommend you look at the documentation on the website, since it
will be more current than the documentation included with the package.
If you really wish to build the documentation yourself, you will need
sphinx. Issue the following command:
It is recommended that you look at the documentation on the website, as it will be more current than the documentation included with the package.
In order to build the documentation yourself, you will need sphinx. Issue the following command:
python ./doc/scripts/docgen.py
Documentation is built into html/
The PDF of the documentation is html/theano.pdf
The PDF of the documentation can be found at html/theano.pdf
DIRECTORY LAYOUT
......@@ -31,7 +30,7 @@ Theano (current directory) is the distribution directory.
* tensor depends upon scalar
* sparse depends upon tensor
* sandbox can depend on everything else
* Theano/examples are copies of the example on the wiki
* Theano/examples are copies of the example found on the wiki
* Theano/benchmark and Theano/examples are in the distribution, but not in
the Python package
* Theano/bin contains executable scripts that are copied to the bin folder
......@@ -39,4 +38,4 @@ Theano (current directory) is the distribution directory.
* Tests are distributed and are part of the package, i.e. fall in
the appropriate submodules
* Theano/doc contains files and scripts used to generate the documentation
* Theano/html is the place where the documentation will be generated
* Theano/html is where the documentation will be generated
......@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops
^^^^^^^^^^^^^^^
Ops to be executed on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
When using the old GPU backend, Ops to be executed on the GPU should inherit
from ``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU.
......
......@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=gpu1``.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=cuda1``.
See :ref:`libdoc_config` for more information on how to change these
configuration options.
......@@ -508,25 +508,25 @@ Any one of them is enough.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=gpu,floatX=float32'``.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=cuda,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section:
.. code-block:: cfg
[global]
device = gpu
device = cuda
floatX = float32
Note that:
* If your computer has multiple GPUs and you use 'device=gpu', the driver
selects the one to use (usually gpu0).
* You can use the program nvida-smi to change this policy.
* You can choose one specific GPU by specifying 'device=gpuX', with X the
* If your computer has multiple GPUs and you use 'device=cuda', the driver
selects the one to use (usually cuda0).
* You can use the program ``nvidia-smi`` to change this policy.
* You can choose one specific GPU by specifying 'device=cudaX', with X the
the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU.
......@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly.
......
......@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
On 14.04, this will install Python 2 by default. If you want to use Python 3:
.. code-block:: bash
......@@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs
----------------------
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano`
command with:
.. code-block:: bash
git clone git://github.com/Theano/Theano.git
cd Theano
cd Theano
python setup.py develop --user
cd ..
VirtualEnv
----------
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`.
.. code-block:: bash
virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate
pip install Theano
......@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run:
.. code-block:: bash
git pull
......@@ -303,7 +303,7 @@ Test GPU configuration
.. code-block:: bash
THEANO_FLAGS=floatX=float32,device=gpu python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
THEANO_FLAGS=floatX=float32,device=cuda python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
.. note::
......
......@@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
.. testoutput::
:hide:
:options: +ELLIPSIS
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ...
.. code-block:: none
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000
......@@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use
############################
Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the
......@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg
[global]
device = gpu
device = cuda
floatX = float32
[nvcc]
......@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~
If you installed Python through WinPython or EPD, Theano will automatically
If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS.
.. note::
......
差异被折叠。
......@@ -1414,7 +1414,7 @@ Mathematical
.. function:: abs_(a)
Returns a variable representingthe absolute of a, ie ``|a|``.
Returns a variable representing the absolute of a, ie ``|a|``.
.. note:: Can also be accessed with ``abs(a)``.
......
......@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ =============
:term:`merge` x x
:term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x
......@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x
:term:`inplace_random` x
:term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x
:term:`local_remove_all_assert`
========================================================= ========= ============ =============
......
......@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the
graph.
For GPU graphs, this borrowing can have a major speed impact. See the following code:
.. code-block:: python
from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time
vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Which produces this output:
.. code-block:: none
$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu
*Take home message:*
When an input *x* to a function is not needed after the function
......@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y,
borrow=True)``.
......@@ -168,8 +168,8 @@ Linkers
=======
A mode is composed of 2 things: an optimizer and a linker. Some modes,
like ``NanGuardMode`` and ``DebugMode``, add logic around the optimizer and
linker. ``NanGuardMode`` and ``DebugMode`` use their own linker.
like ``NanGuardMode`` and ``DebugMode``, add logic around the
optimizer and linker. ``DebugMode`` uses its own linker.
You can select which linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers.
......@@ -183,7 +183,7 @@ c|py [#cpy1]_ yes yes "+++" Try C code. If none exis
c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only Python code
NanGuardMode no no "++++" Check if nodes generate NaN
NanGuardMode yes yes "++++" Check if nodes generate NaN
DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= ===
......
差异被折叠。
......@@ -81,7 +81,7 @@ single name and a single device.
It is often the case that multi-gpu operation requires or assumes
that all the GPUs involved are equivalent. This is not the case
for this implementation. Since the user has the task of
distrubuting the jobs across the different device a model can be
distributing the jobs across the different device a model can be
built on the assumption that one of the GPU is slower or has
smaller memory.
......@@ -140,5 +140,5 @@ is a example.
cv = gv.transfer('cpu')
Of course you can mix transfers and operations in any order you
choose. However you should try to minimize transfer operations
because they will introduce overhead any may reduce performance.
choose. However you should try to minimize transfer operations
because they will introduce overhead that may reduce performance.
......@@ -73,7 +73,7 @@ def contains_nan(arr, node=None):
elif arr.size == 0:
return False
elif cuda.cuda_available and isinstance(arr, cuda.CudaNdarray):
if (hasattr(theano.sandbox, 'rng_mrg') and
if (node and hasattr(theano.sandbox, 'rng_mrg') and
isinstance(
node.op,
# It store ints in float container
......@@ -119,7 +119,7 @@ def contains_inf(arr, node=None):
elif arr.size == 0:
return False
elif cuda.cuda_available and isinstance(arr, cuda.CudaNdarray):
if (hasattr(theano.sandbox, 'rng_mrg') and
if (node and hasattr(theano.sandbox, 'rng_mrg') and
isinstance(
node.op,
# It store ints in float container
......@@ -215,7 +215,7 @@ class NanGuardMode(Mode):
assert nan_is_error or inf_is_error or big_is_error
compile_gpu_func(nan_is_error, inf_is_error, big_is_error)
def do_check_on(var, nd, f, is_input):
def do_check_on(var, nd):
"""
Checks `var` for NaNs / Infs. If detected, raises an exception
and / or prints information about `nd`, `f`, and `is_input` to
......@@ -227,11 +227,6 @@ class NanGuardMode(Mode):
The value to be checked.
nd : theano.gof.Apply
The Apply node being executed.
f : callable
The thunk for the apply node.
is_input : bool
If True, `var` is an input to `nd`.
If False, it is an output.
"""
error = False
......@@ -262,17 +257,13 @@ class NanGuardMode(Mode):
print('Big value detected', file=sio)
error = True
if error:
if not is_input:
print("NanGuardMode found an error in the"
" output of a node in this variable:", file=sio)
if nd:
print("NanGuardMode found an error in the "
"output of a node in this variable:", file=sio)
print(theano.printing.debugprint(nd, file='str'), file=sio)
else:
print("NanGuardMode found an error in an"
" input of this node.", file=sio)
print('Node:', file=sio)
print(nd, file=sio)
print("The input variable that cause problem:", file=sio)
print(theano.printing.debugprint(nd, file='str'), file=sio)
print("NanGuardMode found an error in an input of the "
"graph.", file=sio)
msg = sio.getvalue()
if config.NanGuardMode.action == 'raise':
raise AssertionError(msg)
......@@ -283,36 +274,16 @@ class NanGuardMode(Mode):
elif config.NanGuardMode.action == 'warn':
logger.error(msg)
def nan_check(i, node, fn):
"""
Runs `fn` while checking its inputs and outputs for NaNs / Infs.
Parameters
----------
i :
Currently ignored.
TODO: determine why it is here or remove).
node : theano.gof.Apply
The Apply node currently being executed.
fn : callable
The thunk to execute for this Apply node.
"""
inputs = fn.inputs
for x, var in zip(inputs, node.inputs):
# If the input is the result of computation, then we
# don't need to check it. It is already done after the
# computation.
if (var.owner is None and
getattr(var.tag, 'nan_guard_mode_check', True)):
do_check_on(x[0], node, fn, True)
fn()
outputs = fn.outputs
for x, var in zip(outputs, node.outputs):
def nan_check(node, thunk, storage_map, compute_map):
for var in node.outputs:
if getattr(var.tag, 'nan_guard_mode_check', True):
do_check_on(x[0], node, fn, False)
do_check_on(storage_map[var][0], node)
def nan_check_input(var, value):
if getattr(var.tag, 'nan_guard_mode_check', True):
do_check_on(value, None)
wrap_linker = theano.gof.WrapLinker([theano.gof.OpWiseCLinker()],
nan_check)
wrap_linker = theano.gof.vm.VM_Linker(callback=nan_check,
callback_input=nan_check_input)
super(NanGuardMode, self).__init__(wrap_linker,
optimizer=self.provided_optimizer)
......@@ -84,10 +84,15 @@ def _atexit_print_fn():
cum_attr[key] = val
if cum.optimizer_profile and ps.optimizer_profile:
merge = cum.optimizer_profile[0].merge_profile(
cum.optimizer_profile[1],
ps.optimizer_profile[1])
cum.optimizer_profile = (cum.optimizer_profile[0], merge)
try:
merge = cum.optimizer_profile[0].merge_profile(
cum.optimizer_profile[1],
ps.optimizer_profile[1])
cum.optimizer_profile = (cum.optimizer_profile[0], merge)
except Exception as e:
print("Got an exception while merging profile")
print(e)
cum.optimizer_profile = None
else:
cum.optimizer_profile = None
......
......@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam):
AddConfigVar(
'device',
("Default device for computations. If gpu*, change the default to try "
"to move computation to it and to put shared variable of float32 "
"on it. Do not use upper case letters, only lower case even if "
"NVIDIA use capital letters."),
("Default device for computations. If cuda* or opencl*, change the"
"default to try to move computation to the GPU. Do not use upper case"
"letters, only lower case even if NVIDIA uses capital letters."),
DeviceParam('cpu', allow_override=False),
in_c_key=False)
......@@ -273,7 +272,8 @@ def safe_no_dnn_workmem_bwd(workmem):
return True
AddConfigVar('dnn.conv.workmem_bwd',
"This flag is deprecated; use dnn.conv.algo_bwd.",
"This flag is deprecated; use `dnn.conv.algo_bwd_filter` "
"and `dnn.conv.algo_bwd_data` instead.",
ConfigParam('', allow_override=False,
filter=safe_no_dnn_workmem_bwd),
in_c_key=False)
......@@ -651,8 +651,8 @@ AddConfigVar('warn.ignore_bug_before',
"bugs found after that version. "
"Warning for specific bugs can be configured with specific "
"[warn] flags."),
EnumStr('0.7', 'None', 'all', '0.3', '0.4', '0.4.1', '0.5', '0.7',
'0.8',
EnumStr('0.7', 'None', 'all', '0.3', '0.4', '0.4.1', '0.5', '0.6',
'0.7', '0.8', '0.8.1', '0.8.2',
allow_override=False),
in_c_key=False)
......
......@@ -165,6 +165,9 @@ def raise_with_op(node, thunk=None, exc_info=None, storage_map=None):
detailed_err_msg += ("Inputs shapes: %s" % shapes +
"\nInputs strides: %s" % strides +
"\nInputs values: %s" % scalar_values)
if theano.config.exception_verbosity == 'high':
detailed_err_msg += "\nInputs type_num: %s" % str(
[getattr(getattr(i[0], 'dtype', ''), 'num', '') for i in thunk.inputs])
if hasattr(node.op, '__input_name__'):
detailed_err_msg += "\nInputs name: %s\n" % str(node.op.__input_name__)
......
差异被折叠。
......@@ -244,16 +244,26 @@ class EquilibriumDB(DB):
optimization application. This could result in less fgraph iterations,
but this doesn't mean it will be faster globally.
tracks_on_change_inputs
If True, we will re-apply local opt on nodes whose inputs
changed during local optimization application. This could
result in less fgraph iterations, but this doesn't mean it
will be faster globally.
Notes
-----
We can put LocalOptimizer and Optimizer as EquilibriumOptimizer
suppor both.
It is probably not a good idea to have ignore_newtrees=False and
tracks_on_change_inputs=True
"""
def __init__(self, ignore_newtrees=True):
def __init__(self, ignore_newtrees=True, tracks_on_change_inputs=False):
super(EquilibriumDB, self).__init__()
self.ignore_newtrees = ignore_newtrees
self.tracks_on_change_inputs = tracks_on_change_inputs
self.__final__ = {}
self.__cleanup__ = {}
......@@ -281,6 +291,7 @@ class EquilibriumDB(DB):
opts,
max_use_ratio=config.optdb.max_use_ratio,
ignore_newtrees=self.ignore_newtrees,
tracks_on_change_inputs=self.tracks_on_change_inputs,
failure_callback=opt.NavigatorOptimizer.warn_inplace,
final_optimizers=final_opts,
cleanup_optimizers=cleanup_opts)
......
......@@ -332,7 +332,7 @@ class Stack(VM):
def __init__(self, nodes, thunks, pre_call_clear,
storage_map, compute_map, fgraph, allow_gc,
dependencies=None, callback=None):
dependencies=None, callback=None, callback_input=None):
super(Stack, self).__init__(nodes, thunks, pre_call_clear)
self.allow_gc = allow_gc
......@@ -345,6 +345,7 @@ class Stack(VM):
self.compute_map = compute_map
self.node_idx = node_idx = {}
self.callback = callback
self.callback_input = callback_input
ords = fgraph.orderings()
......@@ -411,6 +412,8 @@ class Stack(VM):
for k in self.storage_map:
compute_map[k][0] = (k.owner is None)
if self.callback_input and compute_map[k][0]:
self.callback_input(k, self.storage_map[k][0])
# apply_stack contains nodes
if output_subset is not None:
......@@ -684,6 +687,11 @@ class VM_Linker(link.LocalLinker):
A callable object to call after each call to a thunk within
the virtual machine. It will be called with four arguments called
'node', 'thunk', 'storage_map', and 'compute_map'.
callback_input
A callable object to call on each input to the graph
(variables with no owner). This includes constants and shared
variables values. It will be called with two arguments:
'var', 'value'.
lazy
Useful only when use_cloop is False. When lazy is None, use the
theano flag vm.lazy value. Then if we have a None (default) we auto
......@@ -700,8 +708,8 @@ class VM_Linker(link.LocalLinker):
"""
def __init__(self, allow_gc=None, use_cloop=False, callback=None,
lazy=None, schedule=None, c_thunks=None,
allow_partial_eval=None):
callback_input=None, lazy=None, schedule=None,
c_thunks=None, allow_partial_eval=None):
# Note: if more parameters are added to __init__, make sure to forward
# them in the "type(self)(...)" call in the "accept" method below.
if allow_gc is None:
......@@ -710,6 +718,7 @@ class VM_Linker(link.LocalLinker):
self.allow_gc = allow_gc
self.use_cloop = use_cloop
self.callback = callback
self.callback_input = callback_input
self.lazy = lazy
self.c_thunks = c_thunks
self.allow_partial_eval = allow_partial_eval
......@@ -760,9 +769,11 @@ class VM_Linker(link.LocalLinker):
allow_gc=self.allow_gc,
use_cloop=self.use_cloop,
callback=self.callback,
callback_input=self.callback_input,
lazy=self.lazy,
schedule=self.schedule,
c_thunks=self.c_thunks,
allow_partial_eval=self.allow_partial_eval
).accept(fgraph, no_recycling)
self.fgraph = fgraph
self.no_recycling = no_recycling
......@@ -829,16 +840,17 @@ class VM_Linker(link.LocalLinker):
pre_call_clear = [storage_map[v] for v in self.no_recycling]
if (self.callback is not None or
if (self.callback is not None or self.callback_input is not None or
(config.profile and config.profile_memory) or
getattr(self, 'allow_partial_eval', False)):
self.allow_partial_eval):
if self.use_cloop and self.callback is not None:
if self.use_cloop and (self.callback is not None or
self.callback_input is not None):
logger.warn('CVM does not support callback, using Stack VM.')
if self.use_cloop and config.profile_memory:
warnings.warn(
'CVM does not support memory profile, using Stack VM.')
if self.use_cloop and getattr(self, 'allow_partial_eval', False):
if self.use_cloop and self.allow_partial_eval:
warnings.warn(
'CVM does not support partial evaluation yet, '
'using Stack VM.')
......@@ -849,7 +861,8 @@ class VM_Linker(link.LocalLinker):
storage_map, compute_map,
self.fgraph, self.allow_gc,
dependencies=deps,
callback=self.callback)
callback=self.callback,
callback_input=self.callback_input)
elif self.use_cloop:
# create a map from nodes to ints and vars to ints
nodes_idx = {}
......@@ -1046,7 +1059,7 @@ class VM_Linker(link.LocalLinker):
if lazy is None:
lazy = not all([(not th.lazy) for th in thunks])
if not (lazy or (config.profile and config.profile_memory) or
self.use_cloop or self.callback):
self.use_cloop or self.callback or self.callback_input):
for pair in itervalues(reallocated_info):
storage_map[pair[1]] = storage_map[pair[0]]
......@@ -1088,3 +1101,7 @@ class VM_Linker(link.LocalLinker):
self.__dict__.update(d)
if not hasattr(self, 'c_thunks'):
self.c_thunks = True
if not hasattr(self, 'allow_partial_eval'):
self.allow_partial_eval = None
if not hasattr(self, 'callback_input'):
self.callback_input = None
......@@ -42,7 +42,7 @@ register_transfer(transfer)
def init_dev(dev, name=None):
v = pygpu.gpuarray.api_version()
expected = -9998
expected = -9997
if v[0] != expected:
raise RuntimeError("Wrong major API version for gpuarray:", v[0],
"Make sure Theano and libgpuarray/pygpu "
......@@ -50,6 +50,15 @@ def init_dev(dev, name=None):
if v[1] < 0:
raise RuntimeError("Wrong minor API version for gpuarray:", v[1],
"Please update libgpuarray/pygpu.")
if len(v) < 3:
vpy = -1
else:
vpy = v[2]
vpye = 0
if vpy < vpye:
print("Wrong python API version for gpuarray:", vpy, "expected:", vpye,
"Some python ops may not work correctly and/or crash. "
"Consider updating pygpu.", file=sys.stderr)
global pygpu_activated
if dev not in init_dev.devmap:
ctx = pygpu.init(dev,
......
......@@ -259,14 +259,14 @@ class GpuKernelBase(object):
int types[%(numargs)u] = {%(types)s};
const char *bcode = %(bvar)s;
size_t sz = sizeof(%(bvar)s);
if (GpuKernel_init(&%(ovar)s, %(ctx)s->ops, %(ctx)s->ctx, 1, &bcode, &sz,
if (GpuKernel_init(&%(ovar)s, %(ctx)s->ctx, 1, &bcode, &sz,
"%(kname)s", %(numargs)u, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) {
if ((err = GpuKernel_init(&%(ovar)s, %(ctx)s->ops, %(ctx)s->ctx, 1,
if ((err = GpuKernel_init(&%(ovar)s, %(ctx)s->ctx, 1,
&%(cname)s, NULL, "%(kname)s", %(numargs)u,
types, %(flags)s, NULL)) != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "GpuKernel_init error %%d: %%s",
err, Gpu_error(%(ctx)s->ops, %(ctx)s->ctx, err));
err, gpucontext_error(%(ctx)s->ctx, err));
%(fail)s
}
}
......@@ -310,7 +310,7 @@ class GpuKernelBase(object):
The node that we need the cache version for.
"""
return (3, self.get_params(node).bin_id)
return (4, self.get_params(node).bin_id)
class HostFromGpu(Op):
......@@ -529,15 +529,22 @@ class GpuToGpu(Op):
def c_code(self, node, name, inputs, outputs, sub):
return """
Py_XDECREF(%(out)s);
%(out)s = pygpu_transfer(%(inp)s, %(ctx)s, 0);
%(out)s = pygpu_empty(%(inp)s->ga.nd,
%(inp)s->ga.dimensions,
%(inp)s->ga.typecode,
GpuArray_IS_C_CONTIGUOUS(&(%(inp)s->ga)) ? GA_C_ORDER:GA_F_ORDER,
%(ctx)s, Py_None);
if (%(out)s == NULL) {
%(fail)s
}
if (pygpu_transfer(%(out)s, %(inp)s)) {
%(fail)s
}
""" % {'inp': inputs[0], 'ctx': sub['params'],
'out': outputs[0], 'fail': sub['fail']}
def c_code_cache_version(self):
return (0,)
return (1,)
class GpuAlloc(HideC, Alloc):
......
......@@ -24,16 +24,9 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
size_t *offW = NULL;
size_t *offInp = NULL;
size_t *offOut = NULL;
gpuarray_blas_ops *blas_ops;
int err;
err = ctx->ops->property(ctx->ctx, NULL, NULL,
GA_CTX_PROP_BLAS_OPS, &blas_ops);
if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't get blas ops");
return -1;
}
err = blas_ops->setup(ctx->ctx);
err = gpublas_setup(ctx->ctx);
if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't setup blas");
return -1;
......@@ -93,29 +86,29 @@ int APPLY_SPECIFIC(blockgemv)(PyGpuArrayObject *o, PyGpuArrayObject *W,
}
if (out->ga.typecode == GA_FLOAT) {
err = blas_ops->sgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
err = gpublas_sgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
} else if (out->ga.typecode == GA_DOUBLE) {
err = blas_ops->dgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
err = gpublas_dgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
} else if (out->ga.typecode == GA_HALF) {
err = blas_ops->sgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
err = gpublas_sgemvBatch(cb_fortran, transA,
PyGpuArray_DIMS(out)[2],
PyGpuArray_DIMS(h)[2], 1,
W_list, offW, lda,
inp_list, offInp, PyGpuArray_STRIDES(h)[2] / gpuarray_get_elsize(h->ga.typecode),
1, out_list, offOut, PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode),
PyGpuArray_DIMS(out)[1] * PyGpuArray_DIMS(h)[1] * PyGpuArray_DIMS(out)[0], 0);
} else {
err = GA_INVALID_ERROR;
}
......
......@@ -12,16 +12,9 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
size_t *offOut = NULL;
size_t *offX = NULL;
size_t *offY = NULL;
gpuarray_blas_ops *blas_ops;
int err;
err = ctx->ops->property(ctx->ctx, NULL, NULL,
GA_CTX_PROP_BLAS_OPS, &blas_ops);
if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't get blas ops");
return -1;
}
err = blas_ops->setup(ctx->ctx);
err = gpublas_setup(ctx->ctx);
if (err != GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Can't setup blas");
return -1;
......@@ -84,26 +77,26 @@ int APPLY_SPECIFIC(blockger)(PyGpuArrayObject *o, PyGpuArrayObject *x,
ssize_t str_out = PyGpuArray_STRIDES(out)[2] / gpuarray_get_elsize(out->ga.typecode);
if (out->ga.typecode == GA_FLOAT) {
err = blas_ops->sgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(float *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
err = gpublas_sgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(float *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
} else if (out->ga.typecode == GA_DOUBLE) {
err = blas_ops->dgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(double *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
err = gpublas_dgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(double *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
} else if (out->ga.typecode == GA_HALF) {
err = blas_ops->hgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(float *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
err = gpublas_hgerBatch(cb_fortran,
PyGpuArray_DIMS(y)[2], PyGpuArray_DIMS(x)[2],
*(float *)PyArray_GETPTR1(alpha, 0),
y_list, offY, str_y, x_list, offX, str_x,
o_list, offOut, str_out,
PyGpuArray_DIMS(x)[0] * PyGpuArray_DIMS(x)[1] * PyGpuArray_DIMS(y)[1], 0);
} else {
err = GA_INVALID_ERROR;
}
......
......@@ -125,7 +125,7 @@ def dnn_available(context_name):
ctx = get_context(context_name)
if not ctx.kind == 'cuda':
if not ctx.kind == b'cuda':
dnn_available.msg = "Not on a CUDA device."
return False
......@@ -1493,7 +1493,7 @@ def local_dnn_convi_output_merge(node, *inputs):
return [GpuDnnConvGradI(algo=node.op.algo)(*inputs)]
@register_opt('cudnn')
@register_opt('cudnn', 'fast_compile')
@op_lifter([Pool])
def local_pool_dnn_alternative(node, ctx_name):
if not dnn_available(ctx_name):
......@@ -1509,7 +1509,7 @@ def local_pool_dnn_alternative(node, ctx_name):
return dnn_pool(gpu_contiguous(img), ds, stride=stride, pad=pad, mode=mode)
@register_opt('cudnn')
@register_opt('cudnn', 'fast_compile')
@op_lifter([MaxPoolGrad])
def local_pool_dnn_grad_stride(node, ctx_name):
if not dnn_available(ctx_name):
......@@ -1533,7 +1533,7 @@ def local_pool_dnn_grad_stride(node, ctx_name):
pad)
@register_opt('cudnn')
@register_opt('cudnn', 'fast_compile')
@op_lifter([AveragePoolGrad])
def local_avg_pool_dnn_grad_stride(node, ctx_name):
if not dnn_available(ctx_name):
......@@ -1556,7 +1556,7 @@ def local_avg_pool_dnn_grad_stride(node, ctx_name):
return GpuDnnPoolGrad(mode=mode)(gpu_contiguous(inp), cg, cg, ds, st, pad)
@register_opt('cudnn')
@register_opt('cudnn', 'fast_compile')
@local_optimizer([GpuSoftmax])
def local_softmax_dnn(node):
if isinstance(node.op, GpuSoftmax):
......@@ -1569,7 +1569,7 @@ def local_softmax_dnn(node):
return [out]
@register_opt('cudnn')
@register_opt('cudnn', 'stabilize')
@local_optimizer([GpuElemwise])
def local_log_softmax_dnn(node):
# This looks for GpuDnnSoftmax so we know that we have cudnn.
......@@ -1586,7 +1586,7 @@ def local_log_softmax_dnn(node):
return [new_softmax(softmax_node.inputs[0])]
@register_opt('cudnn')
@register_opt('cudnn', 'fast_compile')
@op_lifter([LogSoftmax])
def local_logsoftmax_to_dnn(node, ctx_name):
# Transform the input in the format expected by GpuDnnSoftmax
......@@ -1624,7 +1624,7 @@ class NoCuDNNRaise(Optimizer):
gpu_seqopt.register("NoCuDNNRaise", NoCuDNNRaise(), 0, 'cudnn')
@register_opt('cudnn')
@register_opt('cudnn', 'fast_compile')
@op_lifter([SoftmaxGrad])
def local_softmax_dnn_grad(node, ctx_name):
if not dnn_available(ctx_name):
......
......@@ -105,7 +105,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
algo = choice.algo;
#else
size_t free;
int err2 = c->ops->property(c->ctx, NULL, NULL, GA_CTX_PROP_FREE_GMEM, &free);
int err2 = gpucontext_property(c->ctx, GA_CTX_PROP_FREE_GMEM, &free);
if (err2 != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "Error when trying to find the "
......@@ -234,7 +234,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
* to place a nice get_work_mem() function in.
*/
if (worksize != 0) {
workspace = c->ops->buffer_alloc(c->ctx, worksize, NULL, 0, NULL);
workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, NULL);
if (workspace == NULL) {
PyErr_SetString(PyExc_RuntimeError,
"Could not allocate working memory");
......@@ -258,7 +258,7 @@ APPLY_SPECIFIC(conv_fwd)(PyGpuArrayObject *input, PyGpuArrayObject *kerns,
APPLY_SPECIFIC(output), PyGpuArray_DEV_DATA(*output));
if (worksize != 0)
c->ops->buffer_release(workspace);
gpudata_release(workspace);
cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ);
cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ);
......
......@@ -106,7 +106,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
algo = choice.algo;
#else
size_t free;
int err2 = c->ops->property(c->ctx, NULL, NULL, GA_CTX_PROP_FREE_GMEM, &free);
int err2 = gpucontext_property(c->ctx, GA_CTX_PROP_FREE_GMEM, &free);
if (err2 != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "Error when trying to find the "
......@@ -204,7 +204,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
}
if (worksize != 0) {
workspace = c->ops->buffer_alloc(c->ctx, worksize, NULL, 0, NULL);
workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, NULL);
if (workspace == NULL) {
PyErr_SetString(PyExc_RuntimeError,
"Could not allocate working memory");
......@@ -227,7 +227,7 @@ APPLY_SPECIFIC(conv_gi)(PyGpuArrayObject *kerns, PyGpuArrayObject *output,
APPLY_SPECIFIC(input), PyGpuArray_DEV_DATA(*input));
if (worksize != 0)
c->ops->buffer_release(workspace);
gpudata_release(workspace);
cuda_record(kerns->ga.data, GPUARRAY_CUDA_WAIT_READ);
cuda_record(output->ga.data, GPUARRAY_CUDA_WAIT_READ);
......
......@@ -107,7 +107,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
algo = choice.algo;
#else
size_t free;
int err2 = c->ops->property(c->ctx, NULL, NULL, GA_CTX_PROP_FREE_GMEM, &free);
int err2 = gpucontext_property(c->ctx, GA_CTX_PROP_FREE_GMEM, &free);
if (err2 != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, "Error when trying to find the "
......@@ -192,7 +192,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
}
if (worksize != 0) {
workspace = c->ops->buffer_alloc(c->ctx, worksize, NULL, 0, NULL);
workspace = gpudata_alloc(c->ctx, worksize, NULL, 0, NULL);
if (workspace == NULL) {
PyErr_SetString(PyExc_RuntimeError, "Could not allocate working memory");
cuda_exit(c->ctx);
......@@ -214,7 +214,7 @@ APPLY_SPECIFIC(conv_gw)(PyGpuArrayObject *input, PyGpuArrayObject *output,
APPLY_SPECIFIC(kerns), PyGpuArray_DEV_DATA(*kerns));
if (worksize != 0)
c->ops->buffer_release(workspace);
gpudata_release(workspace);
cuda_record(input->ga.data, GPUARRAY_CUDA_WAIT_READ);
cuda_record(output->ga.data, GPUARRAY_CUDA_WAIT_READ);
......
......@@ -199,7 +199,7 @@ class GpuElemwise(HideC, Elemwise):
typecode=o.type.typecode)
res += """
ge = GpuElemwise_new(%(ctx)s->ops, %(ctx)s->ctx, %(support)s, %(kop)s, %(nargs)s, args, %(nd)s, 0);
ge = GpuElemwise_new(%(ctx)s->ctx, %(support)s, %(kop)s, %(nargs)s, args, %(nd)s, 0);
if (ge == NULL) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize elemwise support");
%(fail)s
......@@ -360,7 +360,7 @@ class GpuElemwise(HideC, Elemwise):
def c_code_cache_version(self):
ver = self.scalar_op.c_code_cache_version()
if ver:
return (6, ver)
return (7, ver)
else:
return ver
......@@ -554,7 +554,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
def make_node(self, x):
x = as_gpuarray_variable(x, infer_context_name(x))
if x.type.context.kind != 'cuda':
if x.type.context.kind != b'cuda':
raise TypeError("GpuCAReduceCuda doesn't work for non-cuda devices")
ret = super(GpuCAReduceCuda, self).make_node(x)
self = copy.copy(self)
......
......@@ -26,11 +26,8 @@ class GpuCumsum(GpuKernelBase, Op):
def __init__(self, axis):
self.axis = axis
def __str__(self):
return "%s{%s}" % (self.__class__.__name__, self.axis)
def c_code_cache_version_apply(self, node):
return (1,)
def c_code_cache_version(self):
return (3,)
def c_headers(self):
return ['<numpy_compat.h>', '<gpuarray/types.h>', '<gpuarray_helper.h>']
......@@ -221,7 +218,7 @@ class GpuCumsum(GpuKernelBase, Op):
return kernels
def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda':
if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only")
x, = inp
z, = out
......@@ -249,17 +246,17 @@ class GpuCumsum(GpuKernelBase, Op):
size_t max_grid_size1;
size_t max_grid_size2;
int err;
err = %(ctx)s->ops->property(%(ctx)s->ctx, NULL, NULL, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0);
err = gpucontext_property(%(ctx)s->ctx, GA_CTX_PROP_MAXLSIZE0, &max_threads_dim0);
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_threads_dims0");
%(fail)s;
}
err = %(ctx)s->ops->property(%(ctx)s->ctx, NULL, NULL, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1);
err = gpucontext_property(%(ctx)s->ctx, GA_CTX_PROP_MAXGSIZE1, &max_grid_size1);
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size1");
%(fail)s;
}
err = %(ctx)s->ops->property(%(ctx)s->ctx, NULL, NULL, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2);
err = gpucontext_property(%(ctx)s->ctx, GA_CTX_PROP_MAXGSIZE2, &max_grid_size2);
if (err != GA_NO_ERROR){
PyErr_SetString(PyExc_RuntimeError, "Could not fetch max_grid_size2");
%(fail)s;
......
......@@ -117,7 +117,7 @@ int gemm16(PyGpuArrayObject *C, float alpha,
if (48 < n128 && n128 <= 64) {
n64 = n / 64;
if (nprocs == 0)
if (A->ga.ops->property(A->context->ctx, NULL, NULL,
if (gpucontext_property(A->context->ctx,
GA_CTX_PROP_NUMPROCS, &nprocs)) {
nprocs = 0;
res = 1;
......
......@@ -243,7 +243,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
return kernels
def c_code(self, node, name, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda':
if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only")
dtype_ten4 = node.inputs[0].dtype
dtype_neib_shape = node.inputs[1].dtype
......
......@@ -105,7 +105,7 @@ class Gemm16(COp):
return """
bcode = bin_%(name)s;
sz = sizeof(bin_%(name)s);
if (GpuKernel_init(&k_%(name)s, c->ops, c->ctx, 1, &bcode, &sz,
if (GpuKernel_init(&k_%(name)s, c->ctx, 1, &bcode, &sz,
"hgemm_%(name)s", 13, types, GA_USE_BINARY, NULL)
!= GA_NO_ERROR) {
PyErr_SetString(PyExc_RuntimeError, "Could not initialize kernel %(name)s");
......
......@@ -189,7 +189,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op):
flags=flags, objvar=k_var)]
def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda':
if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError('cuda only')
typecode_x = pygpu.gpuarray.dtype_to_typecode(node.inputs[0].dtype)
typecode_b = pygpu.gpuarray.dtype_to_typecode(node.inputs[1].dtype)
......@@ -375,7 +375,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op):
return ['<numpy_compat.h>', '<gpuarray/types.h>']
def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda':
if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only")
typecode_dx = pygpu.gpuarray.dtype_to_typecode(node.outputs[0].dtype)
itemsize_dnll = numpy.dtype(node.inputs[0].dtype).itemsize
......@@ -584,7 +584,7 @@ class GpuSoftmax(GpuKernelBase, Op):
return ['<numpy_compat.h>', '<gpuarray/types.h>']
def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda':
if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError("cuda only")
dtype_x = node.inputs[0].dtype
work_x = work_dtype(dtype_x)
......@@ -783,7 +783,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op):
return ['<numpy_compat.h>', '<gpuarray/types.h>']
def c_code(self, node, nodename, inp, out, sub):
if node.inputs[0].type.context.kind != 'cuda':
if node.inputs[0].type.context.kind != b'cuda':
raise NotImplementedError('cuda only')
dtype_x = node.inputs[0].dtype
dtype_b = node.inputs[1].dtype
......
......@@ -33,12 +33,16 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name,
GpuSplit, GpuContiguous, gpu_contiguous,
GpuAlloc, GpuAllocEmpty, GpuReshape,
GpuEye, gpu_join, GpuJoin)
from .blas import (gpu_dot22, GpuGemv, GpuGemm, GpuGer, GpuGemmBatch,
gpugemm_no_inplace, gpugemmbatch_no_inplace)
from .blocksparse import GpuSparseBlockGemv, GpuSparseBlockOuter
from .nnet import (GpuCrossentropySoftmaxArgmax1HotWithBias,
GpuCrossentropySoftmax1HotWithBiasDx,
GpuSoftmaxWithBias, GpuSoftmax)
from .blas import (gpu_dot22, GpuGemm, GpuGer, GpuGemmBatch,
gpugemm_no_inplace, gpugemm_inplace, gpugemmbatch_no_inplace,
gpugemv_no_inplace, gpugemv_inplace)
from .blocksparse import (GpuSparseBlockGemv, GpuSparseBlockOuter,
gpu_sparse_block_outer, gpu_sparse_block_outer_inplace,
gpu_sparse_block_gemv, gpu_sparse_block_gemv_inplace)
from .nnet import (gpu_crossentropy_softmax_1hot_with_bias_dx,
gpu_crossentropy_softmax_argmax_1hot_with_bias,
gpu_softmax_with_bias, gpu_softmax)
from .elemwise import (GpuElemwise, GpuDimShuffle, GpuCAReduceCuda,
GpuCAReduceCPY)
from .subtensor import (GpuIncSubtensor, GpuSubtensor,
......@@ -49,6 +53,7 @@ from .opt_util import alpha_merge, output_merge
_logger = logging.getLogger("theano.gpuarray.opt")
gpu_optimizer = EquilibriumDB()
gpu_cut_copies = EquilibriumDB()
......@@ -146,7 +151,7 @@ def op_lifter(OP, cuda_only=False):
# Check if we should replace
if (not replace or
(cuda_only and
get_context(context_name).kind != 'cuda')):
get_context(context_name).kind != b'cuda')):
return False
# tag the inputs with the context in case
......@@ -643,7 +648,7 @@ def local_gpua_advanced_subtensor(node, context_name):
def local_gpua_advanced_incsubtensor(node, context_name):
context = get_context(context_name)
# This is disabled on non-cuda contexts
if context.kind != 'cuda':
if context.kind != b'cuda':
return None
x, y, ilist = node.inputs
......@@ -674,12 +679,12 @@ def local_gpua_careduce(node, context_name):
if isinstance(node.op.scalar_op, (scalar.Add, scalar.Mul,
scalar.Maximum, scalar.Minimum)):
ctx = get_context(context_name)
if ctx.kind == 'opencl':
if ctx.kind == b'opencl':
op = GpuCAReduceCPY
if node.op.scalar_op not in [scalar.add, scalar.mul]:
# We don't support yet all reduction with cpy code.
return
elif ctx.kind == 'cuda':
elif ctx.kind == b'cuda':
op = GpuCAReduceCuda
else:
return False
......@@ -711,18 +716,14 @@ def local_gpua_careduce(node, context_name):
assert reduce_mask[a] == 0
reduce_mask[a] = 1
shape_of = node.fgraph.shape_feature.shape_of
x_shape = shape_of[x]
new_in_shp = [x_shape[0]]
new_in_shp = [shape_i(x, 0)]
new_mask = [reduce_mask[0]]
for i in xrange(1, x.type.ndim):
if reduce_mask[i] == reduce_mask[i - 1]:
new_in_shp[-1] *= x_shape[i]
new_in_shp[-1] *= shape_i(x, i)
else:
new_mask.append(reduce_mask[i])
new_in_shp.append(x_shape[i])
new_in_shp.append(shape_i(x, i))
new_axis = []
for idx, m in enumerate(new_mask):
if m == 1:
......@@ -744,8 +745,12 @@ def local_gpua_careduce(node, context_name):
greduce(gpu_reshaped_x))
if reduce_reshaped_x.ndim != node.outputs[0].ndim:
out_shp = []
for i in range(x.ndim):
if i not in node.op.axis:
out_shp.append(shape_i(x, i))
unreshaped_reduce = reduce_reshaped_x.reshape(
tensor.stack(shape_of[node.outputs[0]]))
tensor.stack(out_shp))
else:
unreshaped_reduce = reduce_reshaped_x
return [unreshaped_reduce]
......@@ -754,13 +759,19 @@ def local_gpua_careduce(node, context_name):
@register_opt('fast_compile')
@op_lifter([tensor.blas.Gemv, tensor.blas_c.CGemv])
def local_gpua_gemv(node, context_name):
return GpuGemv(inplace=node.op.inplace)
if node.op.inplace:
return gpugemv_inplace
else:
return gpugemv_no_inplace
@register_opt('fast_compile')
@op_lifter([tensor.blas.Gemm])
def local_gpua_gemm(node, context_name):
return GpuGemm(inplace=node.op.inplace)
if node.op.inplace:
return gpugemm_inplace
else:
return gpugemm_no_inplace
@register_opt('fast_compile')
......@@ -834,7 +845,7 @@ def local_gpua_dot22scalar(node, context_name):
x = as_gpuarray_variable(x, context_name)
y = as_gpuarray_variable(y, context_name)
z = GpuAllocEmpty(x.dtype, context_name)(x.shape[0], y.shape[1])
return [GpuGemm(inplace=False)(z, a, x, y, 0)]
return [gpugemm_no_inplace(z, a, x, y, 0)]
@register_opt('fast_compile')
......@@ -846,25 +857,25 @@ def local_gpua_eye(node, context_name):
@register_opt('fast_compile')
@op_lifter([tensor.nnet.CrossentropySoftmaxArgmax1HotWithBias], cuda_only=True)
def local_gpua_crossentropysoftmaxargmax1hotwithbias(node, context_name):
return GpuCrossentropySoftmaxArgmax1HotWithBias()
return gpu_crossentropy_softmax_argmax_1hot_with_bias
@register_opt('fast_compile')
@op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx], cuda_only=True)
def local_gpua_crossentropysoftmax1hotwithbiasdx(node, context_name):
return GpuCrossentropySoftmax1HotWithBiasDx()
return gpu_crossentropy_softmax_1hot_with_bias_dx
@register_opt('fast_compile')
@op_lifter([tensor.nnet.Softmax], cuda_only=True)
def local_gpua_softmax(node, context_name):
return GpuSoftmax()
return gpu_softmax
@register_opt('fast_compile')
@op_lifter([tensor.nnet.SoftmaxWithBias], cuda_only=True)
def local_gpua_softmaxwithbias(node, context_name):
return GpuSoftmaxWithBias()
return gpu_softmax_with_bias
@register_opt('fast_compile')
......@@ -889,20 +900,26 @@ theano.tensor.nnet.conv2d()
@register_opt('fast_compile')
@op_lifter([SparseBlockGemv])
def local_lift_sparseblockgemv(node, context_name):
return GpuSparseBlockGemv(node.op.inplace)
if node.op.inplace:
return gpu_sparse_block_gemv_inplace
else:
return gpu_sparse_block_gemv
@register_opt('fast_compile')
@op_lifter([SparseBlockOuter])
def local_lift_sparseblockouter(node, context_name):
return GpuSparseBlockOuter(node.op.inplace)
if node.op.inplace:
return gpu_sparse_block_outer_inplace
else:
return gpu_sparse_block_outer
@register_inplace()
@local_optimizer([GpuSparseBlockGemv], inplace=True)
def local_inplace_sparseblockgemv(node):
if isinstance(node.op, GpuSparseBlockGemv) and not node.op.inplace:
return [GpuSparseBlockGemv(inplace=True)(*node.inputs)]
return [gpu_sparse_block_gemv_inplace(*node.inputs)]
@register_inplace()
......
......@@ -18,7 +18,7 @@ from theano.tests import unittest_tools as utt
from ..type import (GpuArrayType, get_context,
gpuarray_shared_constructor)
from ..basic_ops import (
host_from_gpu, HostFromGpu, GpuFromHost, GpuReshape,
host_from_gpu, HostFromGpu, GpuFromHost, GpuReshape, GpuToGpu,
GpuAlloc, GpuAllocEmpty, GpuContiguous,
gpu_join, GpuJoin, GpuSplit, GpuEye, gpu_contiguous)
from ..subtensor import GpuSubtensor
......@@ -182,6 +182,21 @@ def test_transfer_cpu_gpu():
assert numpy.all(fv == av)
def test_transfer_gpu_gpu():
g = GpuArrayType(dtype='float32', broadcastable=(False, False),
context_name=test_ctx_name)()
av = numpy.asarray(rng.rand(5, 4), dtype='float32')
gv = gpuarray.array(av, context=get_context(test_ctx_name))
mode = mode_with_gpu.excluding('cut_gpua_host_transfers', 'local_cut_gpua_host_gpua')
f = theano.function([g], GpuToGpu(test_ctx_name)(g), mode=mode)
topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert isinstance(topo[0].op, GpuToGpu)
fv = f(gv)
assert GpuArrayType.values_eq(fv, gv)
def test_transfer_strided():
# This is just to ensure that it works in theano
# libgpuarray has a much more comprehensive suit of tests to
......
......@@ -197,7 +197,7 @@ class test_GpuCAReduceCuda(test_GpuCAReduceCPY):
def setUp(self):
super(test_GpuCAReduceCuda, self).setUp()
if get_context(test_ctx_name).kind != 'cuda':
if get_context(test_ctx_name).kind != b'cuda':
raise SkipTest("Cuda specific tests")
......@@ -212,7 +212,7 @@ class T_gpureduce_dtype(test_elemwise.T_reduce_dtype):
'float32', 'float64']
def setUp(self):
if get_context(test_ctx_name).kind != 'cuda':
if get_context(test_ctx_name).kind != b'cuda':
raise SkipTest("Cuda specific tests")
......
......@@ -24,7 +24,7 @@ class TestGpuCumsum(theano.tensor.tests.test_extra_ops.TestCumsumOp):
def setUp(self):
super(TestGpuCumsum, self).setUp()
test_ctx = get_context(test_ctx_name)
if test_ctx.kind != 'cuda':
if test_ctx.kind != b'cuda':
raise SkipTest("Cuda specific tests")
self.max_threads_dim0 = test_ctx.maxlsize0
self.max_grid_size1 = test_ctx.maxgsize2
......
......@@ -125,7 +125,7 @@ def test_reduce():
topo = f.maker.fgraph.toposort()
ops = [type(node.op) for node in topo]
if kind == 'opencl' and method in ["max", "min"]:
if kind == b'opencl' and method in ["max", "min"]:
assert not(GpuCAReduceCuda in ops or GpuCAReduceCPY in ops)
else:
assert GpuCAReduceCuda in ops or GpuCAReduceCPY in ops
......
......@@ -56,3 +56,32 @@ def test_advinc_subtensor1():
rep = xval.copy()
rep[[0, 2]] += yval
assert numpy.allclose(rval, rep)
def test_incsub_f16():
shp = (3, 3)
shared = gpuarray_shared_constructor
xval = numpy.arange(numpy.prod(shp), dtype='float16').reshape(shp) + 1
yval = numpy.empty((2,) + shp[1:], dtype='float16')
yval[:] = 2
x = shared(xval, name='x')
y = tensor.tensor(dtype='float16',
broadcastable=(False,) * len(shp),
name='y')
expr = tensor.advanced_inc_subtensor1(x, y, [0, 2])
f = theano.function([y], expr, mode=mode_with_gpu)
assert sum([isinstance(node.op, GpuAdvancedIncSubtensor1)
for node in f.maker.fgraph.toposort()]) == 1
rval = f(yval)
rep = xval.copy()
rep[[0, 2]] += yval
assert numpy.allclose(rval, rep)
expr = tensor.inc_subtensor(x[1:], y)
f = theano.function([y], expr, mode=mode_with_gpu)
assert sum([isinstance(node.op, GpuIncSubtensor)
for node in f.maker.fgraph.toposort()]) == 1
rval = f(yval)
rep = xval.copy()
rep[1:] += yval
assert numpy.allclose(rval, rep)
......@@ -301,20 +301,14 @@ class GpuArrayType(Type):
raise NotImplementedError(
"GpuArrayType.values_eq_approx() don't implemented the"
" allow_remove_inf and allow_remove_nan parameter")
if a.dtype == 'float16' or b.dtype == 'float16':
an = numpy.asarray(a)
bn = numpy.asarray(b)
return tensor.TensorType.values_eq_approx(
an, bn, allow_remove_inf=allow_remove_inf,
allow_remove_nan=allow_remove_nan, rtol=rtol, atol=atol)
atol_, rtol_ = theano.tensor.basic._get_atol_rtol(a, b)
if rtol is not None:
rtol_ = rtol
if atol is not None:
atol_ = atol
res = elemwise2(a, '', b, a, odtype=numpy.dtype('bool'),
op_tmpl="res[i] = (fabs(%%(a)s - %%(b)s) <"
"(%(atol_)s + %(rtol_)s * fabs(%%(b)s)))" %
op_tmpl="res = (fabs(a - b) <"
"(%(atol_)s + %(rtol_)s * fabs(b)))" %
locals())
ret = numpy.asarray(res).all()
if ret:
......
......@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
t0 = 0
t1 = -1
f() # Ignore first function call to get representative time.
if execute:
sync = (hasattr(theano, "sandbox") and
hasattr(theano.sandbox, "cuda") and
theano.sandbox.cuda.cuda_available)
sync2 = (hasattr(theano, "gpuarray") and
theano.gpuarray.pygpu_activated)
t0 = time.time()
for i in range(iters):
f()
if sync:
theano.sandbox.cuda.synchronize()
if sync2:
c.get_value(borrow=True, return_internal_type=True).sync()
t1 = time.time()
return t1 - t0, impl
......@@ -244,6 +249,7 @@ if __name__ == "__main__":
cuda version 7.5 7.0 6.5
gpu
M40 0.47s
k80 0.96s
K6000/NOECC 0.69s
K40 0.88s
......
......@@ -2526,7 +2526,8 @@ if True:
out = as_cuda_ndarray_variable(out.dimshuffle(0, 1))
return [out]
@register_opt('cudnn')
@register_opt('cudnn', 'stabilize', 'fast_compile')
# We put fast_compile as otherwise it won't be on the GPU.
@local_optimizer([GpuElemwise, LogSoftmax])
def local_log_softmax_dnn(node):
# The log-softmax implementation is only available starting at cuDNN V3
......
......@@ -14,6 +14,7 @@ from . import dnn
import theano
from theano import scalar as scal
from theano import config, tensor, gof
from theano.compile.ops import shape_i
import theano.ifelse
import theano.tensor.signal.pool
import theano.tensor.nnet
......@@ -900,18 +901,14 @@ def local_gpu_careduce(node):
# to make them a single dimension, do the reduction, and
# then reshape to get them back.
shape_of = node.fgraph.shape_feature.shape_of
x_shape = shape_of[x]
new_in_shp = [x_shape[0]]
new_in_shp = [shape_i(x, 0)]
new_mask = [reduce_mask[0]]
for i in xrange(1, x.type.ndim):
if reduce_mask[i] == reduce_mask[i - 1]:
new_in_shp[-1] *= x_shape[i]
new_in_shp[-1] *= shape_i(x, i)
else:
new_mask.append(reduce_mask[i])
new_in_shp.append(x_shape[i])
new_in_shp.append(shape_i(x, i))
new_greduce = GpuCAReduce(new_mask, scalar_op)
new_x = x.reshape(tensor.stack(new_in_shp))
......@@ -936,8 +933,11 @@ def local_gpu_careduce(node):
# Restore the expected shape of the output
if rval.ndim != out.ndim:
rval = rval.reshape(
tensor.stack(shape_of[out]))
out_shp = []
for i in range(x.ndim):
if i not in node.op.axis:
out_shp.append(shape_i(x, i))
rval = rval.reshape(tensor.stack(out_shp))
if rval.type == out.type:
return [rval]
......
......@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray."""
import warnings
from theano.gpuarray import *
message = "theano.sandbox.gpuarray has been moved to theano.gpuarray." + \
" Please update your code and pickles."
message = ("theano.sandbox.gpuarray has been moved to theano.gpuarray. "
"Please update your code and pickles. If the warning persists, "
"clear theano's cache ('$theano/bin/theano-cache clear').")
warnings.warn(message)
......@@ -2543,7 +2543,7 @@ class Log2(UnaryScalarOp):
else:
return [x.zeros_like()]
return gz / (x * math.log(2.0)),
return gz / (x * numpy.asarray(math.log(2.0)).astype(x.dtype)),
def c_code(self, node, name, inputs, outputs, sub):
(x,) = inputs
......
......@@ -202,7 +202,7 @@ def remove_constants_and_unused_inputs_scan(node):
# DEBUG CHECK
nwScan = scan_op.Scan(nw_inner, op_outs, nw_info)
nw_outs = nwScan(*nw_outer, **dict(return_list=True))
return dict([("remove", [node])] + list(zip(node.outputs, nw_outs)))
return OrderedDict([("remove", [node])] + list(zip(node.outputs, nw_outs)))
else:
return False
......@@ -2072,8 +2072,8 @@ def scan_merge_inouts(node):
new_outer_out_mit_mot.append(outer_omm)
na.outer_out_mit_mot = new_outer_out_mit_mot
if remove:
return dict([("remove", remove)] +
list(zip(node.outputs, na.outer_outputs)))
return OrderedDict([("remove", remove)] +
list(zip(node.outputs, na.outer_outputs)))
return na.outer_outputs
......
......@@ -612,14 +612,14 @@ def get_scalar_constant_value(orig_v, elemwise=True,
return numpy.asarray(v)
if isinstance(v, numpy.ndarray):
return numpy_scalar(v)
return numpy_scalar(v).copy()
if isinstance(v, Constant):
if getattr(v.tag, 'unique_value', None) is not None:
data = v.tag.unique_value
else:
data = v.data
return numpy_scalar(data)
return numpy_scalar(data).copy()
if not only_process_constants and getattr(v, 'owner', None):
if isinstance(v.owner.op, (Alloc, DimShuffle, Rebroadcast,
......@@ -649,7 +649,7 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for i in v.owner.inputs]
ret = [[None]]
v.owner.op.perform(v.owner, const, ret)
return ret[0][0]
return ret[0][0].copy()
elif elemwise and isinstance(v.owner.op, Elemwise):
if isinstance(v.owner.op.scalar_op, scal.Second):
# We don't need both input to be constant for second
......@@ -662,13 +662,13 @@ def get_scalar_constant_value(orig_v, elemwise=True,
for i in v.owner.inputs]
ret = [[None]]
v.owner.op.perform(v.owner, const, ret)
return ret[0][0]
return ret[0][0].copy()
elif (isinstance(v.owner.op, theano.tensor.subtensor.Subtensor) and
v.ndim == 0):
if isinstance(v.owner.inputs[0], TensorConstant):
cdata = tuple(v.owner.op.get_constant_idx(v.owner.inputs))
try:
return v.owner.inputs[0].data.__getitem__(cdata)
return v.owner.inputs[0].data.__getitem__(cdata).copy()
except IndexError:
raise IndexError(
str(tuple(v.owner.op.idx_list)) +
......@@ -1399,8 +1399,6 @@ class MaxAndArgmax(Op):
%(axis_code)s
%(max)s = (PyArrayObject*)PyArray_Max(%(x)s, axis, NULL);
if(%(max)s == NULL){
PyErr_SetString(PyExc_ValueError,
"MaxAndArgmax, max failed");
%(fail)s;
}
if(!PyArray_CheckExact(%(max)s)){
......@@ -1412,7 +1410,6 @@ class MaxAndArgmax(Op):
%(argmax)s = (PyArrayObject*)PyArray_ArgMax(%(x)s, axis, NULL);
if(%(argmax)s == NULL){
PyErr_SetString(PyExc_ValueError, "MaxAndArgmax, argmax failed");
Py_CLEAR(%(max)s);
%(fail)s;
}
......@@ -1434,7 +1431,7 @@ class MaxAndArgmax(Op):
return ret % locals()
def c_code_cache_version(self):
return (3,)
return (4,)
def infer_shape(self, node, shapes):
ishape, axis_shape = shapes
......
......@@ -152,6 +152,7 @@ from theano.tensor import basic as T
from theano.tensor.blas_headers import blas_header_text
from theano.tensor.blas_headers import blas_header_version
from theano.tensor.opt import in2out, local_dimshuffle_lift
from theano.tensor.type import values_eq_approx_remove_inf_nan
_logger = logging.getLogger('theano.tensor.blas')
......@@ -1435,7 +1436,8 @@ class GemmOptimizer(Optimizer):
if new_node is not node:
nodelist.append(new_node)
u = theano.gof.opt.Updater(on_import, None, None)
u = theano.gof.opt.Updater(on_import, None, None,
name="GemmOptimizer")
fgraph.attach_feature(u)
while did_something:
nb_iter += 1
......@@ -1465,6 +1467,7 @@ class GemmOptimizer(Optimizer):
if new_outputs:
new_outputs, old_dot22 = new_outputs
assert len(new_outputs) == len(node.outputs)
new_outputs[0].tag.values_eq_approx = values_eq_approx_remove_inf_nan
try:
fgraph.replace_all_validate_remove(
list(zip(node.outputs, new_outputs)),
......
......@@ -726,3 +726,62 @@ def norm(x, ord):
raise ValueError(0)
elif ndim > 2:
raise NotImplementedError("We don't support norm witn ndim > 2")
class TensorInv(Op):
"""
Class wrapper for tensorinv() function;
Theano utilization of numpy.linalg.tensorinv;
"""
_numop = staticmethod(numpy.linalg.tensorinv)
__props__ = ('ind',)
def __init__(self, ind=2):
self.ind = ind
def make_node(self, a):
a = as_tensor_variable(a)
out = a.type()
return Apply(self, [a], [out])
def perform(self, node, inputs, outputs):
(a,) = inputs
(x,) = outputs
x[0] = self._numop(a, self.ind)
def infer_shape(self, node, shapes):
sp = shapes[0][self.ind:] + shapes[0][:self.ind]
return [sp]
def tensorinv(a, ind=2):
"""
Does not run on GPU;
Theano utilization of numpy.linalg.tensorinv;
Compute the 'inverse' of an N-dimensional array.
The result is an inverse for `a` relative to the tensordot operation
``tensordot(a, b, ind)``, i. e., up to floating-point accuracy,
``tensordot(tensorinv(a), a, ind)`` is the "identity" tensor for the
tensordot operation.
Parameters
----------
a : array_like
Tensor to 'invert'. Its shape must be 'square', i. e.,
``prod(a.shape[:ind]) == prod(a.shape[ind:])``.
ind : int, optional
Number of first indices that are involved in the inverse sum.
Must be a positive integer, default is 2.
Returns
-------
b : ndarray
`a`'s tensordot inverse, shape ``a.shape[ind:] + a.shape[:ind]``.
Raises
------
LinAlgError
If `a` is singular or not 'square' (in the above sense).
"""
return TensorInv(ind)(a)
......@@ -413,6 +413,7 @@ log1msigm_to_softplus = gof.PatternSub(
values_eq_approx=values_eq_approx_remove_inf,
skip_identities_fn=_skip_mul_1)
log1pexp_to_softplus = gof.PatternSub(
(tensor.log1p,
(tensor.exp, 'x')),
......@@ -420,12 +421,20 @@ log1pexp_to_softplus = gof.PatternSub(
values_eq_approx=values_eq_approx_remove_inf,
allow_multiple_clients=True)
log1p_neg_sigmoid = gof.PatternSub(
(tensor.log1p,
(tensor.neg, (sigmoid, 'x'))),
(tensor.neg, (softplus, 'x')),
values_eq_approx=values_eq_approx_remove_inf,
allow_multiple_clients=True)
opt.register_stabilize(logsigm_to_softplus, name='logsigm_to_softplus')
opt.register_stabilize(log1msigm_to_softplus, name='log1msigm_to_softplus')
opt.register_stabilize(log1pexp_to_softplus, name='log1pexp_to_softplus')
opt.register_stabilize(log1p_neg_sigmoid, name='log1p_neg_sigmoid,')
def is_1pexp(t):
def is_1pexp(t, only_process_constants=True):
"""
Returns
......@@ -437,8 +446,9 @@ def is_1pexp(t):
"""
if t.owner and t.owner.op == tensor.add:
scalars, scalar_inputs, nonconsts = \
opt.scalarconsts_rest(t.owner.inputs)
# scalar_inputs are potentially dimshuffled and fill'd scalars
opt.scalarconsts_rest(t.owner.inputs,
only_process_constants=only_process_constants)
# scalar_inputs are potentially dimshuffled and filled with scalars
if len(nonconsts) == 1:
maybe_exp = nonconsts[0]
if maybe_exp.owner and maybe_exp.owner.op == tensor.exp:
......@@ -947,7 +957,7 @@ def local_inv_1_plus_exp(node):
inv_arg = node.inputs[0]
if inv_arg.owner and inv_arg.owner.op == tensor.add:
scalars, scalar_inputs, nonconsts = \
opt.scalarconsts_rest(inv_arg.owner.inputs)
opt.scalarconsts_rest(inv_arg.owner.inputs, only_process_constants=True)
# scalar_inputs are potentially dimshuffled and fill'd scalars
if len(nonconsts) == 1:
if nonconsts[0].owner and nonconsts[0].owner.op == tensor.exp:
......
......@@ -356,7 +356,6 @@ class T_sigmoid_opts(unittest.TestCase):
f = theano.function([x], s, mode=mode)
assert hasattr(f.maker.fgraph.outputs[0].tag, 'trace')
topo = f.maker.fgraph.toposort()
assert len(topo) > 1
assert not any([n.op == sigmoid for n in topo])
ux_v = f([[-50, -10, -4, -1, 0, 1, 4, 10, 50]])
......@@ -467,15 +466,17 @@ class T_sigmoid_utils(unittest.TestCase):
try:
x = tensor.vector('x')
exp = tensor.exp
assert is_1pexp(1 + exp(x)) == (False, x)
assert is_1pexp(exp(x) + 1) == (False, x)
for neg, exp_arg in imap(is_1pexp, [(1 + exp(-x)), (exp(-x) + 1)]):
assert is_1pexp(1 + exp(x), False) == (False, x)
assert is_1pexp(exp(x) + 1, False) == (False, x)
for neg, exp_arg in imap(lambda x:
is_1pexp(x, only_process_constants=False),
[(1 + exp(-x)), (exp(-x) + 1)]):
assert not neg and theano.gof.graph.is_same_graph(exp_arg, -x)
assert is_1pexp(1 - exp(x)) is None
assert is_1pexp(2 + exp(x)) is None
assert is_1pexp(exp(x) + 2) is None
assert is_1pexp(exp(x) - 1) is None
assert is_1pexp(-1 + exp(x)) is None
assert is_1pexp(1 + 2 * exp(x)) is None
assert is_1pexp(1 - exp(x), False) is None
assert is_1pexp(2 + exp(x), False) is None
assert is_1pexp(exp(x) + 2, False) is None
assert is_1pexp(exp(x) - 1, False) is None
assert is_1pexp(-1 + exp(x), False) is None
assert is_1pexp(1 + 2 * exp(x), False) is None
finally:
config.warn.identify_1pexp_bug = backup
差异被折叠。
......@@ -186,8 +186,12 @@ class Pool(Op):
if st is None:
st = ds
r, c = imgshape[-2:]
r += padding[0] * 2
c += padding[1] * 2
r = tensor.extract_constant(r)
c = tensor.extract_constant(c)
if padding[0]:
r += padding[0] * 2
if padding[1]:
c += padding[1] * 2
if ignore_border:
if ds[0] == st[0]:
......@@ -216,7 +220,7 @@ class Pool(Op):
elif st[0] >= ds[0]:
nr = (r - 1) // st[0] + 1
else:
nr = max(0, (r - 1 - ds[0]) // st[0] + 1) + 1
nr = max(0, (r - 1 - ds[0] + st[0]) // st[0]) + 1
if isinstance(c, theano.Variable):
nc = tensor.switch(tensor.ge(st[1], ds[1]),
......@@ -226,7 +230,7 @@ class Pool(Op):
elif st[1] >= ds[1]:
nc = (c - 1) // st[1] + 1
else:
nc = max(0, (c - 1 - ds[1]) // st[1] + 1) + 1
nc = max(0, (c - 1 - ds[1] + st[1]) // st[1]) + 1
rval = list(imgshape[:-2]) + [nr, nc]
return rval
......@@ -257,10 +261,10 @@ class Pool(Op):
self.mode = mode
def make_node(self, x):
if x.type.ndim != 4:
raise TypeError()
# TODO: consider restricting the dtype?
x = tensor.as_tensor_variable(x)
if x.type.ndim != 4:
raise TypeError()
# If the input shape are broadcastable we can have 0 in the output shape
broad = x.broadcastable[:2] + (False, False)
out = tensor.TensorType(x.dtype, broad)
......@@ -274,6 +278,9 @@ class Pool(Op):
'Pool requires 4D input for now')
z_shape = self.out_shape(x.shape, self.ds, self.ignore_border, self.st,
self.padding)
if not self.ignore_border:
assert z_shape[2] > 0
assert z_shape[3] > 0
if (z[0] is None) or (z[0].shape != z_shape):
z[0] = numpy.empty(z_shape, dtype=x.dtype)
zz = z[0]
......@@ -403,7 +410,7 @@ class Pool(Op):
}
else
{
z_r = std::max(0, (r - 1 - %(ds0)s) / %(st0)s + 1) + 1;
z_r = std::max(0, (r - 1 - %(ds0)s + %(st0)s) / %(st0)s) + 1;
}
// decide how many columns the output has
if (%(st1)s >= %(ds1)s)
......@@ -412,8 +419,10 @@ class Pool(Op):
}
else
{
z_c = std::max(0, (c - 1 - %(ds1)s) / %(st1)s + 1) + 1;
z_c = std::max(0, (c - 1 - %(ds1)s + %(st0)s) / %(st1)s) + 1;
}
assert(z_r > 0);
assert(z_c > 0);
}
// memory allocation of z if necessary
if ((!%(z)s)
......@@ -522,7 +531,7 @@ class Pool(Op):
return ccode % locals()
def c_code_cache_version(self):
return (0, 6, 8, 3)
return (0, 6, 8, 4)
class PoolGrad(Op):
......@@ -632,12 +641,12 @@ class MaxPoolGrad(PoolGrad):
def make_node(self, x, maxout, gz):
# make_node should only be called by the grad function of
# Pool, so these asserts should not fail.
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(maxout, Variable) and maxout.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
x = tensor.as_tensor_variable(x)
maxout = tensor.as_tensor_variable(maxout)
gz = tensor.as_tensor_variable(gz)
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(maxout, Variable) and maxout.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
return Apply(self, [x, maxout, gz], [x.type()])
......@@ -814,10 +823,10 @@ class AveragePoolGrad(PoolGrad):
def make_node(self, x, gz, dummy=None):
# make_node should only be called by the grad function of
# Pool, so these asserts should not fail.
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
x = tensor.as_tensor_variable(x)
gz = tensor.as_tensor_variable(gz)
assert isinstance(x, Variable) and x.ndim == 4
assert isinstance(gz, Variable) and gz.ndim == 4
return Apply(self, [x, gz], [x.type()])
......
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论