提交 839fa93b authored 作者: Frédéric Bastien's avatar Frédéric Bastien 提交者: GitHub

Merge branch 'master' into ipT_grad

#!/bin/bash #!/bin/bash
BUILDBOT_DIR=$WORKSPACE/nightly_build BUILDBOT_DIR=$WORKSPACE/nightly_build
THEANO_PARAM="theano --with-timer --timer-top-n 10" THEANO_PARAM="theano --with-timer --timer-top-n 10 -v"
export THEANO_FLAGS=init_gpu_device=gpu export THEANO_FLAGS=init_gpu_device=gpu
# CUDA # CUDA
......
...@@ -66,8 +66,7 @@ features: ...@@ -66,8 +66,7 @@ features:
* tight integration with NumPy: a similar interface to NumPy's. * tight integration with NumPy: a similar interface to NumPy's.
numpy.ndarrays are also used internally in Theano-compiled functions. numpy.ndarrays are also used internally in Theano-compiled functions.
* transparent use of a GPU: perform data-intensive computations up to * transparent use of a GPU: perform data-intensive computations much faster than on a CPU.
140x faster than on a CPU (support for float32 only).
* efficient symbolic differentiation: Theano can compute derivatives * efficient symbolic differentiation: Theano can compute derivatives
for functions of one or many inputs. for functions of one or many inputs.
* speed and stability optimizations: avoid nasty bugs when computing * speed and stability optimizations: avoid nasty bugs when computing
......
...@@ -7,7 +7,7 @@ evaluate mathematical expressions involving multi-dimensional ...@@ -7,7 +7,7 @@ evaluate mathematical expressions involving multi-dimensional
arrays efficiently. Theano features: arrays efficiently. Theano features:
* **tight integration with NumPy** -- Use `numpy.ndarray` in Theano-compiled functions. * **tight integration with NumPy** -- Use `numpy.ndarray` in Theano-compiled functions.
* **transparent use of a GPU** -- Perform data-intensive calculations up to 140x faster than with CPU.(float32 only) * **transparent use of a GPU** -- Perform data-intensive computations much faster than on a CPU.
* **efficient symbolic differentiation** -- Theano does your derivatives for functions with one or many inputs. * **efficient symbolic differentiation** -- Theano does your derivatives for functions with one or many inputs.
* **speed and stability optimizations** -- Get the right answer for ``log(1+x)`` even when ``x`` is really tiny. * **speed and stability optimizations** -- Get the right answer for ``log(1+x)`` even when ``x`` is really tiny.
* **dynamic C code generation** -- Evaluate expressions faster. * **dynamic C code generation** -- Evaluate expressions faster.
......
...@@ -12,6 +12,7 @@ CentOS 6 Installation Instructions ...@@ -12,6 +12,7 @@ CentOS 6 Installation Instructions
page <http://deeplearning.net/software/theano_versions/dev/install_centos6.html>`_. page <http://deeplearning.net/software/theano_versions/dev/install_centos6.html>`_.
.. |PlatformCompiler| replace:: ``python-dev``, ``g++`` >= 4.2 .. |PlatformCompiler| replace:: ``python-dev``, ``g++`` >= 4.2
.. |CompilerName| replace:: ``g++``
.. include:: requirements.inc .. include:: requirements.inc
......
...@@ -9,6 +9,24 @@ Installation ...@@ -9,6 +9,24 @@ Installation
Stable Installation Stable Installation
------------------- -------------------
With ``conda``
^^^^^^^^^^^^^^
If you use conda, you can directly install both theano and pygpu. Libgpuarray
will be automatically installed as a dependency.
.. code-block:: bash
conda install theano pygpu
With ``pip``
^^^^^^^^^^^^
If you use pip, you have to install Theano and libgpuarray separately.
theano
::::::
Install the latest stable version of Theano with: Install the latest stable version of Theano with:
.. raw:: html .. raw:: html
...@@ -27,23 +45,18 @@ Install the latest stable version of Theano with: ...@@ -27,23 +45,18 @@ Install the latest stable version of Theano with:
If you encountered any trouble, head to the :ref:`troubleshooting` page. If you encountered any trouble, head to the :ref:`troubleshooting` page.
libgpuarray The latest stable version of Theano is ``0.9.0`` (tagged with ``rel-0.9.0``).
^^^^^^^^^^^
It is recommanded that you don't use 0.8.2 for the new back-end. Use
the dev version of Theano or 0.9rc3.
For the stable version of Theano(0.8.2) you need a specific version of libgpuarray, libgpuarray
that has been tagged ``v-9998``. :::::::::::
Download it with:
.. raw:: html For the stable version of Theano you need a specific version of libgpuarray,
that has been tagged ``v0.6.2``.
Download it with::
<div class='highlight'><pre> git clone https://github.com/Theano/libgpuarray.git
git clone https://github.com/Theano/libgpuarray.git --tags
git checkout origin/v-9998
cd libgpuarray cd libgpuarray
</pre></div> git checkout tags/v0.6.2 -b v0.6.2
and then follow the `Step-by-step instructions <http://deeplearning.net/software/libgpuarray/installation.html#step-by-step-install>`__. and then follow the `Step-by-step instructions <http://deeplearning.net/software/libgpuarray/installation.html#step-by-step-install>`__.
......
...@@ -20,6 +20,7 @@ alternative instructions here. ...@@ -20,6 +20,7 @@ alternative instructions here.
.. _theano-users: http://groups.google.com/group/theano-users?pli=1 .. _theano-users: http://groups.google.com/group/theano-users?pli=1
.. |PlatformCompiler| replace:: ``clang`` (the system version) .. |PlatformCompiler| replace:: ``clang`` (the system version)
.. |CompilerName| replace:: ``Clang``
.. include:: requirements.inc .. include:: requirements.inc
......
...@@ -14,6 +14,7 @@ Ubuntu Installation Instructions ...@@ -14,6 +14,7 @@ Ubuntu Installation Instructions
.. _gpu_linux: .. _gpu_linux:
.. |PlatformCompiler| replace:: ``python-dev``, ``g++`` >= 4.2 .. |PlatformCompiler| replace:: ``python-dev``, ``g++`` >= 4.2
.. |CompilerName| replace:: ``g++``
.. include:: requirements.inc .. include:: requirements.inc
......
差异被折叠。
...@@ -153,7 +153,7 @@ For final releases, send the e-mail to the following mailing lists: ...@@ -153,7 +153,7 @@ For final releases, send the e-mail to the following mailing lists:
* theano-users * theano-users
* theano-announce * theano-announce
* numpy-discussion@scipy.org * numpy-discussion@scipy.org
* scipy-user@scipy.org * scipy-user@python.org
* G+, Scientific Python: https://plus.google.com/communities/108773711053400791849 * G+, Scientific Python: https://plus.google.com/communities/108773711053400791849
For release candidates, only e-mail: For release candidates, only e-mail:
......
...@@ -219,6 +219,7 @@ TODO: Give examples on how to use these things! They are pretty complicated. ...@@ -219,6 +219,7 @@ TODO: Give examples on how to use these things! They are pretty complicated.
It flip the kernel. It flip the kernel.
.. autofunction:: theano.tensor.nnet.conv2d .. autofunction:: theano.tensor.nnet.conv2d
.. autofunction:: theano.tensor.nnet.conv2d_transpose
.. autofunction:: theano.tensor.nnet.conv3d .. autofunction:: theano.tensor.nnet.conv3d
.. autofunction:: theano.sandbox.cuda.fftconv.conv2d_fft .. autofunction:: theano.sandbox.cuda.fftconv.conv2d_fft
.. autofunction:: theano.tensor.nnet.Conv3D.conv3D .. autofunction:: theano.tensor.nnet.Conv3D.conv3D
......
...@@ -7,21 +7,28 @@ Requirements ...@@ -7,21 +7,28 @@ Requirements
.. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms .. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
.. _Python: http://www.python.org/ .. _Python: http://www.python.org/
.. _LaTeX: http://www.latex-project.org/
.. _dvipng: http://savannah.nongnu.org/projects/dvipng/
.. _NVIDIA CUDA drivers and SDK: http://developer.nvidia.com/object/gpucomputing.html
.. _libgpuarray: http://deeplearning.net/software/libgpuarray/installation.html
.. _pycuda: https://mathema.tician.de/software/pycuda/
.. _skcuda: http://scikit-cuda.readthedocs.io/en/latest/
Python_ >= 2.7 or >= 3.3 The development package (python-dev or Python_ == 2.7 or ( >= 3.3 and <= 3.5 )
The development package (python-dev or
python-devel on most Linux distributions) is recommended (see python-devel on most Linux distributions) is recommended (see
just below). Python 2.4 was supported up to and including the just below). Python 2.4 was supported up to and including the
release 0.6. Python 2.6 was supported up to and including the release 0.6. Python 2.6 was supported up to and including the
release 0.8.2. Python 3 is supported past the 3.3 release. release 0.8.2. Python 3 is supported past the 3.3 release.
`NumPy <http://numpy.scipy.org/>`_ >= 1.9.1 < 1.11.1 `NumPy <http://numpy.scipy.org/>`_ >= 1.9.1 <= 1.12
Earlier versions could work, but we dont test it. Earlier versions could work, but we dont test it.
`SciPy <http://scipy.org>`_ >= 0.14 < 0.17.1 `SciPy <http://scipy.org>`_ >= 0.14 < 0.17.1
Only currently required for sparse matrix and special functions support, but highly recommended. SciPy >=0.8 could work, but earlier versions have known bugs with sparse matrices. Only currently required for sparse matrix and special functions support, but highly recommended. SciPy >=0.8 could work, but earlier versions have known bugs with sparse matrices.
`BLAS`_ installation (with Level 3 functionality) `BLAS`_ installation (with Level 3 functionality)
* **Recommended**: MKL, which is free through Conda. * **Recommended**: MKL, which is free through Conda.
* Alternatively, we suggest to install OpenBLAS, with the development headers (``-dev``, ``-devel``, depending on your Linux distribution). * Alternatively, we suggest to install OpenBLAS, with the development headers (``-dev``, ``-devel``, depending on your Linux distribution).
**Optional requirements** **Optional requirements**
...@@ -42,10 +49,9 @@ Requirements ...@@ -42,10 +49,9 @@ Requirements
**Highly recommended** Required for GPU code generation/execution on NVIDIA gpus. See instruction below. **Highly recommended** Required for GPU code generation/execution on NVIDIA gpus. See instruction below.
`libgpuarray`_ `libgpuarray`_
Required for GPU/CPU code generation on CUDA and OpenCL devices (see: :ref:`gpuarray`.) Required for GPU/CPU code generation on CUDA and OpenCL devices (see: :ref:`gpuarray`).
`pycuda`_ and `skcuda`_ `pycuda`_ and `skcuda`_
Required for some extra operations on the GPU like fft and Required for some extra operations on the GPU like fft and
solvers. We use them to wrap cufft and cusolver. Quick install solvers. We use them to wrap cufft and cusolver. Quick install
``pip install pycuda scikit-cuda``. For cuda 8, the dev ``pip install pycuda scikit-cuda``. For cuda 8, the dev
...@@ -63,7 +69,9 @@ Follow this `link <http://conda.pydata.org/miniconda.html>`__ to install Minicon ...@@ -63,7 +69,9 @@ Follow this `link <http://conda.pydata.org/miniconda.html>`__ to install Minicon
.. note:: .. note::
If you want fast compiled code (recommended), make sure you have g++ (Windows/Linux) or Clang (OS X) installed. If you want fast compiled code (recommended), make sure you have |CompilerName| installed.
.. install_requirements_and_optional_packages
Install requirements and optional packages Install requirements and optional packages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...@@ -109,9 +117,4 @@ Install and configure the GPU drivers (recommended) ...@@ -109,9 +117,4 @@ Install and configure the GPU drivers (recommended)
* add a ``cuda.root`` flag to :envvar:`THEANO_FLAGS`, as in ``THEANO_FLAGS='cuda.root=/path/to/cuda/root'``, or * add a ``cuda.root`` flag to :envvar:`THEANO_FLAGS`, as in ``THEANO_FLAGS='cuda.root=/path/to/cuda/root'``, or
* add a [cuda] section to your .theanorc file containing the option ``root = /path/to/cuda/root``. * add a [cuda] section to your .theanorc file containing the option ``root = /path/to/cuda/root``.
.. _LaTeX: http://www.latex-project.org/
.. _dvipng: http://savannah.nongnu.org/projects/dvipng/
.. _NVIDIA CUDA drivers and SDK: http://developer.nvidia.com/object/gpucomputing.html
.. _libgpuarray: http://deeplearning.net/software/libgpuarray/installation.html
.. _pycuda: https://mathema.tician.de/software/pycuda/
.. _skcuda: http://scikit-cuda.readthedocs.io/en/latest/
.. |PlatformCompiler| replace:: ``g++`` (Linux and Windows), ``clang`` (OS X) .. |PlatformCompiler| replace:: ``g++`` (Linux and Windows), ``clang`` (OS X)
.. |CompilerName| replace:: ``g++`` (Windows/Linux) or ``Clang`` (OS X)
.. include:: requirements.inc .. include:: requirements.inc
...@@ -220,6 +220,36 @@ The ``compute_test_value`` mechanism works as follows: ...@@ -220,6 +220,36 @@ The ``compute_test_value`` mechanism works as follows:
This feature is currently incompatible with ``Scan`` and also with ops This feature is currently incompatible with ``Scan`` and also with ops
which do not implement a ``perform`` method. which do not implement a ``perform`` method.
It is also possible to override variables ``__repr__`` method to have them return tag.test_value.
.. testsetup:: printtestvalue
import theano
import theano.tensor as T
.. testcode:: printtestvalue
x = T.scalar('x')
# Assigning test value
x.tag.test_value = 42
# Enable test value printing
theano.config.print_test_value = True
print(x.__repr__())
# Disable test value printing
theano.config.print_test_value = False
print(x.__repr__())
Running the code above returns the following output:
.. testoutput:: printtestvalue
x
array(42.0)
x
"How do I Print an Intermediate Value in a Function?" "How do I Print an Intermediate Value in a Function?"
----------------------------------------------------- -----------------------------------------------------
......
...@@ -31,14 +31,36 @@ import logging ...@@ -31,14 +31,36 @@ import logging
import sys import sys
def has_handlers(logger):
# copied from Logger.hasHandlers() (introduced in Python 3.2)
_logger = logger
_has_handler = False
while _logger:
if _logger.handlers:
_has_handler = True
break
if not _logger.propagate:
break
else:
_logger = _logger.parent
return _has_handler
theano_logger = logging.getLogger("theano") theano_logger = logging.getLogger("theano")
logging_default_handler = logging.StreamHandler() logging_default_handler = logging.StreamHandler()
logging_default_formatter = logging.Formatter( logging_default_formatter = logging.Formatter(
fmt='%(levelname)s (%(name)s): %(message)s') fmt='%(levelname)s (%(name)s): %(message)s')
logging_default_handler.setFormatter(logging_default_formatter) logging_default_handler.setFormatter(logging_default_formatter)
theano_logger.addHandler(logging_default_handler)
theano_logger.setLevel(logging.WARNING) theano_logger.setLevel(logging.WARNING)
if has_handlers(theano_logger) is False:
theano_logger.addHandler(logging_default_handler)
# Disable default log handler added to theano_logger when the module
# is imported.
def disable_log_handler(logger=theano_logger, handler=logging_default_handler):
if has_handlers(logger):
logger.removeHandler(handler)
# Version information. # Version information.
from theano.version import version as __version__ from theano.version import version as __version__
......
...@@ -2302,6 +2302,7 @@ class GCC_compiler(Compiler): ...@@ -2302,6 +2302,7 @@ class GCC_compiler(Compiler):
if status: if status:
tf = tempfile.NamedTemporaryFile( tf = tempfile.NamedTemporaryFile(
mode='w',
prefix='theano_compilation_error_', prefix='theano_compilation_error_',
delete=False delete=False
) )
......
...@@ -1375,3 +1375,27 @@ def list_of_nodes(inputs, outputs): ...@@ -1375,3 +1375,27 @@ def list_of_nodes(inputs, outputs):
lambda o: [inp.owner for inp in o.inputs lambda o: [inp.owner for inp in o.inputs
if inp.owner and if inp.owner and
not any(i in inp.owner.outputs for i in inputs)]) not any(i in inp.owner.outputs for i in inputs)])
def is_in_ancestors(l_node, f_node):
r"""
Goes up in the graph and returns True if the apply node f_node is found.
Use a stack implementation as the vm algo.
We suppose all nodes are not lazy
(i.e. for IfElse we suppose all inputs are computed)
"""
computed = set()
todo = [l_node]
while todo:
cur = todo.pop()
if cur.outputs[0] in computed:
continue
if all([i in computed or i.owner is None for i in cur.inputs]):
computed.update(cur.outputs)
if cur is f_node:
return True
else:
todo.append(cur)
todo.extend(i.owner for i in cur.inputs if i.owner)
return False
...@@ -2089,13 +2089,7 @@ class TopoOptimizer(NavigatorOptimizer): ...@@ -2089,13 +2089,7 @@ class TopoOptimizer(NavigatorOptimizer):
if node is not current_node: if node is not current_node:
q.append(node) q.append(node)
def pruner(node): u = self.attach_updater(fgraph, importer, None,
if node is not current_node:
try:
q.remove(node)
except ValueError:
pass
u = self.attach_updater(fgraph, importer, pruner,
name=getattr(self, 'name', None)) name=getattr(self, 'name', None))
nb = 0 nb = 0
try: try:
...@@ -2105,6 +2099,8 @@ class TopoOptimizer(NavigatorOptimizer): ...@@ -2105,6 +2099,8 @@ class TopoOptimizer(NavigatorOptimizer):
node = q.pop() node = q.pop()
else: else:
node = q.popleft() node = q.popleft()
if node not in fgraph.apply_nodes:
continue
current_node = node current_node = node
nb += self.process_node(fgraph, node) nb += self.process_node(fgraph, node)
loop_t = time.time() - t0 loop_t = time.time() - t0
...@@ -2217,17 +2213,13 @@ class OpKeyOptimizer(NavigatorOptimizer): ...@@ -2217,17 +2213,13 @@ class OpKeyOptimizer(NavigatorOptimizer):
if node.op == op: if node.op == op:
q.append(node) q.append(node)
def pruner(node): u = self.attach_updater(fgraph, importer, None,
if node is not current_node and node.op == op:
try:
q.remove(node)
except ValueError:
pass
u = self.attach_updater(fgraph, importer, pruner,
name=getattr(self, 'name', None)) name=getattr(self, 'name', None))
try: try:
while q: while q:
node = q.pop() node = q.pop()
if node not in fgraph.apply_nodes:
continue
current_node = node current_node = node
self.process_node(fgraph, node) self.process_node(fgraph, node)
finally: finally:
......
...@@ -73,7 +73,7 @@ def as_gpuarray_variable(x, context_name): ...@@ -73,7 +73,7 @@ def as_gpuarray_variable(x, context_name):
# If we couldn't deal with transfers, then maybe it's a tensor # If we couldn't deal with transfers, then maybe it's a tensor
if isinstance(x.type, tensor.TensorType): if isinstance(x.type, tensor.TensorType):
return gpu_from_host(context_name)(x) return GpuFromHost(context_name)(x)
# Try _as_GpuArrayVariable if possible # Try _as_GpuArrayVariable if possible
if hasattr(x, '_as_GpuArrayVariable'): if hasattr(x, '_as_GpuArrayVariable'):
...@@ -617,7 +617,7 @@ class HostFromGpu(Op): ...@@ -617,7 +617,7 @@ class HostFromGpu(Op):
def grad(self, inputs, grads): def grad(self, inputs, grads):
gz, = grads gz, = grads
return [gpu_from_host(inputs[0].type.context_name)(gz)] return [GpuFromHost(inputs[0].type.context_name)(gz)]
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
ev, = eval_points ev, = eval_points
...@@ -663,8 +663,8 @@ class GpuFromHost(Op): ...@@ -663,8 +663,8 @@ class GpuFromHost(Op):
def grad(self, inputs, grads): def grad(self, inputs, grads):
gz, = grads gz, = grads
return [host_from_gpu(as_gpuarray_variable( return [as_gpuarray_variable(
gz, context_name=self.context_name))] gz, context_name=self.context_name).transfer('cpu')]
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
ev, = eval_points ev, = eval_points
...@@ -722,14 +722,6 @@ class GpuFromHost(Op): ...@@ -722,14 +722,6 @@ class GpuFromHost(Op):
return (9,) return (9,)
# Caching GPUAlloc
def gpu_from_host(ctx):
if ctx not in gpu_alloc.cache:
gpu_from_host.cache[ctx] = GpuFromHost(ctx)
return gpu_from_host.cache[ctx]
gpu_from_host.cache = {}
class GpuToGpu(Op): class GpuToGpu(Op):
""" """
Transfer data between GPUs. Transfer data between GPUs.
...@@ -953,15 +945,6 @@ class GpuAlloc(HideC, Alloc): ...@@ -953,15 +945,6 @@ class GpuAlloc(HideC, Alloc):
return True return True
# Caching GPUAlloc
def gpu_alloc(ctx, memset_0=False):
key = (ctx, memset_0)
if key not in gpu_alloc.cache:
gpu_alloc.cache[key] = GpuAlloc(ctx, memset_0)
return gpu_alloc.cache[key]
gpu_alloc.cache = {}
class GpuAllocEmpty(HideC, AllocEmpty): class GpuAllocEmpty(HideC, AllocEmpty):
""" """
Allocate uninitialized memory on the GPU. Allocate uninitialized memory on the GPU.
...@@ -1048,14 +1031,6 @@ def empty_like(var): ...@@ -1048,14 +1031,6 @@ def empty_like(var):
return GpuAllocEmpty(var.type.dtype, var.type.context_name)(*var.shape) return GpuAllocEmpty(var.type.dtype, var.type.context_name)(*var.shape)
def gpu_alloc_empty(ctx, dtype):
key = (dtype, ctx)
if key not in gpu_alloc_empty.cache:
gpu_alloc_empty.cache[key] = GpuAllocEmpty(dtype, ctx)
return gpu_alloc_empty.cache[key]
gpu_alloc_empty.cache = {}
class GpuContiguous(Op): class GpuContiguous(Op):
""" """
Return a C contiguous version of the input. Return a C contiguous version of the input.
...@@ -1132,7 +1107,7 @@ class GpuReshape(HideC, tensor.Reshape): ...@@ -1132,7 +1107,7 @@ class GpuReshape(HideC, tensor.Reshape):
ctx_name = infer_context_name(x) ctx_name = infer_context_name(x)
x = as_gpuarray_variable(x, context_name=ctx_name) x = as_gpuarray_variable(x, context_name=ctx_name)
shp = tensor.as_tensor_variable(shp) shp = tensor.as_tensor_variable(shp)
res = host_from_gpu(x).reshape(shp, ndim=self.ndim) res = x.transfer('cpu').reshape(shp, ndim=self.ndim)
otype = GpuArrayType(dtype=res.dtype, otype = GpuArrayType(dtype=res.dtype,
broadcastable=res.broadcastable, broadcastable=res.broadcastable,
context_name=ctx_name) context_name=ctx_name)
......
差异被折叠。
...@@ -2,13 +2,13 @@ from __future__ import absolute_import, print_function, division ...@@ -2,13 +2,13 @@ from __future__ import absolute_import, print_function, division
import os import os
from theano import Apply, Op from theano import Apply, Op
from theano.tensor.extra_ops import CumOp from theano.tensor.extra_ops import CumOp
from .basic_ops import infer_context_name
try: try:
from pygpu import gpuarray from pygpu import gpuarray
except ImportError: except ImportError:
pass pass
from .basic_ops import (as_gpuarray_variable, GpuKernelBase, Kernel, GpuReshape) from .basic_ops import (as_gpuarray_variable, GpuKernelBase, Kernel, GpuReshape, infer_context_name)
from .opt import register_opt, op_lifter, register_opt2 from .opt import register_opt, op_lifter, register_opt2
......
...@@ -10,7 +10,7 @@ from theano.scalar import as_scalar, constant ...@@ -10,7 +10,7 @@ from theano.scalar import as_scalar, constant
from . import opt from . import opt
from .basic_ops import (as_gpuarray_variable, GpuAllocEmpty, from .basic_ops import (as_gpuarray_variable, GpuAllocEmpty,
infer_context_name, gpu_alloc_empty) infer_context_name)
from .type import gpu_context_type from .type import gpu_context_type
from .opt_util import alpha_merge, output_merge from .opt_util import alpha_merge, output_merge
...@@ -158,7 +158,7 @@ def local_gpua_dot_to_gemm16(op, ctx_name, inputs, outputs): ...@@ -158,7 +158,7 @@ def local_gpua_dot_to_gemm16(op, ctx_name, inputs, outputs):
if (A.ndim == 2 and B.ndim == 2 and if (A.ndim == 2 and B.ndim == 2 and
A.dtype == 'float16' and B.dtype == 'float16'): A.dtype == 'float16' and B.dtype == 'float16'):
fgraph = getattr(outputs[0], 'fgraph', None) fgraph = getattr(outputs[0], 'fgraph', None)
C = gpu_alloc_empty(ctx_name, dtype='float16')( C = GpuAllocEmpty('float16', ctx_name)(
shape_i(A, 0, fgraph), shape_i(B, 1, fgraph)) shape_i(A, 0, fgraph), shape_i(B, 1, fgraph))
return Gemm16()(C, 1.0, A, B, 0.0) return Gemm16()(C, 1.0, A, B, 0.0)
......
...@@ -44,8 +44,7 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name, ...@@ -44,8 +44,7 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name,
HostFromGpu, GpuFromHost, HostFromGpu, GpuFromHost,
GpuSplit, GpuContiguous, gpu_contiguous, GpuSplit, GpuContiguous, gpu_contiguous,
GpuAlloc, GpuAllocEmpty, GpuReshape, GpuAlloc, GpuAllocEmpty, GpuReshape,
GpuEye, gpu_join, GpuJoin, gpu_alloc_empty, GpuEye, gpu_join, GpuJoin)
gpu_alloc, gpu_from_host)
from .blas import (gpu_dot22, GpuGemm, GpuGer, GpuGemmBatch, from .blas import (gpu_dot22, GpuGemm, GpuGer, GpuGemmBatch,
gpugemm_no_inplace, gpugemm_inplace, gpugemm_no_inplace, gpugemm_inplace,
gpugemmbatch_no_inplace, gpugemmbatch_no_inplace,
...@@ -61,9 +60,8 @@ from .blocksparse import (GpuSparseBlockGemv, GpuSparseBlockOuter, ...@@ -61,9 +60,8 @@ from .blocksparse import (GpuSparseBlockGemv, GpuSparseBlockOuter,
from .nnet import (gpu_crossentropy_softmax_1hot_with_bias_dx, from .nnet import (gpu_crossentropy_softmax_1hot_with_bias_dx,
gpu_crossentropy_softmax_argmax_1hot_with_bias, gpu_crossentropy_softmax_argmax_1hot_with_bias,
gpu_softmax_with_bias, gpu_softmax) gpu_softmax_with_bias, gpu_softmax)
from .elemwise import (GpuElemwise, GpuDimShuffle, GpuCAReduceCuda, from .elemwise import (GpuElemwise, GpuDimShuffle, GpuCAReduceCuda,
GpuCAReduceCPY, gpu_ca_reduce_cuda, gpu_erfinv, gpu_erfcinv, GpuCAReduceCPY, gpu_erfinv, gpu_erfcinv,
max_inputs_to_GpuElemwise) max_inputs_to_GpuElemwise)
from .subtensor import (GpuIncSubtensor, GpuSubtensor, from .subtensor import (GpuIncSubtensor, GpuSubtensor,
GpuAdvancedSubtensor, GpuAdvancedSubtensor,
...@@ -165,14 +163,14 @@ gpu_optimizer.register('local_remove_all_assert', ...@@ -165,14 +163,14 @@ gpu_optimizer.register('local_remove_all_assert',
def safe_to_gpu(x, ctx_name): def safe_to_gpu(x, ctx_name):
if isinstance(x.type, tensor.TensorType): if isinstance(x.type, tensor.TensorType):
return gpu_from_host(ctx_name)(x) return GpuFromHost(ctx_name)(x)
else: else:
return x return x
def safe_to_cpu(x): def safe_to_cpu(x):
if isinstance(x.type, GpuArrayType): if isinstance(x.type, GpuArrayType):
return host_from_gpu(x) return x.transfer('cpu')
else: else:
return x return x
...@@ -236,7 +234,7 @@ def op_lifter(OP, cuda_only=False): ...@@ -236,7 +234,7 @@ def op_lifter(OP, cuda_only=False):
elif isinstance(new_op, (tuple, list)): elif isinstance(new_op, (tuple, list)):
return [safe_to_cpu(o) for o in new_op] return [safe_to_cpu(o) for o in new_op]
else: # suppose it is a variable on the GPU else: # suppose it is a variable on the GPU
return [host_from_gpu(new_op)] return [new_op.transfer('cpu')]
return False return False
local_opt.__name__ = maker.__name__ local_opt.__name__ = maker.__name__
return local_optimizer(OP)(local_opt) return local_optimizer(OP)(local_opt)
...@@ -269,7 +267,7 @@ class InputToGpuOptimizer(Optimizer): ...@@ -269,7 +267,7 @@ class InputToGpuOptimizer(Optimizer):
continue continue
try: try:
new_input = host_from_gpu(gpu_from_host(target)(input)) new_input = GpuFromHost(target)(input).transfer('cpu')
fgraph.replace_validate(input, new_input, fgraph.replace_validate(input, new_input,
"InputToGpuOptimizer") "InputToGpuOptimizer")
except TypeError: except TypeError:
...@@ -546,7 +544,7 @@ def local_cut_gpu_transfers(node): ...@@ -546,7 +544,7 @@ def local_cut_gpu_transfers(node):
# gpub -> # gpub ->
if isinstance(n2.op, GpuToGpu): if isinstance(n2.op, GpuToGpu):
return [host_from_gpu(n2.inputs[0])] return [n2.inputs[0].transfer('cpu')]
# ? -> gpua -> gpub # ? -> gpua -> gpub
elif isinstance(node.op, GpuToGpu): elif isinstance(node.op, GpuToGpu):
...@@ -600,14 +598,14 @@ def local_gpua_alloc2(node): ...@@ -600,14 +598,14 @@ def local_gpua_alloc2(node):
i.owner.op in [host_from_gpu, tensor.alloc] i.owner.op in [host_from_gpu, tensor.alloc]
for i in c.inputs[1:]) for i in c.inputs[1:])
for c, idx in node.outputs[0].clients)): for c, idx in node.outputs[0].clients)):
return [host_from_gpu(gpu_alloc(None)(*node.inputs))] return [GpuAlloc(None)(*node.inputs).transfer('cpu')]
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.Alloc]) @op_lifter([tensor.Alloc])
@register_opt2([tensor.Alloc], 'fast_compile') @register_opt2([tensor.Alloc], 'fast_compile')
def local_gpua_alloc(op, context_name, inputs, outputs): def local_gpuaalloc(op, context_name, inputs, outputs):
return gpu_alloc(context_name) return GpuAlloc(context_name)(*inputs)
@register_opt('fast_compile') @register_opt('fast_compile')
...@@ -616,7 +614,7 @@ def local_gpua_alloc(op, context_name, inputs, outputs): ...@@ -616,7 +614,7 @@ def local_gpua_alloc(op, context_name, inputs, outputs):
def local_gpua_alloc_empty(op, context_name, inputs, outputs): def local_gpua_alloc_empty(op, context_name, inputs, outputs):
# We use _props_dict() to make sure that the GPU op know all the # We use _props_dict() to make sure that the GPU op know all the
# CPU op props. # CPU op props.
return gpu_alloc_empty(context_name, **op._props_dict()) return GpuAllocEmpty(context_name=context_name, **op._props_dict())(*inputs)
@register_opt() @register_opt()
...@@ -627,7 +625,7 @@ def local_gpualloc_memset_0(node): ...@@ -627,7 +625,7 @@ def local_gpualloc_memset_0(node):
if (isinstance(inp, GpuArrayConstant) and if (isinstance(inp, GpuArrayConstant) and
inp.data.size == 1 and inp.data.size == 1 and
(np.asarray(inp.data) == 0).all()): (np.asarray(inp.data) == 0).all()):
new_op = gpu_alloc(node.op.context_name, memset_0=True) new_op = GpuAlloc(node.op.context_name, memset_0=True)
return [new_op(*node.inputs)] return [new_op(*node.inputs)]
...@@ -637,8 +635,8 @@ def local_gpua_alloc_empty_to_zeros(node): ...@@ -637,8 +635,8 @@ def local_gpua_alloc_empty_to_zeros(node):
if isinstance(node.op, GpuAllocEmpty): if isinstance(node.op, GpuAllocEmpty):
context_name = infer_context_name(*node.inputs) context_name = infer_context_name(*node.inputs)
z = np.asarray(0, dtype=node.outputs[0].dtype) z = np.asarray(0, dtype=node.outputs[0].dtype)
return [gpu_alloc(context_name)(as_gpuarray_variable(z, context_name), return [GpuAlloc(context_name)(as_gpuarray_variable(z, context_name),
*node.inputs)] *node.inputs)]
optdb.register('local_gpua_alloc_empty_to_zeros', optdb.register('local_gpua_alloc_empty_to_zeros',
theano.tensor.opt.in2out(local_gpua_alloc_empty_to_zeros), theano.tensor.opt.in2out(local_gpua_alloc_empty_to_zeros),
# After move to gpu and merge2, before inplace. # After move to gpu and merge2, before inplace.
...@@ -918,7 +916,7 @@ def local_gpu_pdbbreakpoint_op(node): ...@@ -918,7 +916,7 @@ def local_gpu_pdbbreakpoint_op(node):
new_outputs = [] new_outputs = []
for i in range(len(new_op_outputs)): for i in range(len(new_op_outputs)):
if input_transfered[i]: if input_transfered[i]:
new_outputs.append(host_from_gpu(new_op_outputs[i])) new_outputs.append(new_op_outputs[i].transfer('cpu'))
else: else:
new_outputs.append(new_op_outputs[i]) new_outputs.append(new_op_outputs[i])
...@@ -983,7 +981,7 @@ def local_gpua_subtensor(op, context_name, inputs, outputs): ...@@ -983,7 +981,7 @@ def local_gpua_subtensor(op, context_name, inputs, outputs):
for n, _ in outputs[0].clients]): for n, _ in outputs[0].clients]):
return return
else: else:
return [host_from_gpu(gpu_x.owner.op(outputs[0]))] return [gpu_x.owner.op(outputs[0]).transfer('cpu')]
return GpuSubtensor(op.idx_list) return GpuSubtensor(op.idx_list)
...@@ -1234,7 +1232,7 @@ def local_gpua_dot22scalar(op, context_name, inputs, outputs): ...@@ -1234,7 +1232,7 @@ def local_gpua_dot22scalar(op, context_name, inputs, outputs):
x, y, a = inputs x, y, a = inputs
x = as_gpuarray_variable(x, context_name) x = as_gpuarray_variable(x, context_name)
y = as_gpuarray_variable(y, context_name) y = as_gpuarray_variable(y, context_name)
z = gpu_alloc_empty(context_name, dtype=x.dtype)(x.shape[0], y.shape[1]) z = GpuAllocEmpty(x.dtype, context_name)(x.shape[0], y.shape[1])
return [gpugemm_no_inplace(z, a, x, y, 0)] return [gpugemm_no_inplace(z, a, x, y, 0)]
...@@ -1804,10 +1802,10 @@ def local_gpu_elemwise_careduce(node): ...@@ -1804,10 +1802,10 @@ def local_gpu_elemwise_careduce(node):
isinstance(node.inputs[0].owner.op.scalar_op, scalar.basic.Sqr)): isinstance(node.inputs[0].owner.op.scalar_op, scalar.basic.Sqr)):
op = node.op op = node.op
inp = node.inputs[0].owner.inputs[0] inp = node.inputs[0].owner.inputs[0]
return [gpu_ca_reduce_cuda(scalar_op=op.scalar_op, return [GpuCAReduceCuda(scalar_op=op.scalar_op,
axis=op.axis, axis=op.axis,
reduce_mask=op.reduce_mask, reduce_mask=op.reduce_mask,
pre_scalar_op=scalar.basic.sqr)(inp)] pre_scalar_op=scalar.basic.sqr)(inp)]
@local_optimizer(None) @local_optimizer(None)
......
...@@ -8,7 +8,7 @@ from theano.gof import local_optimizer ...@@ -8,7 +8,7 @@ from theano.gof import local_optimizer
from theano.tensor import (DimShuffle, get_scalar_constant_value, from theano.tensor import (DimShuffle, get_scalar_constant_value,
NotScalarConstantError) NotScalarConstantError)
from .basic_ops import GpuFromHost, HostFromGpu, GpuAllocEmpty, GpuReshape, gpu_alloc_empty from .basic_ops import GpuFromHost, HostFromGpu, GpuAllocEmpty, GpuReshape
from .elemwise import GpuDimShuffle, GpuElemwise from .elemwise import GpuDimShuffle, GpuElemwise
_one = scal.constant(np.asarray(1.0, dtype='float32')) _one = scal.constant(np.asarray(1.0, dtype='float32'))
...@@ -324,7 +324,7 @@ def inplace_allocempty(op, idx): ...@@ -324,7 +324,7 @@ def inplace_allocempty(op, idx):
if (alloc.owner and if (alloc.owner and
isinstance(alloc.owner.op, GpuAllocEmpty) and isinstance(alloc.owner.op, GpuAllocEmpty) and
len(alloc.clients) > 1): len(alloc.clients) > 1):
alloc_op = gpu_alloc_empty(alloc.owner.op.context_name, dtype=alloc.owner.op.dtype) alloc_op = GpuAllocEmpty(alloc.owner.op.dtype, alloc.owner.op.context_name)
inputs[idx] = alloc_op(*alloc.owner.inputs) inputs[idx] = alloc_op(*alloc.owner.inputs)
return maker(node, inputs) return maker(node, inputs)
return opt return opt
......
...@@ -271,7 +271,7 @@ class GpuArrayType(Type): ...@@ -271,7 +271,7 @@ class GpuArrayType(Type):
return data return data
def filter_variable(self, other, allow_convert=True): def filter_variable(self, other, allow_convert=True):
from theano.gpuarray.basic_ops import gpu_from_host from theano.gpuarray.basic_ops import GpuFromHost
if hasattr(other, '_as_GpuArrayVariable'): if hasattr(other, '_as_GpuArrayVariable'):
other = other._as_GpuArrayVariable(self.context_name) other = other._as_GpuArrayVariable(self.context_name)
...@@ -303,7 +303,7 @@ class GpuArrayType(Type): ...@@ -303,7 +303,7 @@ class GpuArrayType(Type):
str(self.broadcastable))) str(self.broadcastable)))
other = other2 other = other2
return gpu_from_host(self.context_name)(other) return GpuFromHost(self.context_name)(other)
@staticmethod @staticmethod
def values_eq(a, b, force_same_dtype=True): def values_eq(a, b, force_same_dtype=True):
......
...@@ -1712,6 +1712,9 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, ...@@ -1712,6 +1712,9 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
if max_abs_err > abs_tol and max_rel_err > rel_tol: if max_abs_err > abs_tol and max_rel_err > rel_tol:
raise verify_grad.E_grad(max_arg, max_err_pos, raise verify_grad.E_grad(max_arg, max_err_pos,
analytic_grad[max_arg].shape,
analytic_grad[max_arg].flatten()[max_err_pos],
num_grad.gf[max_arg].flatten()[max_err_pos],
max_abs_err, max_rel_err, max_abs_err, max_rel_err,
abs_tol, rel_tol) abs_tol, rel_tol)
...@@ -1727,10 +1730,14 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, ...@@ -1727,10 +1730,14 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
class GradientError(Exception): class GradientError(Exception):
"""This error is raised when a gradient is calculated, but incorrect.""" """This error is raised when a gradient is calculated, but incorrect."""
def __init__(self, arg, err_pos, abs_err, rel_err, abs_tol, rel_tol): def __init__(self, arg, err_pos, shape, val1, val2,
abs_err, rel_err, abs_tol, rel_tol):
Exception.__init__(self) # to be compatible with python2.4 Exception.__init__(self) # to be compatible with python2.4
self.arg = arg self.arg = arg
self.err_pos = err_pos self.err_pos = err_pos
self.shape = shape
self.val1 = val1
self.val2 = val2
self.abs_err = abs_err self.abs_err = abs_err
self.rel_err = rel_err self.rel_err = rel_err
self.abs_tol = abs_tol self.abs_tol = abs_tol
...@@ -1741,10 +1748,13 @@ class GradientError(Exception): ...@@ -1741,10 +1748,13 @@ class GradientError(Exception):
args_msg = ", ".join(str(a) for a in self.args) args_msg = ", ".join(str(a) for a in self.args)
return """\ return """\
GradientError: numeric gradient and analytic gradient exceed tolerance: GradientError: numeric gradient and analytic gradient exceed tolerance:
At position %i of argument %i, At position %i of argument %i with shape %s,
val1 = %f , val2 = %f
abs. error = %f, abs. tolerance = %f abs. error = %f, abs. tolerance = %f
rel. error = %f, rel. tolerance = %f rel. error = %f, rel. tolerance = %f
Exception args: %s""" % (self.err_pos, self.arg, Exception args: %s""" % (self.err_pos, self.arg,
self.shape,
self.val1, self.val2,
self.abs_err, self.abs_tol, self.abs_err, self.abs_tol,
self.rel_err, self.rel_tol, self.rel_err, self.rel_tol,
args_msg) args_msg)
......
...@@ -26,7 +26,6 @@ from six import iteritems ...@@ -26,7 +26,6 @@ from six import iteritems
from six.moves import xrange from six.moves import xrange
from theano.compile import optdb from theano.compile import optdb
from theano.tensor import opt from theano.tensor import opt
from theano.scan_module.scan_utils import find_up
from theano.scan_module.scan_utils import clone from theano.scan_module.scan_utils import clone
...@@ -578,7 +577,7 @@ class CondMerge(gof.Optimizer): ...@@ -578,7 +577,7 @@ class CondMerge(gof.Optimizer):
merging_node = cond_nodes[0] merging_node = cond_nodes[0]
for proposal in cond_nodes[1:]: for proposal in cond_nodes[1:]:
if (proposal.inputs[0] == merging_node.inputs[0] and if (proposal.inputs[0] == merging_node.inputs[0] and
not find_up(proposal, merging_node)): not gof.graph.is_in_ancestors(proposal, merging_node)):
# Create a list of replacements for proposal # Create a list of replacements for proposal
mn_ts = merging_node.inputs[1:][:merging_node.op.n_outs] mn_ts = merging_node.inputs[1:][:merging_node.op.n_outs]
mn_fs = merging_node.inputs[1:][merging_node.op.n_outs:] mn_fs = merging_node.inputs[1:][merging_node.op.n_outs:]
...@@ -683,8 +682,8 @@ def cond_merge_random_op(main_node): ...@@ -683,8 +682,8 @@ def cond_merge_random_op(main_node):
merging_node = cond_nodes[0] merging_node = cond_nodes[0]
for proposal in cond_nodes[1:]: for proposal in cond_nodes[1:]:
if (proposal.inputs[0] == merging_node.inputs[0] and if (proposal.inputs[0] == merging_node.inputs[0] and
not find_up(proposal, merging_node) and not gof.graph.is_in_ancestors(proposal, merging_node) and
not find_up(merging_node, proposal)): not gof.graph.is_in_ancestors(merging_node, proposal)):
# Create a list of replacements for proposal # Create a list of replacements for proposal
mn_ts = merging_node.inputs[1:][:merging_node.op.n_outs] mn_ts = merging_node.inputs[1:][:merging_node.op.n_outs]
mn_fs = merging_node.inputs[1:][merging_node.op.n_outs:] mn_fs = merging_node.inputs[1:][merging_node.op.n_outs:]
......
...@@ -9,7 +9,7 @@ import theano ...@@ -9,7 +9,7 @@ import theano
y = theano.tensor.fvector() y = theano.tensor.fvector()
x = theano.shared(np.zeros(1, dtype='float32')) x = theano.shared(np.zeros(1, dtype='float32'))
f1 = theano.function([y], updates={x: y}) f1 = theano.function([y], updates={x: y})
f2 = theano.function([], theano.sandbox.cuda.host_from_gpu(x)) f2 = theano.function([], x.transfer('cpu'))
print(f1.maker.fgraph.toposort()) print(f1.maker.fgraph.toposort())
print(f2.maker.fgraph.toposort()) print(f2.maker.fgraph.toposort())
for i in [1, 10, 100, 1000, 10000, 100000, 1000000, 10000000]: for i in [1, 10, 100, 1000, 10000, 100000, 1000000, 10000000]:
......
from __future__ import absolute_import, print_function, division from __future__ import absolute_import, print_function, division
from .ops import (cholesky, matrix_inverse, solve, from theano.tensor.slinalg import (cholesky, solve, eigvalsh)
diag, extract_diag, alloc_diag, from theano.tensor.nlinalg import (matrix_inverse,
det, psd, eig, eigh, eigvalsh, diag, extract_diag, alloc_diag,
trace, spectral_radius_bound) det, eig, eigh,
trace)
from theano.sandbox.linalg.ops import psd, spectral_radius_bound
from __future__ import absolute_import, print_function, division from __future__ import absolute_import, print_function, division
import logging import logging
logger = logging.getLogger(__name__)
import numpy
from six import iteritems, integer_types from six import iteritems, integer_types
from six.moves import xrange from six.moves import xrange
from theano.gof import Op, Apply from theano.gof import Op, Apply
from theano.tensor import as_tensor_variable, dot, DimShuffle, Dot from theano.tensor import DimShuffle, Dot
from theano.tensor.blas import Dot22 from theano.tensor.blas import Dot22
from theano import tensor from theano import tensor
import theano.tensor import theano.tensor
from theano.tensor.opt import (register_stabilize, from theano.tensor.opt import (register_stabilize,
register_specialize, register_canonicalize) register_specialize,
register_canonicalize)
from theano.gof import local_optimizer from theano.gof import local_optimizer
from theano.gof.opt import Optimizer from theano.gof.opt import Optimizer
from theano.gradient import DisconnectedType
from theano.tensor.nlinalg import (MatrixInverse,
from theano.tensor.nlinalg import ( MatrixInverse, matrix_inverse,
matrix_inverse, extract_diag,
MatrixPinv, trace,
pinv, det)
AllocDiag,
alloc_diag, from theano.tensor.slinalg import (Cholesky,
ExtractDiag, cholesky,
extract_diag, Solve,
diag, solve,
trace, imported_scipy)
Det,
det,
Eig, logger = logging.getLogger(__name__)
eig,
Eigh,
EighGrad,
eigh,
matrix_dot,
_zero_disconnected,
qr,
svd,
lstsq,
matrix_power,
norm
)
from theano.tensor.slinalg import ( Cholesky,
cholesky,
CholeskyGrad,
Solve,
solve,
Eigvalsh,
EigvalshGrad,
eigvalsh
)
try:
import scipy.linalg
imported_scipy = True
except ImportError:
# some ops (e.g. Cholesky, Solve, A_Xinv_b) won't work
imported_scipy = False
class Hint(Op): class Hint(Op):
...@@ -212,8 +180,6 @@ class HintsFeature(object): ...@@ -212,8 +180,6 @@ class HintsFeature(object):
class HintsOptimizer(Optimizer): class HintsOptimizer(Optimizer):
""" """
Optimizer that serves to add HintsFeature as an fgraph feature. Optimizer that serves to add HintsFeature as an fgraph feature.
""" """
def __init__(self): def __init__(self):
...@@ -310,8 +276,8 @@ def tag_solve_triangular(node): ...@@ -310,8 +276,8 @@ def tag_solve_triangular(node):
return [Solve('lower_triangular')(A, b)] return [Solve('lower_triangular')(A, b)]
else: else:
return [Solve('upper_triangular')(A, b)] return [Solve('upper_triangular')(A, b)]
if (A.owner and isinstance(A.owner.op, DimShuffle) if (A.owner and isinstance(A.owner.op, DimShuffle) and
and A.owner.op.new_order == (1, 0)): A.owner.op.new_order == (1, 0)):
A_T, = A.owner.inputs A_T, = A.owner.inputs
if A_T.owner and isinstance(A_T.owner.op, type(cholesky)): if A_T.owner and isinstance(A_T.owner.op, type(cholesky)):
if A_T.owner.op.lower: if A_T.owner.op.lower:
...@@ -423,6 +389,5 @@ def spectral_radius_bound(X, log2_exponent): ...@@ -423,6 +389,5 @@ def spectral_radius_bound(X, log2_exponent):
XX = X XX = X
for i in xrange(log2_exponent): for i in xrange(log2_exponent):
XX = tensor.dot(XX, XX) XX = tensor.dot(XX, XX)
return tensor.pow( return tensor.pow(trace(XX),
trace(XX), 2 ** (-log2_exponent))
2 ** (-log2_exponent))
...@@ -163,4 +163,4 @@ def test_matrix_inverse_solve(): ...@@ -163,4 +163,4 @@ def test_matrix_inverse_solve():
b = theano.tensor.dmatrix('b') b = theano.tensor.dmatrix('b')
node = matrix_inverse(A).dot(b).owner node = matrix_inverse(A).dot(b).owner
[out] = inv_as_solve.transform(node) [out] = inv_as_solve.transform(node)
assert isinstance(out.owner.op, Solve) assert isinstance(out.owner.op, Solve)
...@@ -29,8 +29,7 @@ from theano.gpuarray.basic_ops import GpuKernelBase, Kernel, infer_context_name, ...@@ -29,8 +29,7 @@ from theano.gpuarray.basic_ops import GpuKernelBase, Kernel, infer_context_name,
from theano.gpuarray.type import GpuArrayType from theano.gpuarray.type import GpuArrayType
from theano.gpuarray.fp16_help import write_w from theano.gpuarray.fp16_help import write_w
from theano.gpuarray.opt import (register_opt as register_gpua, from theano.gpuarray.opt import (register_opt as register_gpua,
register_opt2, register_opt2)
host_from_gpu as host_from_gpua)
if theano.sandbox.cuda.cuda_available: if theano.sandbox.cuda.cuda_available:
from theano.sandbox.cuda import (CudaNdarrayType, from theano.sandbox.cuda import (CudaNdarrayType,
float32_shared_constructor) float32_shared_constructor)
...@@ -1621,7 +1620,7 @@ def local_gpua_mrg_graph(op, context_name, inputs, outputs): ...@@ -1621,7 +1620,7 @@ def local_gpua_mrg_graph(op, context_name, inputs, outputs):
op.output_type.ndim, op.output_type.ndim,
op.output_type.dtype, op.output_type.dtype,
inputs[1]) inputs[1])
return [outs[0], host_from_gpua(outs[1])] return [outs[0], outs[1].transfer('cpu')]
@register_gpua('fast_compile') @register_gpua('fast_compile')
......
...@@ -70,7 +70,7 @@ from theano.gof.opt import pre_constant_merge, pre_greedy_local_optimizer ...@@ -70,7 +70,7 @@ from theano.gof.opt import pre_constant_merge, pre_greedy_local_optimizer
from theano.scan_module import scan_op from theano.scan_module import scan_op
from theano.scan_module import scan_utils from theano.scan_module import scan_utils
from theano.scan_module.scan_utils import equal_computations, find_up, scan_args from theano.scan_module.scan_utils import equal_computations, scan_args
__docformat__ = 'restructedtext en' __docformat__ = 'restructedtext en'
__authors__ = ("Razvan Pascanu " __authors__ = ("Razvan Pascanu "
...@@ -1605,7 +1605,7 @@ class ScanSaveMem(gof.Optimizer): ...@@ -1605,7 +1605,7 @@ class ScanSaveMem(gof.Optimizer):
nw_pos = compress_map[idx] nw_pos = compress_map[idx]
old_new += [(o, new_outs[nw_pos])] old_new += [(o, new_outs[nw_pos])]
# Check if the new outputs depend on the old scan node # Check if the new outputs depend on the old scan node
old_scan_is_used = [scan_utils.find_up(new.owner, node) old_scan_is_used = [gof.graph.is_in_ancestors(new.owner, node)
for old, new in old_new] for old, new in old_new]
if any(old_scan_is_used): if any(old_scan_is_used):
return False return False
...@@ -1829,19 +1829,21 @@ class ScanMerge(gof.Optimizer): ...@@ -1829,19 +1829,21 @@ class ScanMerge(gof.Optimizer):
except tensor.NotScalarConstantError: except tensor.NotScalarConstantError:
pass pass
if nsteps != rep_nsteps:
return False
# Check to see if it is an input of a different node # Check to see if it is an input of a different node
for nd in set_nodes: for nd in set_nodes:
if find_up(node, nd) or find_up(nd, node): if gof.graph.is_in_ancestors(node, nd) or gof.graph.is_in_ancestors(nd, node):
return False return False
if not node.op.as_while: if not node.op.as_while:
return nsteps == rep_nsteps return True
cond = node.op.outputs[-1] cond = node.op.outputs[-1]
rep_cond = rep.op.outputs[-1] rep_cond = rep.op.outputs[-1]
same_cond = scan_utils.equal_computations([cond], [rep_cond], return scan_utils.equal_computations([cond], [rep_cond],
node.op.inputs, node.op.inputs,
rep.op.inputs) rep.op.inputs)
return same_cond and (nsteps == rep_nsteps)
def apply(self, fgraph): def apply(self, fgraph):
# Collect all scan nodes ordered according to toposort # Collect all scan nodes ordered according to toposort
......
...@@ -152,7 +152,7 @@ def traverse(out, x, x_copy, d, visited=None): ...@@ -152,7 +152,7 @@ def traverse(out, x, x_copy, d, visited=None):
return d return d
visited.add(out) visited.add(out)
from theano.sandbox import cuda from theano.sandbox import cuda
from theano.gpuarray.basic_ops import gpu_from_host, host_from_gpu from theano.gpuarray.basic_ops import GpuFromHost, host_from_gpu
from theano.gpuarray import pygpu_activated from theano.gpuarray import pygpu_activated
from theano.gpuarray.type import GpuArrayType from theano.gpuarray.type import GpuArrayType
if out == x: if out == x:
...@@ -160,7 +160,7 @@ def traverse(out, x, x_copy, d, visited=None): ...@@ -160,7 +160,7 @@ def traverse(out, x, x_copy, d, visited=None):
d[out] = cuda.gpu_from_host(x_copy) d[out] = cuda.gpu_from_host(x_copy)
else: else:
assert isinstance(x.type, GpuArrayType) assert isinstance(x.type, GpuArrayType)
d[out] = gpu_from_host(x.type.context_name)(x_copy) d[out] = GpuFromHost(x.type.context_name)(x_copy)
return d return d
elif out.owner is None: elif out.owner is None:
return d return d
...@@ -876,10 +876,13 @@ class Validator(object): ...@@ -876,10 +876,13 @@ class Validator(object):
if out.owner is None: if out.owner is None:
if isinstance(out, tensor.TensorConstant): if isinstance(out, tensor.TensorConstant):
if hasattr(out, 'fgraph'): if hasattr(out, 'fgraph') or getattr(out, 'cached', False):
# If out have an fgraph, we aren't sure if it # If out have an fgraph, we aren't sure if it
# is from the inner graph or outer graph, so # is from the inner graph or outer graph, so
# clone it. # clone it.
# As it will be used as is in an FunctionGraph
# (won't be cloned later), it can't be a
# cached variable
cloned_out = out.clone() cloned_out = out.clone()
self.valid.add(cloned_out) self.valid.add(cloned_out)
self.invalid.add(out) self.invalid.add(out)
...@@ -1113,20 +1116,6 @@ def compress_outs(op, not_required, inputs): ...@@ -1113,20 +1116,6 @@ def compress_outs(op, not_required, inputs):
return (op_inputs, op_outputs, info, node_inputs, map_old_new) return (op_inputs, op_outputs, info, node_inputs, map_old_new)
def find_up(l_node, f_node):
r"""
Goes up in the graph and returns True if a node in nodes is found.
"""
if isinstance(l_node, gof.Apply):
l_outs = l_node.outputs
else:
l_outs = l_node
l_ins = gof.graph.inputs(l_outs)
nodes = gof.graph.io_toposort(l_ins, l_outs)
return f_node in nodes
def reconstruct_graph(inputs, outputs, tag=None): def reconstruct_graph(inputs, outputs, tag=None):
""" """
Different interface to clone, that allows you to pass inputs. Different interface to clone, that allows you to pass inputs.
......
...@@ -332,7 +332,7 @@ def make_gpu_optimizer(op, to_gpu): ...@@ -332,7 +332,7 @@ def make_gpu_optimizer(op, to_gpu):
new_inp[idx] = cuda.gpu_from_host(new_inp[idx]) new_inp[idx] = cuda.gpu_from_host(new_inp[idx])
result_node = op()(*new_inp) result_node = op()(*new_inp)
copy_stack_trace(node.outputs[0], result_node) copy_stack_trace(node.outputs[0], result_node)
transfer_node = cuda.host_from_gpu(result_node) transfer_node = result_node.transfer('cpu')
copy_stack_trace(node.outputs[0], transfer_node) copy_stack_trace(node.outputs[0], transfer_node)
return [transfer_node] return [transfer_node]
if node.op == cuda.gpu_from_host: if node.op == cuda.gpu_from_host:
......
...@@ -8,7 +8,7 @@ __docformat__ = 'restructedtext en' ...@@ -8,7 +8,7 @@ __docformat__ = 'restructedtext en'
from collections import OrderedDict from collections import OrderedDict
import numpy import numpy as np
import theano import theano
import theano.tensor as T import theano.tensor as T
...@@ -17,12 +17,12 @@ import theano.tensor as T ...@@ -17,12 +17,12 @@ import theano.tensor as T
def gen_data(): def gen_data():
# generate the dataset # generate the dataset
train_set = (numpy.asarray(numpy.random.rand(10000, 784), dtype='float32'), train_set = (np.asarray(np.random.rand(10000, 784), dtype='float32'),
numpy.asarray(numpy.random.rand(10000)*10, dtype='int64')) np.asarray(np.random.rand(10000)*10, dtype='int64'))
valid_set = (numpy.asarray(numpy.random.rand(10000, 784), dtype='float32'), valid_set = (np.asarray(np.random.rand(10000, 784), dtype='float32'),
numpy.asarray(numpy.random.rand(10000)*10, dtype='int64')) np.asarray(np.random.rand(10000)*10, dtype='int64'))
test_set = (numpy.asarray(numpy.random.rand(10000, 784), dtype='float32'), test_set = (np.asarray(np.random.rand(10000, 784), dtype='float32'),
numpy.asarray(numpy.random.rand(10000)*10, dtype='int64')) np.asarray(np.random.rand(10000)*10, dtype='int64'))
def shared_dataset(data_xy): def shared_dataset(data_xy):
""" Function that loads the dataset into shared variables """ Function that loads the dataset into shared variables
...@@ -33,8 +33,8 @@ def gen_data(): ...@@ -33,8 +33,8 @@ def gen_data():
variable) would lead to a large decrease in performance. variable) would lead to a large decrease in performance.
""" """
data_x, data_y = data_xy data_x, data_y = data_xy
shared_x = theano.shared(numpy.asarray(data_x, dtype=theano.config.floatX)) shared_x = theano.shared(np.asarray(data_x, dtype=theano.config.floatX))
shared_y = theano.shared(numpy.asarray(data_y, dtype=theano.config.floatX)) shared_y = theano.shared(np.asarray(data_y, dtype=theano.config.floatX))
# When storing data on the GPU it has to be stored as floats # When storing data on the GPU it has to be stored as floats
# therefore we will store the labels as ``floatX`` as well # therefore we will store the labels as ``floatX`` as well
# (``shared_y`` does exactly that). But during our computations # (``shared_y`` does exactly that). But during our computations
...@@ -79,7 +79,7 @@ class LogisticRegression(object): ...@@ -79,7 +79,7 @@ class LogisticRegression(object):
""" """
# initialize with 0 the weights W as a matrix of shape (n_in, n_out) # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
self.W = theano.shared(value=numpy.zeros((n_in, n_out), dtype=theano.config.floatX), self.W = theano.shared(value=np.zeros((n_in, n_out), dtype=theano.config.floatX),
name=name_prefix+'W') name=name_prefix+'W')
# compute vector of class-membership probabilities in symbolic form # compute vector of class-membership probabilities in symbolic form
...@@ -129,7 +129,7 @@ class HiddenLayer(object): ...@@ -129,7 +129,7 @@ class HiddenLayer(object):
Hidden unit activation is given by: tanh(dot(input,W) + b) Hidden unit activation is given by: tanh(dot(input,W) + b)
:type rng: numpy.random.RandomState :type rng: np.random.RandomState
:param rng: a random number generator used to initialize weights :param rng: a random number generator used to initialize weights
:type input: theano.tensor.dmatrix :type input: theano.tensor.dmatrix
...@@ -151,9 +151,9 @@ class HiddenLayer(object): ...@@ -151,9 +151,9 @@ class HiddenLayer(object):
# from -6./sqrt(n_in+n_hidden) and 6./sqrt(n_in+n_hidden) # from -6./sqrt(n_in+n_hidden) and 6./sqrt(n_in+n_hidden)
# the output of uniform if converted using asarray to dtype # the output of uniform if converted using asarray to dtype
# theano.config.floatX so that the code is runable on GPU # theano.config.floatX so that the code is runable on GPU
W_values = numpy.asarray( rng.uniform( \ W_values = np.asarray( rng.uniform( \
low=-numpy.sqrt(6./(n_in+n_out)), \ low=-np.sqrt(6./(n_in+n_out)), \
high=numpy.sqrt(6./(n_in+n_out)), \ high=np.sqrt(6./(n_in+n_out)), \
size=(n_in, n_out)), dtype=theano.config.floatX) size=(n_in, n_out)), dtype=theano.config.floatX)
self.W = theano.shared(value=W_values, name=name_prefix+'W') self.W = theano.shared(value=W_values, name=name_prefix+'W')
...@@ -176,7 +176,7 @@ class MLP(object): ...@@ -176,7 +176,7 @@ class MLP(object):
def __init__(self, rng, input, n_in, n_hidden, n_out): def __init__(self, rng, input, n_in, n_hidden, n_out):
"""Initialize the parameters for the multilayer perceptron """Initialize the parameters for the multilayer perceptron
:type rng: numpy.random.RandomState :type rng: np.random.RandomState
:param rng: a random number generator used to initialize weights :param rng: a random number generator used to initialize weights
:type input: theano.tensor.TensorType :type input: theano.tensor.TensorType
...@@ -265,7 +265,7 @@ def test_mlp(): ...@@ -265,7 +265,7 @@ def test_mlp():
y = T.ivector('y') # the labels are presented as 1D vector of y = T.ivector('y') # the labels are presented as 1D vector of
# [int] labels # [int] labels
rng = numpy.random.RandomState(1234) rng = np.random.RandomState(1234)
# construct the MLP class # construct the MLP class
classifier = MLP( rng=rng, input=x, n_in=28*28, n_hidden=500, n_out=10) classifier = MLP( rng=rng, input=x, n_in=28*28, n_hidden=500, n_out=10)
......
This source diff could not be displayed because it is too large. You can view the blob instead.
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论