提交 839fa93b authored 作者: Frédéric Bastien's avatar Frédéric Bastien 提交者: GitHub

Merge branch 'master' into ipT_grad

#!/bin/bash
BUILDBOT_DIR=$WORKSPACE/nightly_build
THEANO_PARAM="theano --with-timer --timer-top-n 10"
THEANO_PARAM="theano --with-timer --timer-top-n 10 -v"
export THEANO_FLAGS=init_gpu_device=gpu
# CUDA
......
......@@ -66,8 +66,7 @@ features:
* tight integration with NumPy: a similar interface to NumPy's.
numpy.ndarrays are also used internally in Theano-compiled functions.
* transparent use of a GPU: perform data-intensive computations up to
140x faster than on a CPU (support for float32 only).
* transparent use of a GPU: perform data-intensive computations much faster than on a CPU.
* efficient symbolic differentiation: Theano can compute derivatives
for functions of one or many inputs.
* speed and stability optimizations: avoid nasty bugs when computing
......
......@@ -7,7 +7,7 @@ evaluate mathematical expressions involving multi-dimensional
arrays efficiently. Theano features:
* **tight integration with NumPy** -- Use `numpy.ndarray` in Theano-compiled functions.
* **transparent use of a GPU** -- Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
* **transparent use of a GPU** -- Perform data-intensive computations much faster than on a CPU.
* **efficient symbolic differentiation** -- Theano does your derivatives for functions with one or many inputs.
* **speed and stability optimizations** -- Get the right answer for ``log(1+x)`` even when ``x`` is really tiny.
* **dynamic C code generation** -- Evaluate expressions faster.
......
......@@ -12,6 +12,7 @@ CentOS 6 Installation Instructions
page <http://deeplearning.net/software/theano_versions/dev/install_centos6.html>`_.
.. |PlatformCompiler| replace:: ``python-dev``, ``g++`` >= 4.2
.. |CompilerName| replace:: ``g++``
.. include:: requirements.inc
......
......@@ -9,6 +9,24 @@ Installation
Stable Installation
-------------------
With ``conda``
^^^^^^^^^^^^^^
If you use conda, you can directly install both theano and pygpu. Libgpuarray
will be automatically installed as a dependency.
.. code-block:: bash
conda install theano pygpu
With ``pip``
^^^^^^^^^^^^
If you use pip, you have to install Theano and libgpuarray separately.
theano
::::::
Install the latest stable version of Theano with:
.. raw:: html
......@@ -27,23 +45,18 @@ Install the latest stable version of Theano with:
If you encountered any trouble, head to the :ref:`troubleshooting` page.
libgpuarray
^^^^^^^^^^^
It is recommanded that you don't use 0.8.2 for the new back-end. Use
the dev version of Theano or 0.9rc3.
The latest stable version of Theano is ``0.9.0`` (tagged with ``rel-0.9.0``).
For the stable version of Theano(0.8.2) you need a specific version of libgpuarray,
that has been tagged ``v-9998``.
Download it with:
libgpuarray
:::::::::::
.. raw:: html
For the stable version of Theano you need a specific version of libgpuarray,
that has been tagged ``v0.6.2``.
Download it with::
<div class='highlight'><pre>
git clone https://github.com/Theano/libgpuarray.git --tags
git checkout origin/v-9998
git clone https://github.com/Theano/libgpuarray.git
cd libgpuarray
</pre></div>
git checkout tags/v0.6.2 -b v0.6.2
and then follow the `Step-by-step instructions <http://deeplearning.net/software/libgpuarray/installation.html#step-by-step-install>`__.
......
......@@ -20,6 +20,7 @@ alternative instructions here.
.. _theano-users: http://groups.google.com/group/theano-users?pli=1
.. |PlatformCompiler| replace:: ``clang`` (the system version)
.. |CompilerName| replace:: ``Clang``
.. include:: requirements.inc
......
......@@ -14,6 +14,7 @@ Ubuntu Installation Instructions
.. _gpu_linux:
.. |PlatformCompiler| replace:: ``python-dev``, ``g++`` >= 4.2
.. |CompilerName| replace:: ``g++``
.. include:: requirements.inc
......
差异被折叠。
......@@ -153,7 +153,7 @@ For final releases, send the e-mail to the following mailing lists:
* theano-users
* theano-announce
* numpy-discussion@scipy.org
* scipy-user@scipy.org
* scipy-user@python.org
* G+, Scientific Python: https://plus.google.com/communities/108773711053400791849
For release candidates, only e-mail:
......
......@@ -219,6 +219,7 @@ TODO: Give examples on how to use these things! They are pretty complicated.
It flip the kernel.
.. autofunction:: theano.tensor.nnet.conv2d
.. autofunction:: theano.tensor.nnet.conv2d_transpose
.. autofunction:: theano.tensor.nnet.conv3d
.. autofunction:: theano.sandbox.cuda.fftconv.conv2d_fft
.. autofunction:: theano.tensor.nnet.Conv3D.conv3D
......
......@@ -7,21 +7,28 @@ Requirements
.. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
.. _Python: http://www.python.org/
.. _LaTeX: http://www.latex-project.org/
.. _dvipng: http://savannah.nongnu.org/projects/dvipng/
.. _NVIDIA CUDA drivers and SDK: http://developer.nvidia.com/object/gpucomputing.html
.. _libgpuarray: http://deeplearning.net/software/libgpuarray/installation.html
.. _pycuda: https://mathema.tician.de/software/pycuda/
.. _skcuda: http://scikit-cuda.readthedocs.io/en/latest/
Python_ >= 2.7 or >= 3.3 The development package (python-dev or
Python_ == 2.7 or ( >= 3.3 and <= 3.5 )
The development package (python-dev or
python-devel on most Linux distributions) is recommended (see
just below). Python 2.4 was supported up to and including the
release 0.6. Python 2.6 was supported up to and including the
release 0.8.2. Python 3 is supported past the 3.3 release.
`NumPy <http://numpy.scipy.org/>`_ >= 1.9.1 < 1.11.1
`NumPy <http://numpy.scipy.org/>`_ >= 1.9.1 <= 1.12
Earlier versions could work, but we dont test it.
`SciPy <http://scipy.org>`_ >= 0.14 < 0.17.1
Only currently required for sparse matrix and special functions support, but highly recommended. SciPy >=0.8 could work, but earlier versions have known bugs with sparse matrices.
`BLAS`_ installation (with Level 3 functionality)
* **Recommended**: MKL, which is free through Conda.
* **Recommended**: MKL, which is free through Conda.
* Alternatively, we suggest to install OpenBLAS, with the development headers (``-dev``, ``-devel``, depending on your Linux distribution).
**Optional requirements**
......@@ -42,10 +49,9 @@ Requirements
**Highly recommended** Required for GPU code generation/execution on NVIDIA gpus. See instruction below.
`libgpuarray`_
Required for GPU/CPU code generation on CUDA and OpenCL devices (see: :ref:`gpuarray`.)
Required for GPU/CPU code generation on CUDA and OpenCL devices (see: :ref:`gpuarray`).
`pycuda`_ and `skcuda`_
Required for some extra operations on the GPU like fft and
solvers. We use them to wrap cufft and cusolver. Quick install
``pip install pycuda scikit-cuda``. For cuda 8, the dev
......@@ -63,7 +69,9 @@ Follow this `link <http://conda.pydata.org/miniconda.html>`__ to install Minicon
.. note::
If you want fast compiled code (recommended), make sure you have g++ (Windows/Linux) or Clang (OS X) installed.
If you want fast compiled code (recommended), make sure you have |CompilerName| installed.
.. install_requirements_and_optional_packages
Install requirements and optional packages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -109,9 +117,4 @@ Install and configure the GPU drivers (recommended)
* add a ``cuda.root`` flag to :envvar:`THEANO_FLAGS`, as in ``THEANO_FLAGS='cuda.root=/path/to/cuda/root'``, or
* add a [cuda] section to your .theanorc file containing the option ``root = /path/to/cuda/root``.
.. _LaTeX: http://www.latex-project.org/
.. _dvipng: http://savannah.nongnu.org/projects/dvipng/
.. _NVIDIA CUDA drivers and SDK: http://developer.nvidia.com/object/gpucomputing.html
.. _libgpuarray: http://deeplearning.net/software/libgpuarray/installation.html
.. _pycuda: https://mathema.tician.de/software/pycuda/
.. _skcuda: http://scikit-cuda.readthedocs.io/en/latest/
.. |PlatformCompiler| replace:: ``g++`` (Linux and Windows), ``clang`` (OS X)
.. |CompilerName| replace:: ``g++`` (Windows/Linux) or ``Clang`` (OS X)
.. include:: requirements.inc
......@@ -220,6 +220,36 @@ The ``compute_test_value`` mechanism works as follows:
This feature is currently incompatible with ``Scan`` and also with ops
which do not implement a ``perform`` method.
It is also possible to override variables ``__repr__`` method to have them return tag.test_value.
.. testsetup:: printtestvalue
import theano
import theano.tensor as T
.. testcode:: printtestvalue
x = T.scalar('x')
# Assigning test value
x.tag.test_value = 42
# Enable test value printing
theano.config.print_test_value = True
print(x.__repr__())
# Disable test value printing
theano.config.print_test_value = False
print(x.__repr__())
Running the code above returns the following output:
.. testoutput:: printtestvalue
x
array(42.0)
x
"How do I Print an Intermediate Value in a Function?"
-----------------------------------------------------
......
......@@ -31,14 +31,36 @@ import logging
import sys
def has_handlers(logger):
# copied from Logger.hasHandlers() (introduced in Python 3.2)
_logger = logger
_has_handler = False
while _logger:
if _logger.handlers:
_has_handler = True
break
if not _logger.propagate:
break
else:
_logger = _logger.parent
return _has_handler
theano_logger = logging.getLogger("theano")
logging_default_handler = logging.StreamHandler()
logging_default_formatter = logging.Formatter(
fmt='%(levelname)s (%(name)s): %(message)s')
logging_default_handler.setFormatter(logging_default_formatter)
theano_logger.addHandler(logging_default_handler)
theano_logger.setLevel(logging.WARNING)
if has_handlers(theano_logger) is False:
theano_logger.addHandler(logging_default_handler)
# Disable default log handler added to theano_logger when the module
# is imported.
def disable_log_handler(logger=theano_logger, handler=logging_default_handler):
if has_handlers(logger):
logger.removeHandler(handler)
# Version information.
from theano.version import version as __version__
......
......@@ -2302,6 +2302,7 @@ class GCC_compiler(Compiler):
if status:
tf = tempfile.NamedTemporaryFile(
mode='w',
prefix='theano_compilation_error_',
delete=False
)
......
......@@ -1375,3 +1375,27 @@ def list_of_nodes(inputs, outputs):
lambda o: [inp.owner for inp in o.inputs
if inp.owner and
not any(i in inp.owner.outputs for i in inputs)])
def is_in_ancestors(l_node, f_node):
r"""
Goes up in the graph and returns True if the apply node f_node is found.
Use a stack implementation as the vm algo.
We suppose all nodes are not lazy
(i.e. for IfElse we suppose all inputs are computed)
"""
computed = set()
todo = [l_node]
while todo:
cur = todo.pop()
if cur.outputs[0] in computed:
continue
if all([i in computed or i.owner is None for i in cur.inputs]):
computed.update(cur.outputs)
if cur is f_node:
return True
else:
todo.append(cur)
todo.extend(i.owner for i in cur.inputs if i.owner)
return False
......@@ -2089,13 +2089,7 @@ class TopoOptimizer(NavigatorOptimizer):
if node is not current_node:
q.append(node)
def pruner(node):
if node is not current_node:
try:
q.remove(node)
except ValueError:
pass
u = self.attach_updater(fgraph, importer, pruner,
u = self.attach_updater(fgraph, importer, None,
name=getattr(self, 'name', None))
nb = 0
try:
......@@ -2105,6 +2099,8 @@ class TopoOptimizer(NavigatorOptimizer):
node = q.pop()
else:
node = q.popleft()
if node not in fgraph.apply_nodes:
continue
current_node = node
nb += self.process_node(fgraph, node)
loop_t = time.time() - t0
......@@ -2217,17 +2213,13 @@ class OpKeyOptimizer(NavigatorOptimizer):
if node.op == op:
q.append(node)
def pruner(node):
if node is not current_node and node.op == op:
try:
q.remove(node)
except ValueError:
pass
u = self.attach_updater(fgraph, importer, pruner,
u = self.attach_updater(fgraph, importer, None,
name=getattr(self, 'name', None))
try:
while q:
node = q.pop()
if node not in fgraph.apply_nodes:
continue
current_node = node
self.process_node(fgraph, node)
finally:
......
......@@ -73,7 +73,7 @@ def as_gpuarray_variable(x, context_name):
# If we couldn't deal with transfers, then maybe it's a tensor
if isinstance(x.type, tensor.TensorType):
return gpu_from_host(context_name)(x)
return GpuFromHost(context_name)(x)
# Try _as_GpuArrayVariable if possible
if hasattr(x, '_as_GpuArrayVariable'):
......@@ -617,7 +617,7 @@ class HostFromGpu(Op):
def grad(self, inputs, grads):
gz, = grads
return [gpu_from_host(inputs[0].type.context_name)(gz)]
return [GpuFromHost(inputs[0].type.context_name)(gz)]
def R_op(self, inputs, eval_points):
ev, = eval_points
......@@ -663,8 +663,8 @@ class GpuFromHost(Op):
def grad(self, inputs, grads):
gz, = grads
return [host_from_gpu(as_gpuarray_variable(
gz, context_name=self.context_name))]
return [as_gpuarray_variable(
gz, context_name=self.context_name).transfer('cpu')]
def R_op(self, inputs, eval_points):
ev, = eval_points
......@@ -722,14 +722,6 @@ class GpuFromHost(Op):
return (9,)
# Caching GPUAlloc
def gpu_from_host(ctx):
if ctx not in gpu_alloc.cache:
gpu_from_host.cache[ctx] = GpuFromHost(ctx)
return gpu_from_host.cache[ctx]
gpu_from_host.cache = {}
class GpuToGpu(Op):
"""
Transfer data between GPUs.
......@@ -953,15 +945,6 @@ class GpuAlloc(HideC, Alloc):
return True
# Caching GPUAlloc
def gpu_alloc(ctx, memset_0=False):
key = (ctx, memset_0)
if key not in gpu_alloc.cache:
gpu_alloc.cache[key] = GpuAlloc(ctx, memset_0)
return gpu_alloc.cache[key]
gpu_alloc.cache = {}
class GpuAllocEmpty(HideC, AllocEmpty):
"""
Allocate uninitialized memory on the GPU.
......@@ -1048,14 +1031,6 @@ def empty_like(var):
return GpuAllocEmpty(var.type.dtype, var.type.context_name)(*var.shape)
def gpu_alloc_empty(ctx, dtype):
key = (dtype, ctx)
if key not in gpu_alloc_empty.cache:
gpu_alloc_empty.cache[key] = GpuAllocEmpty(dtype, ctx)
return gpu_alloc_empty.cache[key]
gpu_alloc_empty.cache = {}
class GpuContiguous(Op):
"""
Return a C contiguous version of the input.
......@@ -1132,7 +1107,7 @@ class GpuReshape(HideC, tensor.Reshape):
ctx_name = infer_context_name(x)
x = as_gpuarray_variable(x, context_name=ctx_name)
shp = tensor.as_tensor_variable(shp)
res = host_from_gpu(x).reshape(shp, ndim=self.ndim)
res = x.transfer('cpu').reshape(shp, ndim=self.ndim)
otype = GpuArrayType(dtype=res.dtype,
broadcastable=res.broadcastable,
context_name=ctx_name)
......
差异被折叠。
......@@ -2,13 +2,13 @@ from __future__ import absolute_import, print_function, division
import os
from theano import Apply, Op
from theano.tensor.extra_ops import CumOp
from .basic_ops import infer_context_name
try:
from pygpu import gpuarray
except ImportError:
pass
from .basic_ops import (as_gpuarray_variable, GpuKernelBase, Kernel, GpuReshape)
from .basic_ops import (as_gpuarray_variable, GpuKernelBase, Kernel, GpuReshape, infer_context_name)
from .opt import register_opt, op_lifter, register_opt2
......
......@@ -10,7 +10,7 @@ from theano.scalar import as_scalar, constant
from . import opt
from .basic_ops import (as_gpuarray_variable, GpuAllocEmpty,
infer_context_name, gpu_alloc_empty)
infer_context_name)
from .type import gpu_context_type
from .opt_util import alpha_merge, output_merge
......@@ -158,7 +158,7 @@ def local_gpua_dot_to_gemm16(op, ctx_name, inputs, outputs):
if (A.ndim == 2 and B.ndim == 2 and
A.dtype == 'float16' and B.dtype == 'float16'):
fgraph = getattr(outputs[0], 'fgraph', None)
C = gpu_alloc_empty(ctx_name, dtype='float16')(
C = GpuAllocEmpty('float16', ctx_name)(
shape_i(A, 0, fgraph), shape_i(B, 1, fgraph))
return Gemm16()(C, 1.0, A, B, 0.0)
......
......@@ -44,8 +44,7 @@ from .basic_ops import (as_gpuarray_variable, infer_context_name,
HostFromGpu, GpuFromHost,
GpuSplit, GpuContiguous, gpu_contiguous,
GpuAlloc, GpuAllocEmpty, GpuReshape,
GpuEye, gpu_join, GpuJoin, gpu_alloc_empty,
gpu_alloc, gpu_from_host)
GpuEye, gpu_join, GpuJoin)
from .blas import (gpu_dot22, GpuGemm, GpuGer, GpuGemmBatch,
gpugemm_no_inplace, gpugemm_inplace,
gpugemmbatch_no_inplace,
......@@ -61,9 +60,8 @@ from .blocksparse import (GpuSparseBlockGemv, GpuSparseBlockOuter,
from .nnet import (gpu_crossentropy_softmax_1hot_with_bias_dx,
gpu_crossentropy_softmax_argmax_1hot_with_bias,
gpu_softmax_with_bias, gpu_softmax)
from .elemwise import (GpuElemwise, GpuDimShuffle, GpuCAReduceCuda,
GpuCAReduceCPY, gpu_ca_reduce_cuda, gpu_erfinv, gpu_erfcinv,
GpuCAReduceCPY, gpu_erfinv, gpu_erfcinv,
max_inputs_to_GpuElemwise)
from .subtensor import (GpuIncSubtensor, GpuSubtensor,
GpuAdvancedSubtensor,
......@@ -165,14 +163,14 @@ gpu_optimizer.register('local_remove_all_assert',
def safe_to_gpu(x, ctx_name):
if isinstance(x.type, tensor.TensorType):
return gpu_from_host(ctx_name)(x)
return GpuFromHost(ctx_name)(x)
else:
return x
def safe_to_cpu(x):
if isinstance(x.type, GpuArrayType):
return host_from_gpu(x)
return x.transfer('cpu')
else:
return x
......@@ -236,7 +234,7 @@ def op_lifter(OP, cuda_only=False):
elif isinstance(new_op, (tuple, list)):
return [safe_to_cpu(o) for o in new_op]
else: # suppose it is a variable on the GPU
return [host_from_gpu(new_op)]
return [new_op.transfer('cpu')]
return False
local_opt.__name__ = maker.__name__
return local_optimizer(OP)(local_opt)
......@@ -269,7 +267,7 @@ class InputToGpuOptimizer(Optimizer):
continue
try:
new_input = host_from_gpu(gpu_from_host(target)(input))
new_input = GpuFromHost(target)(input).transfer('cpu')
fgraph.replace_validate(input, new_input,
"InputToGpuOptimizer")
except TypeError:
......@@ -546,7 +544,7 @@ def local_cut_gpu_transfers(node):
# gpub ->
if isinstance(n2.op, GpuToGpu):
return [host_from_gpu(n2.inputs[0])]
return [n2.inputs[0].transfer('cpu')]
# ? -> gpua -> gpub
elif isinstance(node.op, GpuToGpu):
......@@ -600,14 +598,14 @@ def local_gpua_alloc2(node):
i.owner.op in [host_from_gpu, tensor.alloc]
for i in c.inputs[1:])
for c, idx in node.outputs[0].clients)):
return [host_from_gpu(gpu_alloc(None)(*node.inputs))]
return [GpuAlloc(None)(*node.inputs).transfer('cpu')]
@register_opt('fast_compile')
@op_lifter([tensor.Alloc])
@register_opt2([tensor.Alloc], 'fast_compile')
def local_gpua_alloc(op, context_name, inputs, outputs):
return gpu_alloc(context_name)
def local_gpuaalloc(op, context_name, inputs, outputs):
return GpuAlloc(context_name)(*inputs)
@register_opt('fast_compile')
......@@ -616,7 +614,7 @@ def local_gpua_alloc(op, context_name, inputs, outputs):
def local_gpua_alloc_empty(op, context_name, inputs, outputs):
# We use _props_dict() to make sure that the GPU op know all the
# CPU op props.
return gpu_alloc_empty(context_name, **op._props_dict())
return GpuAllocEmpty(context_name=context_name, **op._props_dict())(*inputs)
@register_opt()
......@@ -627,7 +625,7 @@ def local_gpualloc_memset_0(node):
if (isinstance(inp, GpuArrayConstant) and
inp.data.size == 1 and
(np.asarray(inp.data) == 0).all()):
new_op = gpu_alloc(node.op.context_name, memset_0=True)
new_op = GpuAlloc(node.op.context_name, memset_0=True)
return [new_op(*node.inputs)]
......@@ -637,8 +635,8 @@ def local_gpua_alloc_empty_to_zeros(node):
if isinstance(node.op, GpuAllocEmpty):
context_name = infer_context_name(*node.inputs)
z = np.asarray(0, dtype=node.outputs[0].dtype)
return [gpu_alloc(context_name)(as_gpuarray_variable(z, context_name),
*node.inputs)]
return [GpuAlloc(context_name)(as_gpuarray_variable(z, context_name),
*node.inputs)]
optdb.register('local_gpua_alloc_empty_to_zeros',
theano.tensor.opt.in2out(local_gpua_alloc_empty_to_zeros),
# After move to gpu and merge2, before inplace.
......@@ -918,7 +916,7 @@ def local_gpu_pdbbreakpoint_op(node):
new_outputs = []
for i in range(len(new_op_outputs)):
if input_transfered[i]:
new_outputs.append(host_from_gpu(new_op_outputs[i]))
new_outputs.append(new_op_outputs[i].transfer('cpu'))
else:
new_outputs.append(new_op_outputs[i])
......@@ -983,7 +981,7 @@ def local_gpua_subtensor(op, context_name, inputs, outputs):
for n, _ in outputs[0].clients]):
return
else:
return [host_from_gpu(gpu_x.owner.op(outputs[0]))]
return [gpu_x.owner.op(outputs[0]).transfer('cpu')]
return GpuSubtensor(op.idx_list)
......@@ -1234,7 +1232,7 @@ def local_gpua_dot22scalar(op, context_name, inputs, outputs):
x, y, a = inputs
x = as_gpuarray_variable(x, context_name)
y = as_gpuarray_variable(y, context_name)
z = gpu_alloc_empty(context_name, dtype=x.dtype)(x.shape[0], y.shape[1])
z = GpuAllocEmpty(x.dtype, context_name)(x.shape[0], y.shape[1])
return [gpugemm_no_inplace(z, a, x, y, 0)]
......@@ -1804,10 +1802,10 @@ def local_gpu_elemwise_careduce(node):
isinstance(node.inputs[0].owner.op.scalar_op, scalar.basic.Sqr)):
op = node.op
inp = node.inputs[0].owner.inputs[0]
return [gpu_ca_reduce_cuda(scalar_op=op.scalar_op,
axis=op.axis,
reduce_mask=op.reduce_mask,
pre_scalar_op=scalar.basic.sqr)(inp)]
return [GpuCAReduceCuda(scalar_op=op.scalar_op,
axis=op.axis,
reduce_mask=op.reduce_mask,
pre_scalar_op=scalar.basic.sqr)(inp)]
@local_optimizer(None)
......
......@@ -8,7 +8,7 @@ from theano.gof import local_optimizer
from theano.tensor import (DimShuffle, get_scalar_constant_value,
NotScalarConstantError)
from .basic_ops import GpuFromHost, HostFromGpu, GpuAllocEmpty, GpuReshape, gpu_alloc_empty
from .basic_ops import GpuFromHost, HostFromGpu, GpuAllocEmpty, GpuReshape
from .elemwise import GpuDimShuffle, GpuElemwise
_one = scal.constant(np.asarray(1.0, dtype='float32'))
......@@ -324,7 +324,7 @@ def inplace_allocempty(op, idx):
if (alloc.owner and
isinstance(alloc.owner.op, GpuAllocEmpty) and
len(alloc.clients) > 1):
alloc_op = gpu_alloc_empty(alloc.owner.op.context_name, dtype=alloc.owner.op.dtype)
alloc_op = GpuAllocEmpty(alloc.owner.op.dtype, alloc.owner.op.context_name)
inputs[idx] = alloc_op(*alloc.owner.inputs)
return maker(node, inputs)
return opt
......
......@@ -271,7 +271,7 @@ class GpuArrayType(Type):
return data
def filter_variable(self, other, allow_convert=True):
from theano.gpuarray.basic_ops import gpu_from_host
from theano.gpuarray.basic_ops import GpuFromHost
if hasattr(other, '_as_GpuArrayVariable'):
other = other._as_GpuArrayVariable(self.context_name)
......@@ -303,7 +303,7 @@ class GpuArrayType(Type):
str(self.broadcastable)))
other = other2
return gpu_from_host(self.context_name)(other)
return GpuFromHost(self.context_name)(other)
@staticmethod
def values_eq(a, b, force_same_dtype=True):
......
......@@ -1712,6 +1712,9 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
if max_abs_err > abs_tol and max_rel_err > rel_tol:
raise verify_grad.E_grad(max_arg, max_err_pos,
analytic_grad[max_arg].shape,
analytic_grad[max_arg].flatten()[max_err_pos],
num_grad.gf[max_arg].flatten()[max_err_pos],
max_abs_err, max_rel_err,
abs_tol, rel_tol)
......@@ -1727,10 +1730,14 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
class GradientError(Exception):
"""This error is raised when a gradient is calculated, but incorrect."""
def __init__(self, arg, err_pos, abs_err, rel_err, abs_tol, rel_tol):
def __init__(self, arg, err_pos, shape, val1, val2,
abs_err, rel_err, abs_tol, rel_tol):
Exception.__init__(self) # to be compatible with python2.4
self.arg = arg
self.err_pos = err_pos
self.shape = shape
self.val1 = val1
self.val2 = val2
self.abs_err = abs_err
self.rel_err = rel_err
self.abs_tol = abs_tol
......@@ -1741,10 +1748,13 @@ class GradientError(Exception):
args_msg = ", ".join(str(a) for a in self.args)
return """\
GradientError: numeric gradient and analytic gradient exceed tolerance:
At position %i of argument %i,
At position %i of argument %i with shape %s,
val1 = %f , val2 = %f
abs. error = %f, abs. tolerance = %f
rel. error = %f, rel. tolerance = %f
Exception args: %s""" % (self.err_pos, self.arg,
self.shape,
self.val1, self.val2,
self.abs_err, self.abs_tol,
self.rel_err, self.rel_tol,
args_msg)
......
......@@ -26,7 +26,6 @@ from six import iteritems
from six.moves import xrange
from theano.compile import optdb
from theano.tensor import opt
from theano.scan_module.scan_utils import find_up
from theano.scan_module.scan_utils import clone
......@@ -578,7 +577,7 @@ class CondMerge(gof.Optimizer):
merging_node = cond_nodes[0]
for proposal in cond_nodes[1:]:
if (proposal.inputs[0] == merging_node.inputs[0] and
not find_up(proposal, merging_node)):
not gof.graph.is_in_ancestors(proposal, merging_node)):
# Create a list of replacements for proposal
mn_ts = merging_node.inputs[1:][:merging_node.op.n_outs]
mn_fs = merging_node.inputs[1:][merging_node.op.n_outs:]
......@@ -683,8 +682,8 @@ def cond_merge_random_op(main_node):
merging_node = cond_nodes[0]
for proposal in cond_nodes[1:]:
if (proposal.inputs[0] == merging_node.inputs[0] and
not find_up(proposal, merging_node) and
not find_up(merging_node, proposal)):
not gof.graph.is_in_ancestors(proposal, merging_node) and
not gof.graph.is_in_ancestors(merging_node, proposal)):
# Create a list of replacements for proposal
mn_ts = merging_node.inputs[1:][:merging_node.op.n_outs]
mn_fs = merging_node.inputs[1:][merging_node.op.n_outs:]
......
......@@ -9,7 +9,7 @@ import theano
y = theano.tensor.fvector()
x = theano.shared(np.zeros(1, dtype='float32'))
f1 = theano.function([y], updates={x: y})
f2 = theano.function([], theano.sandbox.cuda.host_from_gpu(x))
f2 = theano.function([], x.transfer('cpu'))
print(f1.maker.fgraph.toposort())
print(f2.maker.fgraph.toposort())
for i in [1, 10, 100, 1000, 10000, 100000, 1000000, 10000000]:
......
from __future__ import absolute_import, print_function, division
from .ops import (cholesky, matrix_inverse, solve,
diag, extract_diag, alloc_diag,
det, psd, eig, eigh, eigvalsh,
trace, spectral_radius_bound)
from theano.tensor.slinalg import (cholesky, solve, eigvalsh)
from theano.tensor.nlinalg import (matrix_inverse,
diag, extract_diag, alloc_diag,
det, eig, eigh,
trace)
from theano.sandbox.linalg.ops import psd, spectral_radius_bound
from __future__ import absolute_import, print_function, division
import logging
logger = logging.getLogger(__name__)
import numpy
from six import iteritems, integer_types
from six.moves import xrange
from theano.gof import Op, Apply
from theano.tensor import as_tensor_variable, dot, DimShuffle, Dot
from theano.tensor import DimShuffle, Dot
from theano.tensor.blas import Dot22
from theano import tensor
import theano.tensor
from theano.tensor.opt import (register_stabilize,
register_specialize, register_canonicalize)
register_specialize,
register_canonicalize)
from theano.gof import local_optimizer
from theano.gof.opt import Optimizer
from theano.gradient import DisconnectedType
from theano.tensor.nlinalg import ( MatrixInverse,
matrix_inverse,
MatrixPinv,
pinv,
AllocDiag,
alloc_diag,
ExtractDiag,
extract_diag,
diag,
trace,
Det,
det,
Eig,
eig,
Eigh,
EighGrad,
eigh,
matrix_dot,
_zero_disconnected,
qr,
svd,
lstsq,
matrix_power,
norm
)
from theano.tensor.slinalg import ( Cholesky,
cholesky,
CholeskyGrad,
Solve,
solve,
Eigvalsh,
EigvalshGrad,
eigvalsh
)
try:
import scipy.linalg
imported_scipy = True
except ImportError:
# some ops (e.g. Cholesky, Solve, A_Xinv_b) won't work
imported_scipy = False
from theano.tensor.nlinalg import (MatrixInverse,
matrix_inverse,
extract_diag,
trace,
det)
from theano.tensor.slinalg import (Cholesky,
cholesky,
Solve,
solve,
imported_scipy)
logger = logging.getLogger(__name__)
class Hint(Op):
......@@ -212,8 +180,6 @@ class HintsFeature(object):
class HintsOptimizer(Optimizer):
"""
Optimizer that serves to add HintsFeature as an fgraph feature.
"""
def __init__(self):
......@@ -310,8 +276,8 @@ def tag_solve_triangular(node):
return [Solve('lower_triangular')(A, b)]
else:
return [Solve('upper_triangular')(A, b)]
if (A.owner and isinstance(A.owner.op, DimShuffle)
and A.owner.op.new_order == (1, 0)):
if (A.owner and isinstance(A.owner.op, DimShuffle) and
A.owner.op.new_order == (1, 0)):
A_T, = A.owner.inputs
if A_T.owner and isinstance(A_T.owner.op, type(cholesky)):
if A_T.owner.op.lower:
......@@ -423,6 +389,5 @@ def spectral_radius_bound(X, log2_exponent):
XX = X
for i in xrange(log2_exponent):
XX = tensor.dot(XX, XX)
return tensor.pow(
trace(XX),
2 ** (-log2_exponent))
return tensor.pow(trace(XX),
2 ** (-log2_exponent))
......@@ -163,4 +163,4 @@ def test_matrix_inverse_solve():
b = theano.tensor.dmatrix('b')
node = matrix_inverse(A).dot(b).owner
[out] = inv_as_solve.transform(node)
assert isinstance(out.owner.op, Solve)
assert isinstance(out.owner.op, Solve)
......@@ -29,8 +29,7 @@ from theano.gpuarray.basic_ops import GpuKernelBase, Kernel, infer_context_name,
from theano.gpuarray.type import GpuArrayType
from theano.gpuarray.fp16_help import write_w
from theano.gpuarray.opt import (register_opt as register_gpua,
register_opt2,
host_from_gpu as host_from_gpua)
register_opt2)
if theano.sandbox.cuda.cuda_available:
from theano.sandbox.cuda import (CudaNdarrayType,
float32_shared_constructor)
......@@ -1621,7 +1620,7 @@ def local_gpua_mrg_graph(op, context_name, inputs, outputs):
op.output_type.ndim,
op.output_type.dtype,
inputs[1])
return [outs[0], host_from_gpua(outs[1])]
return [outs[0], outs[1].transfer('cpu')]
@register_gpua('fast_compile')
......
......@@ -70,7 +70,7 @@ from theano.gof.opt import pre_constant_merge, pre_greedy_local_optimizer
from theano.scan_module import scan_op
from theano.scan_module import scan_utils
from theano.scan_module.scan_utils import equal_computations, find_up, scan_args
from theano.scan_module.scan_utils import equal_computations, scan_args
__docformat__ = 'restructedtext en'
__authors__ = ("Razvan Pascanu "
......@@ -1605,7 +1605,7 @@ class ScanSaveMem(gof.Optimizer):
nw_pos = compress_map[idx]
old_new += [(o, new_outs[nw_pos])]
# Check if the new outputs depend on the old scan node
old_scan_is_used = [scan_utils.find_up(new.owner, node)
old_scan_is_used = [gof.graph.is_in_ancestors(new.owner, node)
for old, new in old_new]
if any(old_scan_is_used):
return False
......@@ -1829,19 +1829,21 @@ class ScanMerge(gof.Optimizer):
except tensor.NotScalarConstantError:
pass
if nsteps != rep_nsteps:
return False
# Check to see if it is an input of a different node
for nd in set_nodes:
if find_up(node, nd) or find_up(nd, node):
if gof.graph.is_in_ancestors(node, nd) or gof.graph.is_in_ancestors(nd, node):
return False
if not node.op.as_while:
return nsteps == rep_nsteps
return True
cond = node.op.outputs[-1]
rep_cond = rep.op.outputs[-1]
same_cond = scan_utils.equal_computations([cond], [rep_cond],
node.op.inputs,
rep.op.inputs)
return same_cond and (nsteps == rep_nsteps)
return scan_utils.equal_computations([cond], [rep_cond],
node.op.inputs,
rep.op.inputs)
def apply(self, fgraph):
# Collect all scan nodes ordered according to toposort
......
......@@ -152,7 +152,7 @@ def traverse(out, x, x_copy, d, visited=None):
return d
visited.add(out)
from theano.sandbox import cuda
from theano.gpuarray.basic_ops import gpu_from_host, host_from_gpu
from theano.gpuarray.basic_ops import GpuFromHost, host_from_gpu
from theano.gpuarray import pygpu_activated
from theano.gpuarray.type import GpuArrayType
if out == x:
......@@ -160,7 +160,7 @@ def traverse(out, x, x_copy, d, visited=None):
d[out] = cuda.gpu_from_host(x_copy)
else:
assert isinstance(x.type, GpuArrayType)
d[out] = gpu_from_host(x.type.context_name)(x_copy)
d[out] = GpuFromHost(x.type.context_name)(x_copy)
return d
elif out.owner is None:
return d
......@@ -876,10 +876,13 @@ class Validator(object):
if out.owner is None:
if isinstance(out, tensor.TensorConstant):
if hasattr(out, 'fgraph'):
if hasattr(out, 'fgraph') or getattr(out, 'cached', False):
# If out have an fgraph, we aren't sure if it
# is from the inner graph or outer graph, so
# clone it.
# As it will be used as is in an FunctionGraph
# (won't be cloned later), it can't be a
# cached variable
cloned_out = out.clone()
self.valid.add(cloned_out)
self.invalid.add(out)
......@@ -1113,20 +1116,6 @@ def compress_outs(op, not_required, inputs):
return (op_inputs, op_outputs, info, node_inputs, map_old_new)
def find_up(l_node, f_node):
r"""
Goes up in the graph and returns True if a node in nodes is found.
"""
if isinstance(l_node, gof.Apply):
l_outs = l_node.outputs
else:
l_outs = l_node
l_ins = gof.graph.inputs(l_outs)
nodes = gof.graph.io_toposort(l_ins, l_outs)
return f_node in nodes
def reconstruct_graph(inputs, outputs, tag=None):
"""
Different interface to clone, that allows you to pass inputs.
......
......@@ -332,7 +332,7 @@ def make_gpu_optimizer(op, to_gpu):
new_inp[idx] = cuda.gpu_from_host(new_inp[idx])
result_node = op()(*new_inp)
copy_stack_trace(node.outputs[0], result_node)
transfer_node = cuda.host_from_gpu(result_node)
transfer_node = result_node.transfer('cpu')
copy_stack_trace(node.outputs[0], transfer_node)
return [transfer_node]
if node.op == cuda.gpu_from_host:
......
......@@ -8,7 +8,7 @@ __docformat__ = 'restructedtext en'
from collections import OrderedDict
import numpy
import numpy as np
import theano
import theano.tensor as T
......@@ -17,12 +17,12 @@ import theano.tensor as T
def gen_data():
# generate the dataset
train_set = (numpy.asarray(numpy.random.rand(10000, 784), dtype='float32'),
numpy.asarray(numpy.random.rand(10000)*10, dtype='int64'))
valid_set = (numpy.asarray(numpy.random.rand(10000, 784), dtype='float32'),
numpy.asarray(numpy.random.rand(10000)*10, dtype='int64'))
test_set = (numpy.asarray(numpy.random.rand(10000, 784), dtype='float32'),
numpy.asarray(numpy.random.rand(10000)*10, dtype='int64'))
train_set = (np.asarray(np.random.rand(10000, 784), dtype='float32'),
np.asarray(np.random.rand(10000)*10, dtype='int64'))
valid_set = (np.asarray(np.random.rand(10000, 784), dtype='float32'),
np.asarray(np.random.rand(10000)*10, dtype='int64'))
test_set = (np.asarray(np.random.rand(10000, 784), dtype='float32'),
np.asarray(np.random.rand(10000)*10, dtype='int64'))
def shared_dataset(data_xy):
""" Function that loads the dataset into shared variables
......@@ -33,8 +33,8 @@ def gen_data():
variable) would lead to a large decrease in performance.
"""
data_x, data_y = data_xy
shared_x = theano.shared(numpy.asarray(data_x, dtype=theano.config.floatX))
shared_y = theano.shared(numpy.asarray(data_y, dtype=theano.config.floatX))
shared_x = theano.shared(np.asarray(data_x, dtype=theano.config.floatX))
shared_y = theano.shared(np.asarray(data_y, dtype=theano.config.floatX))
# When storing data on the GPU it has to be stored as floats
# therefore we will store the labels as ``floatX`` as well
# (``shared_y`` does exactly that). But during our computations
......@@ -79,7 +79,7 @@ class LogisticRegression(object):
"""
# initialize with 0 the weights W as a matrix of shape (n_in, n_out)
self.W = theano.shared(value=numpy.zeros((n_in, n_out), dtype=theano.config.floatX),
self.W = theano.shared(value=np.zeros((n_in, n_out), dtype=theano.config.floatX),
name=name_prefix+'W')
# compute vector of class-membership probabilities in symbolic form
......@@ -129,7 +129,7 @@ class HiddenLayer(object):
Hidden unit activation is given by: tanh(dot(input,W) + b)
:type rng: numpy.random.RandomState
:type rng: np.random.RandomState
:param rng: a random number generator used to initialize weights
:type input: theano.tensor.dmatrix
......@@ -151,9 +151,9 @@ class HiddenLayer(object):
# from -6./sqrt(n_in+n_hidden) and 6./sqrt(n_in+n_hidden)
# the output of uniform if converted using asarray to dtype
# theano.config.floatX so that the code is runable on GPU
W_values = numpy.asarray( rng.uniform( \
low=-numpy.sqrt(6./(n_in+n_out)), \
high=numpy.sqrt(6./(n_in+n_out)), \
W_values = np.asarray( rng.uniform( \
low=-np.sqrt(6./(n_in+n_out)), \
high=np.sqrt(6./(n_in+n_out)), \
size=(n_in, n_out)), dtype=theano.config.floatX)
self.W = theano.shared(value=W_values, name=name_prefix+'W')
......@@ -176,7 +176,7 @@ class MLP(object):
def __init__(self, rng, input, n_in, n_hidden, n_out):
"""Initialize the parameters for the multilayer perceptron
:type rng: numpy.random.RandomState
:type rng: np.random.RandomState
:param rng: a random number generator used to initialize weights
:type input: theano.tensor.TensorType
......@@ -265,7 +265,7 @@ def test_mlp():
y = T.ivector('y') # the labels are presented as 1D vector of
# [int] labels
rng = numpy.random.RandomState(1234)
rng = np.random.RandomState(1234)
# construct the MLP class
classifier = MLP( rng=rng, input=x, n_in=28*28, n_hidden=500, n_out=10)
......
This source diff could not be displayed because it is too large. You can view the blob instead.
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论