提交 54e96754 authored 作者: Pascal Lamblin's avatar Pascal Lamblin

Merge pull request #3559 from abergeron/multi_gpu_doc

Multi gpu doc
...@@ -132,7 +132,7 @@ Roughly in order of what you'll want to check out: ...@@ -132,7 +132,7 @@ Roughly in order of what you'll want to check out:
* :ref:`extending` -- Learn to add a Type, Op, or graph optimization. * :ref:`extending` -- Learn to add a Type, Op, or graph optimization.
* :ref:`dev_start_guide` -- How to contribute code to Theano. * :ref:`dev_start_guide` -- How to contribute code to Theano.
* :ref:`developer` -- Primarily of interest to developers of Theano * :ref:`developer` -- Primarily of interest to developers of Theano
* :ref:`internal` -- How to maintain Theano, LISA-specific tips, and more... * :ref:`internal` -- How to maintain Theano and more...
* :ref:`release` -- How our release should work. * :ref:`release` -- How our release should work.
* :ref:`acknowledgement` -- What we took from other projects. * :ref:`acknowledgement` -- What we took from other projects.
* `Related Projects`_ -- link to other projects that implement new functionalities on top of Theano * `Related Projects`_ -- link to other projects that implement new functionalities on top of Theano
......
...@@ -5,16 +5,11 @@ ...@@ -5,16 +5,11 @@
Internal Documentation Internal Documentation
====================== ======================
If you're feeling ambitious, go fix some `pylint
<http://lgcm.iro.umontreal.ca/auto_theano_pylint/pylint_global.html>` errors!
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
release release
dev_start_guide dev_start_guide
lisa_labo
mammouth
metadocumentation metadocumentation
python python
how_to_release how_to_release
.. _lisa_labo:
===============================
LISA Labo specific instructions
===============================
Tips for running at LISA
------------------------
Shell configuration files ``/opt/lisa/os/.local.{bash,csh}rc`` should define
:envvar:`THEANORC` to include ``/opt/lisa/os/.local.theanorc`` as a
configuration file.
``/opt/lisa/os/.local.theanorc`` should include the right default values for
the lab, in particular, ``blas.ldflags`` should contain '-lgoto'.
Tips for running on a cluster
-----------------------------
:ref:`mammouth`
For instructions on running Theano on the mammouth cluster.
.. _mammouth:
===========================
Running Theano on Mammouth
===========================
To run Theano on the Mammouth cluster, follow these simple steps:
* Make sure to source Fred's .local.bashrc file. It contains all
the goodies for using the latest and greatest (optimized) libraries
(numpy, scipy, etc.)
.. code-block:: sh
source /home/bastienf/.local.bashrc
Perhaps even put this in your ``.bashrc``
* set ``config.blas.ldflags`` to ``'-lmkl -lguide -fopenmp'``
(see :mod:`config` to know how)
Note: the -lguide flag works, however the fix should probably be considered temporary.
Intel has deprecated libguide.so in favor of the newer library libiomp5.so. However,
both libraries are mutually exclusive and one component (theano, numpy or scipy?) already
seems to be using libguide.so (hence -liomp5 causes a linking error when compiling thunks)
...@@ -110,9 +110,6 @@ pylint output is not autogenerated anymore. ...@@ -110,9 +110,6 @@ pylint output is not autogenerated anymore.
Pylint documentation is generated using pylintrc file: ``Theano/doc/pylintrc`` Pylint documentation is generated using pylintrc file: ``Theano/doc/pylintrc``
You can see a list of all `pylint messages
<http://www.logilab.org/card/pylintfeatures>`__.
.. _metadocumentation_nightly_build: .. _metadocumentation_nightly_build:
......
.. _libdoc_gpuarray_dnn:
===========================================
:mod:`theano.sandbox.gpuarray.dnn` -- cuDNN
===========================================
.. moduleauthor:: LISA
`cuDNN <https://developer.nvidia.com/cuDNN>`_ is an NVIDIA library
with functionality used by deep neural networks. It provides optimized
versions of some operations like the convolution. cuDNN is not
currently installed with CUDA. You must download and install it
yourself.
To install it, decompress the downloaded file and make the ``*.h`` and
``*.so*`` files available to the compilation environment.
There are at least three possible ways of doing so:
- The easiest is to include them in your CUDA installation. Copy the
``*.h`` files to ``CUDA_ROOT/include`` and the ``*.so*`` files to
``CUDA_ROOT/lib64`` (by default, ``CUDA_ROOT`` is ``/usr/local/cuda``
on Linux).
- Alternatively, on Linux, you can set the environment variables
``LD_LIBRARY_PATH``, ``LIBRARY_PATH`` and ``CPATH`` to the directory
extracted from the download. If needed, separate multiple directories
with ``:`` as in the ``PATH`` environment variable.
example::
export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
- And as a third way, also on Linux, you can copy the ``*.h`` files
to ``/usr/include`` and the ``*.so*`` files to ``/lib64``.
By default, Theano will detect if it can use cuDNN. If so, it will use
it. If not, Theano optimizations will not introduce cuDNN ops. So
Theano will still work if the user did not introduce them manually.
To get an error if Theano can not use cuDNN, use this Theano flag:
``optimizer_including=cudnn``.
.. note::
CuDNN v3 has now been released. CuDNN v2 remains supported but CuDNN v3 is
faster and offers many more options. We recommend that everybody update to
v3.
.. note::
Starting in CuDNN v3, multiple convolution implementations are offered and
it is possible to use heuristics to automatically choose a convolution
implementation well suited to the parameters of the convolution.
The Theano flag ``dnn.conv.algo_fwd`` allows to specify the CuDNN
convolution implementation that Theano should use for forward convolutions.
Possible values include :
* ``small`` (default) : use a convolution implementation with small memory
usage
* ``none`` : use a slower implementation with minimal memory usage
* ``large`` : use a sometimes faster implementation with large memory usage
* ``fft`` : use the Fast Fourrier Transform implementation of convolution
(very high memory usage)
* ``guess_once`` : the first time a convolution is executed, the
implementation to use is chosen according to CuDNN's heuristics and reused
for every subsequent execution of the convolution.
* ``guess_on_shape_change`` : like ``guess_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
* ``time_once`` : the first time a convolution is executed, every convolution
implementation offered by CuDNN is executed and timed. The fastest is
reused for every subsequent execution of the convolution.
* ``time_on_shape_change`` : like ``time_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
The Theano flag ``dnn.conv.algo_bwd`` allows to specify the CuDNN
convolution implementation that Theano should use for gradient convolutions.
Possible values include :
* ``none`` (default) : use the default non-deterministic convolution
implementation
* ``deterministic`` : use a slower but deterministic implementation
* ``fft`` : use the Fast Fourrier Transform implementation of convolution
(very high memory usage)
* ``guess_once`` : the first time a convolution is executed, the
implementation to use is chosen according to CuDNN's heuristics and reused
for every subsequent execution of the convolution.
* ``guess_on_shape_change`` : like ``guess_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
* ``time_once`` : the first time a convolution is executed, every convolution
implementation offered by CuDNN is executed and timed. The fastest is
reused for every subsequent execution of the convolution.
* ``time_on_shape_change`` : like ``time_once`` but a new convolution
implementation selected every time the shapes of the inputs and kernels
don't match the shapes from the last execution.
``guess_*`` and ``time_*`` flag values take into account the amount of
available memory when selecting an implementation. This means that slower
implementations might be selected if not enough memory is available for the
faster implementations.
.. note::
Normally you should not call GPU Ops directly, but the CPU interface
currently does not allow all options supported by cuDNN ops. So it is
possible that you will need to call them manually.
.. note::
The documentation of CUDNN tells that, for the 2 following operations, the
reproducibility is not guaranteed with the default implementation:
`cudnnConvolutionBackwardFilter` and `cudnnConvolutionBackwardData`.
Those correspond to the gradient wrt the weights and the gradient wrt the
input of the convolution. They are also used sometimes in the forward
pass, when they give a speed up.
The Theano flag ``dnn.conv.algo_bwd`` can be use to force the use of a
slower but deterministic convolution implementation.
.. note::
There is a problem we do not understand yet when cudnn paths are
used with symbolic links. So avoid using that.
.. note::
cudnn.so* must be readable and executable by everybody.
cudnn.h must be readable by everybody.
Functions
=========
.. automodule:: theano.sandbox.gpuarray.dnn
:noindex:
:members: dnn_conv, dnn_pool
Convolution Ops
===============
.. automodule:: theano.sandbox.gpuarray.dnn
:noindex:
:members: GpuDnnConvDesc, GpuDnnConv, GpuDnnConvGradW, GpuDnnConvGradI
Pooling Ops
===========
.. automodule:: theano.sandbox.gpuarray.dnn
:noindex:
:members: GpuDnnPoolDesc, GpuDnnPool, GpuDnnPoolGrad
Softmax Ops
===========
.. automodule:: theano.sandbox.gpuarray.dnn
:noindex:
:members: GpuDnnSoftmax, GpuDnnSoftmaxGrad
.. _libdoc_gpuarray_extra:
=================
Utility functions
=================
Optimisation
------------
.. automodule:: theano.sandbox.gpuarray.opt_util
:members:
Kernel generation
-----------------
.. automodule:: theano.sandbox.gpuarray.kernel_codegen
:members:
.. _libdoc_gpuarray:
=======================================================
:mod:`theano.sandbox.gpuarray` -- The (new) GPU backend
=======================================================
.. module:: theano.sandbox.gpuarray
:platform: Unix, Windows
:synopsis: Code for GPU programming (new)
.. moduleauthor:: MILA
.. toctree::
:maxdepth: 1
op
dnn
type
extra
.. _libdoc_gpuarray_op:
================================
List of gpuarray Ops implemented
================================
.. moduleauthor:: LISA
Normally you should not call directly those Ops! Theano should
automatically transform cpu ops to their gpu equivalent. So this list
is just useful to let people know what is implemented on the gpu.
Basic Op
========
.. automodule:: theano.sandbox.gpuarray.basic_ops
:members:
Blas Op
=======
.. automodule:: theano.sandbox.gpuarray.blas
:members:
.. automodule:: theano.sandbox.gpuarray.nerv
:members:
Elemwise Op
===========
.. automodule:: theano.sandbox.gpuarray.elemwise
:members:
Subtensor Op
============
.. automodule:: theano.sandbox.gpuarray.subtensor
:members:
Nnet Op
=======
.. automodule:: theano.sandbox.gpuarray.nnet
:members:
.. automodule:: theano.sandbox.gpuarray.neighbours
:members:
.. _libdoc_gpuarray_type:
===================================================
:mod:`theano.sandbox.gpuarray.type` -- Type classes
===================================================
.. automodule:: theano.sandbox.gpuarray.type
:members:
...@@ -14,6 +14,8 @@ ...@@ -14,6 +14,8 @@
:maxdepth: 1 :maxdepth: 1
cuda/index cuda/index
gpuarray/index
linalg linalg
neighbours neighbours
rng_mrg rng_mrg
blocksparse
...@@ -37,6 +37,7 @@ you out. ...@@ -37,6 +37,7 @@ you out.
loop loop
sparse sparse
using_gpu using_gpu
using_multi_gpu
gpu_data_convert gpu_data_convert
aliasing aliasing
shape_info shape_info
......
.. _tut_using_multi_gpu:
===================
Using multiple GPUs
===================
Theano has a feature to allow the use of multiple GPUs at the same
time in one function. The multiple gpu feature requires the use of
the :ref:`gpuarray` backend, so make sure that works correctly.
In order to keep a reasonably high level of abstraction you do not
refer to device names directly for multiple-gpu use. You instead
refer to what we call context names. These are then mapped to a
device using the theano configuration. This allows portability of
models between machines.
.. warning::
The code is rather new and is still considered experimental at this
point. It has been tested and seems to perform correctly in all
cases observed, but make sure to double-check your results before
publishing a paper or anything of the sort.
Defining the context map
------------------------
The mapping from context names to devices is done through the
:attr:`config.contexts` option. The format looks like this::
dev0->cuda0;dev1->cuda1
Let's break it down. First there is a list of mappings. Each of
these mappings is separeted by a semicolon ';'. There can be any
number of such mappings, but in the example above we have two of them:
`dev0->cuda0` and `dev1->cuda1`.
The mappings themselves are composed of a context name followed by the
two characters '->' and the device name. The context name is a simple
string which does not have any special meaning for Theano. For
parsing reasons, the context name cannot contain the sequence '->' or
';'. To avoid confusion context names that begin with 'cuda' or
'opencl' are disallowed. The device name is a device in the form that
gpuarray expects like 'cuda0' or 'opencl0:0'.
.. note::
Since there are a bunch of shell special characters in the syntax,
defining this on the command-line will require proper quoting, like this:
.. code-block:: shell
$ THEANO_FLAGS="contexts=dev0->cuda0"
When you define a context map, if :attr:`config.print_active_device`
is `True` (the default), Theano will print the mappings as they are
defined. This will look like this:
.. code-block:: bash
$ THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1" python -c 'import theano'
Mapped name dev0 to device cuda0: GeForce GTX TITAN X
Mapped name dev1 to device cuda1: GeForce GTX TITAN X
If you don't have enough GPUs for a certain model, you can assign the
same device to more than one name. You can also assign extra names
that a model doesn't need to some other devices. However, a
proliferation of names is not always a good idea since theano often
assumes that different context names will be on different devices and
will optimize accordingly. So you may get faster performance for a
single name and a single device.
.. note::
It is often the case that multi-gpu operation requires or assumes
that all the GPUs involved are equivalent. This is not the case
for this implementation. Since the user has the task of
distrubuting the jobs across the different device a model can be
built on the assumption that one of the GPU is slower or has
smaller memory.
A simple graph on two GPUs
--------------------------
The following simple program works on two GPUs. It builds a function
which perform two dot products on two different GPUs.
.. code-block:: python
import numpy
import theano
v01 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev0')
v02 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev0')
v11 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev1')
v12 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev1')
f = theano.function([], [theano.tensor.dot(v01, v02),
theano.tensor.dot(v11, v12)])
f()
This model requires a context map with assignations for 'dev0' and
'dev1'. It should run twice as fast when the devices are different.
Explicit transfers of data
--------------------------
Since operations themselves cannot work on more than one device, they
will pick a device to work on based on their inputs and automatically
insert transfers for any input which is not on the right device.
However you may want some explicit control over where and how these
transfers are done at some points. This is done by using the new
:meth:`transfer` method that is present on variables. It works for
moving data between GPUs and also between the host and the GPUs. Here
is a example.
.. code-block:: python
import theano
v = theano.tensor.fmatrix()
# Move to the device associated with 'gpudev'
gv = v.transfer('gpudev')
# Move back to the cpu
cv = gv.transfer('cpu')
Of course you can mix transfers and operations in any order you
choose. However you should try to minimize transfer operations
because they will introduce overhead any may reduce performance.
...@@ -12,7 +12,6 @@ import numpy ...@@ -12,7 +12,6 @@ import numpy
import theano import theano
from theano.sandbox.gpuarray import init_dev from theano.sandbox.gpuarray import init_dev
from theano.sandbox.gpuarray.type import gpuarray_shared_constructor as shared
from theano.sandbox.gpuarray.blas import gpu_dot22 from theano.sandbox.gpuarray.blas import gpu_dot22
...@@ -22,13 +21,13 @@ def main(dev1, dev2): ...@@ -22,13 +21,13 @@ def main(dev1, dev2):
size = 1024 * 16 size = 1024 * 16
data = numpy.random.randn(size, size).astype('float32') data = numpy.random.randn(size, size).astype('float32')
val1a = shared(data, target='ctx1') val1a = theano.shared(data, target='ctx1')
val1b = shared(data, target='ctx1') val1b = theano.shared(data, target='ctx1')
val1c = shared(data, target='ctx1') val1c = theano.shared(data, target='ctx1')
val1d = shared(data, target='ctx1') val1d = theano.shared(data, target='ctx1')
val2a = shared(data, target='ctx2') val2a = theano.shared(data, target='ctx2')
val2b = shared(data, target='ctx2') val2b = theano.shared(data, target='ctx2')
f1 = theano.function([], [gpu_dot22(val1a, val1b), f1 = theano.function([], [gpu_dot22(val1a, val1b),
gpu_dot22(val1c, val1d)]) gpu_dot22(val1c, val1d)])
......
...@@ -27,6 +27,20 @@ from .fp16_help import write_w ...@@ -27,6 +27,20 @@ from .fp16_help import write_w
def as_gpuarray_variable(x, context_name): def as_gpuarray_variable(x, context_name):
"""
This will attempt to convert `x` into a variable on the GPU.
It can take either a value of another variable. If `x` is already
suitable, it will be returned as-is.
Parameters
----------
x
Object to convert
context_name : str or None
target context name for the result
"""
# If this is already some form of variable, try to avoid an extra transfer # If this is already some form of variable, try to avoid an extra transfer
if isinstance(x, Variable): if isinstance(x, Variable):
while True: while True:
...@@ -174,6 +188,13 @@ class Kernel(object): ...@@ -174,6 +188,13 @@ class Kernel(object):
class GpuKernelBase(object): class GpuKernelBase(object):
"""
Base class for operations that need to compile kernels.
It is not mandatory to use this class, but it helps with a lot of
the small things that you have to pay attention to.
"""
params_type = gpu_context_type params_type = gpu_context_type
def gpu_kernels(self, node, name): def gpu_kernels(self, node, name):
...@@ -274,10 +295,25 @@ class GpuKernelBase(object): ...@@ -274,10 +295,25 @@ class GpuKernelBase(object):
return (self.c_code_cache_version(), self.kernel_version(node)) return (self.c_code_cache_version(), self.kernel_version(node))
def kernel_version(self, node): def kernel_version(self, node):
"""
If you override :meth:`c_code_cache_version_apply`, call this
method to have the version of the kernel support code and
device.
Parameters
----------
node : apply node
The node that we need the cache version for.
"""
return (3, self.get_params(node).bin_id) return (3, self.get_params(node).bin_id)
class HostFromGpu(Op): class HostFromGpu(Op):
"""
Transfer data to CPU.
"""
__props__ = () __props__ = ()
_f16_ok = True _f16_ok = True
...@@ -356,6 +392,10 @@ host_from_gpu = HostFromGpu() ...@@ -356,6 +392,10 @@ host_from_gpu = HostFromGpu()
class GpuFromHost(Op): class GpuFromHost(Op):
"""
Transfer data to GPU.
"""
__props__ = ('context_name',) __props__ = ('context_name',)
_f16_ok = True _f16_ok = True
params_type = gpu_context_type params_type = gpu_context_type
...@@ -443,6 +483,10 @@ class GpuFromHost(Op): ...@@ -443,6 +483,10 @@ class GpuFromHost(Op):
class GpuToGpu(Op): class GpuToGpu(Op):
"""
Transfer data between GPUs.
"""
__props__ = ('context_name',) __props__ = ('context_name',)
_f16_ok = True _f16_ok = True
params_type = gpu_context_type params_type = gpu_context_type
...@@ -494,6 +538,7 @@ class GpuToGpu(Op): ...@@ -494,6 +538,7 @@ class GpuToGpu(Op):
class GpuAlloc(HideC, Alloc): class GpuAlloc(HideC, Alloc):
""" """
Allocate initialized memory on the GPU.
Parameters Parameters
---------- ----------
...@@ -654,6 +699,10 @@ class GpuAlloc(HideC, Alloc): ...@@ -654,6 +699,10 @@ class GpuAlloc(HideC, Alloc):
class GpuAllocEmpty(HideC, Alloc): class GpuAllocEmpty(HideC, Alloc):
"""
Allocate uninitialized memory on the GPU.
"""
__props__ = ('dtype', 'context_name') __props__ = ('dtype', 'context_name')
_f16_ok = True _f16_ok = True
params_type = gpu_context_type params_type = gpu_context_type
...@@ -732,8 +781,10 @@ def empty_like(var): ...@@ -732,8 +781,10 @@ def empty_like(var):
class GpuContiguous(Op): class GpuContiguous(Op):
""" """
Always return a c contiguous output. Copy the input only if it is Return a C contiguous version of the input.
not already c contiguous.
This may either pass the object as-is (if already C contiguous) or
make a copy.
""" """
__props__ = () __props__ = ()
...@@ -793,7 +844,7 @@ gpu_contiguous = GpuContiguous() ...@@ -793,7 +844,7 @@ gpu_contiguous = GpuContiguous()
class GpuReshape(HideC, tensor.Reshape): class GpuReshape(HideC, tensor.Reshape):
""" """
Implement Reshape on the gpu. Reshape for GPU variables.
""" """
...@@ -914,6 +965,10 @@ class GpuReshape(HideC, tensor.Reshape): ...@@ -914,6 +965,10 @@ class GpuReshape(HideC, tensor.Reshape):
class GpuJoin(HideC, Join): class GpuJoin(HideC, Join):
"""
Join for GPU.
"""
_f16_ok = True _f16_ok = True
params_type = gpu_context_type params_type = gpu_context_type
...@@ -991,6 +1046,10 @@ gpu_join = GpuJoin() ...@@ -991,6 +1046,10 @@ gpu_join = GpuJoin()
class GpuSplit(HideC, Split): class GpuSplit(HideC, Split):
"""
Split for GPU.
"""
def make_node(self, x, axis, splits): def make_node(self, x, axis, splits):
node = Split.make_node(self, x, axis, splits) node = Split.make_node(self, x, axis, splits)
x = as_gpuarray_variable(x, infer_context_name(x)) x = as_gpuarray_variable(x, infer_context_name(x))
...@@ -1002,6 +1061,10 @@ class GpuSplit(HideC, Split): ...@@ -1002,6 +1061,10 @@ class GpuSplit(HideC, Split):
class GpuEye(GpuKernelBase, Op): class GpuEye(GpuKernelBase, Op):
"""
Eye for GPU.
"""
__props__ = ('dtype', 'context_name') __props__ = ('dtype', 'context_name')
_f16_ok = True _f16_ok = True
......
...@@ -31,6 +31,10 @@ class BlasOp(Op): ...@@ -31,6 +31,10 @@ class BlasOp(Op):
class GpuGemv(BlasOp): class GpuGemv(BlasOp):
"""
Gemv on the GPU.
"""
__props__ = ('inplace',) __props__ = ('inplace',)
def __init__(self, inplace=False): def __init__(self, inplace=False):
...@@ -107,6 +111,10 @@ gpugemv_inplace = GpuGemv(inplace=True) ...@@ -107,6 +111,10 @@ gpugemv_inplace = GpuGemv(inplace=True)
class GpuGemm(BlasOp): class GpuGemm(BlasOp):
"""
Gemm on the GPU.
"""
__props__ = ('inplace',) __props__ = ('inplace',)
_f16_ok = True _f16_ok = True
...@@ -184,6 +192,10 @@ gpugemm_inplace = GpuGemm(inplace=True) ...@@ -184,6 +192,10 @@ gpugemm_inplace = GpuGemm(inplace=True)
class GpuGer(BlasOp): class GpuGer(BlasOp):
"""
Ger on the GPU.
"""
__props__ = ('inplace',) __props__ = ('inplace',)
def __init__(self, inplace=False): def __init__(self, inplace=False):
...@@ -256,6 +268,10 @@ gpuger_inplace = GpuGer(inplace=True) ...@@ -256,6 +268,10 @@ gpuger_inplace = GpuGer(inplace=True)
class GpuDot22(BlasOp): class GpuDot22(BlasOp):
"""
Dot22 on the GPU.
"""
__props__ = () __props__ = ()
def make_node(self, x, y): def make_node(self, x, y):
......
...@@ -57,6 +57,10 @@ def as_C_string_const(s): ...@@ -57,6 +57,10 @@ def as_C_string_const(s):
class GpuElemwise(GpuKernelBase, HideC, Elemwise): class GpuElemwise(GpuKernelBase, HideC, Elemwise):
"""
Elemwise on the GPU.
"""
nin = property(lambda self: self.scalar_op.nin) nin = property(lambda self: self.scalar_op.nin)
nout = property(lambda self: self.scalar_op.nout) nout = property(lambda self: self.scalar_op.nout)
_f16_ok = True _f16_ok = True
...@@ -445,6 +449,10 @@ class SupportCodeError(Exception): ...@@ -445,6 +449,10 @@ class SupportCodeError(Exception):
class GpuDimShuffle(HideC, DimShuffle): class GpuDimShuffle(HideC, DimShuffle):
"""
DimShuffle on the GPU.
"""
_f16_ok = True _f16_ok = True
def make_node(self, input): def make_node(self, input):
...@@ -548,7 +556,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype): ...@@ -548,7 +556,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
Parameters Parameters
---------- ----------
reduce-mask reduce_mask
The dimensions along which to reduce. The `reduce_mask` is a tuple of The dimensions along which to reduce. The `reduce_mask` is a tuple of
booleans (actually integers 0 or 1) that specify for each input booleans (actually integers 0 or 1) that specify for each input
dimension, whether to reduce it (1) or not (0). dimension, whether to reduce it (1) or not (0).
...@@ -1279,14 +1287,6 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype): ...@@ -1279,14 +1287,6 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
""" % locals() """ % locals()
def c_code_reduce_ccontig(self, sio, node, name, x, z, fail): def c_code_reduce_ccontig(self, sio, node, name, x, z, fail):
"""
WRITEME
IG: I believe, based on how this is called in c_code, that it
is for the case where we are reducing on all axes and x is
C contiguous.
"""
in_dtype = "npy_" + node.inputs[0].dtype in_dtype = "npy_" + node.inputs[0].dtype
out_dtype = "npy_" + node.outputs[0].dtype out_dtype = "npy_" + node.outputs[0].dtype
if getattr(self.scalar_op, 'identity', None) == 0: if getattr(self.scalar_op, 'identity', None) == 0:
...@@ -2666,8 +2666,6 @@ class GpuCAReduceCPY(GpuKernelBase, HideC, CAReduceDtype): ...@@ -2666,8 +2666,6 @@ class GpuCAReduceCPY(GpuKernelBase, HideC, CAReduceDtype):
""" """
CAReduce that reuse the python code from gpuarray. CAReduce that reuse the python code from gpuarray.
Too slow for now as it only have a python interface.
""" """
def __init__(self, scalar_op, axis=None, dtype=None, acc_dtype=None): def __init__(self, scalar_op, axis=None, dtype=None, acc_dtype=None):
if not hasattr(scalar_op, 'identity'): if not hasattr(scalar_op, 'identity'):
......
...@@ -71,17 +71,19 @@ def inline_reduce(N, buf, pos, count, manner_fn): ...@@ -71,17 +71,19 @@ def inline_reduce(N, buf, pos, count, manner_fn):
count count
Number of executing threads. Number of executing threads.
manner_fn manner_fn
A function that accepts strings of arguments a and b, and returns c code A function that accepts strings of arguments a and b, and
for their reduction. returns c code for their reduction.
Example: return "%(a)s + %(b)s" for a sum reduction.
:postcondition: return "%(a)s + %(b)s"
This function leaves the answer in position 0 of the buffer. The
rest of the buffer is trashed by this function. for a sum reduction.
Notes Notes
----- -----
buf should be in gpu shared memory, we access it many times. `buf` should be in gpu shared memory, we access it many times.
This function leaves the answer in position 0 of the buffer. The
rest of the buffer is trashed by this function.
""" """
loop_line = manner_fn("%s[%s]" % (buf, pos), "%s[i]" % (buf)) loop_line = manner_fn("%s[%s]" % (buf, pos), "%s[i]" % (buf))
...@@ -149,6 +151,13 @@ def inline_reduce_prod(N, buf, pos, count): ...@@ -149,6 +151,13 @@ def inline_reduce_prod(N, buf, pos, count):
inline_reduce_sum.code_version) inline_reduce_sum.code_version)
def inline_softmax(N, buf, buf2, threadPos, threadCount, dtype="float32"): def inline_softmax(N, buf, buf2, threadPos, threadCount, dtype="float32"):
""" """
Generate code for a softmax.
On entry, `buf` and `buf2` must contain two identical copies of
the input to softmax.
After the code returns `buf` contains the softmax, `buf2` contains
un-normalized softmax.
Parameters Parameters
---------- ----------
...@@ -161,14 +170,10 @@ def inline_softmax(N, buf, buf2, threadPos, threadCount, dtype="float32"): ...@@ -161,14 +170,10 @@ def inline_softmax(N, buf, buf2, threadPos, threadCount, dtype="float32"):
dtype dtype
Dtype of the softmax's output. Dtype of the softmax's output.
:Precondition: buf and buf2 contain two identical copies of the input
to softmax
:Postcondition: buf contains the softmax, buf2 contains un-normalized
softmax
Notes Notes
----- -----
buf and buf2 should be in gpu shared memory, we access it many times. `buf` and `buf2` should be in gpu shared memory, we access it many
times.
We use __i as an int variable in a loop. We use __i as an int variable in a loop.
...@@ -205,6 +210,9 @@ def inline_reduce_fixed_shared(N, buf, x, stride_x, load_x, pos, count, ...@@ -205,6 +210,9 @@ def inline_reduce_fixed_shared(N, buf, x, stride_x, load_x, pos, count,
""" """
Return C++ code for a function that reduces a contiguous buffer. Return C++ code for a function that reduces a contiguous buffer.
This function leaves the answer in position 0 of the buffer. The
rest of the buffer is trashed by this function.
Parameters Parameters
---------- ----------
N N
...@@ -230,20 +238,19 @@ def inline_reduce_fixed_shared(N, buf, x, stride_x, load_x, pos, count, ...@@ -230,20 +238,19 @@ def inline_reduce_fixed_shared(N, buf, x, stride_x, load_x, pos, count,
dtype dtype
Optional, the dtype of the output. Optional, the dtype of the output.
manner_fn manner_fn
A function that accepts strings of arguments a and b, and returns c code A function that accepts strings of arguments a and b, and
for their reduction. returns c code for their reduction.
Example: return "%(a)s + %(b)s" for a sum reduction.
manner_init
A function that accepts strings of arguments a and return c code for its
initialization.
:postcondition: return "%(a)s + %(b)s"
This function leaves the answer in position 0 of the buffer. The rest of the
buffer is trashed by this function. for a sum reduction.
manner_init
A function that accepts strings of arguments a and return c
code for its initialization.
Notes Notes
----- -----
buf should be in gpu shared memory, we access it many times. `buf` should be in gpu shared memory, we access it many times.
""" """
if b: if b:
...@@ -320,6 +327,13 @@ def inline_softmax_fixed_shared(N, buf, x, stride_x, load_x, ...@@ -320,6 +327,13 @@ def inline_softmax_fixed_shared(N, buf, x, stride_x, load_x,
b='', stride_b='', load_b='', b='', stride_b='', load_b='',
dtype="float32"): dtype="float32"):
""" """
Generate code to perform softmax with a fixed amount of shared
memory.
On entry, `buf` is assumed to be empty.
On exit, `buf[0]` contains the softmax, `buf2` contains
un-normalized softmax.
Parameters Parameters
---------- ----------
...@@ -352,13 +366,9 @@ def inline_softmax_fixed_shared(N, buf, x, stride_x, load_x, ...@@ -352,13 +366,9 @@ def inline_softmax_fixed_shared(N, buf, x, stride_x, load_x,
dtype dtype
Optional, the dtype of the softmax's output if not float32. Optional, the dtype of the softmax's output if not float32.
:Precondition: buf is empty
:Postcondition: buf[0] contains the softmax, buf2 contains un-normalized
softmax
Notes Notes
----- -----
buf should be in gpu shared memory, we access it many times. `buf` should be in gpu shared memory, we access it many times.
We use tx as an int variable in a loop. We use tx as an int variable in a loop.
......
...@@ -17,6 +17,10 @@ from .type import GpuArrayType ...@@ -17,6 +17,10 @@ from .type import GpuArrayType
class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op): class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
"""
Images2Neibs for the GPU.
"""
def __init__(self, mode='valid'): def __init__(self, mode='valid'):
if mode not in ['valid', 'ignore_borders', 'wrap_centered']: if mode not in ['valid', 'ignore_borders', 'wrap_centered']:
raise NotImplementedError("Only the mode valid, ignore_borders" raise NotImplementedError("Only the mode valid, ignore_borders"
......
...@@ -41,6 +41,9 @@ def ensure_float(val, name): ...@@ -41,6 +41,9 @@ def ensure_float(val, name):
class Gemm16(COp): class Gemm16(COp):
"""
Gemm for float16 using the nervena kernels.
"""
__props__ = ('relu', 'inplace') __props__ = ('relu', 'inplace')
_f16_ok = True _f16_ok = True
params_type = gpu_context_type params_type = gpu_context_type
......
...@@ -22,7 +22,7 @@ def grab_cpu_scalar(v, nd): ...@@ -22,7 +22,7 @@ def grab_cpu_scalar(v, nd):
Parameters Parameters
---------- ----------
v : variable v
Theano variable to extract the constant value from. Theano variable to extract the constant value from.
nd : int nd : int
Expected number of dimensions for the variable (for Expected number of dimensions for the variable (for
...@@ -55,7 +55,7 @@ def find_node(v, cls, ignore_clients=False): ...@@ -55,7 +55,7 @@ def find_node(v, cls, ignore_clients=False):
Parameters Parameters
---------- ----------
v : variable v
The variable to dig through The variable to dig through
cls : Op class cls : Op class
The type of the node we are looking for The type of the node we are looking for
...@@ -84,9 +84,9 @@ def is_equal(var, val): ...@@ -84,9 +84,9 @@ def is_equal(var, val):
Parameters Parameters
---------- ----------
var : variable var
Variable to compare Variable to compare
val : value val
Python value Python value
""" """
...@@ -101,11 +101,11 @@ def alpha_merge(cls, alpha_in, beta_in): ...@@ -101,11 +101,11 @@ def alpha_merge(cls, alpha_in, beta_in):
""" """
Decorator to merge multiplication by a scalar on the output. Decorator to merge multiplication by a scalar on the output.
This will find a pattern of scal * <yourop>(some, params, alpha, This will find a pattern of `scal * <yourop>(some, params, alpha,
beta) and update it so that the scalar multiplication happens as beta)` and update it so that the scalar multiplication happens as
part of your op. part of your op.
The op needs to accept an alpha and a beta scalar which act this way: The op needs to accept an alpha and a beta scalar which act this way::
out = Op() * alpha + out_like * beta out = Op() * alpha + out_like * beta
...@@ -113,7 +113,7 @@ def alpha_merge(cls, alpha_in, beta_in): ...@@ -113,7 +113,7 @@ def alpha_merge(cls, alpha_in, beta_in):
and gets added to the "real" output of the operation. An example and gets added to the "real" output of the operation. An example
of an operation that respects this pattern is GEMM from blas. of an operation that respects this pattern is GEMM from blas.
The decorated function must have this signature: The decorated function must have this signature::
maker(node, *inputs) maker(node, *inputs)
...@@ -122,7 +122,7 @@ def alpha_merge(cls, alpha_in, beta_in): ...@@ -122,7 +122,7 @@ def alpha_merge(cls, alpha_in, beta_in):
for your op so that the new version performs the same computation. for your op so that the new version performs the same computation.
The `*inputs` parameters contains the new inputs for your op. You The `*inputs` parameters contains the new inputs for your op. You
MUST use those inputs instead of the ones on `node`. Note that MUST use those inputs instead of the ones on `node`. Note that
this function can be as simple as: this function can be as simple as::
def maker(node, *inputs): def maker(node, *inputs):
return node.op(*inputs) return node.op(*inputs)
...@@ -138,8 +138,9 @@ def alpha_merge(cls, alpha_in, beta_in): ...@@ -138,8 +138,9 @@ def alpha_merge(cls, alpha_in, beta_in):
Returns Returns
------- -------
This returns an unregistered local optimizer that has the same local optimizer
name as the decorated function. an unregistered local optimizer that has the same name as the
decorated function.
Notes Notes
----- -----
...@@ -191,11 +192,11 @@ def output_merge(cls, alpha_in, beta_in, out_in): ...@@ -191,11 +192,11 @@ def output_merge(cls, alpha_in, beta_in, out_in):
""" """
Decorator to merge addition by a value on the output. Decorator to merge addition by a value on the output.
This will find a pattern of val * <yourop>(some, params, alpha, This will find a pattern of `val * <yourop>(some, params, alpha,
beta, out_like) and update it so that the addtition happens as beta, out_like)` and update it so that the addtition happens as
part of your op. part of your op.
The op needs to accept an alpha and a beta scalar which act this way: The op needs to accept an alpha and a beta scalar which act this way::
out = Op() * alpha + out_like * beta out = Op() * alpha + out_like * beta
...@@ -203,7 +204,7 @@ def output_merge(cls, alpha_in, beta_in, out_in): ...@@ -203,7 +204,7 @@ def output_merge(cls, alpha_in, beta_in, out_in):
and gets added to the "real" output of the operation. An example and gets added to the "real" output of the operation. An example
of an operation that respects this pattern is GEMM from blas. of an operation that respects this pattern is GEMM from blas.
The decorated function must have this signature: The decorated function must have this signature::
maker(node, *inputs) maker(node, *inputs)
...@@ -212,7 +213,7 @@ def output_merge(cls, alpha_in, beta_in, out_in): ...@@ -212,7 +213,7 @@ def output_merge(cls, alpha_in, beta_in, out_in):
for your op so that the new version performs the same computation. for your op so that the new version performs the same computation.
The `*inputs` parameters contains the new inputs for your op. You The `*inputs` parameters contains the new inputs for your op. You
MUST use those inputs instead of the ones on `node`. Note that MUST use those inputs instead of the ones on `node`. Note that
this function can be as simple as: this function can be as simple as::
def maker(node, *inputs): def maker(node, *inputs):
return node.op(*inputs) return node.op(*inputs)
...@@ -230,8 +231,9 @@ def output_merge(cls, alpha_in, beta_in, out_in): ...@@ -230,8 +231,9 @@ def output_merge(cls, alpha_in, beta_in, out_in):
Returns Returns
------- -------
This returns an unregistered local optimizer that has the same local optimizer
name as the decorated function. an unregistered local optimizer that has the same name as the
decorated function.
Notes Notes
----- -----
...@@ -281,7 +283,7 @@ def inplace_allocempty(op, idx): ...@@ -281,7 +283,7 @@ def inplace_allocempty(op, idx):
This will duplicate the alloc input if it has more than one client This will duplicate the alloc input if it has more than one client
to allow the op to work on it inplace. to allow the op to work on it inplace.
The decorated function must have this signature: The decorated function must have this signature::
maker(node, inputs) maker(node, inputs)
...@@ -291,7 +293,7 @@ def inplace_allocempty(op, idx): ...@@ -291,7 +293,7 @@ def inplace_allocempty(op, idx):
You should also switch the op to work inplace. The `*inputs` You should also switch the op to work inplace. The `*inputs`
parameters contains the new inputs for your op. You MUST use parameters contains the new inputs for your op. You MUST use
those inputs instead of the ones on `node`. Note that this those inputs instead of the ones on `node`. Note that this
function can be as simple as: function can be as simple as::
def maker(node, inputs): def maker(node, inputs):
return [node.op.__class__(inplace=True)(*inputs)] return [node.op.__class__(inplace=True)(*inputs)]
...@@ -305,8 +307,9 @@ def inplace_allocempty(op, idx): ...@@ -305,8 +307,9 @@ def inplace_allocempty(op, idx):
Returns Returns
------- -------
This returns an unregistered inplace local optimizer that has the local optimizer
same name as the decorated function. an unregistered inplace local optimizer that has the same name
as the decorated function.
""" """
def wrapper(maker): def wrapper(maker):
......
...@@ -24,6 +24,9 @@ from .elemwise import GpuElemwise ...@@ -24,6 +24,9 @@ from .elemwise import GpuElemwise
class GpuSubtensor(HideC, Subtensor): class GpuSubtensor(HideC, Subtensor):
"""
Subtensor on the GPU.
"""
_f16_ok = True _f16_ok = True
def make_node(self, x, *inputs): def make_node(self, x, *inputs):
...@@ -173,8 +176,8 @@ class GpuIncSubtensor(GpuKernelBase, IncSubtensor): ...@@ -173,8 +176,8 @@ class GpuIncSubtensor(GpuKernelBase, IncSubtensor):
The optimization to make this inplace is in tensor/opt. The optimization to make this inplace is in tensor/opt.
The same optimization handles IncSubtensor and GpuIncSubtensor. The same optimization handles IncSubtensor and GpuIncSubtensor.
This Op has c_code too; it inherits tensor.IncSubtensor's c_code. This Op has c_code too; it inherits tensor.IncSubtensor's c_code.
The helper methods like do_type_checking, copy_of_x, etc. specialize The helper methods like :meth:`do_type_checking`,
the c_code for this Op. :meth:`copy_of_x`, etc. specialize the c_code for this Op.
""" """
...@@ -405,6 +408,9 @@ class GpuIncSubtensor(GpuKernelBase, IncSubtensor): ...@@ -405,6 +408,9 @@ class GpuIncSubtensor(GpuKernelBase, IncSubtensor):
class GpuAdvancedSubtensor1(HideC, tensor.AdvancedSubtensor1): class GpuAdvancedSubtensor1(HideC, tensor.AdvancedSubtensor1):
"""
AdvancedSubrensor1 on the GPU.
"""
def make_node(self, x, ilist): def make_node(self, x, ilist):
ctx_name = infer_context_name(x, ilist) ctx_name = infer_context_name(x, ilist)
x_ = as_gpuarray_variable(x, ctx_name) x_ = as_gpuarray_variable(x, ctx_name)
...@@ -580,8 +586,10 @@ class GpuAdvancedIncSubtensor1_dev20(GpuKernelBase, GpuAdvancedIncSubtensor1): ...@@ -580,8 +586,10 @@ class GpuAdvancedIncSubtensor1_dev20(GpuKernelBase, GpuAdvancedIncSubtensor1):
_f16_ok = True _f16_ok = True
def make_node(self, x, y, ilist): def make_node(self, x, y, ilist):
"""It defer from GpuAdvancedIncSubtensor1 in that it make sure """
the index are of type long. It differs from GpuAdvancedIncSubtensor1 in that it makes sure
the indexes are of type long.
""" """
ctx_name = infer_context_name(x, y, ilist) ctx_name = infer_context_name(x, y, ilist)
x_ = as_gpuarray_variable(x, ctx_name) x_ = as_gpuarray_variable(x, ctx_name)
......
...@@ -67,6 +67,7 @@ def get_context(name): ...@@ -67,6 +67,7 @@ def get_context(name):
def list_contexts(): def list_contexts():
""" """
Return an iterable of all the registered context names. Return an iterable of all the registered context names.
""" """
return _context_reg.keys() return _context_reg.keys()
...@@ -85,6 +86,54 @@ def _unreg_context(name): ...@@ -85,6 +86,54 @@ def _unreg_context(name):
class GpuArrayType(Type): class GpuArrayType(Type):
"""
The type that represents an array on a gpu.
The `dtype` indicates what scalar data type the elements of
variables of this type will be.
`broadcastable` indicates whether each dimension is broadcastable
or not (to be broadcastable a dimension must always be of length
1).
The `context_name` is the name of the context on will values of
variables of this type will be stored.
Parameters
----------
dtype : str
The name of a numpy dtype
broadcastable : tuple of bools
A tuple that indicates both the number of dimensions (by its
length) and whether those dimensions are broadcastable or not
(by the boolean values).
context_name : str
The name of the context the that this type is attached to
(default: None, which is the context specified by
config.device).
name : string, optional
A name for the type that will be used in printouts.
Attributes
----------
dtype : str
Data type used for scalar elements of variables.
broadcastable : tuple of bools
Indicates whether the dimensions are broadcastable or not.
ndim : int
The number of dimensions
context_name : str
The name of a gpu context on which variables will have their values.
name : str
A string used to print the type if given.
typecode : int
The gpuarray typecode for `dtype`
See Also
--------
theano.gof.type.PureType
"""
def __init__(self, dtype, broadcastable, context_name=None, name=None): def __init__(self, dtype, broadcastable, context_name=None, name=None):
# In case this was not provided and no global value is available # In case this was not provided and no global value is available
self.dtype = str(dtype) self.dtype = str(dtype)
...@@ -111,6 +160,11 @@ class GpuArrayType(Type): ...@@ -111,6 +160,11 @@ class GpuArrayType(Type):
# This is a property to keep the type pickleable # This is a property to keep the type pickleable
@property @property
def context(self): def context(self):
"""
The context object mapped to the type's :attr:`context_name`.
This is a property.
"""
return get_context(self.context_name) return get_context(self.context_name)
def __repr__(self): def __repr__(self):
...@@ -306,8 +360,6 @@ class GpuArrayType(Type): ...@@ -306,8 +360,6 @@ class GpuArrayType(Type):
This function is used internally as part of C code generation. This function is used internally as part of C code generation.
""" """
# TODO: add more type correspondances for e.g. int32, int64, float32,
# complex64, etc.
try: try:
return { return {
'float16': (float, 'npy_float16', 'NPY_FLOAT16'), 'float16': (float, 'npy_float16', 'NPY_FLOAT16'),
...@@ -321,8 +373,8 @@ class GpuArrayType(Type): ...@@ -321,8 +373,8 @@ class GpuArrayType(Type):
'int32': (int, 'npy_int32', 'NPY_INT32'), 'int32': (int, 'npy_int32', 'NPY_INT32'),
'uint64': (int, 'npy_uint64', 'NPY_UINT64'), 'uint64': (int, 'npy_uint64', 'NPY_UINT64'),
'int64': (int, 'npy_int64', 'NPY_INT64'), 'int64': (int, 'npy_int64', 'NPY_INT64'),
'complex128': (complex, 'theano_complex128', 'NPY_COMPLEX128'), # 'complex128': (complex, 'theano_complex128', 'NPY_COMPLEX128'),
'complex64': (complex, 'theano_complex64', 'NPY_COMPLEX64') # 'complex64': (complex, 'theano_complex64', 'NPY_COMPLEX64')
}[self.dtype] }[self.dtype]
except KeyError: except KeyError:
raise TypeError("Unsupported dtype for %s: %s" % raise TypeError("Unsupported dtype for %s: %s" %
...@@ -420,10 +472,21 @@ class _operators(_tensor_py_operators): ...@@ -420,10 +472,21 @@ class _operators(_tensor_py_operators):
class GpuArrayVariable(_operators, Variable): class GpuArrayVariable(_operators, Variable):
"""
A variable representing a computation on a certain GPU.
This supports all the operations that :class:`TensorType`
supports.
See Also
--------
Variable
"""
# override the default # override the default
def __repr_test_value__(self): def __repr_test_value__(self):
return repr(numpy.array(theano.gof.op.get_test_value(self))) return repr(numpy.array(theano.gof.op.get_test_value(self)))
pass
GpuArrayType.Variable = GpuArrayVariable GpuArrayType.Variable = GpuArrayVariable
...@@ -436,6 +499,17 @@ class GpuArraySignature(tensor.TensorConstantSignature): ...@@ -436,6 +499,17 @@ class GpuArraySignature(tensor.TensorConstantSignature):
class GpuArrayConstant(_operators, Constant): class GpuArrayConstant(_operators, Constant):
"""
A constant representing a value on a certain GPU.
This supports all the operations that :class:`TensorType`
supports.
See Also
--------
Constant
"""
def signature(self): def signature(self):
return GpuArraySignature((self.type, numpy.asarray(self.data))) return GpuArraySignature((self.type, numpy.asarray(self.data)))
...@@ -453,6 +527,17 @@ GpuArrayType.Constant = GpuArrayConstant ...@@ -453,6 +527,17 @@ GpuArrayType.Constant = GpuArrayConstant
class GpuArraySharedVariable(_operators, SharedVariable): class GpuArraySharedVariable(_operators, SharedVariable):
"""
A variable representing a shared value on a certain GPU.
This supports all the operations that :class:`TensorType`
supports.
See Also
--------
SharedVariable
"""
def get_value(self, borrow=False, return_internal_type=False): def get_value(self, borrow=False, return_internal_type=False):
if return_internal_type: if return_internal_type:
if borrow: if borrow:
...@@ -481,6 +566,8 @@ def gpuarray_shared_constructor(value, name=None, strict=False, ...@@ -481,6 +566,8 @@ def gpuarray_shared_constructor(value, name=None, strict=False,
""" """
SharedVariable constructor for GpuArrayType. SharedVariable constructor for GpuArrayType.
See :func:`theano.shared`.
""" """
if target == 'gpu' or target == 'cpu': if target == 'gpu' or target == 'cpu':
raise TypeError('not for me') raise TypeError('not for me')
...@@ -596,6 +683,13 @@ theano.compile.register_specify_shape_c_code( ...@@ -596,6 +683,13 @@ theano.compile.register_specify_shape_c_code(
class GpuContextType(Type): class GpuContextType(Type):
"""
Minimal type used for passing contexts to nodes.
This Type is not a complete type and should never be used for
regular graph operations.
"""
def filter(self, data, strict=False, allow_downcast=None): def filter(self, data, strict=False, allow_downcast=None):
if not isinstance(data, gpuarray.GpuContext): if not isinstance(data, gpuarray.GpuContext):
raise TypeError('context is not a GpuContext') raise TypeError('context is not a GpuContext')
...@@ -652,4 +746,8 @@ Py_INCREF(%(name)s); ...@@ -652,4 +746,8 @@ Py_INCREF(%(name)s);
# Variable, Contstant, ... not declared # Variable, Contstant, ... not declared
"""
Instance of :class:`GpuContextType` to use for the context_type
declaration of an operation.
"""
gpu_context_type = GpuContextType() gpu_context_type = GpuContextType()
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论