提交 ec0419a6 authored 作者: Pascal Lamblin's avatar Pascal Lamblin

Merge pull request #4500 from slefrancois/gpu_out_sandbox

Update doc with instructions for using new gpu backend
...@@ -37,3 +37,4 @@ Theano.suo ...@@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints .ipynb_checkpoints
.pydevproject .pydevproject
.ropeproject .ropeproject
core
\ No newline at end of file
...@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this: ...@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops Testing GPU Ops
^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
Ops to be executed on the GPU should inherit from the When using the old GPU backend, Ops to be executed on the GPU should inherit
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows from ``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU. NVIDIA driver works correctly with our sum reduction code on the GPU.
......
...@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add ...@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
If you want GPU-related tests to run on a specific GPU device, and not If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`. the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=gpu1``. For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=cuda1``.
See :ref:`libdoc_config` for more information on how to change these See :ref:`libdoc_config` for more information on how to change these
configuration options. configuration options.
...@@ -508,25 +508,25 @@ Any one of them is enough. ...@@ -508,25 +508,25 @@ Any one of them is enough.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`. :ref:`Ubuntu instructions <install_ubuntu_gpu>`.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32. computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=gpu,floatX=float32'``. For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=cuda,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section: You can also set these options in the .theanorc file's ``[global]`` section:
.. code-block:: cfg .. code-block:: cfg
[global] [global]
device = gpu device = cuda
floatX = float32 floatX = float32
Note that: Note that:
* If your computer has multiple GPUs and you use 'device=gpu', the driver * If your computer has multiple GPUs and you use 'device=cuda', the driver
selects the one to use (usually gpu0). selects the one to use (usually cuda0).
* You can use the program nvida-smi to change this policy. * You can use the program ``nvidia-smi`` to change this policy.
* You can choose one specific GPU by specifying 'device=gpuX', with X the * You can choose one specific GPU by specifying 'device=cudaX', with X the
the corresponding GPU index (0, 1, 2, ...) the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations, * By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU. Theano will fall back to the CPU if there is a problem with the GPU.
...@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats: ...@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__. `gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once your setup is complete, head to :ref:`using_gpu` to find how to verify Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly. everything is working properly.
......
...@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04: ...@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano sudo pip install Theano
On 14.04, this will install Python 2 by default. If you want to use Python 3: On 14.04, this will install Python 2 by default. If you want to use Python 3:
.. code-block:: bash .. code-block:: bash
...@@ -104,30 +104,30 @@ For Ubuntu 11.04: ...@@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it. probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs Bleeding Edge Installs
---------------------- ----------------------
If you would like, instead, to install the bleeding edge Theano (from github) If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano` such that you can edit and contribute to Theano, replace the `pip install Theano`
command with: command with:
.. code-block:: bash .. code-block:: bash
git clone git://github.com/Theano/Theano.git git clone git://github.com/Theano/Theano.git
cd Theano cd Theano
python setup.py develop --user python setup.py develop --user
cd .. cd ..
VirtualEnv VirtualEnv
---------- ----------
If you would like to install Theano in a VirtualEnv, you will want to pass the If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up `--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`. the system-provided `Numpy` and `SciPy`.
.. code-block:: bash .. code-block:: bash
virtualenv --system-site-packages -p python2.7 theano-env virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate source theano-env/bin/activate
pip install Theano pip install Theano
...@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs ...@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run: Change to the Theano directory and run:
.. code-block:: bash .. code-block:: bash
git pull git pull
...@@ -303,7 +303,7 @@ Test GPU configuration ...@@ -303,7 +303,7 @@ Test GPU configuration
.. code-block:: bash .. code-block:: bash
THEANO_FLAGS=floatX=float32,device=gpu python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py THEANO_FLAGS=floatX=float32,device=cuda python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
.. note:: .. note::
......
...@@ -423,16 +423,16 @@ Create a test file containing: ...@@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %( print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start)) np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), )) print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
.. testoutput:: .. testoutput::
:hide: :hide:
:options: +ELLIPSIS :options: +ELLIPSIS
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!) NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ... Result difference: ...
.. code-block:: none .. code-block:: none
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!) NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000 Result difference: 0.000000
...@@ -445,6 +445,8 @@ routine for matrix multiplication) ...@@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use Configure Theano for GPU use
############################ ############################
Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
Theano can be configured with a ``.theanorc`` text file (or Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under ``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the Windows). It should be placed in the directory pointed to by the
...@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file: ...@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg .. code-block:: cfg
[global] [global]
device = gpu device = cuda
floatX = float32 floatX = float32
[nvcc] [nvcc]
...@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above. ...@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
If you installed Python through WinPython or EPD, Theano will automatically If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS. link with the MKL library, so you should not need to compile your own BLAS.
.. note:: .. note::
......
...@@ -51,11 +51,11 @@ Environment Variables ...@@ -51,11 +51,11 @@ Environment Variables
.. code-block:: bash .. code-block:: bash
THEANO_FLAGS='floatX=float32,device=gpu0,lib.cnmem=1' python <myscript>.py THEANO_FLAGS='floatX=float32,device=cuda0,lib.cnmem=1' python <myscript>.py
If a value is defined several times in ``THEANO_FLAGS``, If a value is defined several times in ``THEANO_FLAGS``,
the right-most definition is used. So, for instance, if the right-most definition is used. So, for instance, if
``THEANO_FLAGS='device=cpu,device=gpu0'``, then gpu0 will be used. ``THEANO_FLAGS='device=cpu,device=cuda0'``, then cuda0 will be used.
.. envvar:: THEANORC .. envvar:: THEANORC
...@@ -70,7 +70,7 @@ Environment Variables ...@@ -70,7 +70,7 @@ Environment Variables
[global] [global]
floatX = float32 floatX = float32
device = gpu0 device = cuda0
[lib] [lib]
cnmem = 1 cnmem = 1
...@@ -102,22 +102,21 @@ import theano and print the config variable, as in: ...@@ -102,22 +102,21 @@ import theano and print the config variable, as in:
.. attribute:: device .. attribute:: device
String value: either ``'cpu'``, ``'gpu'``, ``'gpu0'``, ``'gpu1'``, String value: either ``'cpu'``, ``'cuda'``, ``'cuda0'``, ``'cuda1'``,
``'gpu2'``, or ``'gpu3'`` ``'opencl0:0'``, ``'opencl0:1'``, ``'gpu'``, ``'gpu0'`` ...
Default device for computations. If ``gpu*``, change the default to try Default device for computations. If ``'cuda*``, change the default to try
to move computation to it and to put shared variable of float32 on to move computation to the GPU using CUDA libraries. If ``'opencl*'``,
it. the openCL libraries will be used. To let the driver select the device,
Choose the default compute device for theano graphs. Setting this to a use ``'cuda'`` or ``'opencl'``. If ``'gpu*'``, the old gpu backend will
``gpu*`` string will make theano to try by default to move computation to it. be used, although users are encouraged to migrate to the new GpuArray
Also it will make theano put by default shared variable of float32 on it. backend. If we are not able to use the GPU,
``'gpu'`` lets the driver select the GPU to use, while ``'gpu?'`` makes Theano try either we fall back on the CPU, or an error is raised, depending
to use a specific device. If we are not able to use the GPU, either we fall back on the :attr:`force_device` flag.
on the CPU, or an error is raised, depending on the :attr:`force_device` flag.
This flag's value cannot be modified during the program execution. This flag's value cannot be modified during the program execution.
Do not use upper case letters, only lower case even if NVIDIA use Do not use upper case letters, only lower case even if NVIDIA uses
capital letters. capital letters.
.. attribute:: force_device .. attribute:: force_device
...@@ -138,11 +137,12 @@ import theano and print the config variable, as in: ...@@ -138,11 +137,12 @@ import theano and print the config variable, as in:
.. attribute:: init_gpu_device .. attribute:: init_gpu_device
String value: either ``''``, ``'gpu'``, ``'gpu0'``, ``'gpu1'``, ``'gpu2'``, String value: either ``''``, ``'cuda'``, ``'cuda0'``, ``'cuda1'``,
or ``'gpu3'`` ``'opencl0:0'``, ``'opencl0:1'``, ``'gpu'``, ``'gpu0'`` ...
Initialize the gpu device to use. Initialize the gpu device to use.
When its value is gpu*, the theano flag :attr:`device` must be ``"cpu"``. When its value is ``'cuda*'``, ``'opencl*'`` or ``'gpu*'``, the theano
flag :attr:`device` must be ``'cpu'``.
Unlike :attr:`device`, setting this flag to a specific GPU will not Unlike :attr:`device`, setting this flag to a specific GPU will not
try to use this device by default, in particular it will **not** move try to use this device by default, in particular it will **not** move
computations, nor shared variables, to the specified GPU. computations, nor shared variables, to the specified GPU.
......
...@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE ...@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ ============= ========================================================= ========= ============ =============
:term:`merge` x x :term:`merge` x x
:term:`constant folding<constant folding>` x x :term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x :term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x :term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x :term:`inc_subtensor srlz.<inc_subtensor serialization>` x
...@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE ...@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x :term:`inplace_elemwise` x
:term:`inplace_random` x :term:`inplace_random` x
:term:`elemwise fusion` x :term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x :term:`local_log_softmax` x x
:term:`local_remove_all_assert` :term:`local_remove_all_assert`
========================================================= ========= ============ ============= ========================================================= ========= ============ =============
......
...@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to ...@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the hints that give more flexibility to the compilation and optimization of the
graph. graph.
For GPU graphs, this borrowing can have a major speed impact. See the following code:
.. code-block:: python
from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time
vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Which produces this output:
.. code-block:: none
$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu
*Take home message:* *Take home message:*
When an input *x* to a function is not needed after the function When an input *x* to a function is not needed after the function
...@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory ...@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y, it's returned, then consider marking it with an ``Out(y,
borrow=True)``. borrow=True)``.
差异被折叠。
...@@ -81,7 +81,7 @@ single name and a single device. ...@@ -81,7 +81,7 @@ single name and a single device.
It is often the case that multi-gpu operation requires or assumes It is often the case that multi-gpu operation requires or assumes
that all the GPUs involved are equivalent. This is not the case that all the GPUs involved are equivalent. This is not the case
for this implementation. Since the user has the task of for this implementation. Since the user has the task of
distrubuting the jobs across the different device a model can be distributing the jobs across the different device a model can be
built on the assumption that one of the GPU is slower or has built on the assumption that one of the GPU is slower or has
smaller memory. smaller memory.
...@@ -140,5 +140,5 @@ is a example. ...@@ -140,5 +140,5 @@ is a example.
cv = gv.transfer('cpu') cv = gv.transfer('cpu')
Of course you can mix transfers and operations in any order you Of course you can mix transfers and operations in any order you
choose. However you should try to minimize transfer operations choose. However you should try to minimize transfer operations
because they will introduce overhead any may reduce performance. because they will introduce overhead that may reduce performance.
...@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam): ...@@ -104,10 +104,9 @@ class DeviceParam(ConfigParam):
AddConfigVar( AddConfigVar(
'device', 'device',
("Default device for computations. If gpu*, change the default to try " ("Default device for computations. If cuda* or opencl*, change the"
"to move computation to it and to put shared variable of float32 " "default to try to move computation to the GPU. Do not use upper case"
"on it. Do not use upper case letters, only lower case even if " "letters, only lower case even if NVIDIA uses capital letters."),
"NVIDIA use capital letters."),
DeviceParam('cpu', allow_override=False), DeviceParam('cpu', allow_override=False),
in_c_key=False) in_c_key=False)
......
...@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000, ...@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
t0 = 0 t0 = 0
t1 = -1 t1 = -1
f() # Ignore first function call to get representative time.
if execute: if execute:
sync = (hasattr(theano, "sandbox") and sync = (hasattr(theano, "sandbox") and
hasattr(theano.sandbox, "cuda") and hasattr(theano.sandbox, "cuda") and
theano.sandbox.cuda.cuda_available) theano.sandbox.cuda.cuda_available)
sync2 = (hasattr(theano, "gpuarray") and
theano.gpuarray.pygpu_activated)
t0 = time.time() t0 = time.time()
for i in range(iters): for i in range(iters):
f() f()
if sync: if sync:
theano.sandbox.cuda.synchronize() theano.sandbox.cuda.synchronize()
if sync2:
c.get_value(borrow=True, return_internal_type=True).sync()
t1 = time.time() t1 = time.time()
return t1 - t0, impl return t1 - t0, impl
......
...@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray.""" ...@@ -4,6 +4,7 @@ which refered to theano.sandbox.gpuarray."""
import warnings import warnings
from theano.gpuarray import * from theano.gpuarray import *
message = "theano.sandbox.gpuarray has been moved to theano.gpuarray." + \ message = ("theano.sandbox.gpuarray has been moved to theano.gpuarray. "
" Please update your code and pickles." "Please update your code and pickles. If the warning persists, "
"clear theano's cache ('$theano/bin/theano-cache clear').")
warnings.warn(message) warnings.warn(message)
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论