提交 bd544674 authored 作者: slefrancois's avatar slefrancois

Update doc with instructions for using new gpu backend

上级 319382b5
...@@ -37,3 +37,4 @@ Theano.suo ...@@ -37,3 +37,4 @@ Theano.suo
.ipynb_checkpoints .ipynb_checkpoints
.pydevproject .pydevproject
.ropeproject .ropeproject
core
\ No newline at end of file
...@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this: ...@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops Testing GPU Ops
^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
Ops to be executed on the GPU should inherit from the When using the old GPU backend, Ops to be executed on the GPU should inherit
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows from ``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
Theano to distinguish them. Currently, we use this to test if the Theano to distinguish them. Currently, we use this to test if the
NVIDIA driver works correctly with our sum reduction code on the GPU. NVIDIA driver works correctly with our sum reduction code on the GPU.
......
...@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add ...@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
If you want GPU-related tests to run on a specific GPU device, and not If you want GPU-related tests to run on a specific GPU device, and not
the default one, you should use :attr:`~config.init_gpu_device`. the default one, you should use :attr:`~config.init_gpu_device`.
For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=gpu1``. For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=cuda1``.
See :ref:`libdoc_config` for more information on how to change these See :ref:`libdoc_config` for more information on how to change these
configuration options. configuration options.
...@@ -508,25 +508,25 @@ Any one of them is enough. ...@@ -508,25 +508,25 @@ Any one of them is enough.
:ref:`Ubuntu instructions <install_ubuntu_gpu>`. :ref:`Ubuntu instructions <install_ubuntu_gpu>`.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
computer, and set the default floating point computations to float32. computer, and set the default floating point computations to float32.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=gpu,floatX=float32'``. For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=cuda,floatX=float32'``.
You can also set these options in the .theanorc file's ``[global]`` section: You can also set these options in the .theanorc file's ``[global]`` section:
.. code-block:: cfg .. code-block:: cfg
[global] [global]
device = gpu device = cuda
floatX = float32 floatX = float32
Note that: Note that:
* If your computer has multiple GPUs and you use 'device=gpu', the driver * If your computer has multiple GPUs and you use 'device=cuda', the driver
selects the one to use (usually gpu0). selects the one to use (usually gpu0).
* You can use the program nvida-smi to change this policy. * You can use the program nvida-smi to change this policy.
* You can choose one specific GPU by specifying 'device=gpuX', with X the * You can choose one specific GPU by specifying 'device=cudaX', with X the
the corresponding GPU index (0, 1, 2, ...) the corresponding GPU index (0, 1, 2, ...)
* By default, when ``device`` indicates preference for GPU computations, * By default, when ``device`` indicates preference for GPU computations,
Theano will fall back to the CPU if there is a problem with the GPU. Theano will fall back to the CPU if there is a problem with the GPU.
...@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats: ...@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
toggle your GPU on, which can be done with toggle your GPU on, which can be done with
`gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__. `gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
Once your setup is complete, head to :ref:`using_gpu` to find how to verify Once your setup is complete, head to :ref:`using_gpu` to find how to verify
everything is working properly. everything is working properly.
......
...@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04: ...@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano sudo pip install Theano
On 14.04, this will install Python 2 by default. If you want to use Python 3: On 14.04, this will install Python 2 by default. If you want to use Python 3:
.. code-block:: bash .. code-block:: bash
...@@ -104,30 +104,30 @@ For Ubuntu 11.04: ...@@ -104,30 +104,30 @@ For Ubuntu 11.04:
The development version of Theano supports Python 3.3 and The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it. probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs Bleeding Edge Installs
---------------------- ----------------------
If you would like, instead, to install the bleeding edge Theano (from github) If you would like, instead, to install the bleeding edge Theano (from github)
such that you can edit and contribute to Theano, replace the `pip install Theano` such that you can edit and contribute to Theano, replace the `pip install Theano`
command with: command with:
.. code-block:: bash .. code-block:: bash
git clone git://github.com/Theano/Theano.git git clone git://github.com/Theano/Theano.git
cd Theano cd Theano
python setup.py develop --user python setup.py develop --user
cd .. cd ..
VirtualEnv VirtualEnv
---------- ----------
If you would like to install Theano in a VirtualEnv, you will want to pass the If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up `--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`. the system-provided `Numpy` and `SciPy`.
.. code-block:: bash .. code-block:: bash
virtualenv --system-site-packages -p python2.7 theano-env virtualenv --system-site-packages -p python2.7 theano-env
source theano-env/bin/activate source theano-env/bin/activate
pip install Theano pip install Theano
...@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs ...@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
Change to the Theano directory and run: Change to the Theano directory and run:
.. code-block:: bash .. code-block:: bash
git pull git pull
...@@ -303,7 +303,7 @@ Test GPU configuration ...@@ -303,7 +303,7 @@ Test GPU configuration
.. code-block:: bash .. code-block:: bash
THEANO_FLAGS=floatX=float32,device=gpu python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py THEANO_FLAGS=floatX=float32,device=cuda python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
.. note:: .. note::
......
...@@ -423,16 +423,16 @@ Create a test file containing: ...@@ -423,16 +423,16 @@ Create a test file containing:
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %( print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
np_end-np_start, t_end-t_start)) np_end-np_start, t_end-t_start))
print("Result difference: %f" % (np.abs(AB-tAB).max(), )) print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
.. testoutput:: .. testoutput::
:hide: :hide:
:options: +ELLIPSIS :options: +ELLIPSIS
NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!) NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
Result difference: ... Result difference: ...
.. code-block:: none .. code-block:: none
NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!) NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
Result difference: 0.000000 Result difference: 0.000000
...@@ -445,6 +445,8 @@ routine for matrix multiplication) ...@@ -445,6 +445,8 @@ routine for matrix multiplication)
Configure Theano for GPU use Configure Theano for GPU use
############################ ############################
Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
Theano can be configured with a ``.theanorc`` text file (or Theano can be configured with a ``.theanorc`` text file (or
``.theanorc.txt``, whichever is easier for you to create under ``.theanorc.txt``, whichever is easier for you to create under
Windows). It should be placed in the directory pointed to by the Windows). It should be placed in the directory pointed to by the
...@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file: ...@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
.. code-block:: cfg .. code-block:: cfg
[global] [global]
device = gpu device = cuda
floatX = float32 floatX = float32
[nvcc] [nvcc]
...@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above. ...@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
Compiling a faster BLAS Compiling a faster BLAS
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
If you installed Python through WinPython or EPD, Theano will automatically If you installed Python through WinPython or EPD, Theano will automatically
link with the MKL library, so you should not need to compile your own BLAS. link with the MKL library, so you should not need to compile your own BLAS.
.. note:: .. note::
......
...@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE ...@@ -32,6 +32,7 @@ Optimization FAST_RUN FAST_COMPILE
========================================================= ========= ============ ============= ========================================================= ========= ============ =============
:term:`merge` x x :term:`merge` x x
:term:`constant folding<constant folding>` x x :term:`constant folding<constant folding>` x x
:term:`GPU transfer` x x
:term:`shape promotion<shape promotion>` x :term:`shape promotion<shape promotion>` x
:term:`fill cut<fill cut>` x :term:`fill cut<fill cut>` x
:term:`inc_subtensor srlz.<inc_subtensor serialization>` x :term:`inc_subtensor srlz.<inc_subtensor serialization>` x
...@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE ...@@ -52,7 +53,6 @@ Optimization FAST_RUN FAST_COMPILE
:term:`inplace_elemwise` x :term:`inplace_elemwise` x
:term:`inplace_random` x :term:`inplace_random` x
:term:`elemwise fusion` x :term:`elemwise fusion` x
:term:`GPU transfer` x
:term:`local_log_softmax` x x :term:`local_log_softmax` x x
:term:`local_remove_all_assert` :term:`local_remove_all_assert`
========================================================= ========= ============ ============= ========================================================= ========= ============ =============
......
...@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to ...@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
hints that give more flexibility to the compilation and optimization of the hints that give more flexibility to the compilation and optimization of the
graph. graph.
For GPU graphs, this borrowing can have a major speed impact. See the following code:
.. code-block:: python
from theano import function, config, shared, sandbox, tensor, Out
import numpy
import time
vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
f2 = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
borrow=True))
t0 = time.time()
for i in range(iters):
r = f1()
t1 = time.time()
no_borrow = t1 - t0
t0 = time.time()
for i in range(iters):
r = f2()
t1 = time.time()
print(
"Looping %s times took %s seconds without borrow "
"and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
)
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f1.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')
Which produces this output:
.. code-block:: none
$ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
Using gpu device 0: GeForce GTX 275
Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
Used the gpu
*Take home message:* *Take home message:*
When an input *x* to a function is not needed after the function When an input *x* to a function is not needed after the function
...@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory ...@@ -317,4 +271,3 @@ requirement. When a return value *y* is large (in terms of memory
footprint), and you only need to read from it once, right away when footprint), and you only need to read from it once, right away when
it's returned, then consider marking it with an ``Out(y, it's returned, then consider marking it with an ``Out(y,
borrow=True)``. borrow=True)``.
差异被折叠。
...@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000, ...@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
t0 = 0 t0 = 0
t1 = -1 t1 = -1
f() # Ignore first function call to get representative time.
if execute: if execute:
sync = (hasattr(theano, "sandbox") and sync = (hasattr(theano, "sandbox") and
hasattr(theano.sandbox, "cuda") and hasattr(theano.sandbox, "cuda") and
theano.sandbox.cuda.cuda_available) theano.sandbox.cuda.cuda_available)
sync2 = (hasattr(theano, "gpuarray") and
theano.gpuarray.pygpu_activated)
t0 = time.time() t0 = time.time()
for i in range(iters): for i in range(iters):
f() f()
if sync: if sync:
theano.sandbox.cuda.synchronize() theano.sandbox.cuda.synchronize()
if sync2:
c.get_value(borrow=True, return_internal_type=True).sync()
t1 = time.time() t1 = time.time()
return t1 - t0, impl return t1 - t0, impl
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论