@@ -15,28 +15,47 @@ about how to carry out those computations. One of the ways we take
...
@@ -15,28 +15,47 @@ about how to carry out those computations. One of the ways we take
advantage of this flexibility is in carrying out calculations on a
advantage of this flexibility is in carrying out calculations on a
graphics card.
graphics card.
There are two ways currently to use a gpu, one of which only supports NVIDIA cards (:ref:`cuda`) and the other, in development, that should support any OpenCL device as well as NVIDIA cards (:ref:`gpuarray`).
There are two ways currently to use a gpu, one that should support any OpenCL
device as well as NVIDIA cards (:ref:`gpuarray`), and the old backend that
only supports NVIDIA cards (:ref:`cuda`).
.. _cuda:
.. warning::
CUDA backend
If you want to use the new GpuArray backend, make sure to have the
------------
development version of Theano installed. The 0.8.X releases have not
been optimized to work correctly with the new backend.
If you have not done so already, you will need to install Nvidia's
.. _gpuarray:
GPU-programming toolchain (CUDA) and configure Theano to use it.
We provide installation instructions for :ref:`Linux <gpu_linux>`,
GpuArray Backend
:ref:`MacOS <gpu_macos>` and :ref:`Windows <gpu_windows>`.
----------------
If you have not done so already, you will need to install libgpuarray
as well as at least one computing toolkit. Instructions for doing so
are provided at `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
While all types of devices are supported if using OpenCL, for the
remainder of this section, whatever compute device you are using will
be referred to as GPU.
.. warning::
The backend was designed to support OpenCL, however current support is
incomplete. A lot of very useful ops still do not support it because they
were ported from the old backend with minimal change.
Testing Theano with GPU
Testing Theano with GPU
~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
To see if your GPU is being used, cut and paste the following program into a
To see if your GPU is being used, cut and paste the following program
file and run it.
into a file and run it.
Use the Theano flag ``device=cuda`` to require the use of the GPU. Use the flag
``device=cuda{0,1,...}`` to specify which GPU to use.
.. testcode::
.. testcode::
from theano import function, config, shared, sandbox
from theano import function, config, shared, tensor
import theano.tensor as T
import numpy
import numpy
import time
import time
...
@@ -45,7 +64,7 @@ file and run it.
...
@@ -45,7 +64,7 @@ file and run it.
rng = numpy.random.RandomState(22)
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
print(f.maker.fgraph.toposort())
t0 = time.time()
t0 = time.time()
for i in range(iters):
for i in range(iters):
...
@@ -53,20 +72,16 @@ file and run it.
...
@@ -53,20 +72,16 @@ file and run it.
t1 = time.time()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
print('Used the cpu')
else:
else:
print('Used the gpu')
print('Used the gpu')
The program just computes the ``exp()`` of a bunch of random numbers.
The program just computes ``exp()`` of a bunch of random numbers. Note
Note that we use the ``shared`` function to
that we use the :func:`theano.shared` function to make sure that the
make sure that the input *x* is stored on the graphics device.
input *x* is stored on the GPU.
.. the following figures have been measured twice on BART3 on Aug 2nd 2012 with no other job running simultaneously
If I run this program (in check1.py) with ``device=cpu``, my computer takes a little over 3 seconds,
whereas on the GPU it takes just over 0.64 seconds. The GPU will not always produce the exact
same floating-point numbers as the CPU. As a benchmark, a loop that calls ``numpy.exp(x.get_value())`` takes about 46 seconds.
.. testoutput::
.. testoutput::
:hide:
:hide:
...
@@ -79,40 +94,37 @@ same floating-point numbers as the CPU. As a benchmark, a loop that calls ``nump
...
@@ -79,40 +94,37 @@ same floating-point numbers as the CPU. As a benchmark, a loop that calls ``nump