One of the Theano's design goals is to specify computations at an
abstract level, so that the internal function compiler has a lot of flexibility
about how to carry out those computations. One of the ways we take advantage of
this flexibility is in carrying out calculations on an Nvidia graphics card when
there is a CUDA-enabled device in your computer.
Setting up CUDA
----------------
The first thing you'll need for Theano to use your GPU is Nvidia's
GPU-programming toolchain. You should install at least the CUDA driver and the CUDA Toolkit, as
:ref:`described here<http://www.nvidia.com/object/cuda_get.html>`. After
installing these tools, there should be a folder on your computer with a 'bin' subfolder containing the 'nvcc' executable,
a 'lib' subfolder containing libcudart among other things, and an 'include' directory.
This folder with the 'bin', 'lib', and 'include' folders is called the *cuda
root* directory, and Theano needs to know where it is to use GPU functionality.
On Linux or OS-X, add the cuda root 'lib' (and/or 'lib64' if you have a 64-bit
computer) directories to your LD_LIBRARY_PATH environment variable so that the
dynamic loading of modules linked with cuda libraries can work. (***TODO on
windows I don't know how to do this!)
Making Theano use CUDA
----------------------
There are three ways to tell Theano where the cuda root is. Any one of them is
enough (and it would be confusing to use more than one!)
* Define a $CUDA_ROOT environment variable to equal the cuda root directory, as in ``CUDA_ROOT=/path/to/cuda/root``, or
* add a ``cuda.root`` flag to :envvar:`THEANO_FLAGS`, as in ``THEANO_FLAGS='cuda.root=/path/to/cuda/root'``, or
* add a [cuda] section to your .theanorc file containing the option ``root = /path/to/cuda/root``.
Once everything is set up correctly, the only thing left to do to tell Theano to
use the GPU is to change the ``device`` option to name the GPU device in your
computer.
For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=gpu0'``.
You can also set the device option in the .theanorc file's [global] section. If
your computer has multiple gpu devices, you can address them as gpu0, gpu1,
gpu2, or gpu3. (If you have more than 4 devices you are very lucky but you'll have to modify theano's
configdefaults.py file to define more gpu devices to choose from.)
Putting it all Together
-------------------------
To see if your GPU is being used, cut and paste the following program into a
file and run it.
.. code-block:: python
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 100000
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
t0 = time.time()
for i in xrange(iters):
r = f()
print 'Looping 100 times took', time.time() - t0, 'seconds'
print 'Result is', r
The program computes an outer-sum of two long vectors, and then adds up the
result of the exp() of each element. Note that we use the `shared` function to
make sure that the inputs `x` and `y` are stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 3 seconds, whereas on the GPU it takes just over 0.2 seconds. Note that the results are close but not identical! The GPU will not always produce the exact same floating-point numbers as the CPU.