Update doc with instructions for using new gpu backend

bd544674 · slefrancois · 319382b5 · bd544674 · bd544674 · bd544674
--- a/.gitignore
+++ b/.gitignore
@@ -37,3 +37,4 @@ Theano.suo
 .ipynb_checkpoints
 .pydevproject
 .ropeproject
+core
\ No newline at end of file
--- a/doc/extending/extending_theano.txt
+++ b/doc/extending/extending_theano.txt
@@ -681,8 +681,8 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
 Testing GPU Ops
 ^^^^^^^^^^^^^^^
-Ops to be executed on the GPU should inherit from the
+When using the old GPU backend, Ops to be executed on the GPU should inherit
-``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
+from ``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
 Theano to distinguish them. Currently, we use this to test if the
 NVIDIA driver works correctly with our sum reduction code on the GPU.

--- a/doc/install.txt
+++ b/doc/install.txt
@@ -375,7 +375,7 @@ If ``theano-nose`` is not found by your shell, you will need to add
    If you want GPU-related tests to run on a specific GPU device, and not
    the default one, you should use :attr:`~config.init_gpu_device`.
-    For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=gpu1``.
+    For instance: ``THEANO_FLAGS=device=cpu,init_gpu_device=cuda1``.
    See :ref:`libdoc_config` for more information on how to change these
    configuration options.
@@ -508,25 +508,25 @@ Any one of them is enough.
    :ref:`Ubuntu instructions <install_ubuntu_gpu>`.
+Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
 Once that is done, the only thing left is to change the ``device`` option to name the GPU device in your
 computer, and set the default floating point computations to float32.
-For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=gpu,floatX=float32'``.
+For example: ``THEANO_FLAGS='cuda.root=/path/to/cuda/root,device=cuda,floatX=float32'``.
 You can also set these options in the .theanorc file's ``[global]`` section:
     .. code-block:: cfg
        [global]
-        device = gpu
+        device = cuda
        floatX = float32
 Note that:
-    * If your computer has multiple GPUs and you use 'device=gpu', the driver
+    * If your computer has multiple GPUs and you use 'device=cuda', the driver
      selects the one to use (usually gpu0).
    * You can use the program nvida-smi to change this policy.
-    * You can choose one specific GPU by specifying 'device=gpuX', with X the
+    * You can choose one specific GPU by specifying 'device=cudaX', with X the
      the corresponding GPU index (0, 1, 2, ...)
    * By default, when ``device`` indicates preference for GPU computations,
      Theano will fall back to the CPU if there is a problem with the GPU.
@@ -794,6 +794,8 @@ setup CUDA, but be aware of the following caveats:
     toggle your GPU on, which can be done with
     `gfxCardStatus <http://codykrieger.com/gfxCardStatus>`__.
+Next, install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_.
 Once your setup is complete, head to :ref:`using_gpu` to find how to verify
 everything is working properly.

--- a/doc/install_ubuntu.txt
+++ b/doc/install_ubuntu.txt
@@ -43,7 +43,7 @@ For Ubuntu 11.10 through 14.04:
    sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
    sudo pip install Theano
 On 14.04, this will install Python 2 by default. If you want to use Python 3:
 .. code-block:: bash
@@ -104,30 +104,30 @@ For Ubuntu 11.04:
   The development version of Theano supports Python 3.3 and
   probably supports Python 3.2, but we do not test on it.
 Bleeding Edge Installs
 ----------------------
-If you would like, instead, to install the bleeding edge Theano (from github) 
+If you would like, instead, to install the bleeding edge Theano (from github)
-such that you can edit and contribute to Theano, replace the `pip install Theano` 
+such that you can edit and contribute to Theano, replace the `pip install Theano`
 command with:
 .. code-block:: bash
    git clone git://github.com/Theano/Theano.git
-    cd Theano 
+    cd Theano
    python setup.py develop --user
    cd ..
 VirtualEnv
 ----------
-If you would like to install Theano in a VirtualEnv, you will want to pass the 
+If you would like to install Theano in a VirtualEnv, you will want to pass the
-`--system-site-packages` flag when creating the VirtualEnv so that it will pick up 
+`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
 the system-provided `Numpy` and `SciPy`.
 .. code-block:: bash
    virtualenv --system-site-packages -p python2.7 theano-env
    source theano-env/bin/activate
    pip install Theano
@@ -208,7 +208,7 @@ Updating Bleeding Edge Installs
 Change to the Theano directory and run:
 .. code-block:: bash
    git pull
@@ -303,7 +303,7 @@ Test GPU configuration
 .. code-block:: bash
-    THEANO_FLAGS=floatX=float32,device=gpu python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
+    THEANO_FLAGS=floatX=float32,device=cuda python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
 .. note::

--- a/doc/install_windows.txt
+++ b/doc/install_windows.txt
@@ -423,16 +423,16 @@ Create a test file containing:
   print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" %(
                                              np_end-np_start, t_end-t_start))
   print("Result difference: %f" % (np.abs(AB-tAB).max(), ))
 .. testoutput::
   :hide:
   :options: +ELLIPSIS
   NP time: ...[s], theano time: ...[s] (times should be close when run on CPU!)
   Result difference: ...
 .. code-block:: none
   NP time: 1.480863[s], theano time: 1.475381[s] (times should be close when run on CPU!)
   Result difference: 0.000000
@@ -445,6 +445,8 @@ routine for matrix multiplication)
 Configure Theano for GPU use
 ############################
+Install `libgpuarray <http://deeplearning.net/software/libgpuarray/installation.html>`_ if you have not already done so.
 Theano can be configured with a ``.theanorc`` text file (or
 ``.theanorc.txt``, whichever is easier for you to create under
 Windows). It should be placed in the directory pointed to by the
@@ -457,7 +459,7 @@ To use the GPU please write the following configuration file:
 .. code-block:: cfg
   [global]
-   device = gpu
+   device = cuda
   floatX = float32
   [nvcc]
@@ -498,7 +500,7 @@ within an MSYS shell if you installed Nose manually as described above.
 Compiling a faster BLAS
 ~~~~~~~~~~~~~~~~~~~~~~~
-If you installed Python through WinPython or EPD, Theano will automatically 
+If you installed Python through WinPython or EPD, Theano will automatically
 link with the MKL library, so you should not need to compile your own BLAS.
 .. note::

--- a/doc/optimizations.txt
+++ b/doc/optimizations.txt
@@ -32,6 +32,7 @@ Optimization                                              FAST_RUN  FAST_COMPILE
 ========================================================= ========= ============ =============
 :term:`merge`                                             x         x
 :term:`constant folding<constant folding>`                x         x
+:term:`GPU transfer`                                      x         x
 :term:`shape promotion<shape promotion>`                  x
 :term:`fill cut<fill cut>`                                x
 :term:`inc_subtensor srlz.<inc_subtensor serialization>`  x
@@ -52,7 +53,6 @@ Optimization                                              FAST_RUN  FAST_COMPILE
 :term:`inplace_elemwise`                                  x
 :term:`inplace_random`                                    x
 :term:`elemwise fusion`                                   x
-:term:`GPU transfer`                                      x
 :term:`local_log_softmax`                                 x                      x
 :term:`local_remove_all_assert`                                                   
 ========================================================= ========= ============ =============

--- a/doc/tutorial/aliasing.txt
+++ b/doc/tutorial/aliasing.txt
@@ -261,52 +261,6 @@ combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
 hints that give more flexibility to the compilation and optimization of the
 graph.
-For GPU graphs, this borrowing can have a major speed impact.  See the following code:
-.. code-block:: python
-   from theano import function, config, shared, sandbox, tensor, Out
-   import numpy
-   import time
-   vlen = 10 * 30 * 768  # 10 x # cores x # threads per core
-   iters = 1000
-   rng = numpy.random.RandomState(22)
-   x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
-   f1 = function([], sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)))
-   f2 = function([],
-                 Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
-                     borrow=True))
-   t0 = time.time()
-   for i in range(iters):
-       r = f1()
-   t1 = time.time()
-   no_borrow = t1 - t0
-   t0 = time.time()
-   for i in range(iters):
-       r = f2()
-   t1 = time.time()
-   print(
-       "Looping %s times took %s seconds without borrow "
-       "and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
-   )
-   if numpy.any([isinstance(x.op, tensor.Elemwise) and
-                 ('Gpu' not in type(x.op).__name__)
-                 for x in f1.maker.fgraph.toposort()]):
-       print('Used the cpu')
-   else:
-       print('Used the gpu')
-Which produces this output:
-.. code-block:: none
-   $ THEANO_FLAGS=device=gpu0,floatX=float32 python test1.py
-   Using gpu device 0: GeForce GTX 275
-   Looping 1000 times took 0.368273973465 seconds without borrow and 0.0240728855133 seconds with borrow.
-   Used the gpu
 *Take home message:*
 When an input *x* to a function is not needed after the function
@@ -317,4 +271,3 @@ requirement.  When a return value *y* is large (in terms of memory
 footprint), and you only need to read from it once, right away when
 it's returned, then consider marking it with an ``Out(y,
 borrow=True)``.
--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt
--- a/doc/tutorial/using_gpu_solution_1.py
+++ b/doc/tutorial/using_gpu_solution_1.py
--- a/theano/misc/check_blas.py
+++ b/theano/misc/check_blas.py
@@ -86,15 +86,20 @@ def execute(execute=True, verbose=True, M=2000, N=2000, K=2000,
    t0 = 0
    t1 = -1
+    f() # Ignore first function call to get representative time.
    if execute:
        sync = (hasattr(theano, "sandbox") and
                hasattr(theano.sandbox, "cuda") and
                theano.sandbox.cuda.cuda_available)
+        sync2 = (hasattr(theano, "gpuarray") and
+                theano.gpuarray.pygpu_activated)
        t0 = time.time()
        for i in range(iters):
            f()
        if sync:
            theano.sandbox.cuda.synchronize()
+        if sync2:
+            c.get_value(borrow=True, return_internal_type=True).sync()
        t1 = time.time()
    return t1 - t0, impl