Merge pull request #5231 from chinnadhurai/ccw_5186

Clarify GPU memory pre-allocation in new and old backend

Merge pull request #5231 from chinnadhurai/ccw_5186
cdfbcbfa · Simon Lefrancois · GitHub · 443a7ae4 · 0cc7b8ed · cdfbcbfa
--- a/doc/faq.txt
+++ b/doc/faq.txt
@@ -162,6 +162,12 @@ but requires that all nodes in the graph have a C implementation:
    f = function([x], (x + 1.) * 2, mode=theano.Mode(linker='c'))
    f(10.)

+New GPU backend using libgpuarray
+---------------------------------
+
+The new theano GPU backend (:ref:`gpuarray`) uses ``config.gpuarray.preallocate`` for GPU memory allocation. 
+
+Likewise, the old back-end uses ``config.lib.cnmem`` for GPU memory allocation. 

 Related Projects
 ----------------

--- a/doc/library/config.txt
+++ b/doc/library/config.txt
@@ -416,32 +416,36 @@ import theano and print the config variable, as in:
    `amdlibm <http://developer.amd.com/cpu/libraries/libm/>`__
    library, which is faster than the standard libm.

-.. attribute:: config.lib.cnmem
+.. attribute:: config.gpuarray.preallocate

-    Float value: >= 0
+    Float value

-    Controls the use of `CNMeM <https://github.com/NVIDIA/cnmem>`_ (a
-    faster CUDA memory allocator). In Theano dev version until 0.8
-    is released.
+    Default: 0 (Preallocation of size 0, only cache the allocation)

-    The CNMeM library is included in Theano and does not need to be
-    separately installed.
+    Controls the preallocation of memory with the gpuarray backend.

-    The value represents the start size (either in MB or the fraction of total GPU
-    memory) of the memory pool. If more memory is needed, Theano will
-    try to obtain more, but this can cause memory fragmentation.
+    The value represents the start size (either in MB or the fraction
+    of total GPU memory) of the memory pool. If more memory is needed,
+    Theano will try to obtain more, but this can cause memory
+    fragmentation.

-        * 0: not enabled.
-        * 0 < N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
+    A negative value will completely disable the allocation cache.
+    This can have a severe impact on performance and so should not be
+    done outside of debugging.
+
+        * < 0: disabled
+        * 0 <= N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
        * > 1: use this number in megabytes (MB) of memory.

+    .. note::

-    Default: 0 (but should change later)
+        This value allocates GPU memory ONLY when using (:ref:`gpuarray`).
+        For the old backend, please see ``config.lib.cnmem``

    .. note::

-        This could cause memory fragmentation. So if you have a
-        memory error while using CNMeM, try to allocate more memory at
+        This could cause memory fragmentation. So if you have a memory
+        error while using the cache, try to allocate more memory at
        the start or disable it. If you try this, report your result
        on :ref`theano-dev`.

@@ -452,31 +456,38 @@ import theano and print the config variable, as in:
        automatically to get more memory. But this can cause
        fragmentation, see note above.

-.. attribute:: config.gpuarray.preallocate
+.. attribute:: config.lib.cnmem

-    Float value
+    .. note::

-    Default: 0
+        This value allocates GPU memory ONLY when using (:ref:`cuda`)
+        and has no effect when the GPU backend is (:ref:`gpuarray`).
+        For the new backend, please see ``config.gpuarray.preallocate``

-    Controls the preallocation of memory with the gpuarray backend.
+    Float value: >= 0

-    The value represents the start size (either in MB or the fraction
-    of total GPU memory) of the memory pool. If more memory is needed,
-    Theano will try to obtain more, but this can cause memory
-    fragmentation.
+    Controls the use of `CNMeM <https://github.com/NVIDIA/cnmem>`_ (a
+    faster CUDA memory allocator). Applies to the old GPU backend 
+    :ref:`cuda` up to Theano release 0.8.

-    A negative value will completely disable the allocation cache.
-    This can have a severe impact on performance and so should not be
-    done outside of debugging.
+    The CNMeM library is included in Theano and does not need to be
+    separately installed.

-        * < 0: disabled
-        * 0 <= N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
+    The value represents the start size (either in MB or the fraction of total GPU
+    memory) of the memory pool. If more memory is needed, Theano will
+    try to obtain more, but this can cause memory fragmentation.
+
+        * 0: not enabled.
+        * 0 < N <= 1: use this fraction of the total GPU memory (clipped to .95 for driver memory).
        * > 1: use this number in megabytes (MB) of memory.

+
+    Default: 0
+
    .. note::

-        This could cause memory fragmentation. So if you have a memory
-        error while using the cache, try to allocate more memory at
+        This could cause memory fragmentation. So if you have a
+        memory error while using CNMeM, try to allocate more memory at
        the start or disable it. If you try this, report your result
        on :ref`theano-dev`.


--- a/doc/troubleshooting.txt
+++ b/doc/troubleshooting.txt
@@ -144,11 +144,11 @@ Could speed up and lower memory usage:

 Could raise memory usage but speed up computation:

- :attr:`config.gpuarray.preallocate` =1  # Preallocates the GPU memory and 
-  then manages it in a smart way. Does not raise much the memory usage, but if
+- :attr:`config.gpuarray.preallocate` =1  # Preallocates the GPU memory for the new backend(:ref:`gpuarray`) 
+  and then manages it in a smart way. Does not raise much the memory usage, but if
  you are at the limit of GPU memory available you might need to specify a
  lower value. GPU only.
- :attr:`config.lib.cnmem` =1  # Equivalent on the old backend. GPU only.
+- :attr:`config.lib.cnmem` =1  # Equivalent on the old backend (:ref:`cuda`). GPU only.
 - :attr:`config.allow_gc` =False
 - :attr:`config.optimizer_excluding` =low_memory , GPU only for now.


--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt
@@ -64,6 +64,10 @@ While all types of devices are supported if using OpenCL, for the
 remainder of this section, whatever compute device you are using will
 be referred to as GPU.

+.. note::
+  GpuArray backend uses ``config.gpuarray.preallocate`` for GPU memory allocation.
+  For the old backend, please see ``config.lib.cnmem``
+
 .. warning::

  If you want to use the new GpuArray backend, make sure to have the
@@ -283,6 +287,9 @@ Tips for Improving Performance on GPU
  a value to `assert_no_cpu_op` flag, i.e. `warn`, for warning, `raise` for
  raising an error or `pdb` for putting a breakpoint in the computational
  graph if there is a CPU Op.
+* Please note that ``config.lib.cnmem`` and ``config.gpuarray.preallocate``
+  controls GPU memory allocation when using (:ref:`cuda`) and 
+  (:ref:`gpuarray`) as theano backends respectively.

  .. _gpu_async:

@@ -409,8 +416,8 @@ We provide installation instructions for :ref:`Linux <gpu_linux>`,
 The old CUDA backend can be activated using the flags ``device=gpu`` or
 ``device=gpu{0,1,...}``

-.. Note::
-
+.. note::
+   * CUDA backend uses ``config.lib.cnmem`` for GPU memory allocation. For the new backend(:ref:`gpuarray`), please see ``config.gpuarray.preallocate``
   * Only 32 bit floats are supported.
   * ``Shared`` variables with *float32* dtype are by default moved to the GPU memory space.