Merge pull request #3856 from abergeron/transfer_noints

Don't transfer int inputs to the GPU by default

Merge pull request #3856 from abergeron/transfer_noints
42dd8a8f · Frédéric Bastien · 2d4f6d79 · 354c4a9f · 42dd8a8f · 42dd8a8f
--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt
@@ -538,6 +538,10 @@ int, ...) however GPU support varies and some units can't deal with
 double (float64) or small (less than 32 bits like int16) data types.
 You will get an error at compile time or runtime if this is the case.

+By default all inputs will get transferred to GPU.  You can prevent an
+input from getting transferred by setting its tag.target attribute to
+'cpu'.
+
 Complex support is untested and most likely completely broken.

 In general, large operations like matrix multiplication, or
@@ -553,19 +557,12 @@ means that they are only scheduled to run and the function returns.
 This is made somewhat transparently by the underlying libgpuarray.

 A forced synchronization point is introduced when doing memory
-transfers between device and host. Another is introduced when
-releasing active memory buffers on the GPU (active buffers are buffers
-that are still in use by a kernel).
+transfers between device and host.

 It is possible to force synchronization for a particular GpuArray by
 calling its ``sync()`` method.  This is useful to get accurate timings
 when doing benchmarks.

-The forced synchronization points interact with the garbage collection
-of the intermediate results.  To get the fastest speed possible, you
-should disable the garbage collector by using the theano flag
-``allow_gc=False``.  Be aware that this will increase memory usage
-sometimes significantly.


 -------------------------------------------

--- a/theano/sandbox/gpuarray/opt.py
+++ b/theano/sandbox/gpuarray/opt.py
@@ -159,7 +159,6 @@ class InputToGpuOptimizer(Optimizer):
    Transfer the input to the gpu to start the rolling wave.

    """
-
    def add_requirements(self, fgraph):
        fgraph.attach_feature(toolbox.ReplaceValidate())

@@ -173,16 +172,19 @@ class InputToGpuOptimizer(Optimizer):
                    for cl in input.clients)):
                continue

-            ctx_name = getattr(input.tag, 'context_name', None)
+            target = getattr(input.tag, 'target', None)
+            if target == 'cpu':
+                continue
+
            try:
-                new_input = host_from_gpu(GpuFromHost(ctx_name)(input))
+                new_input = host_from_gpu(GpuFromHost(target)(input))
                fgraph.replace_validate(input, new_input,
                                        "InputToGpuOptimizer")
            except TypeError:
                # This could fail if the inputs are not TensorTypes
                pass
            except ContextNotDefined:
-                if hasattr(input.tag, 'context_name'):
+                if hasattr(input.tag, 'target'):
                    raise
                # If there is no context tag and no default context
                # then it stays on the CPU