提交 42dd8a8f authored 作者: Frédéric Bastien's avatar Frédéric Bastien

Merge pull request #3856 from abergeron/transfer_noints

Don't transfer int inputs to the GPU by default
......@@ -538,6 +538,10 @@ int, ...) however GPU support varies and some units can't deal with
double (float64) or small (less than 32 bits like int16) data types.
You will get an error at compile time or runtime if this is the case.
By default all inputs will get transferred to GPU. You can prevent an
input from getting transferred by setting its tag.target attribute to
'cpu'.
Complex support is untested and most likely completely broken.
In general, large operations like matrix multiplication, or
......@@ -553,19 +557,12 @@ means that they are only scheduled to run and the function returns.
This is made somewhat transparently by the underlying libgpuarray.
A forced synchronization point is introduced when doing memory
transfers between device and host. Another is introduced when
releasing active memory buffers on the GPU (active buffers are buffers
that are still in use by a kernel).
transfers between device and host.
It is possible to force synchronization for a particular GpuArray by
calling its ``sync()`` method. This is useful to get accurate timings
when doing benchmarks.
The forced synchronization points interact with the garbage collection
of the intermediate results. To get the fastest speed possible, you
should disable the garbage collector by using the theano flag
``allow_gc=False``. Be aware that this will increase memory usage
sometimes significantly.
-------------------------------------------
......
......@@ -159,7 +159,6 @@ class InputToGpuOptimizer(Optimizer):
Transfer the input to the gpu to start the rolling wave.
"""
def add_requirements(self, fgraph):
fgraph.attach_feature(toolbox.ReplaceValidate())
......@@ -173,16 +172,19 @@ class InputToGpuOptimizer(Optimizer):
for cl in input.clients)):
continue
ctx_name = getattr(input.tag, 'context_name', None)
target = getattr(input.tag, 'target', None)
if target == 'cpu':
continue
try:
new_input = host_from_gpu(GpuFromHost(ctx_name)(input))
new_input = host_from_gpu(GpuFromHost(target)(input))
fgraph.replace_validate(input, new_input,
"InputToGpuOptimizer")
except TypeError:
# This could fail if the inputs are not TensorTypes
pass
except ContextNotDefined:
if hasattr(input.tag, 'context_name'):
if hasattr(input.tag, 'target'):
raise
# If there is no context tag and no default context
# then it stays on the CPU
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论