提交 7826b448 authored 作者: Iban Harlouchet's avatar Iban Harlouchet 提交者: Arnaud Bergeron

testcode for doc/tutorial/using_gpu.txt

上级 ce3ca941
...@@ -36,7 +36,7 @@ file and run it. ...@@ -36,7 +36,7 @@ file and run it.
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_1 .. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_1
.. code-block:: python .. testcode::
from theano import function, config, shared, sandbox from theano import function, config, shared, sandbox
import theano.tensor as T import theano.tensor as T
...@@ -49,17 +49,17 @@ file and run it. ...@@ -49,17 +49,17 @@ file and run it.
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x)) f = function([], T.exp(x))
print f.maker.fgraph.toposort() print(f.maker.fgraph.toposort())
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
t1 = time.time() t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds' print("Looping %d times took" % iters, t1 - t0, "seconds")
print 'Result is', r print("Result is", r)
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print 'Used the cpu' print('Used the cpu')
else: else:
print 'Used the gpu' print('Used the gpu')
The program just computes the ``exp()`` of a bunch of random numbers. The program just computes the ``exp()`` of a bunch of random numbers.
Note that we use the ``shared`` function to Note that we use the ``shared`` function to
...@@ -71,7 +71,24 @@ If I run this program (in check1.py) with ``device=cpu``, my computer takes a li ...@@ -71,7 +71,24 @@ If I run this program (in check1.py) with ``device=cpu``, my computer takes a li
whereas on the GPU it takes just over 0.64 seconds. The GPU will not always produce the exact whereas on the GPU it takes just over 0.64 seconds. The GPU will not always produce the exact
same floating-point numbers as the CPU. As a benchmark, a loop that calls ``numpy.exp(x.get_value())`` takes about 46 seconds. same floating-point numbers as the CPU. As a benchmark, a loop that calls ``numpy.exp(x.get_value())`` takes about 46 seconds.
.. code-block:: text .. testoutput::
:hide:
:options: +ELLIPSIS
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took ... seconds
Result is ...
Used the cpu
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check1.py
Using gpu device 0: GeForce GTX 580
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took ... seconds
Result is ...
Used the gpu
.. code-block:: none
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py $ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
...@@ -105,7 +122,7 @@ after the ``T.exp(x)`` is replaced by a GPU version of ``exp()``. ...@@ -105,7 +122,7 @@ after the ``T.exp(x)`` is replaced by a GPU version of ``exp()``.
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_2 .. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_2
.. code-block:: python .. testcode::
from theano import function, config, shared, sandbox from theano import function, config, shared, sandbox
import theano.sandbox.cuda.basic_ops import theano.sandbox.cuda.basic_ops
...@@ -119,22 +136,34 @@ after the ``T.exp(x)`` is replaced by a GPU version of ``exp()``. ...@@ -119,22 +136,34 @@ after the ``T.exp(x)`` is replaced by a GPU version of ``exp()``.
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x))) f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))
print f.maker.fgraph.toposort() print(f.maker.fgraph.toposort())
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
t1 = time.time() t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds' print("Looping %d times took" % iters, t1 - t0, "seconds")
print 'Result is', r print("Result is", r)
print 'Numpy result is', numpy.asarray(r) print("Numpy result is", numpy.asarray(r))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print 'Used the cpu' print('Used the cpu')
else: else:
print 'Used the gpu' print('Used the gpu')
The output from this program is The output from this program is
.. code-block:: text .. testoutput::
:hide:
:options: +ELLIPSIS
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check2.py
Using gpu device 0: GeForce GTX 580
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>)]
Looping 1000 times took ... seconds
Result is <CudaNdarray object at 0x6a7a5f0>
Numpy result is ...
Used the gpu
.. code-block:: none
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check2.py $ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check2.py
Using gpu device 0: GeForce GTX 580 Using gpu device 0: GeForce GTX 580
...@@ -253,7 +282,7 @@ Exercise ...@@ -253,7 +282,7 @@ Exercise
Consider again the logistic regression: Consider again the logistic regression:
.. code-block:: python .. testcode::
import numpy import numpy
import theano import theano
...@@ -294,25 +323,74 @@ Consider again the logistic regression: ...@@ -294,25 +323,74 @@ Consider again the logistic regression:
if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]): train.maker.fgraph.toposort()]):
print 'Used the cpu' print('Used the cpu')
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]): train.maker.fgraph.toposort()]):
print 'Used the gpu' print('Used the gpu')
else: else:
print 'ERROR, not able to tell if theano used the cpu or the gpu' print('ERROR, not able to tell if theano used the cpu or the gpu')
print train.maker.fgraph.toposort() print(train.maker.fgraph.toposort())
for i in range(training_steps): for i in range(training_steps):
pred, err = train(D[0], D[1]) pred, err = train(D[0], D[1])
#print "Final model:" #print "Final model:"
#print w.get_value(), b.get_value() #print w.get_value(), b.get_value()
print "target values for D" print("target values for D")
print D[1] print(D[1])
print "prediction on D" print("prediction on D")
print predict(D[0]) print(predict(D[0]))
.. testoutput::
:hide:
:options: + ELLIPSIS
Used the cpu
target values for D
...
prediction on D
...
.. code-block:: none
Used the cpu
target values for D
[ 0. 1. 0. 0. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0.
0. 0. 0. 0. 1. 1. 0. 1. 1. 0. 0. 1. 1. 1. 1. 0. 1. 1.
0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 1. 0. 0. 0. 0. 1. 0.
0. 1. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 1. 0. 0.
1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 0. 0. 0. 1. 0.
0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0.
0. 1. 0. 1. 0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 1. 1. 0. 1.
1. 1. 0. 1. 0. 1. 1. 0. 1. 1. 1. 0. 0. 0. 0. 0. 1. 0.
0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0.
0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 1. 1. 0.
1. 0. 1. 1. 1. 0. 0. 0. 1. 0. 1. 1. 0. 1. 1. 0. 1. 1.
1. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 1. 1. 0. 0. 0.
1. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 1. 0. 0. 1.
0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 1. 0. 1. 0. 0. 1. 0.
1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 1. 1.
0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 1. 1. 0. 1. 1. 1.
0. 1. 0. 1. 0. 0. 0. 1. 1. 1. 0. 1. 0. 1. 1. 1. 0. 0.
0. 1. 0. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 1. 0. 0. 0.
1. 1. 0. 1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 0. 0. 0. 0. 1.
1. 1. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 1. 0. 0. 1. 0. 1.
0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 0.
0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 1.
1. 1. 0. 1.]
prediction on D
[0 1 0 0 1 1 1 0 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 1 1 1 1 0 1 1 0
0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 1 1 0 1 0 0 1 1
0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 0
1 0 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0
1 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 1 1 1
0 0 0 1 0 1 1 0 1 1 0 1 1 1 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 1
1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1 0
0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 1 1 0 1 1 1 0 1 0 1 0 0 0 1
1 1 0 1 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 0 0 1 1 0 1 0 1 1 1 1
1 0 1 1 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 1 1
1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 0 1]
Modify and execute this example to run on GPU with ``floatX=float32`` and Modify and execute this example to run on GPU with ``floatX=float32`` and
...@@ -373,7 +451,7 @@ Testing Theano with GPU ...@@ -373,7 +451,7 @@ Testing Theano with GPU
To see if your GPU is being used, cut and paste the following program To see if your GPU is being used, cut and paste the following program
into a file and run it. into a file and run it.
.. code-block:: python .. testcode::
from theano import function, config, shared, tensor, sandbox from theano import function, config, shared, tensor, sandbox
import numpy import numpy
...@@ -383,27 +461,44 @@ into a file and run it. ...@@ -383,27 +461,44 @@ into a file and run it.
iters = 1000 iters = 1000
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarrayx = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x)) f = function([], tensor.exp(x))
print f.maker.fgraph.toposort() print(f.maker.fgraph.toposort())
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
t1 = time.time() t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds' print("Looping %d times took" % iters, t1 - t0, "seconds")
print 'Result is', r print("Result is", r)
if numpy.any([isinstance(x.op, tensor.Elemwise) and if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__) ('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]): for x in f.maker.fgraph.toposort()]):
print 'Used the cpu' print('Used the cpu')
else: else:
print 'Used the gpu' print('Used the gpu')
The program just compute ``exp()`` of a bunch of random numbers. Note The program just compute ``exp()`` of a bunch of random numbers. Note
that we use the :func:`theano.shared` function to make sure that the that we use the :func:`theano.shared` function to make sure that the
input *x* is stored on the GPU. input *x* is stored on the GPU.
.. code-block:: text .. testoutput::
:hide:
:options: +ELLIPSIS
$ THEANO_FLAGS=device=cpu python check1.py
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took ... seconds
Result is ...
Used the cpu
$ THEANO_FLAGS=device=cuda0 python check1.py
Using device cuda0: GeForce GTX 275
[GpuElemwise{exp,no_inplace}(<GpuArray<float64>>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took ... seconds
Result is ...
Used the gpu
.. code-block:: none
$ THEANO_FLAGS=device=cpu python check1.py $ THEANO_FLAGS=device=cpu python check1.py
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)] [Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
...@@ -432,7 +527,7 @@ the value of the ``device`` flag without touching the code. ...@@ -432,7 +527,7 @@ the value of the ``device`` flag without touching the code.
If you don't mind a loss of flexibility, you can ask theano to return If you don't mind a loss of flexibility, you can ask theano to return
the GPU object directly. The following code is modifed to do just that. the GPU object directly. The following code is modifed to do just that.
.. code-block:: python .. testcode::
:emphasize-lines: 10,17 :emphasize-lines: 10,17
from theano import function, config, shared, tensor, sandbox from theano import function, config, shared, tensor, sandbox
...@@ -445,19 +540,19 @@ the GPU object directly. The following code is modifed to do just that. ...@@ -445,19 +540,19 @@ the GPU object directly. The following code is modifed to do just that.
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], sandbox.gpuarray.basic_ops.gpu_from_host(tensor.exp(x))) f = function([], sandbox.gpuarray.basic_ops.gpu_from_host(tensor.exp(x)))
print f.maker.fgraph.toposort() print(f.maker.fgraph.toposort())
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
t1 = time.time() t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds' print("Looping %d times took" % iters, t1 - t0, "seconds")
print 'Result is', numpy.asarray(r) print("Result is", numpy.asarray(r))
if numpy.any([isinstance(x.op, tensor.Elemwise) and if numpy.any([isinstance(x.op, tensor.Elemwise) and
('Gpu' not in type(x.op).__name__) ('Gpu' not in type(x.op).__name__)
for x in f.maker.fgraph.toposort()]): for x in f.maker.fgraph.toposort()]):
print 'Used the cpu' print('Used the cpu')
else: else:
print 'Used the gpu' print('Used the gpu')
Here the :func:`theano.sandbox.gpuarray.basic.gpu_from_host` call Here the :func:`theano.sandbox.gpuarray.basic.gpu_from_host` call
means "copy input to the GPU". However during the optimization phase, means "copy input to the GPU". However during the optimization phase,
...@@ -466,7 +561,18 @@ used here to tell theano that we want the result on the GPU. ...@@ -466,7 +561,18 @@ used here to tell theano that we want the result on the GPU.
The output is The output is
.. code-block:: text .. testoutput::
:hide:
:options: +ELLIPSIS
$ THEANO_FLAGS=device=cuda0 python check2.py
Using device cuda0: GeForce GTX 275
[GpuElemwise{exp,no_inplace}(<GpuArray<float64>>)]
Looping 1000 times took ... seconds
Result is ...
Used the gpu
.. code-block:: none
$ THEANO_FLAGS=device=cuda0 python check2.py $ THEANO_FLAGS=device=cuda0 python check2.py
Using device cuda0: GeForce GTX 275 Using device cuda0: GeForce GTX 275
...@@ -636,7 +742,7 @@ you feel competent enough, you may try yourself on the corresponding exercises. ...@@ -636,7 +742,7 @@ you feel competent enough, you may try yourself on the corresponding exercises.
**Example: PyCUDA** **Example: PyCUDA**
.. code-block:: python .. testcode::
# (from PyCUDA's documentation) # (from PyCUDA's documentation)
import pycuda.autoinit import pycuda.autoinit
...@@ -663,7 +769,7 @@ you feel competent enough, you may try yourself on the corresponding exercises. ...@@ -663,7 +769,7 @@ you feel competent enough, you may try yourself on the corresponding exercises.
block=(400,1,1), grid=(1,1)) block=(400,1,1), grid=(1,1))
assert numpy.allclose(dest, a*b) assert numpy.allclose(dest, a*b)
print dest print(dest)
Exercise Exercise
...@@ -680,7 +786,7 @@ Modify and execute to work for a matrix of shape (20, 10). ...@@ -680,7 +786,7 @@ Modify and execute to work for a matrix of shape (20, 10).
**Example: Theano + PyCUDA** **Example: Theano + PyCUDA**
.. code-block:: python .. testcode::
import numpy, theano import numpy, theano
import theano.misc.pycuda_init import theano.misc.pycuda_init
...@@ -725,7 +831,7 @@ Use this code to test it: ...@@ -725,7 +831,7 @@ Use this code to test it:
>>> f = theano.function([x], PyCUDADoubleOp()(x)) >>> f = theano.function([x], PyCUDADoubleOp()(x))
>>> xv = numpy.ones((4, 5), dtype="float32") >>> xv = numpy.ones((4, 5), dtype="float32")
>>> assert numpy.allclose(f(xv), xv*2) >>> assert numpy.allclose(f(xv), xv*2)
>>> print numpy.asarray(f(xv)) >>> print(numpy.asarray(f(xv)))
Exercise Exercise
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论