vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
iters = 1000
rng = numpy.random.RandomState(22)
rng = numpy.random.RandomState(22)
...
@@ -74,28 +74,31 @@ The program just computes the exp() of a bunch of random numbers.
...
@@ -74,28 +74,31 @@ The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to
Note that we use the `shared` function to
make sure that the input `x` are stored on the graphics device.
make sure that the input `x` are stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 3 seconds, whereas on the GPU it takes just over 0.2 seconds. Note that the results are close but not identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. Note that the results are close but not
identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
As a point of reference, a loop that calls ``numpy.exp(x.value)`` also takes about 7 seconds.
_logger.warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!")