New update! I have integrated the required changes and
re-run the tests. All tests are passed as before, and now
some tests are faster. The biggest gain on my computer is
for theano/tensor/nnet/tests/test_corr3d.py, which goes from
687 seconds before to 259 seconds now. For other tests, it's
between 3 and 20 seconds.
Now there is not copy nor memory allocation
(apart from NumPy wrapping structures) when BETA == 0.
I rewrote the OP(matrix) function so that it does not return
new allocated data anymore. Instead it just creates a
PyArrayObject wrapper around the matrix pointer with the right
format: F-contiguous (nrow * ncol) by default, or
C-contiguous (ncol * nrow) if matrix need to be transposed.
I also rewrote the matrix sum function so that it requires
scalars to multiply each passed matrix before addition.
Now the function do: B = alpha*A + beta*B
with alpha and beta as the scalars (both set to 1 if we just want
B = A + B). Thus, there is now only one iteration over A and B,
in which A and B are each read once, and B modified once.
正在显示
请
注册
或者
登录
后发表评论