-
由 notoraptor 提交于
re-run the tests. All tests are passed as before, and now some tests are faster. The biggest gain on my computer is for theano/tensor/nnet/tests/test_corr3d.py, which goes from 687 seconds before to 259 seconds now. For other tests, it's between 3 and 20 seconds. Now there is not copy nor memory allocation (apart from NumPy wrapping structures) when BETA == 0. I rewrote the OP(matrix) function so that it does not return new allocated data anymore. Instead it just creates a PyArrayObject wrapper around the matrix pointer with the right format: F-contiguous (nrow * ncol) by default, or C-contiguous (ncol * nrow) if matrix need to be transposed. I also rewrote the matrix sum function so that it requires scalars to multiply each passed matrix before addition. Now the function do: B = alpha*A + beta*B with alpha and beta as the scalars (both set to 1 if we just want B = A + B). Thus, there is now only one iteration over A and B, in which A and B are each read once, and B modified once.
24f96fa8