• notoraptor's avatar
    New update! I have integrated the required changes and · 24f96fa8
    notoraptor 提交于
    re-run the tests. All tests are passed as before, and now
    some tests are faster. The biggest gain on my computer is
    for theano/tensor/nnet/tests/test_corr3d.py, which goes from
    687 seconds before to 259 seconds now. For other tests, it's
    between 3 and 20 seconds.
    
    Now there is not copy nor memory allocation
    (apart from NumPy wrapping structures) when BETA == 0.
    
    I rewrote the OP(matrix) function so that it does not return
    new allocated data anymore. Instead it just creates a
    PyArrayObject wrapper around the matrix pointer with the right
    format: F-contiguous (nrow * ncol) by default, or
    C-contiguous (ncol * nrow) if matrix need to be transposed.
    
    I also rewrote the matrix sum function so that it requires
    scalars to multiply each passed matrix before addition.
    Now the function do: B = alpha*A + beta*B
    with alpha and beta as the scalars (both set to 1 if we just want
    B = A + B). Thus, there is now only one iteration over A and B,
    in which A and B are each read once, and B modified once.
    24f96fa8
alt_gemm_common.c 822 Bytes