• notoraptor's avatar
    New update! I have integrated the required changes and · 24f96fa8
    notoraptor 提交于
    re-run the tests. All tests are passed as before, and now
    some tests are faster. The biggest gain on my computer is
    for theano/tensor/nnet/tests/test_corr3d.py, which goes from
    687 seconds before to 259 seconds now. For other tests, it's
    between 3 and 20 seconds.
    
    Now there is not copy nor memory allocation
    (apart from NumPy wrapping structures) when BETA == 0.
    
    I rewrote the OP(matrix) function so that it does not return
    new allocated data anymore. Instead it just creates a
    PyArrayObject wrapper around the matrix pointer with the right
    format: F-contiguous (nrow * ncol) by default, or
    C-contiguous (ncol * nrow) if matrix need to be transposed.
    
    I also rewrote the matrix sum function so that it requires
    scalars to multiply each passed matrix before addition.
    Now the function do: B = alpha*A + beta*B
    with alpha and beta as the scalars (both set to 1 if we just want
    B = A + B). Thus, there is now only one iteration over A and B,
    in which A and B are each read once, and B modified once.
    24f96fa8
名称
最后提交
最后更新
..
nnet 正在载入提交数据...
signal 正在载入提交数据...
tests 正在载入提交数据...
__init__.py 正在载入提交数据...
alt_gemm_common.c 正在载入提交数据...
alt_gemm_template.c 正在载入提交数据...
basic.py 正在载入提交数据...
blas.py 正在载入提交数据...
blas_c.py 正在载入提交数据...
blas_headers.py 正在载入提交数据...
blas_scipy.py 正在载入提交数据...
elemwise.py 正在载入提交数据...
elemwise_cgen.py 正在载入提交数据...
extra_ops.py 正在载入提交数据...
fft.py 正在载入提交数据...
fourier.py 正在载入提交数据...
inplace.py 正在载入提交数据...
io.py 正在载入提交数据...
nlinalg.py 正在载入提交数据...
opt.py 正在载入提交数据...
opt_uncanonicalize.py 正在载入提交数据...
raw_random.py 正在载入提交数据...
shared_randomstreams.py 正在载入提交数据...
sharedvar.py 正在载入提交数据...
slinalg.py 正在载入提交数据...
sort.py 正在载入提交数据...
subtensor.py 正在载入提交数据...
type.py 正在载入提交数据...
type_other.py 正在载入提交数据...
utils.py 正在载入提交数据...
var.py 正在载入提交数据...
xlogx.py 正在载入提交数据...