提交 a92d2350 authored 作者: Frederic Bastien's avatar Frederic Bastien

Speed up gemv by a work around scipy gemv slowness when the matrix is in c order(the default)

上级 9f06dc38
...@@ -2,7 +2,7 @@ Trunk sin last release ...@@ -2,7 +2,7 @@ Trunk sin last release
------ ------
* Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them. * Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
* fuse GpuElemwise more often(in the case where their is too many inputs that fusing all of them would bust the 256 bytes limits of parameter to gpu function) * fuse GpuElemwise more often(in the case where their is too many inputs that fusing all of them would bust the 256 bytes limits of parameter to gpu function)
* Speed up gemv by a work around scipy gemv slowness when the matrix is in c order(the default)
Theano 0.3 (2010-11-23) Theano 0.3 (2010-11-23)
----------------------- -----------------------
......
...@@ -85,10 +85,15 @@ class Gemv(Op): ...@@ -85,10 +85,15 @@ class Gemv(Op):
def perform(self, node, inputs, out_storage): def perform(self, node, inputs, out_storage):
y, alpha, A, x, beta = inputs y, alpha, A, x, beta = inputs
if _have_fblas: if _have_fblas:
if not self.inplace:
y = y.copy()
gemv = _blas_gemv_fns[y.dtype] gemv = _blas_gemv_fns[y.dtype]
out_storage[0][0] = gemv(alpha, A, x, beta, y, overwrite_y=self.inplace)
#Here I suppose that A is in c order. If we don't make it explicitly
# as fortran order, scipy 0.7.2 seam to create a copy in fortran
# order instead of just reshaping it and using the trans flag.
#If A is already in fortran order, make it in c order and using the
# trans flag don't seam to cause slowdown.
#out_storage[0][0] = gemv(alpha, A, x, beta, y, overwrite_y=self.inplace)
out_storage[0][0] = gemv(alpha, A.T, x, beta, y, overwrite_y=self.inplace, trans=True)
else: else:
out_storage[0][0] = numpy.asarray( out_storage[0][0] = numpy.asarray(
beta * y + alpha * numpy.dot(A, x) beta * y + alpha * numpy.dot(A, x)
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论