Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
954b07ab
提交
954b07ab
authored
5月 05, 2011
作者:
Ian Goodfellow
浏览文件
操作
浏览文件
下载
差异文件
merged
上级
e06a39ea
5d1217d7
隐藏空白字符变更
内嵌
并排
正在显示
2 个修改的文件
包含
27 行增加
和
15 行删除
+27
-15
check_blas.py
theano/misc/check_blas.py
+24
-14
cuda_ndarray.cu
theano/sandbox/cuda/cuda_ndarray.cu
+3
-1
没有找到文件。
theano/misc/check_blas.py
浏览文件 @
954b07ab
...
@@ -91,38 +91,48 @@ if __name__ == "__main__":
...
@@ -91,38 +91,48 @@ if __name__ == "__main__":
Cpu tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB), Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Cpu tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB), Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB), Xeon X5560(2.8Ghz, 12M L2 cache, 6.4GT/s QPI, hyper-threads enabled?)
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB), Xeon X5560(2.8Ghz, 12M L2 cache, 6.4GT/s QPI, hyper-threads enabled?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled), Core i7 950(3.07GHz, hyper-threads enabled)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled), Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Lib tested:
Lib tested:
* numpy with ATLAS from distribution(FC9) package (1 thread)
* numpy with ATLAS from distribution(FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads.
* goto 1.26 with 1, 2, 4 and 8 threads.
Xeon Xeon Xeon Core2 i7 i7 Xeon
* goto2 1.13 compiled with multiple thread enabled.
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550
numpy 1.3.0 blas 775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s
9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/2
9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/4
4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
goto/8
2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/1 14.04s
openblas/2 7.16s
openblas/2 7.16s
openblas/4 3.71s
openblas/4 3.71s
openblas/8 3.70s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
mkl 10.2.2.025/8 2.0s
mkl 11.0.083/1 7.97s
goto2 1.13/1 14.37s
goto2 1.13/2 7.26s
goto2 1.13/4 3.70s
goto2 1.13/8 1.94s
goto2 1.13/16 3.16s
Test time in float32 with cuda 3.0.14
Test time in float32 with cuda 3.0.14
(cuda version 3.2RC and up are supposed to have faster gemm on the GTX4?? card)
(cuda version 3.2RC and up are supposed to have faster gemm on the GTX4?? card)
c
pu/cuda version
g
pu/cuda version
GTX580/3.2 0.20s
GTX580/3.2 0.20s
GTX480/3.2 0.24s
GTX480/3.2 0.24s
GTX480/3.0 0.27s
GTX480/3.0 0.27s
GTX470/3.2 0.29s
GTX470/3.2 0.29s
M2070/3.2 0.32s
GTX470/3.0 0.34s
GTX470/3.0 0.34s
GTX285/3.0 0.40s
GTX285/3.0 0.40s
GT220/3.2RC 5.15s
GT220/3.2RC 5.15s
...
...
theano/sandbox/cuda/cuda_ndarray.cu
浏览文件 @
954b07ab
...
@@ -636,7 +636,9 @@ PyObject * CudaNdarray_Reshape(CudaNdarray * self, PyObject * shape)
...
@@ -636,7 +636,9 @@ PyObject * CudaNdarray_Reshape(CudaNdarray * self, PyObject * shape)
}
}
if (rval_size==0)
if (rval_size==0)
{
{
return CudaNdarray_NewDims(rval_nd, rval_dims);
PyObject * rval = CudaNdarray_NewDims(rval_nd, rval_dims);
free(rval_dims);
return rval;
}
}
if(CudaNdarray_is_c_contiguous(self))
if(CudaNdarray_is_c_contiguous(self))
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论