Added gemm timming and small change to the printing to make it more readable.

上级 ed10bc14
......@@ -91,15 +91,19 @@ if __name__ == "__main__":
Cpu tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB), Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB), Xeon X5560(2.8Ghz, 12M L2 cache, 6.4GT/s QPI, hyper-threads enabled?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled), Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Lib tested:
* numpy with ATLAS from distribution(FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads.
Xeon Xeon Xeon Core2 i7 i7 Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560
* goto2 1.13 compiled with multiple thread enabled.
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560 X5550
numpy 1.3.0 blas 775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
......@@ -110,19 +114,25 @@ if __name__ == "__main__":
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
mkl 11.0.083/1 7.97s
goto2 1.13/1 14.37s
goto2 1.13/2 7.26s
goto2 1.13/4 3.70s
goto2 1.13/8 1.94s
goto2 1.13/16 3.16s
Test time in float32 with cuda 3.0.14
(cuda version 3.2RC and up are supposed to have faster gemm on the GTX4?? card)
cpu/cuda version
gpu/cuda version
GTX580/3.2 0.20s
GTX480/3.2 0.24s
GTX480/3.0 0.27s
GTX470/3.2 0.29s
M2070/3.2 0.32s
GTX470/3.0 0.34s
GTX285/3.0 0.40s
GT220/3.2RC 5.15s
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论