Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
1192e645
提交
1192e645
authored
7月 31, 2011
作者:
James Bergstra
浏览文件
操作
浏览文件
下载
差异文件
merge - conflict in cifarSC2011/introduction.txt
上级
a60e4df2
489b14c4
显示空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
120 行增加
和
81 行删除
+120
-81
introduction.txt
doc/cifarSC2011/introduction.txt
+120
-81
没有找到文件。
doc/cifarSC2011/introduction.txt
浏览文件 @
1192e645
...
@@ -9,29 +9,29 @@ Introduction
...
@@ -9,29 +9,29 @@ Introduction
Background Questionaire
Background Questionaire
-----------------------
-----------------------
* Who has used Theano before?
* Who has used Theano before?
* What did you do with it?
* What did you do with it?
* Who has used Python? numpy? scipy? matplotlib?
* Who has used Python? numpy? scipy? matplotlib?
* Who has used iPython?
* Who has used iPython?
* Who has used it as a distributed computing engine?
* Who has used it as a distributed computing engine?
* Who has done C/C++ programming?
* Who has done C/C++ programming?
* Who has organized computation around a particular physical memory layout?
* Who has organized computation around a particular physical memory layout?
* Who has used a multidimensional array of >2 dimensions?
* Who has used a multidimensional array of >2 dimensions?
* Who has written a Python module in C before?
* Who has written a Python module in C before?
* Who has written a program to *generate* Python modules in C?
* Who has written a program to *generate* Python modules in C?
* Who has used a templating engine?
* Who has used a templating engine?
* Who has programmed a GPU before?
* Who has programmed a GPU before?
* Using OpenGL / shaders ?
* Using OpenGL / shaders ?
...
@@ -43,7 +43,7 @@ Background Questionaire
...
@@ -43,7 +43,7 @@ Background Questionaire
* Other?
* Other?
* Who has used Cython?
* Who has used Cython?
Python in one slide
Python in one slide
...
@@ -51,15 +51,15 @@ Python in one slide
...
@@ -51,15 +51,15 @@ Python in one slide
Features:
Features:
* General-purpose high-level OO interpreted language
* General-purpose high-level OO interpreted language
* Emphasizes code readability
* Emphasizes code readability
* Comprehensive standard library
* Comprehensive standard library
* Dynamic type and memory management
* Dynamic type and memory management
* builtin types: int, float, str, list, dict, tuple, object
* builtin types: int, float, str, list, dict, tuple, object
Syntax sample:
Syntax sample:
...
@@ -71,22 +71,22 @@ Syntax sample:
...
@@ -71,22 +71,22 @@ Syntax sample:
def foo(b, c=3): # function w default param c
def foo(b, c=3): # function w default param c
return a + b + c # note scoping, indentation
return a + b + c # note scoping, indentation
b_squared = [b_i**2 for b_i in b] # list comprehension
print b[1:3] # slicing syntax
* List comprehension: ``[i+3 for i in range(10)]``
Numpy in one slide
Numpy in one slide
------------------
------------------
* Python floats are full-fledged objects on the heap
* Python floats are full-fledged objects on the heap
* Not suitable for high-performance computing!
* Not suitable for high-performance computing!
* Numpy provides a N-dimensional numeric array in Python
* Numpy provides a N-dimensional numeric array in Python
* Perfect for high-performance computing.
* Perfect for high-performance computing.
* Numpy provides:
* Numpy provides
* elementwise computations
* elementwise computations
...
@@ -94,7 +94,7 @@ Numpy in one slide
...
@@ -94,7 +94,7 @@ Numpy in one slide
* pseudorandom numbers from many distributions
* pseudorandom numbers from many distributions
* Scipy provides lots more, including:
* Scipy provides lots more, including
* more linear algebra
* more linear algebra
...
@@ -161,11 +161,11 @@ Training an MNIST-ready classification neural network in pure numpy might look l
...
@@ -161,11 +161,11 @@ Training an MNIST-ready classification neural network in pure numpy might look l
What's missing?
What's missing?
---------------
---------------
* Non-lazy evaluation (required by Python) hurts performance
* Non-lazy evaluation (required by Python) hurts performance
* Numpy is bound to the CPU
* Numpy is bound to the CPU
* Numpy lacks symbolic or automatic differentiation
* Numpy lacks symbolic or automatic differentiation
Here's how the algorithm above looks in Theano, and it runs 15 times faster if
Here's how the algorithm above looks in Theano, and it runs 15 times faster if
you have GPU (I'm skipping some dtype-details which we'll come back to):
you have GPU (I'm skipping some dtype-details which we'll come back to):
...
@@ -210,43 +210,52 @@ you have GPU (I'm skipping some dtype-details which we'll come back to):
...
@@ -210,43 +210,52 @@ you have GPU (I'm skipping some dtype-details which we'll come back to):
Theano in one slide
Theano in one slide
-------------------
-------------------
* High-level domain-specific language tailored to numeric computation
* High-level domain-specific language tailored to numeric computation
* Compiles most common expressions to C for CPU and GPU.
* Compiles most common expressions to C for CPU and GPU.
* Limited expressivity means lots of opportunities for expression-level optimizations
* Limited expressivity means lots of opportunities for expression-level optimizations
* No function call -> global optimization
* No function call -> global optimization
* Strongly typed -> compiles to machine instructions
* Strongly typed -> compiles to machine instructions
* Array oriented -> parallelizable across cores
* Array oriented -> parallelizable across cores
* Expression substitution optimizations automatically draw
* Support for looping and branching in expressions
* Expression substitution optimizations automatically draw
on many backend technologies for best performance.
on many backend technologies for best performance.
* FFTW, MKL, ATLAS, Scipy, Cython, CUDA
* FFTW, MKL, ATLAS, Scipy, Cython, CUDA
* Slower fallbacks always available
* Slower fallbacks always available
* It used to have no/poor support for internal looping and conditional
* Automatic differentiation
expressions, but these are now quite usable.
* Automatic differentiation
Project status
Project status
--------------
--------------
* Mature: theano has been developed and used since January 2008 (3.5 yrs old)
* Mature: theano has been developed and used since January 2008 (3.5 yrs old)
* Driven over 40 research papers in the last few years
* Core technology for a funded Silicon-Valley startup
* Driven over 40 research papers in the last few years
* Good user documentation
* Active mailing list with participants from outside our lab
* Core technology for a funded Silicon-Valley startup
* Many contributors (some from outside our lab)
* Used to teach IFT6266 for two years
* Good user documentation
* Used for research at Google and Yahoo.
* Unofficial RPMs for Mandriva
* Active mailing list with participants from outside our lab
* Downloads (on June 8 2011, since last January): Pypi 780, MLOSS: 483, Assembla (`bleeding edge` repository): unknown
* Many contributors (some from outside our lab)
* Used to teach IFT6266 for two years
* Used for research at Google and Yahoo.
* Unofficial RPMs for Mandriva
* Downloads (on June 8 2011, since last January): Pypi 780, MLOSS: 483, Assembla (`bleeding edge` repository): unknown
Why scripting for GPUs ?
Why scripting for GPUs ?
...
@@ -254,18 +263,23 @@ Why scripting for GPUs ?
...
@@ -254,18 +263,23 @@ Why scripting for GPUs ?
They *Complement each other*:
They *Complement each other*:
- GPUs are everything that scripting/high level languages are not
* GPUs are everything that scripting/high level languages are not
* Highly parallel
* Very architecture-sensitive
* Built for maximum FP/memory throughput
* So hard to program that meta-programming is easier.
- Highly parallel
* CPU: largely restricted to control
- Very architecture-sensitive
- Built for maximum FP/memory throughput
- So hard to program that meta-programming is easier.
- CPU: largely restricted to control
* Optimized for sequential code and low latency (rather than high throughput)
- Optimized for sequential code and low latency (rather than high throughput
)
* Tasks (1000/sec
)
- Tasks (1000/sec)
-
Scripting fast enough
*
Scripting fast enough
Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
...
@@ -273,28 +287,41 @@ Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
...
@@ -273,28 +287,41 @@ Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
How Fast are GPUs?
How Fast are GPUs?
------------------
------------------
- Theory:
* Theory
* Intel Core i7 980 XE (107Gf/s float64) 6 cores
* NVIDIA C2050 (515 Gf/s float64, 1Tf/s float32) 480 cores
* NVIDIA GTX580 (1.5Tf/s float32) 512 cores
* GPUs are faster, cheaper, more power-efficient
- Intel Core i7 980 XE (107Gf/s float64) 6 cores
* Practice (our experience)
- NVIDIA C2050 (515 Gf/s float64, 1Tf/s float32) 480 cores
- NVIDIA GTX580 (1.5Tf/s float32) 512 cores
- GPUs are faster, cheaper, more power-efficient
- Practice (with Theano):
* Depends on algorithm and implementation!
- Depends on algorithm and implementation!
- Reported speed improvements over CPU in lit. vary *widely* (.01x to 1000x)
- Matrix-matrix multiply speedup: usually about 10-20x.
- Convolution speedup: usually about 15x.
- Elemwise speedup: slower or up to 100x (depending on operation and layout)
- Sum: can be faster or slower depending on layout.
- Benchmarking is delicate work...
* Reported speed improvements over CPU in lit. vary *widely* (.01x to 1000x)
- How to control quality of implementation?
- How much time was spent optimizing CPU vs GPU code?
- Theano goes up to 100x faster on GPU because it uses only one CPU core
- Theano can be linked with multi-core capable BLAS (GEMM and GEMV)
- If you see speedup > 100x, the benchmark is probably not fair.
* Matrix-matrix multiply speedup: usually about 10-20x.
* Convolution speedup: usually about 15x.
* Elemwise speedup: slower or up to 100x (depending on operation and layout)
* Sum: can be faster or slower depending on layout.
* Benchmarking is delicate work...
* How to control quality of implementation?
* How much time was spent optimizing CPU vs GPU code?
* Theano goes up to 100x faster on GPU because it uses only one CPU core
* Theano can be linked with multi-core capable BLAS (GEMM and GEMV)
* If you see speedup > 100x, the benchmark is probably not fair.
Software for Directly Programming a GPU
Software for Directly Programming a GPU
...
@@ -302,15 +329,27 @@ Software for Directly Programming a GPU
...
@@ -302,15 +329,27 @@ Software for Directly Programming a GPU
Theano is a meta-programmer, doesn't really count.
Theano is a meta-programmer, doesn't really count.
- CUDA: C extension by NVIDIA
* CUDA: C extension by NVIDIA
- Vendor-specific
- Numeric libraries (BLAS, RNG, FFT) maturing.
* Vendor-specific
- OpenCL: multi-vendor version of CUDA
- More general, standardized
* Numeric libraries (BLAS, RNG, FFT) maturing.
- Fewer libraries, less adoption.
- PyCUDA: python bindings to CUDA driver interface
* OpenCL: multi-vendor version of CUDA
- Python interface to CUDA
- Memory management of GPU objects
* More general, standardized
- Compilation of code for the low-level driver
- Makes it easy to do GPU meta-programming from within Python
* Fewer libraries, less adoption.
- PyOpenCL: PyCUDA for PyOpenCL
* PyCUDA: python bindings to CUDA driver interface
* Python interface to CUDA
* Memory management of GPU objects
* Compilation of code for the low-level driver
* Makes it easy to do GPU meta-programming from within Python
* PyOpenCL: PyCUDA for PyOpenCL
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论