Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
54bc197e
提交
54bc197e
authored
7月 29, 2011
作者:
James Bergstra
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
revising cifar10SC intro
上级
add20871
显示空白字符变更
内嵌
并排
正在显示
2 个修改的文件
包含
212 行增加
和
106 行删除
+212
-106
boot_camp_overview.txt
doc/cifarSC2011/boot_camp_overview.txt
+25
-18
introduction.txt
doc/cifarSC2011/introduction.txt
+187
-88
没有找到文件。
doc/cifarSC2011/boot_camp_overview.txt
浏览文件 @
54bc197e
...
@@ -15,19 +15,18 @@ Day 1
...
@@ -15,19 +15,18 @@ Day 1
* Show of hands - what is your background?
* Show of hands - what is your background?
*
Overview/Motivation
*
Python & Numpy in a nutshell
*
python/numpy crash course
*
Theano basics
*
Theano beggining
*
Quick tour through Deep Learning Tutorials (think about projects)
* Example with recent ML models (DLT)
.. :
day 1:
day 1:
I think that I could cover those 2 pages:
I think that I could cover those 2 pages:
* http://deeplearning.net/software/theano/hpcs2011_tutorial/introduction.html
* http://deeplearning.net/software/theano/hpcs2011_tutorial/introduction.html
* http://deeplearning.net/software/theano/hpcs2011_tutorial/theano.html
* http://deeplearning.net/software/theano/hpcs2011_tutorial/theano.html
That include:
That include:
simple example
simple example
linear regression example with shared var
linear regression example with shared var
theano flags
theano flags
...
@@ -39,27 +38,35 @@ That include:
...
@@ -39,27 +38,35 @@ That include:
Day 2
Day 2
-----
-----
* Day 2:
* Loop/Condition in Theano (10-20m)
* Loop/Condition in Theano (10-20m)
* Propose/discuss projects
* Propose/discuss projects
* For groups and start projects!
* Form groups and start projects!
Day 3
Day 3
-----
-----
* Day 3:
* Advanced Theano (30 minutes)
* Advanced Theano(30 minutes)
* Debuging, profiling, compilation pipeline, inplace optimization
* Debugging, profiling, compilation pipeline
* Projects / General hacking / code-sprinting.
* Projects / General hacking / code-sprinting.
Day 4
Day 4
-----
-----
* Day 4: *You choose* (we can split the group)
* *You choose* (we can split the group)
* Extending Theano or
* How to wrap code in an op
* Extending Theano
* How to use pycuda code in Theano
* Projects / General hacking / code-sprinting.
* How to write an Op
* How to use pycuda code in Theano
* Projects / General hacking / code-sprinting.
Note - the schedule here is a guideline.
We can adapt it in reponse to developments in the hands-on work.
The point is for you to learn something about the practice of machine
learning.
doc/cifarSC2011/introduction.txt
浏览文件 @
54bc197e
...
@@ -51,18 +51,19 @@ Python in one slide
...
@@ -51,18 +51,19 @@ Python in one slide
Features:
Features:
- General-purpose high-level OO interpreted language
* General-purpose high-level OO interpreted language
- Emphasizes code readability
- Comprehensive standard library
- Dynamic type and memory management
Language things:
* Emphasizes code readability
- Indentation for block delimiters
* Comprehensive standard library
- Dictionary ``d={'var1':'value1', 'var2':42, ...}``
- List comprehension: ``[i+3 for i in range(10)]``
.. code-block: python
* Dynamic type and memory management
* builtin types: int, float, str, list, dict, tuple, object
Syntax sample:
.. code-block:: python
a = {'a': 5, 'b': None} # dictionary of two elements
a = {'a': 5, 'b': None} # dictionary of two elements
b = [1,2,3] # list of three int literals
b = [1,2,3] # list of three int literals
...
@@ -71,112 +72,183 @@ Language things:
...
@@ -71,112 +72,183 @@ Language things:
return a + b + c # note scoping, indentation
return a + b + c # note scoping, indentation
* List comprehension: ``[i+3 for i in range(10)]``
Numpy in one slide
Numpy in one slide
------------------
------------------
- Numpy provides a N-dimensional numeric array in Python
* Python floats are full-fledged objects on the heap
- Well known basis for scientific computing
- provides:
* Not suitable for high-performance computing!
- elementwise computations
- linear algebra, Fourier transforms
* Numpy provides a N-dimensional numeric array in Python
- pseudorandom numbers from many distributions
* Perfect for high-performance computing.
* Numpy provides:
* elementwise computations
* linear algebra, Fourier transforms
* pseudorandom numbers from many distributions
* Scipy provides lots more, including:
* more linear algebra
* solvers and optimization algorithms
* matlab-compatible I/O
* I/O and signal processing for images and audio
Here are the properties of numpy arrays that you really need to know.
.. code-block:: python
import numpy as np
a = np.random.rand(3,4,5)
a32 = a.astype('float32')
a.ndim # int: 3
a.shape # tuple: (3,4,5)
a.size # int: 60
a.dtype # np.dtype object: 'float64'
a32.dtype # np.dtype object: 'float32'
These arrays can be combined with numeric operators, standard mathematical
functions. Numpy has XXX great documentation XXX.
Training an MNIST-ready classification neural network in pure numpy might look like this:
.. code-block:: python
- Base scientific computing package in Python on the CPU
x = np.load('data_x.npy')
- A powerful N-dimensional array object
y = np.load('data_y.npy')
w = np.random.normal(avg=0, std=.1,
size=(784, 500))
b = np.zeros(500)
v = np.zeros((500, 10))
c = np.zeros(10)
- ndarray.{ndim, shape, size, dtype, itemsize, stride}
for i in xrange(1000):
x_i = x[i*batchsize:(i+1)*batchsize]
y_i = y[i*batchsize:(i+1)*batchsize]
- Sophisticated broadcasting functions
hidin = N.dot(x_i, w) + b
- ``numpy.random.rand(4,5) * numpy.random.rand(1,5)`` -> mat(4,5)
hidout = N.tanh(hidin)
- ``numpy.random.rand(4,5) * numpy.random.rand(4,1)`` -> mat(4,5)
- ``numpy.random.rand(4,5) * numpy.random.rand(5)`` -> mat(4,5)
- Tools for integrating C/C++ and Fortran code
outin = N.dot(hidout, v) + c
- Linear algebra, Fourier transform and pseudorandom number generation
outout = (N.tanh(outin)+1)/2.0
g_outout = outout - y_i
err = 0.5 * N.sum(g_outout**2)
g_outin = g_outout * outout * (1.0 - outout)
g_hidout = N.dot(g_outin, v.T)
g_hidin = g_hidout * (1 - hidout**2)
b -= lr * N.sum(g_hidin, axis=0)
c -= lr * N.sum(g_outin, axis=0)
w -= lr * N.dot(x_i.T, g_hidin)
v -= lr * N.dot(hidout.T, g_outin)
What's missing?
What's missing?
---------------
---------------
* Non-lazy evaluation (required by Python) hurts performance
.. :
* Numpy is bound to the CPU
Theano tries to be the **holy grail** in computing: *easy to code* and *it fast to execute* !
* Numpy lacks symbolic or automatic differentiation
It works only on mathematical expressions, so you won't have:
Here's how the algorithm above looks in Theano, and it runs 15 times faster if
you have GPU (I'm skipping some dtype-details which we'll come back to):
- Function call inside a theano function
.. code-block:: python
- Structure, enum
- Dynamic type (Theano is Fully typed)
Unfortunately it doesn't do coffee... yet.
import theano as T
import theano.tensor as TT
.. image:: pics/Caffeine_Machine_no_background_red.png
x = np.load('data_x.npy')
y = np.load('data_y.npy')
# symbol declarations
sx = TT.matrix()
sy = TT.matrix()
w = T.shared(np.random.normal(avg=0, std=.1,
size=(784, 500)))
b = T.shared(np.zeros(500))
v = T.shared(np.zeros((500, 10)))
c = T.shared(np.zeros(10))
Theano status
# symbolic expression-building
-------------
outout = TT.tanh(TT.dot(TT.tanh(TT.dot(sx, w.T) + b), v.T) + c)
err = 0.5 * TT.sum(outout - sy)**2
gw, gb, gv, gc = TT.grad(err, [w,b,v,c])
Why you can rely on Theano:
# compile a fast training function
train = function([sx, sy], cost,
updates={
w:w - lr * gw,
b:b - lr * gb,
v:v - lr * gv,
c:c - lr * gc})
- Theano has been developed and used since January 2008 (3.5 yrs old)
# now do the computations
- Core technology for a funded Silicon-Valley startup
for i in xrange(1000):
- Driven over 40 research papers in the last few years
x_i = x[i*batchsize:(i+1)*batchsize]
- Good user documentation
y_i = y[i*batchsize:(i+1)*batchsize]
- Active mailing list with participants from outside our lab
err_i = train(x_i, y_i)
- Many contributors (some from outside our lab)
- Used to teach IFT6266 for two years
- Used by everyone in our lab (~ 30 people)
- Deep Learning Tutorials
- Unofficial RPMs for Mandriva
- Downloads (June 8 2011, since last January): Pypi 780, MLOSS: 483, Assembla (`bleeding edge` repository): unknown
Why Theano is better ?
----------------------
Executing the code is faster because Theano:
Theano in one slide
- Rearranges high-level expressions
-------------------
- Produces customized low-level code
- Uses a variety of backend technologies (GPU,...)
Writing the code is faster because:
* High-level domain-specific language tailored to numeric computation
- High-level language allows to **concentrate on the algorithm**
- Theano does **automatic optimization**
- No need to manually optimize for each algorithm you want to test
* Compiles most common expressions to C for CPU and GPU.
- Theano does **automatic efficient symbolic differentiation**
- No need to manually differentiate your functions (tedious & error-prone for complicated expressions!)
* Limited expressivity means lots of opportunities for expression-level optimizations
* No function call -> global optimization
Why scripting for GPUs ?
* Strongly typed -> compiles to machine instructions
------------------------
**GPUs?**
* Array oriented -> parallelizable across cores
- Faster, cheaper, more efficient power usage
* Expression substitution optimizations automatically draw
- How much faster? I have seen numbers from 100x slower to 1000x faster
.
on many backend technologies for best performance
.
- It depends on the algorithms
* FFTW, MKL, ATLAS, Scipy, Cython, CUDA
- How the benchmark is done
- Quality of implementation
* Slower fallbacks always available
- How much time was spent optimizing CPU vs GPU code
- In Theory:
* It used to have no/poor support for internal looping and conditional
expressions, but these are now quite usable.
- Intel Core i7 980 XE (107Gf/s float64) 6 cores
- NVIDIA C2050 (515 Gf/s float64, 1Tf/s float32) 480 cores
- NVIDIA GTX580 (1.5Tf/s float32) 512 cores
- Theano goes up to 100x faster on th GPU because we don't use multiple core on CPU
Project status
--------------
- Theano can be linked with multi-core capable BLAS (GEMM and GEMV)
* Mature: theano has been developed and used since January 2008 (3.5 yrs old)
- If you see 1000x, it probably means the benchmark is not fair
* Driven over 40 research papers in the last few years
* Core technology for a funded Silicon-Valley startup
* Good user documentation
* Active mailing list with participants from outside our lab
* Many contributors (some from outside our lab)
* Used to teach IFT6266 for two years
* Used for research at Google and Yahoo.
* Unofficial RPMs for Mandriva
* Downloads (on June 8 2011, since last January): Pypi 780, MLOSS: 483, Assembla (`bleeding edge` repository): unknown
**Scripting for GPUs?**
Why scripting for GPUs ?
------------------------
They *Complement each other*:
They *Complement each other*:
...
@@ -185,31 +257,58 @@ They *Complement each other*:
...
@@ -185,31 +257,58 @@ They *Complement each other*:
- Highly parallel
- Highly parallel
- Very architecture-sensitive
- Very architecture-sensitive
- Built for maximum FP/memory throughput
- Built for maximum FP/memory throughput
- So hard to program that meta-programming is easier.
- CPU: largely restricted to control
- CPU: largely restricted to control
- Optimized for sequential code and low latency (rather than high throughput)
- Optimized for sequential code and low latency (rather than high throughput)
- Tasks (1000/sec)
- Tasks (1000/sec)
- Scripting fast enough
- Scripting fast enough
Theano vs PyCUDA vs PyOpenCL vs CUDA
Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
------------------------------------
- Theano
- Mathematical expression compiler
How Fast are GPUs?
- Generates costum C and CUDA code
------------------
- Uses Python code when performance is not critical
- CUDA
- Theory:
- C extension by NVIDA that allow to code and use GPU
- Intel Core i7 980 XE (107Gf/s float64) 6 cores
- NVIDIA C2050 (515 Gf/s float64, 1Tf/s float32) 480 cores
- NVIDIA GTX580 (1.5Tf/s float32) 512 cores
- GPUs are faster, cheaper, more power-efficient
- Practice:
- Depends on algorithm and implementation!
- Reported speed improvements over CPU in lit. vary *widely* (.01x to 1000x)
- Matrix-matrix multiply speedup: usually about 10-20x.
- Convolution speedup: usually about 15x.
- Elemwise speedup: slower or up to 100x (depending on operation and layout)
- Sum: can be faster or slower depending on layout.
- Benchmarking is delicate work...
- How to control quality of implementation?
- How much time was spent optimizing CPU vs GPU code?
- Theano goes up to 100x faster on GPU because it uses only one CPU core
- Theano can be linked with multi-core capable BLAS (GEMM and GEMV)
- PyCUDA (Python + CUDA)
- If you see speedup > 100x, the benchmark is probably not fair.
Software for Directly Programming a GPU
---------------------------------------
Theano is a meta-programmer, doesn't really count.
- CUDA: C extension by NVIDIA
- Vendor-specific
- Numeric libraries (BLAS, RNG, FFT) maturing.
- OpenCL: multi-vendor version of CUDA
- More general, standardized
- Fewer libraries, less adoption.
- PyCUDA: python bindings to CUDA driver interface
- Python interface to CUDA
- Python interface to CUDA
- Memory management of GPU objects
- Memory management of GPU objects
- Compilation of code for the low-level driver
- Compilation of code for the low-level driver
- Makes it easy to do GPU meta-programming from within Python
- PyOpenCL (Python + OpenCL)
- PyOpenCL: PyCUDA for PyOpenCL
- PyCUDA for OpenCL
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论