提交 489b14c4 authored 作者: James Bergstra's avatar James Bergstra

cifarSC - changed indentation levels in rst

上级 ffb29872
......@@ -9,29 +9,29 @@ Introduction
Background Questionaire
-----------------------
* Who has used Theano before?
* Who has used Theano before?
* What did you do with it?
* Who has used Python? numpy? scipy? matplotlib?
* Who has used Python? numpy? scipy? matplotlib?
* Who has used iPython?
* Who has used iPython?
* Who has used it as a distributed computing engine?
* Who has done C/C++ programming?
* Who has done C/C++ programming?
* Who has organized computation around a particular physical memory layout?
* Who has organized computation around a particular physical memory layout?
* Who has used a multidimensional array of >2 dimensions?
* Who has used a multidimensional array of >2 dimensions?
* Who has written a Python module in C before?
* Who has written a Python module in C before?
* Who has written a program to *generate* Python modules in C?
* Who has used a templating engine?
* Who has used a templating engine?
* Who has programmed a GPU before?
* Who has programmed a GPU before?
* Using OpenGL / shaders ?
......@@ -43,7 +43,7 @@ Background Questionaire
* Other?
* Who has used Cython?
* Who has used Cython?
Python in one slide
......@@ -51,15 +51,15 @@ Python in one slide
Features:
* General-purpose high-level OO interpreted language
* General-purpose high-level OO interpreted language
* Emphasizes code readability
* Emphasizes code readability
* Comprehensive standard library
* Comprehensive standard library
* Dynamic type and memory management
* Dynamic type and memory management
* builtin types: int, float, str, list, dict, tuple, object
* builtin types: int, float, str, list, dict, tuple, object
Syntax sample:
......@@ -71,22 +71,22 @@ Syntax sample:
def foo(b, c=3): # function w default param c
return a + b + c # note scoping, indentation
b_squared = [b_i**2 for b_i in b] # list comprehension
* List comprehension: ``[i+3 for i in range(10)]``
print b[1:3] # slicing syntax
Numpy in one slide
------------------
* Python floats are full-fledged objects on the heap
* Python floats are full-fledged objects on the heap
* Not suitable for high-performance computing!
* Numpy provides a N-dimensional numeric array in Python
* Numpy provides a N-dimensional numeric array in Python
* Perfect for high-performance computing.
* Numpy provides:
* Numpy provides
* elementwise computations
......@@ -94,7 +94,7 @@ Numpy in one slide
* pseudorandom numbers from many distributions
* Scipy provides lots more, including:
* Scipy provides lots more, including
* more linear algebra
......@@ -161,11 +161,11 @@ Training an MNIST-ready classification neural network in pure numpy might look l
What's missing?
---------------
* Non-lazy evaluation (required by Python) hurts performance
* Non-lazy evaluation (required by Python) hurts performance
* Numpy is bound to the CPU
* Numpy is bound to the CPU
* Numpy lacks symbolic or automatic differentiation
* Numpy lacks symbolic or automatic differentiation
Here's how the algorithm above looks in Theano, and it runs 15 times faster if
you have GPU (I'm skipping some dtype-details which we'll come back to):
......@@ -210,41 +210,51 @@ you have GPU (I'm skipping some dtype-details which we'll come back to):
Theano in one slide
-------------------
* High-level domain-specific language tailored to numeric computation
* High-level domain-specific language tailored to numeric computation
* Compiles most common expressions to C for CPU and GPU.
* Compiles most common expressions to C for CPU and GPU.
* Limited expressivity means lots of opportunities for expression-level optimizations
* Limited expressivity means lots of opportunities for expression-level optimizations
* No function call -> global optimization
* Strongly typed -> compiles to machine instructions
* Array oriented -> parallelizable across cores
* Expression substitution optimizations automatically draw
* Expression substitution optimizations automatically draw
on many backend technologies for best performance.
* FFTW, MKL, ATLAS, Scipy, Cython, CUDA
* Slower fallbacks always available
* It used to have no/poor support for internal looping and conditional
* It used to have no/poor support for internal looping and conditional
expressions, but these are now quite usable.
Project status
--------------
* Mature: theano has been developed and used since January 2008 (3.5 yrs old)
* Driven over 40 research papers in the last few years
* Core technology for a funded Silicon-Valley startup
* Good user documentation
* Active mailing list with participants from outside our lab
* Many contributors (some from outside our lab)
* Used to teach IFT6266 for two years
* Used for research at Google and Yahoo.
* Unofficial RPMs for Mandriva
* Downloads (on June 8 2011, since last January): Pypi 780, MLOSS: 483, Assembla (`bleeding edge` repository): unknown
* Mature: theano has been developed and used since January 2008 (3.5 yrs old)
* Driven over 40 research papers in the last few years
* Core technology for a funded Silicon-Valley startup
* Good user documentation
* Active mailing list with participants from outside our lab
* Many contributors (some from outside our lab)
* Used to teach IFT6266 for two years
* Used for research at Google and Yahoo.
* Unofficial RPMs for Mandriva
* Downloads (on June 8 2011, since last January): Pypi 780, MLOSS: 483, Assembla (`bleeding edge` repository): unknown
Why scripting for GPUs ?
......@@ -252,18 +262,23 @@ Why scripting for GPUs ?
They *Complement each other*:
- GPUs are everything that scripting/high level languages are not
* GPUs are everything that scripting/high level languages are not
- Highly parallel
- Very architecture-sensitive
- Built for maximum FP/memory throughput
- So hard to program that meta-programming is easier.
* Highly parallel
- CPU: largely restricted to control
* Very architecture-sensitive
- Optimized for sequential code and low latency (rather than high throughput)
- Tasks (1000/sec)
- Scripting fast enough
* Built for maximum FP/memory throughput
* So hard to program that meta-programming is easier.
* CPU: largely restricted to control
* Optimized for sequential code and low latency (rather than high throughput)
* Tasks (1000/sec)
* Scripting fast enough
Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
......@@ -271,28 +286,41 @@ Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
How Fast are GPUs?
------------------
- Theory:
* Theory
* Intel Core i7 980 XE (107Gf/s float64) 6 cores
* NVIDIA C2050 (515 Gf/s float64, 1Tf/s float32) 480 cores
* NVIDIA GTX580 (1.5Tf/s float32) 512 cores
* GPUs are faster, cheaper, more power-efficient
* Practice
- Intel Core i7 980 XE (107Gf/s float64) 6 cores
- NVIDIA C2050 (515 Gf/s float64, 1Tf/s float32) 480 cores
- NVIDIA GTX580 (1.5Tf/s float32) 512 cores
- GPUs are faster, cheaper, more power-efficient
* Depends on algorithm and implementation!
- Practice:
- Depends on algorithm and implementation!
- Reported speed improvements over CPU in lit. vary *widely* (.01x to 1000x)
- Matrix-matrix multiply speedup: usually about 10-20x.
- Convolution speedup: usually about 15x.
- Elemwise speedup: slower or up to 100x (depending on operation and layout)
- Sum: can be faster or slower depending on layout.
* Reported speed improvements over CPU in lit. vary *widely* (.01x to 1000x)
- Benchmarking is delicate work...
- How to control quality of implementation?
- How much time was spent optimizing CPU vs GPU code?
- Theano goes up to 100x faster on GPU because it uses only one CPU core
- Theano can be linked with multi-core capable BLAS (GEMM and GEMV)
* Matrix-matrix multiply speedup: usually about 10-20x.
- If you see speedup > 100x, the benchmark is probably not fair.
* Convolution speedup: usually about 15x.
* Elemwise speedup: slower or up to 100x (depending on operation and layout)
* Sum: can be faster or slower depending on layout.
* Benchmarking is delicate work...
* How to control quality of implementation?
* How much time was spent optimizing CPU vs GPU code?
* Theano goes up to 100x faster on GPU because it uses only one CPU core
* Theano can be linked with multi-core capable BLAS (GEMM and GEMV)
* If you see speedup > 100x, the benchmark is probably not fair.
Software for Directly Programming a GPU
......@@ -300,15 +328,27 @@ Software for Directly Programming a GPU
Theano is a meta-programmer, doesn't really count.
- CUDA: C extension by NVIDIA
- Vendor-specific
- Numeric libraries (BLAS, RNG, FFT) maturing.
- OpenCL: multi-vendor version of CUDA
- More general, standardized
- Fewer libraries, less adoption.
- PyCUDA: python bindings to CUDA driver interface
- Python interface to CUDA
- Memory management of GPU objects
- Compilation of code for the low-level driver
- Makes it easy to do GPU meta-programming from within Python
- PyOpenCL: PyCUDA for PyOpenCL
* CUDA: C extension by NVIDIA
* Vendor-specific
* Numeric libraries (BLAS, RNG, FFT) maturing.
* OpenCL: multi-vendor version of CUDA
* More general, standardized
* Fewer libraries, less adoption.
* PyCUDA: python bindings to CUDA driver interface
* Python interface to CUDA
* Memory management of GPU objects
* Compilation of code for the low-level driver
* Makes it easy to do GPU meta-programming from within Python
* PyOpenCL: PyCUDA for PyOpenCL
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论