提交 ea36c662 authored 作者: Frederic's avatar Frederic

remove old version.

上级 85ebc3aa
.. _omlw2014_libgpuarray:
***********
libgpuarray
***********
Why a common GPU ndarray?
-------------------------
- Currently there are at least 4 different GPU array data structures in use by Python packages
- CudaNdarray (Theano), GPUArray (PyCUDA), CUDAMatrix (cudamat), GPUArray (PyOpenCL), ...
- There are even more if we include other languages
- All of them are a subset of the functionality of ``numpy.ndarray`` on the GPU
- Lots of duplicated effort
- GPU code is harder/slower to do {\bf correctly} and {\bf fast} than on the CPU/Python
- Lack of a common array API makes it harder to port/reuse code
- Also harder to find/distribute code
- Divides development work
Design Goals
------------
- Make it VERY similar to ``numpy.ndarray``
- Be compatible with both CUDA and OpenCL
- Have the base object accessible from C to allow collaboration with more projects, across high-level languages
- We want people from C, C++, lua, Ruby, R, ... all use the same base GPU N-dimensional array
Final Note
----------
- Usable directly, but not all implementation available.
- Is the next GPU array container for Theano and is working (not all implementation available now)
- Mailing list: http://lists.tiker.net/listinfo/gpundarray
.. _omlw2014_index:
======================================================
Theano, Pylearn2, libgpuarray Presentation @ OMLW 2014
======================================================
August 22, 2014, New York University, US.
By Frédéric Bastien and Bart van Merriënboer. University of Montréal, Canada.
Theano, Pylearn2 and libgpuarray software stack for machine learning.
It complements the Python numeric/scientific software stack (e.g. NumPy, SciPy,
scikits, matplotlib, PIL.)
Theano
======
Theano is a software for evaluating and manipulating complicated array
expressions.
What does it do?
* aggressive expression optimizations,
* automatic GPU use,
* automatic symbolic differentiation, Jacobian, Hession computation
and R/L op (for hessian free).
Design and feature set has been driven by machine learning research
at the University of
Montreal (groups of Yoshua Bengio, Pascal Vincent, Aaron Courville and Roland Memisevic)
The result is a very good library for doing research in deep
learning and neural network training, and a flexible framework for
many other models and algorithms in machine learning more generally.
It has proven to be useful for implementing:
- linear and nonlinear neural network classifiers
- including Maxout, Dropout
- convolutional models
- Energy models: RBM, DBN, GRBM, ssRBM, AIS
- Auto-encoders: DAE, CAE
- GP regression
- sparse coding
- recurrent neural networks, echo state, (HMM?) TODO
- online and batch learning and optimization
- Even SVM!
As people's needs change this list will grow, but Theano is built
around vector, matrix, and tensor expressions. It also support sparse matrix.
Pylearn2
========
Pylearn2 is undergoing rapid development. Don’t expect a clean
road without bumps! It is made for machine learning
practitioner/researcher first.
Pylearn2 is a machine learning library. Most of its functionality is
built on top of Theano. This means you can write Pylearn2 plugins (new
models, algorithms, etc) using mathematical expressions, and Theano
will optimize and stabilize those expressions for you, and compile
them to a backend of your choice (CPU or GPU).
Pylearn2 Vision
---------------
TODO: SHould we split this in 2 part, what is done, what is the vision not done yet?
* Researchers **add features as they need them**. We avoid getting bogged down by
too much top-down planning in advance.
* A machine learning toolbox for **easy scientific experimentation**.
* All models/algorithms published by the LISA lab should have reference
implementations in Pylearn2. TODO REMOVE???
* Pylearn2 **may wrap other libraries** such as scikits.learn when this is practical
* Pylearn2 **differs from scikits.learn** in that Pylearn2 aims to provide great
flexibility and make it possible for a researcher to do almost anything,
while **scikits.learn aims to work as a "black box"**.
* **Dataset interface** for vector, images, video, ... TODO (DO WE HAVE VIDEO?)
* Small framework for all what is needed for one normal MLP/RBM/SDA/Convolution
experiments. (TODO: I think I would remove this)
* **Easy reuse of sub-component** of Pylearn2.
* Using one sub-component of the library does not force you to use / learn to
use all of the other sub-components if you choose not to. TODO remove?
* Support cross-platform serialization of learned models.(TODO, I think this isn't done)
* Remain approachable enough to be used in the classroom
libgpuarray
===========
Make a common GPU ndarray(vector, matrix or n dimensions) that can be
reused by all projects. It support CUDA and OpenCL.
Motivation
----------
* Currently there are at least 6 different gpu arrays in python
* CudaNdarray(Theano), GPUArray(pycuda), CUDAMatrix(cudamat), GPUArray(pyopencl), Clyther, Copperhead, ...
* There are even more if we include other languages.
* They are incompatible
* None have the same properties and interface.
* All of them are a subset of numpy.ndarray on the gpu!
Design Goals
------------
* Have the base object in C to allow collaboration with more projects.
* We want people from C, C++, ruby, R, ... all use the same base GPU ndarray.
* Be compatible with CUDA and OpenCL.
* Not too simple, (don't support just matrix).
* But still easy to develop new code that support only a few memory layout.
* This ease the development of new code.
Contents
========
.. toctree::
introduction
theano
pylearn2
gpundarray
sharing
.. _omlw2014_Introduction:
************
Introduction
************
Python in one slide
-------------------
* General-purpose high-level **OO interpreted language**
* Emphasizes **code readability**
* Comprehensive standard library
* Dynamic type and memory management
* Built-in types: int, float, str, list, dict, tuple, object
* Slow execution
* Popular in **web-dev** and **scientific communities**
NumPy in one slide
------------------
* Python floats are full-fledged objects on the heap
* Not suitable for high-performance computing!
* NumPy provides a N-dimensional numeric array in Python
* Perfect for high-performance computing.
* Slice are return view (no copy)
* NumPy provides
* elementwise computations
* linear algebra, Fourier transforms
* pseudorandom numbers from many distributions
* SciPy provides lots more, including
* more linear algebra
* solvers and optimization algorithms
* matlab-compatible I/O
* I/O and signal processing for images and audio
.. code-block:: python
##############################
# Properties of NumPy arrays
# that you really need to know
##############################
import numpy as np # import can rename
a = np.random.rand(3, 4, 5) # random generators
a32 = a.astype('float32') # arrays are strongly typed
a.ndim # int: 3
a.shape # tuple: (3, 4, 5)
a.size # int: 60
a.dtype # np.dtype object: 'float64'
a32.dtype # np.dtype object: 'float32'
assert a[1, 1, 1] != 10 # a[1, 1, 1] is a view
a[1, 1, 1] = 10 # So affectation to it change the
assert a[1, 1, 1] == 10 # original array
Arrays can be combined with numeric operators, standard mathematical
functions. NumPy has great `documentation <http://docs.scipy.org/doc/numpy/reference/>`_.
What's missing?
---------------
* Non-lazy evaluation (required by Python) hurts performance
* NumPy is bound to the CPU
* NumPy lacks symbolic or automatic differentiation
Quick look at a small examples:
.. code-block:: python
#########################
# Theano for Training a
# Neural Network on MNIST
#########################
import numpy as np
import theano
import theano.tensor as tensor
x = np.load('data_x.npy')
y = np.load('data_y.npy')
# symbol declarations
sx = tensor.matrix()
sy = tensor.matrix()
w = theano.shared(np.random.normal(avg=0, std=.1,
size=(784, 500)))
b = theano.shared(np.zeros(500))
v = theano.shared(np.zeros((500, 10)))
c = theano.shared(np.zeros(10))
# symbolic expression-building
hid = tensor.tanh(tensor.dot(sx, w) + b)
out = tensor.tanh(tensor.dot(hid, v) + c)
err = 0.5 * tensor.sum(out - sy) ** 2
gw, gb, gv, gc = tensor.grad(err, [w, b, v, c])
# compile a fast training function
train = theano.function([sx, sy], err,
updates={
w: w - lr * gw,
b: b - lr * gb,
v: v - lr * gv,
c: c - lr * gc})
# now do the computations
batchsize = 100
for i in xrange(1000):
x_i = x[i * batchsize: (i + 1) * batchsize]
y_i = y[i * batchsize: (i + 1) * batchsize]
err_i = train(x_i, y_i)
Theano in one slide
-------------------
* High-level domain-specific language tailored to numeric computation
* Compiles most common expressions to C for CPU and GPU.
* Limited expressivity means lots of opportunities for expression-level optimizations
* No function call -> global optimization
* Strongly typed -> compiles to machine instructions
* Array oriented -> easy parallelism
* Support for looping and branching in expressions
* Expression substitution optimizations automatically draw
on many backend technologies for best performance.
* BLAS, SciPy, Cython, CUDA
* Slower fallbacks always available
* Automatic differentiation and R op
* Sparse matrices
Project status
--------------
* Mature: theano has been developed and used since January 2008 (6.5 yrs old)
* Driven over 100 research papers
* Good user documentation
* Active mailing list with participants from outside our lab
* Core technology for a few Silicon-Valley startup
* Many contributors (some from outside our lab)
* Used to teach many university classes
* Used for research at Google and Yahoo. (TODO, should we remove? I think so)
Pylearn2 in one slide
---------------------
TODO
Other global information
------------------------
Theano have small basic operation, not layers as base operation:
* Easy reuse
* Don't need to reimplement the grad for each variation of layers
This could cause slowness (more small operation), but the optimizer fix that.
Pylearn2 wrap the small operations into layers like other
projects:
* There is no overhead to this extra layer, due to the
compilation of the function by Theano.
Why scripting for GPUs?
-----------------------
They *Complement each other*:
* GPUs are everything that scripting/high level languages are not
* Highly parallel
* Very architecture-sensitive
* Built for maximum FP/memory throughput
* So hard to program that meta-programming is easier.
* CPU: largely restricted to control
* Optimized for sequential code and low latency (rather than high throughput)
* Tasks (1000/sec)
* Scripting fast enough
Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
.. _omlw2014_pylearn2:
********
Pylearn2
********
Pointers
--------
TODO:
* http://deeplearning.net/software/pylearn2/
* User mailing list: http://groups.google.com/group/pylearn-users
* Dev mailing list: http://groups.google.com/group/pylearn-dev
* Installation: http://deeplearning.net/software/pylearn2/index.html#download-and-installation
Description
-----------
TODO:
* ...
Simple example
--------------
(logistic regression?) TODO
Real example
------------
(maxout?)TODO
Known limitations
-----------------
TODO
* It is getting stabilized, but still heavily modified.
.. _omlw2014_sharing:
************
Sharing code
************
* License (BSD 3 clauses suggested, don't forget to add the license info in the code)
* Common base object? libgpuarray.
* If not, important implementation that use raw ptr/shape? Doc that interface.
* Important, *acknowledgement section on web site*(citation like) AND *in paper* about the software we reuse! (and use too)
*************
Theano future
*************
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论