提交 24bcbf36 authored 作者: Frederic Bastien's avatar Frederic Bastien

First version of Open Machine Learning workshop presentation.

上级 93be9cb8
.. _omlw2014_libgpundarray:
*************
libGpuNdArray
*************
Why a common GPU ndarray?
-------------------------
- Currently there are at least 4 different GPU array data structures in use by Python packages
- CudaNdarray (Theano), GPUArray (PyCUDA), CUDAMatrix (cudamat), GPUArray (PyOpenCL), ...
- There are even more if we include other languages
- All of them are a subset of the functionality of ``numpy.ndarray`` on the GPU
- Lots of duplicated effort
- GPU code is harder/slower to do {\bf correctly} and {\bf fast} than on the CPU/Python
- Lack of a common array API makes it harder to port/reuse code
- Also harder to find/distribute code
- Divides development work
Design Goals
------------
- Make it VERY similar to ``numpy.ndarray``
- Be compatible with both CUDA and OpenCL
- Have the base object accessible from C to allow collaboration with more projects, across high-level languages
- We want people from C, C++, lua, Ruby, R, ... all use the same base GPU N-dimensional array
Final Note
----------
TODO: update
- Usable, but under development.
- Is the next GPU array container for Theano
- Mailing list: http://lists.tiker.net/listinfo/gpundarray
.. _omlw2014_index:
===========================
Theano Tutorial @ OMLW 2014
===========================
August 22, 2014, New York University, US.
This presentation will talk about Theano, Pylearn2 software stack for
machine learning.
It complements the Python numeric/scientific software stack (e.g. NumPy, SciPy,
scikits, matplotlib, PIL.)
Theano
======
Theano is a software for evaluating and manipulating complicated array
expressions.
What does it do?
* aggressive expression optimizations,
* automatic GPU use,
* automatic symbolic differentiation, Jacobian, Hession computation
and R/L op (for hessian free).
Design and feature set has been driven by machine learning research
at the University of
Montreal (groups of Yoshua Bengio, Pascal Vincent, Aaron Courville and Roland Memisevic)
The result is a very good library for doing research in deep
learning and neural network training, and a flexible framework for
many other models and algorithms in machine learning more generally.
# TODO UPDATE
It has proven to be useful for implementing:
- linear and nonlinear neural network classifiers
- convolutional models
- Energy models: RBM, DBN, GRBM, ssRBM, AIS
- Auto-encoders: DAE, CAE
- GP regression
- sparse coding
- recurrent neural networks, echo state, (HMM?)
- online and batch learning and optimization
- Even SVM!
As people's needs change this list will grow, but Theano is built
around vector, matrix, and tensor expressions; there is little reason
to use it for calculations on other data structures except. There is
also sparse matrix support.
Pylearn2
========
Pylearn2 is still undergoing rapid development. Don’t expect a clean
road without bumps! It is made for machine learning
practitioner/researcher first.
Pylearn2 is a machine learning library. Most of its functionality is
built on top of Theano. This means you can write Pylearn2 plugins (new
models, algorithms, etc) using mathematical expressions, and Theano
will optimize and stabilize those expressions for you, and compile
them to a backend of your choice (CPU or GPU).
Pylearn2 Vision
---------------
* Researchers add features as they need them. We avoid getting bogged down by
too much top-down planning in advance.
* A machine learning toolbox for easy scientific experimentation.
* All models/algorithms published by the LISA lab should have reference
implementations in Pylearn2.
* Pylearn2 may wrap other libraries such as scikits.learn when this is practical
* Pylearn2 differs from scikits.learn in that Pylearn2 aims to provide great
flexibility and make it possible for a researcher to do almost anything,
while scikits.learn aims to work as a "black box" that can produce good
results even if the user does not understand the implementation
* Dataset interface for vector, images, video, ...
* Small framework for all what is needed for one normal MLP/RBM/SDA/Convolution
experiments.
* *Easy reuse* of sub-component of Pylearn2.
* Using one sub-component of the library does not force you to use / learn to
use all of the other sub-components if you choose not to.
* Support cross-platform serialization of learned models.
* Remain approachable enough to be used in the classroom
Contents
========
The structured part of these lab sessions will be a walk-through of the following
material. Interleaved with this structured part will be blocks of time for
individual or group work. The idea is that you can try out Theano and get help
from gurus on hand if you get stuck.
.. toctree::
introduction
theano
pylearn2
gpundarray
.. _omlw2014_Introduction:
************
Introduction
************
Python in one slide
-------------------
* General-purpose high-level OO interpreted language
* Emphasizes code readability
* Comprehensive standard library
* Dynamic type and memory management
* Built-in types: int, float, str, list, dict, tuple, object
* Slow execution
* Popular in *web-dev* and *scientific communities*
NumPy in one slide
------------------
* Python floats are full-fledged objects on the heap
* Not suitable for high-performance computing!
* NumPy provides a N-dimensional numeric array in Python
* Perfect for high-performance computing.
* Slice are return view (no copy)
* NumPy provides
* elementwise computations
* linear algebra, Fourier transforms
* pseudorandom numbers from many distributions
* SciPy provides lots more, including
* more linear algebra
* solvers and optimization algorithms
* matlab-compatible I/O
* I/O and signal processing for images and audio
.. code-block:: python
##############################
# Properties of NumPy arrays
# that you really need to know
##############################
import numpy as np # import can rename
a = np.random.rand(3, 4, 5) # random generators
a32 = a.astype('float32') # arrays are strongly typed
a.ndim # int: 3
a.shape # tuple: (3, 4, 5)
a.size # int: 60
a.dtype # np.dtype object: 'float64'
a32.dtype # np.dtype object: 'float32'
assert a[1, 1, 1] != 10 # a[1, 1, 1] is a view
a[1, 1, 1] = 10 # So affectation to it change the
assert a[1, 1, 1] == 10 # original array
Arrays can be combined with numeric operators, standard mathematical
functions. NumPy has great `documentation <http://docs.scipy.org/doc/numpy/reference/>`_.
Training an MNIST-ready classification neural network in pure NumPy might look like this:
.. code-block:: python
#########################
# NumPy for Training a
# Neural Network on MNIST
#########################
x = np.load('data_x.npy')
y = np.load('data_y.npy')
w = np.random.normal(
avg=0,
std=.1,
size=(784, 500))
b = np.zeros((500,))
v = np.zeros((500, 10))
c = np.zeros((10,))
batchsize = 100
for i in xrange(1000):
x_i = x[i * batchsize: (i + 1) * batchsize]
y_i = y[i * batchsize: (i + 1) * batchsize]
hidin = np.dot(x_i, w) + b
hidout = np.tanh(hidin)
outin = np.dot(hidout, v) + c
outout = (np.tanh(outin) + 1) / 2.0
g_outout = outout - y_i
err = 0.5 * np.sum(g_outout) ** 2
g_outin = g_outout * outout * (1.0 - outout)
g_hidout = np.dot(g_outin, v.T)
g_hidin = g_hidout * (1 - hidout ** 2)
b -= lr * np.sum(g_hidin, axis=0)
c -= lr * np.sum(g_outin, axis=0)
w -= lr * np.dot(x_i.T, g_hidin)
v -= lr * np.dot(hidout.T, g_outin)
What's missing?
---------------
* Non-lazy evaluation (required by Python) hurts performance
* NumPy is bound to the CPU
* NumPy lacks symbolic or automatic differentiation
Now let's have a look at the same algorithm in Theano, which runs 15 times faster if
you have GPU (I'm skipping some dtype-details which we'll come back to).
.. code-block:: python
#########################
# Theano for Training a
# Neural Network on MNIST
#########################
import numpy as np
import theano
import theano.tensor as tensor
x = np.load('data_x.npy')
y = np.load('data_y.npy')
# symbol declarations
sx = tensor.matrix()
sy = tensor.matrix()
w = theano.shared(np.random.normal(avg=0, std=.1,
size=(784, 500)))
b = theano.shared(np.zeros(500))
v = theano.shared(np.zeros((500, 10)))
c = theano.shared(np.zeros(10))
# symbolic expression-building
hid = tensor.tanh(tensor.dot(sx, w) + b)
out = tensor.tanh(tensor.dot(hid, v) + c)
err = 0.5 * tensor.sum(out - sy) ** 2
gw, gb, gv, gc = tensor.grad(err, [w, b, v, c])
# compile a fast training function
train = theano.function([sx, sy], err,
updates={
w: w - lr * gw,
b: b - lr * gb,
v: v - lr * gv,
c: c - lr * gc})
# now do the computations
batchsize = 100
for i in xrange(1000):
x_i = x[i * batchsize: (i + 1) * batchsize]
y_i = y[i * batchsize: (i + 1) * batchsize]
err_i = train(x_i, y_i)
Theano in one slide
-------------------
* High-level domain-specific language tailored to numeric computation
* Compiles most common expressions to C for CPU and GPU.
* Limited expressivity means lots of opportunities for expression-level optimizations
* No function call -> global optimization
* Strongly typed -> compiles to machine instructions
* Array oriented -> parallelizable across cores
* Support for looping and branching in expressions
* Expression substitution optimizations automatically draw
on many backend technologies for best performance.
* FFTW, MKL, ATLAS, SciPy, Cython, CUDA
* Slower fallbacks always available
* Automatic differentiation and R op
* Sparse matrices
Project status
--------------
* Mature: theano has been developed and used since January 2008 (6.5 yrs old)
* Driven over 100 research papers
* Good user documentation
* Active mailing list with participants from outside our lab
* Core technology for a few funded Silicon-Valley startup
* Many contributors (some from outside our lab)
* Used to teach many university classes
* Used for research at Google and Yahoo.
* Downloads
* Pypi (August 18th 2014, the last release): 255 last day, 2140 last week, 9145 last month
* Github (`bleeding edge` repository, the one recommanded): unknown
* Github stats?????
Why scripting for GPUs?
-----------------------
They *Complement each other*:
* GPUs are everything that scripting/high level languages are not
* Highly parallel
* Very architecture-sensitive
* Built for maximum FP/memory throughput
* So hard to program that meta-programming is easier.
* CPU: largely restricted to control
* Optimized for sequential code and low latency (rather than high throughput)
* Tasks (1000/sec)
* Scripting fast enough
Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
import numpy
import theano
import theano.tensor as tt
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000
# Declare Theano symbolic variables
x = tt.matrix("x")
y = tt.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print "Initial model:"
print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b)) # Probability that target = 1
prediction = p_1 > 0.5 # The prediction thresholded
xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1) # Cross-entropy loss
cost = xent.mean() + 0.01 * (w ** 2).sum() # The cost to minimize
gw, gb = tt.grad(cost, [w, b])
# Compile
train = theano.function(
inputs=[x, y],
outputs=[prediction, xent],
updates=[(w, w - 0.1 * gw),
(b, b - 0.1 * gb)],
name='train')
predict = theano.function(inputs=[x], outputs=prediction,
name='predict')
# Train
for i in range(training_steps):
pred, err = train(D[0], D[1])
print "Final model:"
print w.get_value(), b.get_value()
print "target values for D:", D[1]
print "prediction on D:", predict(D[0])
.. _omlw2014_pylearn2:
********
Pylearn2
********
Pointers
--------
TODO:
* http://deeplearning.net/software/pylearn2/
* User mailing list: http://groups.google.com/group/pylearn-users
* Dev mailing list: http://groups.google.com/group/pylearn-dev
* Installation: http://deeplearning.net/software/pylearn2/index.html#download-and-installation
Description
-----------
TODO:
* ...
Simple example
--------------
(logistic regression?) TODO
Real example
------------
(maxout?)TODO
Known limitations
-----------------
TODO
* It is getting stabilized, but still heavily modified.
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论