First version of Open Machine Learning workshop presentation.

24bcbf36 · Frederic Bastien · 93be9cb8 · 24bcbf36 · 24bcbf36 · 24bcbf36
--- a/doc/omlw2014/gpundarray.txt
+++ b/doc/omlw2014/gpundarray.txt
+.. _omlw2014_libgpundarray:
+*************
+libGpuNdArray
+*************
+Why a common GPU ndarray?
+-------------------------
+- Currently there are at least 4 different GPU array data structures in use by Python packages
+  - CudaNdarray (Theano), GPUArray (PyCUDA), CUDAMatrix (cudamat), GPUArray (PyOpenCL), ...
+  - There are even more if we include other languages
+- All of them are a subset of the functionality of ``numpy.ndarray`` on the GPU
+- Lots of duplicated effort
+  - GPU code is harder/slower to do {\bf correctly} and {\bf fast} than on the CPU/Python
+- Lack of a common array API makes it harder to port/reuse code
+- Also harder to find/distribute code
+- Divides development work
+Design Goals
+------------
+- Make it VERY similar to ``numpy.ndarray``
+- Be compatible with both CUDA and OpenCL
+- Have the base object accessible from C to allow collaboration with more projects, across high-level languages
+  - We want people from C, C++, lua, Ruby, R, ... all use the same base GPU N-dimensional array
+Final Note
+----------
+TODO: update
+- Usable, but under development.
+- Is the next GPU array container for Theano
+- Mailing list: http://lists.tiker.net/listinfo/gpundarray
--- a/doc/omlw2014/index.txt
+++ b/doc/omlw2014/index.txt
+.. _omlw2014_index:
+===========================
+Theano Tutorial @ OMLW 2014
+===========================
+August 22, 2014, New York University, US.
+This presentation will talk about Theano, Pylearn2  software stack for
+machine learning.
+It complements the Python numeric/scientific software stack (e.g. NumPy, SciPy,
+scikits, matplotlib, PIL.)
+Theano
+======
+Theano is a software for evaluating and manipulating complicated array
+expressions.
+What does it do?
+ * aggressive expression optimizations,
+ * automatic GPU use,
+ * automatic symbolic differentiation, Jacobian, Hession computation
+   and R/L op (for hessian free).
+Design and feature set has been driven by machine learning research
+at the University of
+Montreal (groups of Yoshua Bengio, Pascal Vincent, Aaron Courville and Roland Memisevic)
+The result is a very good library for doing research in deep
+learning and neural network training, and a flexible framework for
+many other models and algorithms in machine learning more generally.
+# TODO UPDATE
+It has proven to be useful for implementing:
+ - linear and nonlinear neural network classifiers
+ - convolutional models
+ - Energy models: RBM, DBN, GRBM, ssRBM, AIS
+ - Auto-encoders: DAE, CAE
+ - GP regression
+ - sparse coding
+ - recurrent neural networks, echo state, (HMM?)
+ - online and batch learning and optimization
+ - Even SVM!
+As people's needs change this list will grow, but Theano is built
+around vector, matrix, and tensor expressions; there is little reason
+to use it for calculations on other data structures except. There is
+also sparse matrix support.
+Pylearn2
+========
+Pylearn2 is still undergoing rapid development. Don’t expect a clean
+road without bumps! It is made for machine learning
+practitioner/researcher first.
+Pylearn2 is a machine learning library. Most of its functionality is
+built on top of Theano. This means you can write Pylearn2 plugins (new
+models, algorithms, etc) using mathematical expressions, and Theano
+will optimize and stabilize those expressions for you, and compile
+them to a backend of your choice (CPU or GPU).
+Pylearn2 Vision
+---------------
+* Researchers add features as they need them. We avoid getting bogged down by
+  too much top-down planning in advance.
+* A machine learning toolbox for easy scientific experimentation.
+* All models/algorithms published by the LISA lab should have reference
+  implementations in Pylearn2.
+* Pylearn2 may wrap other libraries such as scikits.learn when this is practical
+* Pylearn2 differs from scikits.learn in that Pylearn2 aims to provide great
+  flexibility and make it possible for a researcher to do almost anything,
+  while scikits.learn aims to work as a "black box" that can produce good
+  results even if the user does not understand the implementation
+* Dataset interface for vector, images, video, ...
+* Small framework for all what is needed for one normal MLP/RBM/SDA/Convolution
+  experiments.
+* *Easy reuse* of sub-component of Pylearn2.
+* Using one sub-component of the library does not force you to use / learn to
+  use all of the other sub-components if you choose not to.
+* Support cross-platform serialization of learned models.
+* Remain approachable enough to be used in the classroom
+Contents
+========
+The structured part of these lab sessions will be a walk-through of the following
+material. Interleaved with this structured part will be blocks of time for
+individual or group work.  The idea is that you can try out Theano and get help
+from gurus on hand if you get stuck.
+.. toctree::
+    introduction
+    theano
+    pylearn2
+    gpundarray
--- a/doc/omlw2014/introduction.txt
+++ b/doc/omlw2014/introduction.txt
+.. _omlw2014_Introduction:
+************
+Introduction
+************
+Python in one slide
+-------------------
+* General-purpose high-level OO interpreted language
+* Emphasizes code readability
+* Comprehensive standard library
+* Dynamic type and memory management
+* Built-in types: int, float, str, list, dict, tuple, object
+* Slow execution
+* Popular in *web-dev* and *scientific communities*
+NumPy in one slide
+------------------
+* Python floats are full-fledged objects on the heap
+ * Not suitable for high-performance computing!
+* NumPy provides a N-dimensional numeric array in Python
+ * Perfect for high-performance computing.
+ * Slice are return view (no copy)
+* NumPy provides
+ * elementwise computations
+ * linear algebra, Fourier transforms
+ * pseudorandom numbers from many distributions
+* SciPy provides lots more, including
+ * more linear algebra
+ * solvers and optimization algorithms
+ * matlab-compatible I/O
+ * I/O and signal processing for images and audio
+.. code-block:: python
+    ##############################
+    # Properties of NumPy arrays
+    # that you really need to know
+    ##############################
+    import numpy as np          # import can rename
+    a = np.random.rand(3, 4, 5) # random generators
+    a32 = a.astype('float32')   # arrays are strongly typed
+    a.ndim                      # int: 3
+    a.shape                     # tuple: (3, 4, 5)
+    a.size                      # int: 60
+    a.dtype                     # np.dtype object: 'float64'
+    a32.dtype                   # np.dtype object: 'float32'
+    assert a[1, 1, 1] != 10     # a[1, 1, 1] is a view
+    a[1, 1, 1] = 10             # So affectation to it change the
+    assert a[1, 1, 1] == 10     # original array
+Arrays can be combined with numeric operators, standard mathematical
+functions. NumPy has great `documentation <http://docs.scipy.org/doc/numpy/reference/>`_.
+Training an MNIST-ready classification neural network in pure NumPy might look like this:
+.. code-block:: python
+    #########################
+    # NumPy for Training a
+    # Neural Network on MNIST
+    #########################
+    x = np.load('data_x.npy')
+    y = np.load('data_y.npy')
+    w = np.random.normal(
+        avg=0,
+        std=.1,
+        size=(784, 500))
+    b = np.zeros((500,))
+    v = np.zeros((500, 10))
+    c = np.zeros((10,))
+    batchsize = 100
+    for i in xrange(1000):
+        x_i = x[i * batchsize: (i + 1) * batchsize]
+        y_i = y[i * batchsize: (i + 1) * batchsize]
+        hidin = np.dot(x_i, w) + b
+        hidout = np.tanh(hidin)
+        outin = np.dot(hidout, v) + c
+        outout = (np.tanh(outin) + 1) / 2.0
+        g_outout = outout - y_i
+        err = 0.5 * np.sum(g_outout) ** 2
+        g_outin = g_outout * outout * (1.0 - outout)
+        g_hidout = np.dot(g_outin, v.T)
+        g_hidin = g_hidout * (1 - hidout ** 2)
+        b -= lr * np.sum(g_hidin, axis=0)
+        c -= lr * np.sum(g_outin, axis=0)
+        w -= lr * np.dot(x_i.T, g_hidin)
+        v -= lr * np.dot(hidout.T, g_outin)
+What's missing?
+---------------
+* Non-lazy evaluation (required by Python) hurts performance
+* NumPy is bound to the CPU
+* NumPy lacks symbolic or automatic differentiation
+Now let's have a look at the same algorithm in Theano, which runs 15 times faster if
+you have GPU (I'm skipping some dtype-details which we'll come back to).
+.. code-block:: python
+    #########################
+    # Theano for Training a
+    # Neural Network on MNIST
+    #########################
+    import numpy as np
+    import theano
+    import theano.tensor as tensor
+    x = np.load('data_x.npy')
+    y = np.load('data_y.npy')
+    # symbol declarations
+    sx = tensor.matrix()
+    sy = tensor.matrix()
+    w = theano.shared(np.random.normal(avg=0, std=.1,
+                                       size=(784, 500)))
+    b = theano.shared(np.zeros(500))
+    v = theano.shared(np.zeros((500, 10)))
+    c = theano.shared(np.zeros(10))
+    # symbolic expression-building
+    hid = tensor.tanh(tensor.dot(sx, w) + b)
+    out = tensor.tanh(tensor.dot(hid, v) + c)
+    err = 0.5 * tensor.sum(out - sy) ** 2
+    gw, gb, gv, gc = tensor.grad(err, [w, b, v, c])
+    # compile a fast training function
+    train = theano.function([sx, sy], err,
+        updates={
+            w: w - lr * gw,
+            b: b - lr * gb,
+            v: v - lr * gv,
+            c: c - lr * gc})
+    # now do the computations
+    batchsize = 100
+    for i in xrange(1000):
+        x_i = x[i * batchsize: (i + 1) * batchsize]
+        y_i = y[i * batchsize: (i + 1) * batchsize]
+        err_i = train(x_i, y_i)
+Theano in one slide
+-------------------
+* High-level domain-specific language tailored to numeric computation
+* Compiles most common expressions to C for CPU and GPU.
+* Limited expressivity means lots of opportunities for expression-level optimizations
+ * No function call -> global optimization
+ * Strongly typed -> compiles to machine instructions
+ * Array oriented -> parallelizable across cores
+ * Support for looping and branching in expressions
+* Expression substitution optimizations automatically draw
+  on many backend technologies for best performance.
+ * FFTW, MKL, ATLAS, SciPy, Cython, CUDA
+ * Slower fallbacks always available
+* Automatic differentiation and R op
+* Sparse matrices
+Project status
+--------------
+* Mature: theano has been developed and used since January 2008 (6.5 yrs old)
+* Driven over 100 research papers
+* Good user documentation
+* Active mailing list with participants from outside our lab
+* Core technology for a few funded Silicon-Valley startup
+* Many contributors (some from outside our lab)
+* Used to teach many university classes
+* Used for research at Google and Yahoo.
+* Downloads
+ * Pypi (August 18th 2014, the last release): 255 last day, 2140 last week, 9145 last month
+ * Github (`bleeding edge` repository, the one recommanded): unknown
+* Github stats?????
+Why scripting for GPUs?
+-----------------------
+They *Complement each other*:
+* GPUs are everything that scripting/high level languages are not
+ * Highly parallel
+ * Very architecture-sensitive
+ * Built for maximum FP/memory throughput
+ * So hard to program that meta-programming is easier.
+* CPU: largely restricted to control
+ * Optimized for sequential code and low latency (rather than high throughput)
+ * Tasks (1000/sec)
+ * Scripting fast enough
+Best of both: scripted CPU invokes JIT-compiled kernels on GPU.
--- a/doc/omlw2014/logreg.py
+++ b/doc/omlw2014/logreg.py
+import numpy
+import theano
+import theano.tensor as tt
+rng = numpy.random
+N = 400
+feats = 784
+D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
+training_steps = 10000
+# Declare Theano symbolic variables
+x = tt.matrix("x")
+y = tt.vector("y")
+w = theano.shared(rng.randn(feats), name="w")
+b = theano.shared(0., name="b")
+print "Initial model:"
+print w.get_value(), b.get_value()
+# Construct Theano expression graph
+p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b))   # Probability that target = 1
+prediction = p_1 > 0.5                      # The prediction thresholded
+xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1)  # Cross-entropy loss
+cost = xent.mean() + 0.01 * (w ** 2).sum()  # The cost to minimize
+gw, gb = tt.grad(cost, [w, b])
+# Compile
+train = theano.function(
+    inputs=[x, y],
+    outputs=[prediction, xent],
+    updates=[(w, w - 0.1 * gw),
+             (b, b - 0.1 * gb)],
+    name='train')
+predict = theano.function(inputs=[x], outputs=prediction,
+                          name='predict')
+# Train
+for i in range(training_steps):
+    pred, err = train(D[0], D[1])
+print "Final model:"
+print w.get_value(), b.get_value()
+print "target values for D:", D[1]
+print "prediction on D:", predict(D[0])
--- a/doc/omlw2014/pylearn2.txt
+++ b/doc/omlw2014/pylearn2.txt
+.. _omlw2014_pylearn2:
+********
+Pylearn2
+********
+Pointers
+--------
+TODO:
+* http://deeplearning.net/software/pylearn2/
+* User mailing list: http://groups.google.com/group/pylearn-users
+* Dev mailing list: http://groups.google.com/group/pylearn-dev
+* Installation: http://deeplearning.net/software/pylearn2/index.html#download-and-installation
+Description
+-----------
+TODO:
+* ...
+Simple example
+--------------
+(logistic regression?) TODO
+Real example
+------------
+(maxout?)TODO
+Known limitations
+-----------------
+TODO
+* It is getting stabilized, but still heavily modified.
--- a/doc/omlw2014/theano.txt
+++ b/doc/omlw2014/theano.txt