提交 0ff016bc authored 作者: lamblin's avatar lamblin

Merge pull request #1392 from nouiz/sigmoid

Add hard_sigmoid, faster, but more approx.
...@@ -15,59 +15,92 @@ ...@@ -15,59 +15,92 @@
:Parameters: *x* - symbolic Tensor (or compatible) :Parameters: *x* - symbolic Tensor (or compatible)
:Return type: same as x :Return type: same as x
:Returns: element-wise sigmoid: :math:`sigmoid(x) = \frac{1}{1 + \exp(-x)}`. :Returns: element-wise sigmoid: :math:`sigmoid(x) = \frac{1}{1 + \exp(-x)}`.
:note: see :func:`ultra_fast_sigmoid` for a faster version :note: see :func:`ultra_fast_sigmoid` or :func:`hard_sigmoid` for faster version.
Speed comparison for 100M float64 element on a Core2 Duo @ 3.16 GHz.
Example: - hard_sigmoid: 1.1s
- ultra_fast_sigmoid: 1.4s
- sigmoid (with amdlibm): 2.3s
- sigmoid (without amdlibm): 3.7s
.. code-block:: python Precision: sigmoid(without or without amdlibm) > ultra_fast_sigmoid > hard_sigmoid.
x,y,b = T.dvectors('x','y','b') .. image:: sigmoid_prec.png
W = T.dmatrix('W')
y = T.nnet.sigmoid(T.dot(W,x) + b)
.. note:: The underlying code will return an exact 0 or 1 if an element of x is too small or too big. Example:
.. code-block:: python
x,y,b = T.dvectors('x','y','b')
W = T.dmatrix('W')
y = T.nnet.sigmoid(T.dot(W,x) + b)
.. note:: The underlying code will return an exact 0 or 1 if an
element of x is too small or too big.
.. function:: ultra_fast_sigmoid(x) .. function:: ultra_fast_sigmoid(x)
Returns the standard sigmoid nonlinearity applied to x Returns the *approximated* standard :func:`sigmoid` nonlinearity applied to x.
:Parameters: *x* - symbolic Tensor (or compatible) :Parameters: *x* - symbolic Tensor (or compatible)
:Return type: same as x :Return type: same as x
:Returns: approximated element-wise sigmoid: :math:`sigmoid(x) = \frac{1}{1 + \exp(-x)}`. :Returns: approximated element-wise sigmoid: :math:`sigmoid(x) = \frac{1}{1 + \exp(-x)}`.
:note: To automatically change all sigmoid op to this version, use :note: To automatically change all :func:`sigmoid` op to this version, use
the Theano optimization ``local_ultra_fast_sigmoid``. This can be done the Theano optimization ``local_ultra_fast_sigmoid``. This can be done
with the Theano flag ``optimizer_including=local_ultra_fast_sigmoid``. with the Theano flag ``optimizer_including=local_ultra_fast_sigmoid``.
This optimization is done late, so it shouldn't affect This optimization is done late, so it shouldn't affect
stabilization optimization. stabilization optimization.
.. note:: The underlying code will return 0.00247262315663 as the
minimum value and 0.997527376843 as the maximum value. So it
never return 0 or 1.
.. function:: hard_sigmoid(x)
Returns the *approximated* standard :func:`sigmoid` nonlinearity applied to x.
:Parameters: *x* - symbolic Tensor (or compatible)
:Return type: same as x
:Returns: approximated element-wise sigmoid: :math:`sigmoid(x) = \frac{1}{1 + \exp(-x)}`.
:note: To automatically change all :func:`sigmoid` op to this version, use
the Theano optimization ``local_hard_sigmoid``. This can be done
with the Theano flag ``optimizer_including=local_hard_sigmoid``.
This optimization is done late, so it shouldn't affect
stabilization optimization.
.. note:: The underlying code will return an exact 0 or 1 if an
element of x is too small or too big.
.. function:: softplus(x) .. function:: softplus(x)
Returns the softplus nonlinearity applied to x Returns the softplus nonlinearity applied to x
:Parameter: *x* - symbolic Tensor (or compatible) :Parameter: *x* - symbolic Tensor (or compatible)
:Return type: same as x :Return type: same as x
:Returns: elementwise softplus: :math:`softplus(x) = \log_e{\left(1 + \exp(x)\right)}`. :Returns: elementwise softplus: :math:`softplus(x) = \log_e{\left(1 + \exp(x)\right)}`.
.. note:: The underlying code will return an exact 0 if an element of x is too small. .. note:: The underlying code will return an exact 0 if an element of x is too small.
.. code-block:: python .. code-block:: python
x,y,b = T.dvectors('x','y','b') x,y,b = T.dvectors('x','y','b')
W = T.dmatrix('W') W = T.dmatrix('W')
y = T.nnet.softplus(T.dot(W,x) + b) y = T.nnet.softplus(T.dot(W,x) + b)
.. function:: softmax(x) .. function:: softmax(x)
Returns the softmax function of x: Returns the softmax function of x:
:Parameter: *x* symbolic **2D** Tensor (or compatible). :Parameter: *x* symbolic **2D** Tensor (or compatible).
:Return type: same as x :Return type: same as x
:Returns: a symbolic 2D tensor whose ijth element is :math:`softmax_{ij}(x) = \frac{\exp{x_{ij}}}{\sum_k\exp(x_{ik})}`. :Returns: a symbolic 2D tensor whose ijth element is :math:`softmax_{ij}(x) = \frac{\exp{x_{ij}}}{\sum_k\exp(x_{ik})}`.
The softmax function will, when applied to a matrix, compute the softmax values row-wise. The softmax function will, when applied to a matrix, compute the softmax values row-wise.
.. code-block:: python .. code-block:: python
x,y,b = T.dvectors('x','y','b') x,y,b = T.dvectors('x','y','b')
W = T.dmatrix('W') W = T.dmatrix('W')
y = T.nnet.softmax(T.dot(W,x) + b) y = T.nnet.softmax(T.dot(W,x) + b)
.. function:: binary_crossentropy(output,target) .. function:: binary_crossentropy(output,target)
...@@ -78,27 +111,27 @@ The softmax function will, when applied to a matrix, compute the softmax values ...@@ -78,27 +111,27 @@ The softmax function will, when applied to a matrix, compute the softmax values
* *output* - symbolic Tensor (or compatible) * *output* - symbolic Tensor (or compatible)
:Return type: same as target :Return type: same as target
:Returns: a symbolic tensor, where the following is applied elementwise :math:`crossentropy(t,o) = -(t\cdot log(o) + (1 - t) \cdot log(1 - o))`. :Returns: a symbolic tensor, where the following is applied elementwise :math:`crossentropy(t,o) = -(t\cdot log(o) + (1 - t) \cdot log(1 - o))`.
The following block implements a simple auto-associator with a sigmoid The following block implements a simple auto-associator with a
nonlinearity and a reconstruction error which corresponds to the binary sigmoid nonlinearity and a reconstruction error which corresponds
cross-entropy (note that this assumes that x will contain values between 0 and to the binary cross-entropy (note that this assumes that x will
1): contain values between 0 and 1):
.. code-block:: python .. code-block:: python
x, y, b = T.dvectors('x', 'y', 'b') x, y, b = T.dvectors('x', 'y', 'b')
W = T.dmatrix('W') W = T.dmatrix('W')
h = T.nnet.sigmoid(T.dot(W, x) + b) h = T.nnet.sigmoid(T.dot(W, x) + b)
x_recons = T.nnet.sigmoid(T.dot(V, h) + c) x_recons = T.nnet.sigmoid(T.dot(V, h) + c)
recon_cost = T.nnet.binary_crossentropy(x_recons, x).mean() recon_cost = T.nnet.binary_crossentropy(x_recons, x).mean()
.. function:: categorical_crossentropy(coding_dist,true_dist) .. function:: categorical_crossentropy(coding_dist,true_dist)
Return the cross-entropy between an approximating distribution and a true distribution. Return the cross-entropy between an approximating distribution and a true distribution.
The cross entropy between two probability distributions measures the average number of bits The cross entropy between two probability distributions measures the average number of bits
needed to identify an event from a set of possibilities, if a coding scheme is used based needed to identify an event from a set of possibilities, if a coding scheme is used based
on a given probability distribution q, rather than the "true" distribution p. Mathematically, this on a given probability distribution q, rather than the "true" distribution p. Mathematically, this
function computes :math:`H(p,q) = - \sum_x p(x) \log(q(x))`, where function computes :math:`H(p,q) = - \sum_x p(x) \log(q(x))`, where
p=true_dist and q=coding_dist. p=true_dist and q=coding_dist.
...@@ -112,15 +145,17 @@ cross-entropy (note that this assumes that x will contain values between 0 and ...@@ -112,15 +145,17 @@ cross-entropy (note that this assumes that x will contain values between 0 and
:Return type: tensor of rank one-less-than `coding_dist` :Return type: tensor of rank one-less-than `coding_dist`
.. note:: An application of the scenario where *true_dist* has a 1-of-N representation .. note:: An application of the scenario where *true_dist* has a
is in classification with softmax outputs. If `coding_dist` is the output of 1-of-N representation is in classification with softmax
the softmax and `true_dist` is a vector of correct labels, then the function outputs. If `coding_dist` is the output of the softmax and
will compute ``y_i = - \log(coding_dist[i, one_of_n[i]])``, which corresponds `true_dist` is a vector of correct labels, then the function
to computing the neg-log-probability of the correct class (which is typically will compute ``y_i = - \log(coding_dist[i, one_of_n[i]])``,
the training criterion in classification settings). which corresponds to computing the neg-log-probability of the
correct class (which is typically the training criterion in
classification settings).
.. code-block:: python .. code-block:: python
y = T.nnet.softmax(T.dot(W, x) + b) y = T.nnet.softmax(T.dot(W, x) + b)
cost = T.nnet.categorical_crossentropy(y, o) cost = T.nnet.categorical_crossentropy(y, o)
# o is either the above-mentioned 1-of-N vector or 2D tensor # o is either the above-mentioned 1-of-N vector or 2D tensor
...@@ -4,4 +4,5 @@ from Conv3D import * ...@@ -4,4 +4,5 @@ from Conv3D import *
from ConvGrad3D import * from ConvGrad3D import *
from ConvTransp3D import * from ConvTransp3D import *
from sigm import (softplus, sigmoid, sigmoid_inplace, from sigm import (softplus, sigmoid, sigmoid_inplace,
scalar_sigmoid, ultra_fast_sigmoid) scalar_sigmoid, ultra_fast_sigmoid,
hard_sigmoid)
...@@ -112,6 +112,42 @@ for i in xrange(750): ...@@ -112,6 +112,42 @@ for i in xrange(750):
""" % locals() """ % locals()
raise theano.gof.utils.MethodNotDefined() raise theano.gof.utils.MethodNotDefined()
@staticmethod
def gen_graph():
"""
This method was used to generate the graph: sigmoid_prec.png in the doc
"""
import matplotlib
data = numpy.arange(-15, 15, .1)
val = 1/(1+numpy.exp(-data))
def hard_sigmoid(x):
return theano.tensor.nnet.hard_sigmoid(x)
def ultra_fast_sigmoid(x):
return theano.tensor.nnet.ultra_fast_sigmoid(x)
val_hard = hard_sigmoid(data).eval()
val_ultra = ultra_fast_sigmoid(data).eval()
import matplotlib.pyplot as plt
import os
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(data, val)#, 'o-')
ax.plot(data, val_ultra)#, '-')
ax.plot(data, val_hard)#, '-')
ax.grid(True)
ax.legend(("sigmoid", "ultra_fast", "hard"), "upper left")
fname = os.path.join(os.path.dirname(theano.__file__), '..',
'doc', 'library', 'tensor', 'nnet',
'sigmoid_prec.png')
plt.savefig(fname)
print "New picture saved at", fname
print val_ultra.max()
print val_ultra.min()
scalar_sigmoid = ScalarSigmoid(scalar.upgrade_to_float, name='scalar_sigmoid') scalar_sigmoid = ScalarSigmoid(scalar.upgrade_to_float, name='scalar_sigmoid')
sigmoid = elemwise.Elemwise(scalar_sigmoid, name='sigmoid') sigmoid = elemwise.Elemwise(scalar_sigmoid, name='sigmoid')
...@@ -210,7 +246,6 @@ def local_ultra_fast_sigmoid(node): ...@@ -210,7 +246,6 @@ def local_ultra_fast_sigmoid(node):
if (isinstance(node.op, tensor.Elemwise) and if (isinstance(node.op, tensor.Elemwise) and
node.op.scalar_op == scalar_sigmoid): node.op.scalar_op == scalar_sigmoid):
out = ultra_fast_sigmoid(node.inputs[0]) out = ultra_fast_sigmoid(node.inputs[0])
out2 = ultra_fast_sigmoid(node.inputs[0])
def values_eq_approx_remove_low_prec(a, b): def values_eq_approx_remove_low_prec(a, b):
# atol is found by trial/error. # atol is found by trial/error.
...@@ -223,6 +258,41 @@ theano.compile.optdb['uncanonicalize'].register("local_ultra_fast_sigmoid", ...@@ -223,6 +258,41 @@ theano.compile.optdb['uncanonicalize'].register("local_ultra_fast_sigmoid",
local_ultra_fast_sigmoid) local_ultra_fast_sigmoid)
def hard_sigmoid(x):
"""An approximation of sigmoid.
More approximate and faster then ultra_fast_sigmoid.
Approx in 3 parts: 0, scaled linear, 1
Removing the slop and shift don't make it faster.
"""
slop = 0.2
shift = 0.5
x = (x * 0.2) + shift
x = tensor.clip(x, 0, 1)
return x
#@opt.register_uncanonicalize
@gof.local_optimizer([sigmoid])
def local_hard_sigmoid(node):
if (isinstance(node.op, tensor.Elemwise) and
node.op.scalar_op == scalar_sigmoid):
out = hard_sigmoid(node.inputs[0])
def values_eq_approx_remove_low_prec(a, b):
# atol is found by trial/error.
# Other test could fail without good reason.
return tensor.TensorType.values_eq_approx(a, b, atol=0.1)
# Let DebugMode know that there this opt approx the values.
out.values_eq_approx = values_eq_approx_remove_low_prec
return [out]
theano.compile.optdb['uncanonicalize'].register("local_hard_sigmoid",
local_hard_sigmoid)
class ScalarSoftplus(scalar.UnaryScalarOp): class ScalarSoftplus(scalar.UnaryScalarOp):
@staticmethod @staticmethod
def static_impl(x): def static_impl(x):
......
...@@ -9,7 +9,7 @@ from theano import tensor as T ...@@ -9,7 +9,7 @@ from theano import tensor as T
from theano import config from theano import config
from theano.tests import unittest_tools as utt from theano.tests import unittest_tools as utt
from theano.tensor.nnet import (sigmoid, sigmoid_inplace, from theano.tensor.nnet import (sigmoid, sigmoid_inplace,
softplus, ultra_fast_sigmoid) softplus, ultra_fast_sigmoid, hard_sigmoid)
from theano.tensor.nnet.sigm import ( from theano.tensor.nnet.sigm import (
compute_mul, is_1pexp, parse_mul_tree, perform_sigm_times_exp, compute_mul, is_1pexp, parse_mul_tree, perform_sigm_times_exp,
register_local_1msigmoid, simplify_mul, register_local_1msigmoid, simplify_mul,
...@@ -46,6 +46,16 @@ UltraFastSigmoidTester = makeBroadcastTester( ...@@ -46,6 +46,16 @@ UltraFastSigmoidTester = makeBroadcastTester(
# This is an approx of the sigmoid. That is why we raise eps # This is an approx of the sigmoid. That is why we raise eps
eps=5e-2) eps=5e-2)
HardSigmoidTester = makeBroadcastTester(
op=hard_sigmoid,
expected=lambda inputs: check_floatX(
inputs, 1/(1+numpy.exp(-inputs))),
good=_good_broadcast_unary_normal_no_complex,
#grad=_grad_broadcast_unary_normal,
name='UltraFastSigmoidTester',
# This is an approx of the sigmoid. That is why we raise eps
eps=1e-1)
SoftplusTester = makeBroadcastTester( SoftplusTester = makeBroadcastTester(
op=softplus, op=softplus,
...@@ -295,11 +305,32 @@ class T_sigmoid_opts(unittest.TestCase): ...@@ -295,11 +305,32 @@ class T_sigmoid_opts(unittest.TestCase):
mode = self.get_mode('local_ultra_fast_sigmoid') mode = self.get_mode('local_ultra_fast_sigmoid')
f = theano.function([x], s, mode=mode) f = theano.function([x], s, mode=mode)
assert f.maker.fgraph.toposort()[0].op == sigmoid topo = f.maker.fgraph.toposort()
assert len(topo) == 1
assert topo[0].op == sigmoid
mode = self.get_mode().including('local_ultra_fast_sigmoid') mode = self.get_mode().including('local_ultra_fast_sigmoid')
f = theano.function([x], s, mode=mode) f = theano.function([x], s, mode=mode)
assert f.maker.fgraph.toposort()[0].op == ultra_fast_sigmoid topo = f.maker.fgraph.toposort()
assert topo[0].op == ultra_fast_sigmoid
assert len(topo) == 1
ux_v = f([[-50, -10, -4, -1, 0, 1, 4, 10, 50]])
def test_local_hard_sigmoid(self):
x = tensor.matrix('x')
s = sigmoid(x)
mode = self.get_mode('local_hard_sigmoid')
f = theano.function([x], s, mode=mode)
topo = f.maker.fgraph.toposort()
assert topo[0].op == sigmoid
assert len(topo) == 1
mode = self.get_mode().including('local_hard_sigmoid')
f = theano.function([x], s, mode=mode)
topo = f.maker.fgraph.toposort()
assert len(topo) > 1
assert not any([n.op == sigmoid for n in topo])
ux_v = f([[-50, -10, -4, -1, 0, 1, 4, 10, 50]]) ux_v = f([[-50, -10, -4, -1, 0, 1, 4, 10, 50]])
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论