提交 c8bc4549 authored 作者: AdeB's avatar AdeB

Force the user to specify both n_classes and n_outputs_per_class in the h_softmax

上级 cb59e785
...@@ -2053,7 +2053,7 @@ def relu(x, alpha=0): ...@@ -2053,7 +2053,7 @@ def relu(x, alpha=0):
def h_softmax(x, batch_size, n_outputs, W1, b1, W2, b2, def h_softmax(x, batch_size, n_outputs, W1, b1, W2, b2,
n_classes=None, n_outputs_per_class=None, target=None): n_classes, n_outputs_per_class, target=None):
""" Two-level hierarchical softmax. """ Two-level hierarchical softmax.
Outputs are grouped in sqrt(n_outputs) classes. Outputs are grouped in sqrt(n_outputs) classes.
...@@ -2081,15 +2081,12 @@ def h_softmax(x, batch_size, n_outputs, W1, b1, W2, b2, ...@@ -2081,15 +2081,12 @@ def h_softmax(x, batch_size, n_outputs, W1, b1, W2, b2,
the number of outputs. the number of outputs.
n_classes: int n_classes: int
(optional, default None)
the number of classes of the two-layer hierarchical softmax. It the number of classes of the two-layer hierarchical softmax. It
corresponds to the number of outputs of the first softmax. It can be corresponds to the number of outputs of the first softmax. See note at
set to None, see the note at the end of the docstring. the end.
n_outputs_per_class: int n_outputs_per_class: int
(optional, default None) the number of outputs per class. See note at the end.
the number of outputs per class. It can be set to None, see the note
at the end of the docstring.
W1: tensor of shape (number of features of the input x, number of classes) W1: tensor of shape (number of features of the input x, number of classes)
the weight matrix of the first softmax, which maps the input x to the the weight matrix of the first softmax, which maps the input x to the
...@@ -2115,25 +2112,15 @@ def h_softmax(x, batch_size, n_outputs, W1, b1, W2, b2, ...@@ -2115,25 +2112,15 @@ def h_softmax(x, batch_size, n_outputs, W1, b1, W2, b2,
Notes Notes
----- -----
n_outputs_per_class and n_classes do not need to be defined. If The product of n_outputs_per_class and n_classes has to be greater or equal
both are not defined, then they are set to the square root of the to n_outputs. If it is strictly greater, then the irrelevant outputs will
number of outputs, which is the most computational efficient be ignored.
configuration. If only one is defined, the other is inferred so that n_outputs_per_class and n_classes have to be the same as the corresponding
their product equals the number of outputs n_outputs (more precisely it is dimensions of the tensors of W1, b1, W2 and b2.
the smallest integer such that their product is greater or equal to The most computational efficient configuration is when n_outputs_per_class
n_outputs). and n_classes are equal to the square root of n_outputs.
""" """
# In case one or both of n_outputs_per_class and n_classes are not defined
if not n_outputs_per_class and not n_classes:
n_outputs_per_class = numpy.ceil(numpy.sqrt(n_outputs))
n_classes = numpy.ceil(n_outputs / n_outputs_per_class)
elif n_outputs_per_class and not n_classes:
n_classes = numpy.ceil(n_outputs / n_outputs_per_class)
elif n_classes and not n_outputs_per_class:
n_outputs_per_class = numpy.ceil(n_outputs / n_classes)
# First softmax that computes the probabilities of belonging to each class # First softmax that computes the probabilities of belonging to each class
class_probs = theano.tensor.nnet.softmax(tensor.dot(x, W1) + b1) class_probs = theano.tensor.nnet.softmax(tensor.dot(x, W1) + b1)
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论