Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
e6a3b009
提交
e6a3b009
authored
10月 24, 2016
作者:
Alexandre de Brébisson
提交者:
Mathieu Germain
10月 24, 2016
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Improve docstring of h_softmax and add basic example (#5095)
上级
bfde042c
隐藏空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
86 行增加
和
24 行删除
+86
-24
nnet.py
theano/tensor/nnet/nnet.py
+86
-24
没有找到文件。
theano/tensor/nnet/nnet.py
浏览文件 @
e6a3b009
...
@@ -2235,16 +2235,25 @@ def h_softmax(x, batch_size, n_outputs, n_classes, n_outputs_per_class,
...
@@ -2235,16 +2235,25 @@ def h_softmax(x, batch_size, n_outputs, n_classes, n_outputs_per_class,
W1
,
b1
,
W2
,
b2
,
target
=
None
):
W1
,
b1
,
W2
,
b2
,
target
=
None
):
""" Two-level hierarchical softmax.
""" Two-level hierarchical softmax.
The architecture is composed of two softmax layers: the first predicts the
This function implements a two-layer hierarchical softmax. It is commonly
class of the input x while the second predicts the output of the input x in
used as an alternative of the softmax when the number of outputs is
the predicted class.
important (it is common to use it for millions of outputs). See
More explanations can be found in the original paper [1]_.
reference [1]_ for more information about the computational gains.
If target is specified, it will only compute the outputs of the
The `n_outputs` outputs are organized in `n_classes` classes, each class
corresponding targets. Otherwise, if target is None, it will compute all
containing the same number `n_outputs_per_class` of outputs.
the outputs.
For an input `x` (last hidden activation), the first softmax layer predicts
its class and the second softmax layer predicts its output among its class.
The outputs are grouped in the same order as they are initially defined.
If `target` is specified, it will only compute the outputs of the
corresponding targets. Otherwise, if `target` is `None`, it will compute
all the outputs.
The outputs are grouped in classes in the same order as they are initially
defined: if `n_outputs=10` and `n_classes=2`, then the first class is
composed of the outputs labeled `{0,1,2,3,4}` while the second class is
composed of `{5,6,7,8,9}`. If you need to change the classes, you have to
re-label your outputs.
.. versionadded:: 0.7.1
.. versionadded:: 0.7.1
...
@@ -2267,7 +2276,8 @@ def h_softmax(x, batch_size, n_outputs, n_classes, n_outputs_per_class,
...
@@ -2267,7 +2276,8 @@ def h_softmax(x, batch_size, n_outputs, n_classes, n_outputs_per_class,
probabilities of the classes.
probabilities of the classes.
b1: tensor of shape (n_classes,)
b1: tensor of shape (n_classes,)
the bias vector of the first softmax layer.
the bias vector of the first softmax layer.
W2: tensor of shape (n_classes, number of features of the input x, n_outputs_per_class)
W2: tensor of shape (n_classes, number of features of the input x,
n_outputs_per_class)
the weight matrix of the second softmax, which maps the input x to
the weight matrix of the second softmax, which maps the input x to
the probabilities of the outputs.
the probabilities of the outputs.
b2: tensor of shape (n_classes, n_outputs_per_class)
b2: tensor of shape (n_classes, n_outputs_per_class)
...
@@ -2281,22 +2291,74 @@ def h_softmax(x, batch_size, n_outputs, n_classes, n_outputs_per_class,
...
@@ -2281,22 +2291,74 @@ def h_softmax(x, batch_size, n_outputs, n_classes, n_outputs_per_class,
Returns
Returns
-------
-------
output_probs: tensor of shape (batch_size, n_outputs) or (batch_size, 1)
tensor of shape (`batch_size`, `n_outputs`) or (`batch_size`, 1)
Output of the two-layer hierarchical softmax for input x. If target is
Output tensor of the two-layer hierarchical softmax for input `x`.
not specified (None), then all the outputs are computed and the
Depending on argument `target`, it can have two different shapes.
returned tensor has shape (batch_size, n_outputs). Otherwise, when
If `target` is not specified (`None`), then all the outputs are
target is specified, only the corresponding outputs are computed and
computed and the returned tensor has shape (`batch_size`, `n_outputs`).
the returned tensor has thus shape (batch_size, 1).
Otherwise, when `target` is specified, only the corresponding outputs
are computed and the returned tensor has thus shape (`batch_size`, 1).
Notes
Notes
-----
-----
The product of n_outputs_per_class and n_classes has to be greater or equal
The product of `n_outputs_per_class` and `n_classes` has to be greater or
to n_outputs. If it is strictly greater, then the irrelevant outputs will
equal to `n_outputs`. If it is strictly greater, then the irrelevant
be ignored.
outputs will be ignored.
n_outputs_per_class and n_classes have to be the same as the corresponding
`n_outputs_per_class` and `n_classes` have to be the same as the
dimensions of the tensors of W1, b1, W2 and b2.
corresponding dimensions of the tensors of `W1`, `b1`, `W2` and `b2`.
The most computational efficient configuration is when n_outputs_per_class
The most computational efficient configuration is when
and n_classes are equal to the square root of n_outputs.
`n_outputs_per_class` and `n_classes` are equal to the square root of
`n_outputs`.
Examples
--------
The following example builds a simple hierarchical softmax layer.
>>> import numpy as np
>>> import theano
>>> from theano import tensor
>>> from theano.tensor.nnet import h_softmax
>>>
>>> # Parameters
>>> batch_size = 32
>>> n_outputs = 100
>>> dim_x = 10 # dimension of the input
>>> n_classes = int(np.ceil(np.sqrt(n_outputs)))
>>> n_outputs_per_class = n_classes
>>> output_size = n_outputs_per_class * n_outputs_per_class
>>>
>>> # First level of h_softmax
>>> W1 = theano.shared(np.asarray(
... np.random.normal(0, 0.001, (dim_x, n_classes))))
>>> b1 = theano.shared(np.asarray(np.zeros((n_classes,))))
>>>
>>> # Second level of h_softmax
>>> W2 = np.asarray(np.random.normal(0, 0.001,
... size=(n_classes, dim_x, n_outputs_per_class)))
>>> W2 = theano.shared(W2)
>>> b2 = theano.shared(
... np.asarray(np.zeros((n_classes, n_outputs_per_class))))
>>>
>>> # We can now build the graph to compute a loss function, typically the
>>> # negative log-likelihood:
>>>
>>> x = tensor.imatrix('x')
>>> target = tensor.imatrix('target')
>>>
>>> # This only computes the output corresponding to the target.
>>> # The complexity is O(n_classes + n_outputs_per_class).
>>> y_hat_tg = h_softmax(x, batch_size, output_size, n_classes,
... n_outputs_per_class, W1, b1, W2, b2, target)
>>>
>>> negll = -tensor.mean(tensor.log(y_hat_tg))
>>>
>>> # We may need to compute all the outputs (at test time usually):
>>>
>>> # This computes all the outputs.
>>> # The complexity is O(n_classes * n_outputs_per_class).
>>> output = h_softmax(x, batch_size, output_size, n_classes,
... n_outputs_per_class, W1, b1, W2, b2)
References
References
----------
----------
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论