Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
1e768a58
提交
1e768a58
authored
9月 03, 2015
作者:
AdeB
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Two-layer hierarchical softmax
上级
7415e2f0
隐藏空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
124 行增加
和
0 行删除
+124
-0
nnet.py
theano/tensor/nnet/nnet.py
+124
-0
没有找到文件。
theano/tensor/nnet/nnet.py
浏览文件 @
1e768a58
...
...
@@ -29,6 +29,7 @@ from theano.gof import Apply
from
theano.tensor.nnet.sigm
import
sigmoid
,
softplus
from
theano.gradient
import
DisconnectedType
from
theano.gradient
import
grad_not_implemented
from
theano.sandbox.blocksparse
import
sparse_block_dot
from
theano.tensor.type
import
values_eq_approx_remove_nan
...
...
@@ -2049,3 +2050,126 @@ def relu(x, alpha=0):
f1
=
0.5
*
(
1
+
alpha
)
f2
=
0.5
*
(
1
-
alpha
)
return
f1
*
x
+
f2
*
abs
(
x
)
def
h_softmax
(
x
,
batch_size
,
n_outputs
,
W1
,
b1
,
W2
,
b2
,
n_classes
=
None
,
n_outputs_per_class
=
None
,
target
=
None
):
""" Two-level hierarchical softmax.
Outputs are grouped in sqrt(n_outputs) classes.
The architecture is composed of two softmax layers: the first predicts the
class of the input x while the second predicts the output of the input x in
the predicted class.
More explanations can be found in the original paper:
http://arxiv.org/abs/cs/0108006.
If target is specified, it will only compute the outputs of the
corresponding targets. Otherwise, if target is None, it will compute all
the outputs.
The outputs are grouped in the same order as they are initially defined.
Arguments:
----------
x: tensor of shape (batch_size, number of features)
the minibatch input of the two-layer hierarchical softmax.
batch_size: int
the size of the minibatch input x.
n_outputs: int
the number of outputs.
n_classes: int
(optional, default None)
the number of classes of the two-layer hierarchical softmax. It
corresponds to the number of outputs of the first softmax. It can be
set to None, see the note at the end of the docstring.
n_outputs_per_class: int
(optional, default None)
the number of outputs per class. It can be set to None, see the note
at the end of the docstring.
W1: tensor of shape (number of features of the input x, number of classes)
the weight matrix of the first softmax, which maps the input x to the
probabilities of the classes.
b1: tensor of shape (number of classes,)
the bias vector of the first softmax layer.
W2: tensor of shape (number of classes, number of features of the input x,
number of outputs per class)
the weight matrix of the second softmax, which maps the input x to
the probabilities of the outputs.
b2: tensor of shape (number of classes, number of outputs per class)
the bias vector of the second softmax layer.
target: tensor of shape either (batch_size,) or (batch_size, 1)
(optional, default None)
contains the indices of the targets for the minibatch
input x. For each input, the function computes the output for its
corresponding target. If target is None, then all the outputs are
computed for each input.
:note: n_outputs_per_class and n_classes do not need to be defined. If
both are not defined, then they are set to the square root of the
number of outputs, which is the most computational efficient
configuration. If only one is defined
"""
# In case one or both of n_outputs_per_class and n_classes are not defined
if
not
n_outputs_per_class
and
not
n_classes
:
n_outputs_per_class
=
numpy
.
ceil
(
numpy
.
sqrt
(
n_outputs
))
n_classes
=
numpy
.
ceil
(
n_outputs
/
n_outputs_per_class
)
elif
n_outputs_per_class
and
not
n_classes
:
n_classes
=
numpy
.
ceil
(
n_outputs
/
n_outputs_per_class
)
elif
n_classes
and
not
n_outputs_per_class
:
n_outputs_per_class
=
numpy
.
ceil
(
n_outputs
/
n_classes
)
# First softmax that computes the probabilities of belonging to each class
class_probs
=
theano
.
tensor
.
nnet
.
softmax
(
tensor
.
dot
(
x
,
W1
)
+
b1
)
if
target
is
None
:
# Computes the probabilites of all the outputs
class_ids
=
tensor
.
tile
(
tensor
.
arange
(
n_classes
,
dtype
=
"int32"
)[
None
,
:],
(
batch_size
,
1
))
# Second softmax that computes the output probabilities
activations
=
sparse_block_dot
(
W2
[
None
,
:,
:,
:],
x
[:,
None
,
:],
tensor
.
zeros
((
batch_size
,
1
),
dtype
=
'int32'
),
b2
,
class_ids
)
output_probs
=
theano
.
tensor
.
nnet
.
softmax
(
activations
.
reshape
((
-
1
,
n_outputs_per_class
)))
output_probs
=
output_probs
.
reshape
((
batch_size
,
n_classes
,
-
1
))
output_probs
=
class_probs
[:,
:,
None
]
*
output_probs
output_probs
=
output_probs
.
reshape
((
batch_size
,
-
1
))
output_probs
=
output_probs
[:,
:
n_outputs
]
else
:
# Computes the probabilities of the outputs specified by the targets
target
=
target
.
flatten
()
# Classes to which belong each target
target_classes
=
target
//
n_outputs_per_class
# Outputs to which belong each target inside a class
target_outputs_in_class
=
target
%
n_classes
# Second softmax that computes the output probabilities
activations
=
sparse_block_dot
(
W2
[
None
,
:,
:,
:],
x
[:,
None
,
:],
tensor
.
zeros
((
batch_size
,
1
),
dtype
=
'int32'
),
b2
,
target_classes
[:,
None
])
output_probs
=
theano
.
tensor
.
nnet
.
softmax
(
activations
[:,
0
,
:])
target_class_probs
=
class_probs
[
tensor
.
arange
(
batch_size
),
target_classes
]
output_probs
=
output_probs
[
tensor
.
arange
(
batch_size
),
target_outputs_in_class
]
output_probs
=
target_class_probs
*
output_probs
return
output_probs
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论