提交 8769382f authored 作者: Ozan Çağlayan's avatar Ozan Çağlayan

scan/scan_op: Convert known_grads to OrderedDict

This was probably causing a different order of operation during gradient computation in scan for each run. With this fix I'm able to finally reproduce results on my RNN system.
上级 5f0cf1c3
......@@ -2024,7 +2024,7 @@ class Scan(PureOp):
# it will be the sum of the external gradient signal and the
# gradient obtained by propagating Y's external gradient signal
# to X.
known_grads = dict([(k.copy(), v) for (k, v) in known_grads.items()])
known_grads = OrderedDict([(k.copy(), v) for (k, v) in known_grads.items()])
grads = gradient.grad(
cost=None,
......@@ -2094,7 +2094,7 @@ class Scan(PureOp):
dC_dXts.append(dC_dXt)
known_grads = {}
known_grads = OrderedDict()
dc_dxts_idx = 0
for i in range(len(diff_outputs)):
if i < idx_nitsot_start or i >= idx_nitsot_end:
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论