replace grad to be sit_sot sequences
The main bug was gradients where represented as shared variables.
Now we represent them as sit_sot sequences to which only the last step
is used (hence the savemem optimization does the memory clean up). The
advantage is that gradients with respect to sitsot are well defined.
正在显示
请
注册
或者
登录
后发表评论