-
由 Razvan Pascanu 提交于
The main bug was gradients where represented as shared variables. Now we represent them as sit_sot sequences to which only the last step is used (hence the savemem optimization does the memory clean up). The advantage is that gradients with respect to sitsot are well defined.
b392cc03