提交 · a5010fe7ad1c8a19e822808d2e5ff89909b08739 · testgroup / pytensor

13 8月, 2014 40 次提交
- Fix error message for sgemv to say "Sgemv" rather than "Sgemm" · a5010fe7
  由 Arnaud Bergeron 提交于 8月 06, 2014
  
  a5010fe7
- Limit the total size of blocks to 512 and the size of the grids to 65535. · 5e69ec44
  由 Arnaud Bergeron 提交于 8月 05, 2014
```
This should help older GPUs run at all and newer GPUs fit more blocks
on one SM.

With this change the code is cc 2.0+ compatible.  But it will only be
fast on cc 3.0+ cards (due to atomicAdd).
```
  5e69ec44
- Remove the need for an intermediate buffer with a custom SgemvBatched kernel. · b4b6a31e
  由 Arnaud Bergeron 提交于 8月 01, 2014
```
Also some small kernel speedups elsewhere.
```
  b4b6a31e
- Fix the stupid scheduling for better performance (should be much faster). · 496cb1c7
  由 Arnaud Bergeron 提交于 8月 01, 2014
```
Also address some other issues that came up in code review.
```
  496cb1c7
- Make a custom ger kernel that uses atomicAdd to do the addition · 5e9c7bce
  由 Arnaud Bergeron 提交于 7月 31, 2014
```
Remove the beta parameter since it's always 1 anyway.
```
  5e9c7bce
- Enable debugging of kernels. · 52cd5ee4
  由 Arnaud Bergeron 提交于 7月 31, 2014
  
  52cd5ee4
- Add support for dimensions of size 1 in all cases. · d1f762aa
  由 Arnaud Bergeron 提交于 7月 30, 2014
  
  d1f762aa
- Remove the python version of these ops as it laughably slow and forces · 8e23c533
  由 Arnaud Bergeron 提交于 7月 30, 2014
```
a dependecy on scikits.cuda and pycuda.
```
  8e23c533
- Update docs to reflect batches and add some fallback code to add batches of 1 to… · d3088260
  由 Arnaud Bergeron 提交于 7月 22, 2014
```
Update docs to reflect batches and add some fallback code to add batches of 1 to non-batched version.
```
  d3088260
- Add batch support to blocksparse. · 42f4cb3e
  由 Arnaud Bergeron 提交于 7月 22, 2014
  
  42f4cb3e
- Use the right spelling for config.unittests.rseed. · 47d59687
  由 Arnaud Bergeron 提交于 7月 22, 2014
  
  47d59687
- Now the opt actually compute the right value and there is a test. · a7329037
  由 Arnaud Bergeron 提交于 7月 21, 2014
  
  a7329037
- Add optimizations to make the gradient update inplace. There are no tests yet. · f1515639
  由 Arnaud Bergeron 提交于 7月 21, 2014
  
  f1515639
- Add infer_shape to the ops. · 2e51a436
  由 Arnaud Bergeron 提交于 7月 21, 2014
  
  2e51a436
- Add C code using gemmBatched to SparseBlockDotOuterSS (the gradient). · 57865538
  由 Arnaud Bergeron 提交于 7月 17, 2014
  
  57865538
- Small forgotten speedup in SparseBlockDotGemvSS. · 98a15fa1
  由 Arnaud Bergeron 提交于 7月 17, 2014
  
  98a15fa1
- Use gemm_batched from python code in the gradient.ù · 9841c0db
  由 Arnaud Bergeron 提交于 7月 17, 2014
  
  9841c0db
- C code version of the python loop. · 437b1a5f
  由 Arnaud Bergeron 提交于 7月 17, 2014
  
  437b1a5f
- Remove leftover opt. · 69895eea
  由 Arnaud Bergeron 提交于 7月 16, 2014
  
  69895eea
- C code that uses SgemmBatched and a kernel to initialize the list of stuff. · ed244b6b
  由 Arnaud Bergeron 提交于 7月 16, 2014
  
  ed244b6b
- Fix memory leak in C code for blocksparse. · c774e32e
  由 Arnaud Bergeron 提交于 7月 14, 2014
  
  c774e32e
- Use gemm_batched in the python code. · 8f9c2a12
  由 Arnaud Bergeron 提交于 7月 14, 2014
  
  8f9c2a12
- Fix errors in C code and add a cache version. It passes the tests and works. · 29db8ffb
  由 Arnaud Bergeron 提交于 7月 14, 2014
  
  29db8ffb
- Add C code to SparseBlockGemvSS · 1519d758
  由 Arnaud Bergeron 提交于 7月 14, 2014
  
  1519d758
- Add support for the fortran order in gemv (and a test for it). · 0bc12fe9
  由 Arnaud Bergeron 提交于 7月 07, 2014
  
  0bc12fe9
- Fix shape error in tests (which also means that we had wrong behavior). · d12f4aea
  由 Arnaud Bergeron 提交于 7月 02, 2014
  
  d12f4aea
- And make the opt test work. · 7f15e04a
  由 Arnaud Bergeron 提交于 6月 25, 2014
  
  7f15e04a
- Add test to check that the inplace opts are working. · 6c77f4a6
  由 Arnaud Bergeron 提交于 6月 25, 2014
  
  6c77f4a6
- Don't try to use opt when cuda is not available. · 93879ae4
  由 Arnaud Bergeron 提交于 6月 25, 2014
  
  93879ae4
- Add inplace optimizations. · 1c3afdf6
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  1c3afdf6
- Use the non-scan version by default, since it's faster. · 6b162a92
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  6b162a92
- Fix last bug. · 9432f511
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  9432f511
- Fix the crash on exit with pycuda. · 4c101c36
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  4c101c36
- Fix the the gradient on W. Finally. · 6d327bd5
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  6d327bd5
- Add tests for grad shape. · 94fb4d03
  由 Arnaud Bergeron 提交于 6月 23, 2014
```
Some improvements to the gradient, but it's still transposed and partially wrong.
```
  94fb4d03
- Fix shape errors in the gradient (but it's still not the right value). · bcb902e9
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  bcb902e9
- Add tests for the grad of the op version (which is broken for now). · 09658800
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  09658800
- Add a test for the op version (but not for grad yet). · 493e71a4
  由 Arnaud Bergeron 提交于 6月 23, 2014
  
  493e71a4
- Address comments. · fabf1fdf
  由 Arnaud Bergeron 提交于 6月 20, 2014
  
  fabf1fdf
- First step, having something that works. · 4703a2b4
  由 Arnaud Bergeron 提交于 6月 19, 2014
  
  4703a2b4