Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
72a7214a
提交
72a7214a
authored
8月 21, 2012
作者:
lamblin
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #863 from nouiz/mixed2
Mixed2
上级
7ebae191
43b81a93
全部展开
隐藏空白字符变更
内嵌
并排
正在显示
9 个修改的文件
包含
86 行增加
和
182 行删除
+86
-182
NEWS.txt
NEWS.txt
+1
-142
theano-nose
bin/theano-nose
+14
-0
compiledir.py
theano/gof/compiledir.py
+1
-1
basic_ops.py
theano/sandbox/cuda/basic_ops.py
+0
-0
cuda_ndarray.cu
theano/sandbox/cuda/cuda_ndarray.cu
+35
-23
nvcc_compiler.py
theano/sandbox/cuda/nvcc_compiler.py
+15
-1
test_scan.py
theano/scan_module/tests/test_scan.py
+13
-12
__init__.py
theano/tensor/__init__.py
+2
-0
extra_ops.py
theano/tensor/extra_ops.py
+5
-3
没有找到文件。
NEWS.txt
浏览文件 @
72a7214a
...
@@ -2,148 +2,7 @@
...
@@ -2,148 +2,7 @@
Updates in the Trunk since the last release:
Updates in the Trunk since the last release:
Bug fixes
https://github.com/Theano/Theano/wiki/Devnews
* Outputs of Scan nodes could contain corrupted values: some parts of the
output would be repeated a second time, instead of the correct values.
It happened randomly, and quite infrequently, but the bug has been present
(both in Python and Cython) since April 2011. (Pascal L.)
* In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale.
It did not return the right number of elements. (Frederic B.)
* set_subtensor(x[int vector], new_value) when moved to the GPU
was transformed into inc_subtensor on the GPU. Now we have a correct
(but slow) GPU implementation.
Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly
in all cases as well as inc_subtensor(*, *).
Note 2: If your code was affected by the incorrect behavior, we now print
a warning by default (Frederic B.)
* Fixed an issue whereby config values were used as default arguments,
with those defaults then stuck at old values if the config variables were
changed during program execution. (David W-F)
* Fixed many subtle bugs involving mutable default arguments which may have
led to unexpected behaviour, such as objects sharing instance variables
they were not supposed to share. (David W-F)
* Correctly record the GPU device number used when we let the driver select it.
(Frederic B.)
Documentation
* Added in the tutorial documentation on how to extend Theano.
This explains how to make a Theano Op from a Python function.
http://deeplearning.net/software/theano/tutorial/extending_theano.html
(Frédéric B.)
* New installation instructions for Windows using EPD (Pascal L.)
Interface changes
* In 0.5, we removed the deprecated sharedvar.value property.
Now we raise an error if you access it. (Frederic B.)
* theano.function does not accept duplicate inputs, so function([x, x], ...)
does not work anymore. (Pascal L.)
* theano.function now raises an error if some of the provided inputs are
not part of the computational graph needed to compute the output, for
instance, function([x, y], [y]). You can use the kwarg
``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
(Pascal L.)
* New Theano flag "on_unused_input" that define the default value of the
previous point. (Frederic B.)
* tensor.alloc() now raises an error during graph build time
when we try to create less dimensions than the number of dimensions
the provided value have. In the past, the error was at run time.
(Frederic B.)
Speed up
* Convolution on the GPU now check the generation of the card to make
it faster in some cases (especially medium/big ouput image) (Frédéric B.)
(We hardcoded 512 as the maximum number of thread per block. Newer card
support up to 1024 threads per block.
* CPU convolution are now parallelized (Frédric B.)
By default use all cores/hyper-threads
To control it, use the OMP_NUM_THREADS=N environment variable.
New Features
* debugprint new param ids=["CHAR", "id", "int", ""]
This makes the identifier printed to be the python id, a unique char, a
unique int, or not have it printed. We changed the default to be "CHAR"
as this is more readable. (Frederic B.)
* debugprint new param stop_on_name=[False, True]. If True, we don't print
anything below an intermediate variable that has a name. Defaults to False.
(Frederic B.)
* debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
* If you use Enthought Python Distribution (EPD) now we use its blas
implementation by default (tested on Linux and Windows)
(Frederic B., Simon McGregor)
* MRG random now raises an error with a clear message when the passed shape
contains dimensions with bad value like 0. (Frédéric B. reported by Ian G.)
* "CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
* "CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
* We add dimensions to CudaNdarray to automatically broadcast more frequently.
(Frederic B.)
* theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
* New theano flag cmodule.warn_no_version. Default False. If True,
will print a warning when compiling one or more Op with C code that
can't be cached because there is no c_code_cache_version() function
associated to at least one of those Ops. (Frederic B.)
* CPU alloc now always generate C code (Pascal L.)
* New Theano flag cmodule.warn_no_version=False. When True, warn when an op
with C code is not versioned (which forces to recompile it everytimes).
(Frédéric B.)
* Made a few Ops with C code versioned to reduce compilation time.
(Frédéric B, Pascal L.)
* C code reuses preallocated outputs (only done by Scan) (Pascal L.)
* Garbage collection of intermediate results during Theano function calls
for Ops with C code (Pascal L.)
* Theano flags compiledir_format now support the parameter numpy_version.
* Theano GPU variables, shared variable and constant now support <, <=,
> and >= as as those not on the GPU.
Sparse
* Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't
have the same sparsity pattern. (Frederic B.)
Sparse Sandbox graduate
* Remove0 op: it removes stored elements with value 0. (Frederic B.)
Sparse Sandbox Additions (not reviewed/documented/tested, but used by some people)
* They are all in the theano.sparse.sandbox.sp2 module
* Op class: Cast, Poisson, Multinomial, EliminateZeros, Sum, Binomial
* Op class: SamplingDot, SamplingDotCsr (inserted automatically)
* Op function: structured_sigmoid, structured_exp, structured_pow, structured_minimum
* Op class: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
* opt: local_sampling_dot_csr, local_structured_add_s_v
Internal changes
* Define new exceptions MissingInputError and UnusedInputError, and use them
in theano.function, instead of TypeError and ValueError. (Pascal L.)
* Better handling of bitwidth and max values of integers and pointers
across platforms (Pascal L.)
Crash Fix
* Do not try to use the BLAS library when blas.ldflags is manually set to an
empty string (Frederic B.)
* When importing theano on a computer without GPU with the Theano
flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by Luo Heng)
* Optimization printed a useless error when scipy was not available. (Frederic B.)
* GPU conv crash/slowdown on newer hardware (James B.)
* Better error handling in GPU conv (Frederic B.)
* GPU optimization that moves element-wise Ops to the GPU. Crash happened in
a particular execution order of this optimization and the
element-wise fusion optimization when upcasting some inputs to
float32 (to compute them on the GPU).
(Frederic B., reported by Sander Dieleman)
* GpuReshape in some particular case when the input is not contiguous
(Frederic B., reported by Sander Dieleman)
* GpuSoftmaxWithBias with shape (0, N) with N > 1.
(Frédéric B., reported by Razvan P.)
* Fix crash under 64-bit Windows, when taking subtensors of the form a[n:]
(Pascal L., reported by Simon McGregor)
* Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable
dimensions, which could typically result in optimization crashes (Olivier D.)
* Fixed crash when concatenating some arrays with specific broadcasting
patterns (Olivier D.)
* Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylon)
* In advanced indexing, if some inputs are constant, no need to call constant(...)
on their value any more. (Pascal L., reported by John Salvatier)
* Fix crash on GPU when the GpuSubtensor didn't put the right stride
when the results tensor had a dimensions with size of 1. (Pascal L,
reported Graham T.)
=============
=============
Release Notes
Release Notes
...
...
bin/theano-nose
浏览文件 @
72a7214a
...
@@ -26,6 +26,9 @@ with the option time_profile=True to conduct time-profiling of the tests.
...
@@ -26,6 +26,9 @@ with the option time_profile=True to conduct time-profiling of the tests.
option will be interpreted as an indication of the number of tests to be run
option will be interpreted as an indication of the number of tests to be run
between notifications of progress to standard output.
between notifications of progress to standard output.
If the '--theano' option is used, it is replaced with the path to theano.
Useful if you don't know where it was installed.
`run_tests_in_batch.py` will in turn call back this script in another process.
`run_tests_in_batch.py` will in turn call back this script in another process.
"""
"""
...
@@ -39,6 +42,12 @@ import sys
...
@@ -39,6 +42,12 @@ import sys
from
nose.plugins
import
Plugin
from
nose.plugins
import
Plugin
def
main
():
def
main
():
# Handle the --theano arguments
if
"--theano"
in
sys
.
argv
:
i
=
sys
.
argv
.
index
(
"--theano"
)
import
theano
sys
.
argv
[
i
]
=
theano
.
__path__
[
0
]
# Handle --batch[=n] arguments
# Handle --batch[=n] arguments
batch_args
=
[
arg
for
arg
in
sys
.
argv
if
arg
.
startswith
(
'--batch'
)]
batch_args
=
[
arg
for
arg
in
sys
.
argv
if
arg
.
startswith
(
'--batch'
)]
for
arg
in
batch_args
:
for
arg
in
batch_args
:
...
@@ -137,6 +146,11 @@ def help():
...
@@ -137,6 +146,11 @@ def help():
--without-knownfailure: Do not load the KnownFailure plugin.
--without-knownfailure: Do not load the KnownFailure plugin.
--theano: This parameter is replaced with the path to the theano library.
As theano-nose is a wrapper to nosetests, it expect a path to the tests to run.
If you don't know where theano is installed, use this option
to have it inserted automatically.
The other options will be passed to nosetests, see ``nosetests -h``.
The other options will be passed to nosetests, see ``nosetests -h``.
"""
"""
...
...
theano/gof/compiledir.py
浏览文件 @
72a7214a
...
@@ -37,7 +37,7 @@ compiledir_format_dict = {"platform": platform.platform(),
...
@@ -37,7 +37,7 @@ compiledir_format_dict = {"platform": platform.platform(),
"python_version"
:
platform
.
python_version
(),
"python_version"
:
platform
.
python_version
(),
"theano_version"
:
theano
.
__version__
,
"theano_version"
:
theano
.
__version__
,
"numpy_version"
:
numpy
.
__version__
,
"numpy_version"
:
numpy
.
__version__
,
"g
++
"
:
gcc_version_str
.
replace
(
" "
,
"_"
),
"g
xx_version
"
:
gcc_version_str
.
replace
(
" "
,
"_"
),
}
}
compiledir_format_keys
=
", "
.
join
(
compiledir_format_dict
.
keys
())
compiledir_format_keys
=
", "
.
join
(
compiledir_format_dict
.
keys
())
default_compiledir_format
=
\
default_compiledir_format
=
\
...
...
theano/sandbox/cuda/basic_ops.py
浏览文件 @
72a7214a
差异被折叠。
点击展开。
theano/sandbox/cuda/cuda_ndarray.cu
浏览文件 @
72a7214a
...
@@ -758,8 +758,10 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -758,8 +758,10 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
PyObject
*
axis_obj
=
Py_None
;
PyObject
*
axis_obj
=
Py_None
;
PyObject
*
out_obj
=
Py_None
;
PyObject
*
out_obj
=
Py_None
;
PyObject
*
clipmode_obj
=
NULL
;
PyObject
*
clipmode_obj
=
NULL
;
if
(
!
PyArg_ParseTuple
(
args
,
"O|OOO"
,
&
indices_obj
,
&
axis_obj
,
int
max_threads
=
1
;
// max threads per blocks
&
out_obj
,
&
clipmode_obj
))
if
(
!
PyArg_ParseTuple
(
args
,
"O|OOOi"
,
&
indices_obj
,
&
axis_obj
,
&
out_obj
,
&
clipmode_obj
,
&
max_threads
))
return
NULL
;
return
NULL
;
//Check argument indices
//Check argument indices
...
@@ -839,14 +841,14 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -839,14 +841,14 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
PyObject
*
axis_iobj
=
PyNumber_Long
(
axis_obj
);
PyObject
*
axis_iobj
=
PyNumber_Long
(
axis_obj
);
if
(
!
axis_iobj
)
{
if
(
!
axis_iobj
)
{
PyErr_SetString
(
PyExc_NotImplementedError
,
"CudaNdarray_TakeFrom: axis must be convertable to a long"
);
PyErr_SetString
(
PyExc_NotImplementedError
,
"CudaNdarray_TakeFrom: axis must be convertable to a long"
);
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
return
NULL
;
return
NULL
;
}
}
long
axis
=
PyInt_AsLong
(
axis_iobj
);
long
axis
=
PyInt_AsLong
(
axis_iobj
);
Py_DECREF
(
axis_iobj
);
axis_iobj
=
NULL
;
Py_DECREF
(
axis_iobj
);
axis_iobj
=
NULL
;
if
(
axis
!=
0
)
{
if
(
axis
!=
0
)
{
PyErr_SetString
(
PyExc_NotImplementedError
,
"CudaNdarray_TakeFrom: only axis=0 is currently supported"
);
PyErr_SetString
(
PyExc_NotImplementedError
,
"CudaNdarray_TakeFrom: only axis=0 is currently supported"
);
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
return
NULL
;
return
NULL
;
}
}
...
@@ -869,13 +871,13 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -869,13 +871,13 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
if
(
!
out
)
{
if
(
!
out
)
{
out
=
(
CudaNdarray
*
)
CudaNdarray_New
();
out
=
(
CudaNdarray
*
)
CudaNdarray_New
();
if
(
!
out
){
if
(
!
out
){
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
free
(
dims
);
free
(
dims
);
return
NULL
;
return
NULL
;
}
}
if
(
CudaNdarray_alloc_contiguous
(
out
,
self
->
nd
,
dims
))
{
if
(
CudaNdarray_alloc_contiguous
(
out
,
self
->
nd
,
dims
))
{
Py_DECREF
(
out
);
Py_DECREF
(
out
);
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
free
(
dims
);
free
(
dims
);
return
NULL
;
return
NULL
;
}
}
...
@@ -887,19 +889,20 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -887,19 +889,20 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
if
(
clipmode_obj
)
{
if
(
clipmode_obj
)
{
char
*
clipmode
=
PyString_AsString
(
clipmode_obj
);
char
*
clipmode
=
PyString_AsString
(
clipmode_obj
);
if
(
!
clipmode
){
if
(
!
clipmode
){
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
free
(
dims
);
free
(
dims
);
return
NULL
;
return
NULL
;
}
}
if
(
strcmp
(
clipmode
,
"raise"
)
!=
0
)
{
if
(
strcmp
(
clipmode
,
"raise"
)
!=
0
)
{
PyErr_SetString
(
PyExc_NotImplementedError
,
"CudaNdarray_TakeFrom: only the raise mode is currently supported"
);
PyErr_Format
(
PyExc_NotImplementedError
,
Py_DECREF
(
indices_obj
);
"CudaNdarray_TakeFrom: only the raise mode is currently supported. Got '%s'"
,
clipmode
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
free
(
dims
);
free
(
dims
);
return
NULL
;
return
NULL
;
}
}
Py_DECREF
(
clipmode_obj
);
}
}
void
(
*
k3
)(
const
int
,
const
int
,
const
int
,
void
(
*
k3
)(
const
int
,
const
int
,
const
int
,
const
npy_int64
*
,
const
npy_int64
*
,
...
@@ -913,7 +916,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -913,7 +916,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
if
(
err_var
==
NULL
)
{
if
(
err_var
==
NULL
)
{
err_var
=
(
int
*
)
device_malloc
(
sizeof
(
int
));
err_var
=
(
int
*
)
device_malloc
(
sizeof
(
int
));
if
(
!
err_var
)
{
// PyErr set by device_malloc
if
(
!
err_var
)
{
// PyErr set by device_malloc
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
free
(
dims
);
free
(
dims
);
return
NULL
;
return
NULL
;
...
@@ -928,7 +931,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -928,7 +931,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
PyErr_Format
(
PyExc_RuntimeError
,
PyErr_Format
(
PyExc_RuntimeError
,
"Error setting device error code to 0. %s"
,
"Error setting device error code to 0. %s"
,
cudaGetErrorString
(
err
));
cudaGetErrorString
(
err
));
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
free
(
dims
);
free
(
dims
);
return
NULL
;
return
NULL
;
...
@@ -936,13 +939,16 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -936,13 +939,16 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
}
}
dim3
n_blocks
(
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
0
],
65535
),
1
,
1
);
dim3
n_blocks
(
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
0
],
65535
),
1
,
1
);
switch
(
self
->
nd
)
{
switch
(
self
->
nd
)
{
case
1
:
case
1
:
{
{
dim3
n_threads
(
1
,
1
,
1
);
dim3
n_threads
(
1
,
1
,
1
);
if
(
verbose
)
if
(
verbose
)
printf
(
"kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
printf
(
"cudaGetLastError=%d, nd=%d"
" kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
" n_threads.x=%i, n_threads.y=%i)
\n
"
,
" n_threads.x=%i, n_threads.y=%i)
\n
"
,
self
->
nd
,
cudaGetLastError
(),
n_blocks
.
x
,
n_blocks
.
y
,
n_threads
.
x
,
n_threads
.
y
);
n_blocks
.
x
,
n_blocks
.
y
,
n_threads
.
x
,
n_threads
.
y
);
k3
<<<
n_blocks
,
n_threads
>>>
(
k3
<<<
n_blocks
,
n_threads
>>>
(
dims
[
0
],
dims
[
0
],
...
@@ -963,11 +969,15 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -963,11 +969,15 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
break
;
break
;
case
2
:
case
2
:
{
{
dim3
n_threads
(
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
1
],
512
),
1
,
1
);
dim3
n_threads
(
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
1
],
max_threads
),
1
,
1
);
if
(
verbose
)
if
(
verbose
)
printf
(
"kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
printf
(
"cudaGetLastError=%d, nd=%d"
" kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
" n_threads.x=%i, n_threads.y=%i)
\n
"
,
" n_threads.x=%i, n_threads.y=%i)
\n
"
,
cudaGetLastError
(),
self
->
nd
,
n_blocks
.
x
,
n_blocks
.
y
,
n_threads
.
x
,
n_threads
.
y
);
n_blocks
.
x
,
n_blocks
.
y
,
n_threads
.
x
,
n_threads
.
y
);
k3
<<<
n_blocks
,
n_threads
>>>
(
k3
<<<
n_blocks
,
n_threads
>>>
(
dims
[
0
],
//dimensions
dims
[
0
],
//dimensions
dims
[
1
],
dims
[
1
],
...
@@ -987,12 +997,14 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -987,12 +997,14 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
break
;
break
;
case
3
:
case
3
:
{
{
int
ty
=
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
2
],
512
);
int
ty
=
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
2
],
max_threads
);
int
tx
=
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
1
],
512
/
ty
);
int
tx
=
std
::
min
(
CudaNdarray_HOST_DIMS
(
out
)[
1
],
max_threads
/
ty
);
dim3
n_threads
(
tx
,
ty
,
1
);
dim3
n_threads
(
tx
,
ty
,
1
);
if
(
verbose
)
if
(
verbose
)
printf
(
"kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
printf
(
"cudaGetLastError=%d, nd=%d"
" kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
" n_threads.x=%i, n_threads.y=%i)
\n
"
,
" n_threads.x=%i, n_threads.y=%i)
\n
"
,
self
->
nd
,
cudaGetLastError
(),
n_blocks
.
x
,
n_blocks
.
y
,
n_threads
.
x
,
n_threads
.
y
);
n_blocks
.
x
,
n_blocks
.
y
,
n_threads
.
x
,
n_threads
.
y
);
k3
<<<
n_blocks
,
n_threads
>>>
(
k3
<<<
n_blocks
,
n_threads
>>>
(
dims
[
0
],
//dimensions
dims
[
0
],
//dimensions
...
@@ -1025,7 +1037,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -1025,7 +1037,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
"Cuda error: %s: %s.
\n
"
,
"Cuda error: %s: %s.
\n
"
,
"CudaNdarray_TakeFrom"
,
"CudaNdarray_TakeFrom"
,
cudaGetErrorString
(
err
));
cudaGetErrorString
(
err
));
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
return
NULL
;
return
NULL
;
}
}
...
@@ -1040,7 +1052,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -1040,7 +1052,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
"Cuda error: %s: %s when trying to get the error value.
\n
"
,
"Cuda error: %s: %s when trying to get the error value.
\n
"
,
"CudaNdarray_TakeFrom"
,
"CudaNdarray_TakeFrom"
,
cudaGetErrorString
(
err
));
cudaGetErrorString
(
err
));
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
return
NULL
;
return
NULL
;
}
}
...
@@ -1055,17 +1067,17 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
...
@@ -1055,17 +1067,17 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
err
=
cudaMemset
((
void
*
)
err_var
,
0
,
sizeof
(
int
));
err
=
cudaMemset
((
void
*
)
err_var
,
0
,
sizeof
(
int
));
if
(
cudaSuccess
!=
err
)
{
if
(
cudaSuccess
!=
err
)
{
PyErr_Format
(
PyExc_MemoryError
,
"Error setting device error code to 0 after having an index error. %s"
,
cudaGetErrorString
(
err
));
PyErr_Format
(
PyExc_MemoryError
,
"Error setting device error code to 0 after having an index error. %s"
,
cudaGetErrorString
(
err
));
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
return
NULL
;
return
NULL
;
}
}
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
Py_DECREF
(
out
);
Py_DECREF
(
out
);
return
NULL
;
return
NULL
;
}
}
Py_DECREF
(
indices
_obj
);
Py_DECREF
(
indices
);
if
(
verbose
)
printf
(
"TAKE SUCCEDED
\n
"
);
if
(
verbose
)
printf
(
"TAKE SUCCEDED
\n
"
);
return
(
PyObject
*
)
out
;
return
(
PyObject
*
)
out
;
...
...
theano/sandbox/cuda/nvcc_compiler.py
浏览文件 @
72a7214a
...
@@ -7,6 +7,7 @@ import subprocess
...
@@ -7,6 +7,7 @@ import subprocess
import
sys
import
sys
import
warnings
import
warnings
import
theano
from
theano.gof.cc
import
hash_from_file
from
theano.gof.cc
import
hash_from_file
from
theano.gof.cmodule
import
(
std_libs
,
std_lib_dirs
,
from
theano.gof.cmodule
import
(
std_libs
,
std_lib_dirs
,
std_include_dirs
,
dlimport
,
std_include_dirs
,
dlimport
,
...
@@ -119,6 +120,16 @@ class NVCC_compiler(object):
...
@@ -119,6 +120,16 @@ class NVCC_compiler(object):
cuda_ndarray_cuh_hash
=
hash_from_file
(
cuda_ndarray_cuh_hash
=
hash_from_file
(
os
.
path
.
join
(
os
.
path
.
split
(
__file__
)[
0
],
'cuda_ndarray.cuh'
))
os
.
path
.
join
(
os
.
path
.
split
(
__file__
)[
0
],
'cuda_ndarray.cuh'
))
flags
.
append
(
'-DCUDA_NDARRAY_CUH='
+
cuda_ndarray_cuh_hash
)
flags
.
append
(
'-DCUDA_NDARRAY_CUH='
+
cuda_ndarray_cuh_hash
)
# We compile cuda_ndarray.cu during import.
# We should not add device properties at that time.
# As the device is not selected yet!
# TODO: compile cuda_ndarray when we bind to a GPU?
import
theano.sandbox.cuda
if
hasattr
(
theano
.
sandbox
,
'cuda'
):
n
=
theano
.
sandbox
.
cuda
.
use
.
device_number
p
=
theano
.
sandbox
.
cuda
.
device_properties
(
n
)
flags
.
append
(
'-arch=sm_'
+
str
(
p
[
'major'
])
+
str
(
p
[
'minor'
]))
return
flags
return
flags
@staticmethod
@staticmethod
...
@@ -217,7 +228,9 @@ class NVCC_compiler(object):
...
@@ -217,7 +228,9 @@ class NVCC_compiler(object):
# '--gpu-code=compute_13',
# '--gpu-code=compute_13',
#nvcc argument
#nvcc argument
preargs1
=
[
pa
for
pa
in
preargs
preargs1
=
[
pa
for
pa
in
preargs
if
pa
.
startswith
(
'-O'
)
or
pa
.
startswith
(
'--maxrregcount='
)]
if
pa
.
startswith
(
'-O'
)
or
pa
.
startswith
(
'--maxrregcount='
)
or
pa
.
startswith
(
'-arch='
)]
preargs2
=
[
pa
for
pa
in
preargs
preargs2
=
[
pa
for
pa
in
preargs
if
pa
not
in
preargs1
]
# other arguments
if
pa
not
in
preargs1
]
# other arguments
...
@@ -337,6 +350,7 @@ class NVCC_compiler(object):
...
@@ -337,6 +350,7 @@ class NVCC_compiler(object):
pass
pass
print
>>
sys
.
stderr
,
l
print
>>
sys
.
stderr
,
l
print
nvcc_stdout
print
nvcc_stdout
print
cmd
raise
Exception
(
'nvcc return status'
,
p
.
returncode
,
raise
Exception
(
'nvcc return status'
,
p
.
returncode
,
'for cmd'
,
' '
.
join
(
cmd
))
'for cmd'
,
' '
.
join
(
cmd
))
elif
config
.
cmodule
.
compilation_warning
and
nvcc_stdout
:
elif
config
.
cmodule
.
compilation_warning
and
nvcc_stdout
:
...
...
theano/scan_module/tests/test_scan.py
浏览文件 @
72a7214a
...
@@ -410,7 +410,8 @@ class T_Scan(unittest.TestCase):
...
@@ -410,7 +410,8 @@ class T_Scan(unittest.TestCase):
for
step
in
xrange
(
1
,
4
):
for
step
in
xrange
(
1
,
4
):
v_out
[
step
]
=
v_u
[
step
]
*
W_in
+
v_out
[
step
-
1
]
*
W
v_out
[
step
]
=
v_u
[
step
]
*
W_in
+
v_out
[
step
-
1
]
*
W
theano_values
=
f2
(
v_u
,
v_x0
,
W_in
,
W
)
theano_values
=
f2
(
v_u
,
v_x0
,
W_in
,
W
)
assert
numpy
.
allclose
(
theano_values
,
v_out
)
assert
numpy
.
allclose
(
theano_values
,
v_out
),
(
theano_values
,
v_out
,
theano_values
-
v_out
)
# TO DEL
# TO DEL
topo
=
f2
.
maker
.
fgraph
.
toposort
()
topo
=
f2
.
maker
.
fgraph
.
toposort
()
...
@@ -591,8 +592,8 @@ class T_Scan(unittest.TestCase):
...
@@ -591,8 +592,8 @@ class T_Scan(unittest.TestCase):
v_y
[
i
]
=
numpy
.
dot
(
v_x
[
i
-
1
],
vWout
)
v_y
[
i
]
=
numpy
.
dot
(
v_x
[
i
-
1
],
vWout
)
(
theano_x
,
theano_y
)
=
f4
(
v_u1
,
v_u2
,
v_x0
,
v_y0
,
vW_in1
)
(
theano_x
,
theano_y
)
=
f4
(
v_u1
,
v_u2
,
v_x0
,
v_y0
,
vW_in1
)
assert
numpy
.
allclose
(
theano_x
,
v_x
)
assert
numpy
.
allclose
(
theano_x
,
v_x
)
,
(
theano_x
,
v_x
,
theano_x
-
v_x
)
assert
numpy
.
allclose
(
theano_y
,
v_y
)
assert
numpy
.
allclose
(
theano_y
,
v_y
)
,
(
theano_y
,
v_y
,
theano_y
-
v_y
)
def
test_multiple_outs_taps
(
self
):
def
test_multiple_outs_taps
(
self
):
l
=
5
l
=
5
...
@@ -683,14 +684,13 @@ class T_Scan(unittest.TestCase):
...
@@ -683,14 +684,13 @@ class T_Scan(unittest.TestCase):
ny1
[
4
]
=
(
ny1
[
3
]
+
ny1
[
1
])
*
numpy
.
dot
(
ny0
[
3
],
vWout
)
ny1
[
4
]
=
(
ny1
[
3
]
+
ny1
[
1
])
*
numpy
.
dot
(
ny0
[
3
],
vWout
)
ny2
[
4
]
=
numpy
.
dot
(
v_u1
[
4
],
vW_in1
)
ny2
[
4
]
=
numpy
.
dot
(
v_u1
[
4
],
vW_in1
)
def
test_using_taps_sequence
(
self
):
def
test_using_taps_sequence
(
self
):
# this test refers to a bug reported by Nicolas
# this test refers to a bug reported by Nicolas
# Boulanger-Lewandowski June 6th
# Boulanger-Lewandowski June 6th
x
=
theano
.
tensor
.
dvector
()
x
=
theano
.
tensor
.
dvector
()
y
,
updates
=
theano
.
scan
(
lambda
x
:
[
x
],
y
,
updates
=
theano
.
scan
(
lambda
x
:
[
x
],
sequences
=
dict
(
input
=
x
,
taps
=
[
-
1
]),
sequences
=
dict
(
input
=
x
,
taps
=
[
-
1
]),
outputs_info
=
[
None
])
outputs_info
=
[
None
])
inp
=
numpy
.
arange
(
5
)
.
astype
(
'float64'
)
inp
=
numpy
.
arange
(
5
)
.
astype
(
'float64'
)
rval
=
theano
.
function
([
x
],
y
,
updates
=
updates
)(
inp
)
rval
=
theano
.
function
([
x
],
y
,
updates
=
updates
)(
inp
)
assert
numpy
.
all
(
rval
==
inp
[:
-
1
])
assert
numpy
.
all
(
rval
==
inp
[:
-
1
])
...
@@ -840,8 +840,10 @@ class T_Scan(unittest.TestCase):
...
@@ -840,8 +840,10 @@ class T_Scan(unittest.TestCase):
# equivalent is done
# equivalent is done
(
theano_x0
,
theano_x1
)
=
f9
(
vu0
,
vu1
,
vu2
,
vx0
,
vx1
)
(
theano_x0
,
theano_x1
)
=
f9
(
vu0
,
vu1
,
vu2
,
vx0
,
vx1
)
# assert that theano does what it should
# assert that theano does what it should
assert
numpy
.
allclose
(
theano_x0
,
numpy_x0
)
assert
numpy
.
allclose
(
theano_x0
,
numpy_x0
),
(
theano_x0
,
numpy_x0
,
assert
numpy
.
allclose
(
theano_x1
,
numpy_x1
),
(
theano_x1
,
numpy_x1
,
theano_x1
-
numpy_x1
)
theano_x0
-
numpy_x0
)
assert
numpy
.
allclose
(
theano_x1
,
numpy_x1
),
(
theano_x1
,
numpy_x1
,
theano_x1
-
numpy_x1
)
# assert that it was done in place
# assert that it was done in place
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
...
@@ -940,11 +942,11 @@ class T_Scan(unittest.TestCase):
...
@@ -940,11 +942,11 @@ class T_Scan(unittest.TestCase):
vx1
=
asarrayX
(
rng
.
uniform
())
vx1
=
asarrayX
(
rng
.
uniform
())
x0
=
theano
.
shared
(
vx0
)
x0
=
theano
.
shared
(
vx0
)
x1
=
theano
.
shared
(
vx1
)
x1
=
theano
.
shared
(
vx1
)
outputs
,
updates
=
theano
.
scan
(
lambda
x
,
y
:
(
x
+
asarrayX
(
1
),
outputs
,
updates
=
theano
.
scan
(
lambda
x
,
y
:
(
x
+
asarrayX
(
1
),
y
+
asarrayX
(
1
)),
y
+
asarrayX
(
1
)),
[],
[],
[
x0
,
x1
],
[
x0
,
x1
],
n_steps
=
3
)
n_steps
=
3
)
x0
=
asarrayX
(
numpy
.
zeros
((
3
,)))
x0
=
asarrayX
(
numpy
.
zeros
((
3
,)))
x0
[
0
]
=
vx0
x0
[
0
]
=
vx0
x0
=
theano
.
tensor
.
constant
(
x0
)
x0
=
theano
.
tensor
.
constant
(
x0
)
...
@@ -2447,7 +2449,6 @@ class T_Scan(unittest.TestCase):
...
@@ -2447,7 +2449,6 @@ class T_Scan(unittest.TestCase):
v_eW
=
numpy
.
array
(
rng
.
uniform
(
size
=
(
5
,
5
))
-
.
5
,
dtype
=
floatX
)
v_eW
=
numpy
.
array
(
rng
.
uniform
(
size
=
(
5
,
5
))
-
.
5
,
dtype
=
floatX
)
v_eh0
=
numpy
.
array
(
rng
.
uniform
(
size
=
(
5
,))
-
.
5
,
dtype
=
floatX
)
v_eh0
=
numpy
.
array
(
rng
.
uniform
(
size
=
(
5
,))
-
.
5
,
dtype
=
floatX
)
def
rnn_fn
(
_u
,
_y
,
_W
):
def
rnn_fn
(
_u
,
_y
,
_W
):
srng
=
theano
.
tensor
.
shared_randomstreams
.
RandomStreams
(
seed
)
srng
=
theano
.
tensor
.
shared_randomstreams
.
RandomStreams
(
seed
)
...
...
theano/tensor/__init__.py
浏览文件 @
72a7214a
...
@@ -55,3 +55,5 @@ from theano.gradient import Rop, Lop, grad, numeric_grad, verify_grad, \
...
@@ -55,3 +55,5 @@ from theano.gradient import Rop, Lop, grad, numeric_grad, verify_grad, \
jacobian
,
hessian
jacobian
,
hessian
from
theano.tensor.sort
import
sort
from
theano.tensor.sort
import
sort
from
extra_ops
import
(
DiffOp
,
bincount
,
squeeze
,
repeat
,
bartlett
,
fill_diagonal
)
theano/tensor/extra_ops.py
浏览文件 @
72a7214a
...
@@ -3,8 +3,8 @@ import numpy
...
@@ -3,8 +3,8 @@ import numpy
import
theano
import
theano
import
basic
import
basic
from
theano
import
gof
,
tensor
,
scalar
from
theano
import
gof
,
scalar
from
theano.sandbox.linalg.ops
import
diag
import
basic
as
tensor
class
DiffOp
(
theano
.
Op
):
class
DiffOp
(
theano
.
Op
):
...
@@ -446,7 +446,9 @@ class FillDiagonal(gof.Op):
...
@@ -446,7 +446,9 @@ class FillDiagonal(gof.Op):
raise
NotImplementedError
(
'
%
s: gradient is currently implemented'
raise
NotImplementedError
(
'
%
s: gradient is currently implemented'
' for matrices only'
%
self
.
__class__
.
__name__
)
' for matrices only'
%
self
.
__class__
.
__name__
)
wr_a
=
fill_diagonal
(
grad
,
0
)
# valid for any number of dimensions
wr_a
=
fill_diagonal
(
grad
,
0
)
# valid for any number of dimensions
wr_val
=
diag
(
grad
)
.
sum
()
# diag is only valid for matrices
# diag is only valid for matrices
import
theano.sandbox.linalg
wr_val
=
theano
.
sandbox
.
linalg
.
ops
.
diag
(
grad
)
.
sum
()
return
[
wr_a
,
wr_val
]
return
[
wr_a
,
wr_val
]
fill_diagonal_
=
FillDiagonal
()
fill_diagonal_
=
FillDiagonal
()
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论