提交 4360868c authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Typo: 'more then' -> 'more than'

Also fixed a few more typos in surrounding code.
上级 c4e0f896
...@@ -110,7 +110,7 @@ Deprecation (will be removed in Theano 0.5, warning generated if you use them): ...@@ -110,7 +110,7 @@ Deprecation (will be removed in Theano 0.5, warning generated if you use them):
(list/tuple/TensorVariable). (list/tuple/TensorVariable).
* Currently tensor.grad return a type list when the wrt is a list/tuple of * Currently tensor.grad return a type list when the wrt is a list/tuple of
more then 1 element. more than 1 element.
Decrecated in 0.4.0(Reminder, warning generated if you use them): Decrecated in 0.4.0(Reminder, warning generated if you use them):
......
...@@ -180,7 +180,7 @@ Here is the state of that vision as of 24 October 2011 (after Theano release ...@@ -180,7 +180,7 @@ Here is the state of that vision as of 24 October 2011 (after Theano release
* Example of use: Determine if we should move computation to the * Example of use: Determine if we should move computation to the
GPU or not depending on the input size. GPU or not depending on the input size.
* Possible implementation note: allow Theano Variable in the env to * Possible implementation note: allow Theano Variable in the env to
have more then 1 owner. have more than 1 owner.
* We have a CUDA backend for tensors of type `float32` only. * We have a CUDA backend for tensors of type `float32` only.
* Efforts have begun towards a generic GPU ndarray (GPU tensor) (started in the * Efforts have begun towards a generic GPU ndarray (GPU tensor) (started in the
......
...@@ -56,13 +56,13 @@ for val in keys.values(): ...@@ -56,13 +56,13 @@ for val in keys.values():
nbs_mod = {} # nb seen -> how many key nbs_mod = {} # nb seen -> how many key
nbs_mod_to_key = {} #nb seen -> keys nbs_mod_to_key = {} #nb seen -> keys
more_then_one = 0 more_than_one = 0
for mod,kk in mods.iteritems(): for mod,kk in mods.iteritems():
val = len(kk) val = len(kk)
nbs_mod.setdefault(val, 0) nbs_mod.setdefault(val, 0)
nbs_mod[val]+=1 nbs_mod[val]+=1
if val>1: if val>1:
more_then_one += 1 more_than_one += 1
nbs_mod_to_key[val] = kk nbs_mod_to_key[val] = kk
if DISPLAY_MOST_FREQUENT_DUPLICATE_CCODE: if DISPLAY_MOST_FREQUENT_DUPLICATE_CCODE:
...@@ -87,7 +87,7 @@ uniq = len(mods) ...@@ -87,7 +87,7 @@ uniq = len(mods)
useless = total - uniq useless = total - uniq
print "mod.{cpp,cu} total:", total print "mod.{cpp,cu} total:", total
print "mod.{cpp,cu} uniq:", uniq print "mod.{cpp,cu} uniq:", uniq
print "mod.{cpp,cu} with more then 1 copy:", more_then_one print "mod.{cpp,cu} with more than 1 copy:", more_than_one
print "mod.{cpp,cu} useless:", useless, float(useless)/total*100,"%" print "mod.{cpp,cu} useless:", useless, float(useless)/total*100,"%"
print "nb directory", len(dirs) print "nb directory", len(dirs)
...@@ -846,7 +846,7 @@ CudaNdarray_conv_full(const CudaNdarray *img, const CudaNdarray * kern, CudaNdar ...@@ -846,7 +846,7 @@ CudaNdarray_conv_full(const CudaNdarray *img, const CudaNdarray * kern, CudaNdar
if(version==-1 && nb_split>1) version=4; if(version==-1 && nb_split>1) version=4;
else if(version==-1) version=3; else if(version==-1) version=3;
else if(version==3 && nb_split!=1) version=4;//we force version 4 when we need more then 1 split as to be always execute. else if(version==3 && nb_split!=1) version=4;//we force version 4 when we need more than 1 split as to be always execute.
assert(version!=3 || nb_split==1); assert(version!=3 || nb_split==1);
assert(version!=5 || kern_len>1); assert(version!=5 || kern_len>1);
......
...@@ -183,7 +183,7 @@ conv_full_patch_stack( float* img, float* kern, float* out, ...@@ -183,7 +183,7 @@ conv_full_patch_stack( float* img, float* kern, float* out,
/** /**
* As conv_patch_stack, but used for the full convolution by padding the image in shared memory. * As conv_patch_stack, but used for the full convolution by padding the image in shared memory.
* I keep it separated from conv_patch as we take 19-20 register which is more then the 10/16 max for each thread and thus this could lower the occupency. * I keep it separated from conv_patch as we take 19-20 register which is more than the 10/16 max for each thread and thus this could lower the occupency.
* Implementation of the valid convolution that keep the full image and the full kernel in shared memory * Implementation of the valid convolution that keep the full image and the full kernel in shared memory
* each thread compute only one value for the output if split is true. Otherwise compute ceil((float)out_len/N) pixel. * each thread compute only one value for the output if split is true. Otherwise compute ceil((float)out_len/N) pixel.
* thread block size=out_wid, nb_rows (optimized value is ceil(out_len/N)) * thread block size=out_wid, nb_rows (optimized value is ceil(out_len/N))
...@@ -195,7 +195,7 @@ conv_full_patch_stack( float* img, float* kern, float* out, ...@@ -195,7 +195,7 @@ conv_full_patch_stack( float* img, float* kern, float* out,
* nstack: the size of the stack, used to compute the image to load. * nstack: the size of the stack, used to compute the image to load.
* template flipped_kern: if true, we "flip" the kernel as in a real convolution, else we don't * template flipped_kern: if true, we "flip" the kernel as in a real convolution, else we don't
* template c_contiguous: if true, the image and kernel have are c_contiguous.(use less registers) * template c_contiguous: if true, the image and kernel have are c_contiguous.(use less registers)
* template split: if true, each thread compute more then 1 output pixel. * template split: if true, each thread compute more than 1 output pixel.
* template low_mem: if true, as split but with use less dynamic shared memory but use more registers. * template low_mem: if true, as split but with use less dynamic shared memory but use more registers.
* if you set split and low_mem to true, we will use the low_mem version! * if you set split and low_mem to true, we will use the low_mem version!
*/ */
......
...@@ -204,7 +204,7 @@ __device__ void store_or_accumulate(float& dst,const float value ){ ...@@ -204,7 +204,7 @@ __device__ void store_or_accumulate(float& dst,const float value ){
* nkern: the number of kernel, used to compute the output image to store the result * nkern: the number of kernel, used to compute the output image to store the result
* nstack: the size of the stack, used to compute the image to load. * nstack: the size of the stack, used to compute the image to load.
* template flipped_kern: if true, we "flip" the kernel as in a real convolution, else we don't * template flipped_kern: if true, we "flip" the kernel as in a real convolution, else we don't
* template split: if true, each thread compute more then 1 output pixel * template split: if true, each thread computes more than 1 output pixel
* When true, allow for output image bigger then 512 pixel. * When true, allow for output image bigger then 512 pixel.
* Use more registers. * Use more registers.
*/ */
...@@ -273,7 +273,7 @@ conv_patch( float* img, float* kern, float* out, ...@@ -273,7 +273,7 @@ conv_patch( float* img, float* kern, float* out,
* As conv_patch, but implement the stack in the kernel. * As conv_patch, but implement the stack in the kernel.
* I keep it separated from conv_patch as we take more registers and this could lower the occupency. * I keep it separated from conv_patch as we take more registers and this could lower the occupency.
* Implementation of the valid convolution that keep the full image and the full kernel in shared memory * Implementation of the valid convolution that keep the full image and the full kernel in shared memory
* each thread compute only one value for the output if split==false else it compute more then 1 values * each thread compute only one value for the output if split==false else it compute more than 1 values
* thread block size=out_wid, out_len/X (X is any number, optimized value is ceil(out_len/N) * thread block size=out_wid, out_len/X (X is any number, optimized value is ceil(out_len/N)
* grid block size=batch_id, nkern * grid block size=batch_id, nkern
* dynamic shared memory: img_len*img_wid+(preload_full_kern?KERNEL_LEN:1)*kern_wid * dynamic shared memory: img_len*img_wid+(preload_full_kern?KERNEL_LEN:1)*kern_wid
...@@ -287,7 +287,7 @@ conv_patch( float* img, float* kern, float* out, ...@@ -287,7 +287,7 @@ conv_patch( float* img, float* kern, float* out,
* template KERN_WIDTH: if 0, will work for any kern_wid, else it specialyse to this kern_wid as an optimization * template KERN_WIDTH: if 0, will work for any kern_wid, else it specialyse to this kern_wid as an optimization
* template img_c_contiguous_2d: if true, the img have are collon and row contiguous * template img_c_contiguous_2d: if true, the img have are collon and row contiguous
* template kern_c_contiguous_2d: if true, the kernel have are collon and row contiguous * template kern_c_contiguous_2d: if true, the kernel have are collon and row contiguous
* template split: if true, each thread generate more then 1 output pixel, but use more registers. * template split: if true, each thread generate more than 1 output pixel, but use more registers.
* template preload_full_kern: if true, we load the full kernel in shared memory, else, we load 1 row at a time. * template preload_full_kern: if true, we load the full kernel in shared memory, else, we load 1 row at a time.
* template subsample: if false, remove some computation needed when dx or dy!=1. * template subsample: if false, remove some computation needed when dx or dy!=1.
*/ */
......
...@@ -3240,9 +3240,9 @@ int CudaNdarray_sger(float alpha, const CudaNdarray * x, const CudaNdarray * y, ...@@ -3240,9 +3240,9 @@ int CudaNdarray_sger(float alpha, const CudaNdarray * x, const CudaNdarray * y,
if(x_strides == 0){ if(x_strides == 0){
if(CudaNdarray_HOST_DIMS(x)[0] != 1){ if(CudaNdarray_HOST_DIMS(x)[0] != 1){
PyErr_Format(PyExc_RuntimeError, PyErr_Format(PyExc_RuntimeError,
"CudaNdarray_sger: Invalid input x(should not happen)." "CudaNdarray_sger: Invalid input x (should not happen)."
" We received an CudaNdarray vector with a stride of 0" " We received a CudaNdarray vector with a stride of 0"
" that have more then 1 elements!"); " that has more than 1 element!");
return -1; return -1;
} }
x_strides = 1; x_strides = 1;
...@@ -3256,9 +3256,9 @@ int CudaNdarray_sger(float alpha, const CudaNdarray * x, const CudaNdarray * y, ...@@ -3256,9 +3256,9 @@ int CudaNdarray_sger(float alpha, const CudaNdarray * x, const CudaNdarray * y,
if(y_strides == 0){ if(y_strides == 0){
if(CudaNdarray_HOST_DIMS(y)[0] != 1){ if(CudaNdarray_HOST_DIMS(y)[0] != 1){
PyErr_Format(PyExc_RuntimeError, PyErr_Format(PyExc_RuntimeError,
"CudaNdarray_sger: Invalid input y(should not happen)." "CudaNdarray_sger: Invalid input y (should not happen)."
" We received an CudaNdarray vector with a stride of 0" " We received a CudaNdarray vector with a stride of 0"
" that have more then 1 elements!"); " that has more than 1 elements!");
return -1; return -1;
} }
y_strides = 1; y_strides = 1;
......
...@@ -257,7 +257,8 @@ def test_downsample(): ...@@ -257,7 +257,8 @@ def test_downsample():
for ds in (2, 2), (3,2), (1,1): for ds in (2, 2), (3,2), (1,1):
if ds[0] > shp[2]: continue if ds[0] > shp[2]: continue
if ds[1] > shp[3]: continue if ds[1] > shp[3]: continue
#GpuDownsampleFactorMax don't having more then 512 columns in the output tensor # GpuDownsampleFactorMax doesn't like having more than 512 columns
# in the output tensor.
if float(shp[3])/ds[1]>512: continue if float(shp[3])/ds[1]>512: continue
for ignore_border in (True, False): for ignore_border in (True, False):
print 'test_downsample', shp, ds, ignore_border print 'test_downsample', shp, ds, ignore_border
......
...@@ -1000,7 +1000,7 @@ class Scan(PureOp): ...@@ -1000,7 +1000,7 @@ class Scan(PureOp):
if i < n_steps: if i < n_steps:
# The reason I don't use out[idx][0][:i] is because for # The reason I don't use out[idx][0][:i] is because for
# certain outputs (those with multiple taps), # certain outputs (those with multiple taps),
# outs[idx][0] has more then n_steps entries, with the # outs[idx][0] has more than n_steps entries, with the
# initial state at the begining. When indexing in it I # initial state at the begining. When indexing in it I
# usually have to do something like # usually have to do something like
# outs[idx][0][i+offset]. To do something similar here, # outs[idx][0][i+offset]. To do something similar here,
......
...@@ -1060,7 +1060,7 @@ static PyObject *__pyx_pf_6theano_11scan_module_12scan_perform_0get_version(PyOb ...@@ -1060,7 +1060,7 @@ static PyObject *__pyx_pf_6theano_11scan_module_12scan_perform_0get_version(PyOb
*/ */
static PyObject *__pyx_pf_6theano_11scan_module_12scan_perform_1perform(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/ static PyObject *__pyx_pf_6theano_11scan_module_12scan_perform_1perform(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static char __pyx_doc_6theano_11scan_module_12scan_perform_1perform[] = "\n Parameters\n ----------\n n_shared_outs: unsigned int\n Number of arugments that correspond to shared variables with\n updates\n n_mit_mot_outs: unsigned int\n Sum over the number of output taps for each mit_mot sequence\n n_seqs: unsigned int\n Number of sequences provided as input\n n_mit_mot : unsigned int\n Number of mit_mot arguemnts\n n_mit_sot: unsigned int\n Number of mit_sot arguments\n n_sit_sot: unsigned int\n Number of sit sot arguemnts\n n_nit_sot: unsigned int\n Number of nit_sot arguments\n n_steps: unsigned int\n Number of steps to loop over\n mintaps: int32 ndarray (can also be a simple python list if that is better !)\n For any of the mit_mot, mit_sot, sit_sot says which is the furtherst\n away input tap from current position. For example, if the taps where [-2,\n -5, -9], the mintap would be -9. For sit_sot this is always -1 since\n is the only allowed tap.\n tap_array: int32 ndarray( can be replaced by a list of list in python if better)\n For each of the mit_mot, mit_sot, sit_sot (the first dimension) says\n which are the corresponding input taps. While this is a matrix, not all\n values in a row are needed and tap_array_len is there to say up to\n which entry we are dealing with valid taps ( afterwards there are\n just 0s to ensure the fix format)\n tap_array_len: int32 ndarray( can be replaced by a list if better)\n For each of the mit_mot, mit_sot, sit_sot says how many input taps\n each has. For sit_sot this will always be 1.\n vector_seqs: int32 ndarray (can be replaced by a list of bools if better)\n For each sequence the corresponding entry is either a 1, is the\n sequence is a vector or 0 if it has more then 1 dimension\n vector_outs: int32 ndarray( can be replaced by list of bools if better)\n For each output ( mit_mot, mit_sot, si""t_sot, nit_sot in this order)\n the entry is 1 if the corresponding argument is a 1 dimensional\n tensor, 0 otherwise.\n mit_mot_out_slices : int32 ndarray( can be replaced by list of lists)\n Same as tap_array, but for the output taps of mit_mot sequences\n mit_mot_out_nslices: int32 ndarray (Can be replaced by a list)\n Same as tap_array_len, but is the number of output taps of the\n mit_mot sequences (i.e. it corresponds to mit_mot_out_slices)\n fn: callable\n This is the linker, i.e. the function that will loop over the\n computational graph and call the perform of each operation. For this\n linker there is a c version in gof/lazy_linker.c that will be the\n starting point of implementing this funciton in C ( we need to take\n all the code around the call of this function and put in C inside\n that code)\n fnct: python object\n Only used to attach some timings for the profile mode ( can be\n skiped if we don't care about Theano's profile mode)\n inplace\n Boolean that says if things should be computed inplace or if they\n should not.\n args: list of ndarrays (and random states)\n The inputs of scan in a given order ( n_steps, sequences, mit_mot,\n mit_sot, sit_sot, nit_sot, shared_outs, other_args)\n outs: list of 1 element list ( or storage objects?)\n This is where we need to copy our outputs ( we don't return the\n results, though we can change the code such that we return, and\n figure things out on the outside - python)\n self: python object\n The scan op itself. I only use it to attach to it some timing\n informations .. but I don;t need to.\n\n "; static char __pyx_doc_6theano_11scan_module_12scan_perform_1perform[] = "\n Parameters\n ----------\n n_shared_outs: unsigned int\n Number of arugments that correspond to shared variables with\n updates\n n_mit_mot_outs: unsigned int\n Sum over the number of output taps for each mit_mot sequence\n n_seqs: unsigned int\n Number of sequences provided as input\n n_mit_mot : unsigned int\n Number of mit_mot arguemnts\n n_mit_sot: unsigned int\n Number of mit_sot arguments\n n_sit_sot: unsigned int\n Number of sit sot arguemnts\n n_nit_sot: unsigned int\n Number of nit_sot arguments\n n_steps: unsigned int\n Number of steps to loop over\n mintaps: int32 ndarray (can also be a simple python list if that is better !)\n For any of the mit_mot, mit_sot, sit_sot says which is the furtherst\n away input tap from current position. For example, if the taps where [-2,\n -5, -9], the mintap would be -9. For sit_sot this is always -1 since\n is the only allowed tap.\n tap_array: int32 ndarray( can be replaced by a list of list in python if better)\n For each of the mit_mot, mit_sot, sit_sot (the first dimension) says\n which are the corresponding input taps. While this is a matrix, not all\n values in a row are needed and tap_array_len is there to say up to\n which entry we are dealing with valid taps ( afterwards there are\n just 0s to ensure the fix format)\n tap_array_len: int32 ndarray( can be replaced by a list if better)\n For each of the mit_mot, mit_sot, sit_sot says how many input taps\n each has. For sit_sot this will always be 1.\n vector_seqs: int32 ndarray (can be replaced by a list of bools if better)\n For each sequence the corresponding entry is either a 1, is the\n sequence is a vector or 0 if it has more than 1 dimension\n vector_outs: int32 ndarray( can be replaced by list of bools if better)\n For each output ( mit_mot, mit_sot, si""t_sot, nit_sot in this order)\n the entry is 1 if the corresponding argument is a 1 dimensional\n tensor, 0 otherwise.\n mit_mot_out_slices : int32 ndarray( can be replaced by list of lists)\n Same as tap_array, but for the output taps of mit_mot sequences\n mit_mot_out_nslices: int32 ndarray (Can be replaced by a list)\n Same as tap_array_len, but is the number of output taps of the\n mit_mot sequences (i.e. it corresponds to mit_mot_out_slices)\n fn: callable\n This is the linker, i.e. the function that will loop over the\n computational graph and call the perform of each operation. For this\n linker there is a c version in gof/lazy_linker.c that will be the\n starting point of implementing this funciton in C ( we need to take\n all the code around the call of this function and put in C inside\n that code)\n fnct: python object\n Only used to attach some timings for the profile mode ( can be\n skiped if we don't care about Theano's profile mode)\n inplace\n Boolean that says if things should be computed inplace or if they\n should not.\n args: list of ndarrays (and random states)\n The inputs of scan in a given order ( n_steps, sequences, mit_mot,\n mit_sot, sit_sot, nit_sot, shared_outs, other_args)\n outs: list of 1 element list ( or storage objects?)\n This is where we need to copy our outputs ( we don't return the\n results, though we can change the code such that we return, and\n figure things out on the outside - python)\n self: python object\n The scan op itself. I only use it to attach to it some timing\n informations .. but I don;t need to.\n\n ";
static PyMethodDef __pyx_mdef_6theano_11scan_module_12scan_perform_1perform = {__Pyx_NAMESTR("perform"), (PyCFunction)__pyx_pf_6theano_11scan_module_12scan_perform_1perform, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(__pyx_doc_6theano_11scan_module_12scan_perform_1perform)}; static PyMethodDef __pyx_mdef_6theano_11scan_module_12scan_perform_1perform = {__Pyx_NAMESTR("perform"), (PyCFunction)__pyx_pf_6theano_11scan_module_12scan_perform_1perform, METH_VARARGS|METH_KEYWORDS, __Pyx_DOCSTR(__pyx_doc_6theano_11scan_module_12scan_perform_1perform)};
static PyObject *__pyx_pf_6theano_11scan_module_12scan_perform_1perform(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) { static PyObject *__pyx_pf_6theano_11scan_module_12scan_perform_1perform(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
unsigned int __pyx_v_n_shared_outs; unsigned int __pyx_v_n_shared_outs;
......
...@@ -125,7 +125,7 @@ def perform( ...@@ -125,7 +125,7 @@ def perform(
each has. For sit_sot this will always be 1. each has. For sit_sot this will always be 1.
vector_seqs: int32 ndarray (can be replaced by a list of bools if better) vector_seqs: int32 ndarray (can be replaced by a list of bools if better)
For each sequence the corresponding entry is either a 1, is the For each sequence the corresponding entry is either a 1, is the
sequence is a vector or 0 if it has more then 1 dimension sequence is a vector or 0 if it has more than 1 dimension
vector_outs: int32 ndarray( can be replaced by list of bools if better) vector_outs: int32 ndarray( can be replaced by list of bools if better)
For each output ( mit_mot, mit_sot, sit_sot, nit_sot in this order) For each output ( mit_mot, mit_sot, sit_sot, nit_sot in this order)
the entry is 1 if the corresponding argument is a 1 dimensional the entry is 1 if the corresponding argument is a 1 dimensional
......
...@@ -959,7 +959,7 @@ class GetItemScalar(gof.op.Op): ...@@ -959,7 +959,7 @@ class GetItemScalar(gof.op.Op):
Implement a subtensor of a sparse variable that take two scalar as Implement a subtensor of a sparse variable that take two scalar as
index and return a scalar index and return a scalar
:see: GetItem2d to return more then one element. :see: GetItem2d to return more than one element.
""" """
def __eq__(self, other): def __eq__(self, other):
return (type(self) == type(other)) return (type(self) == type(other))
......
...@@ -91,14 +91,14 @@ class RopLop_checker(unittest.TestCase): ...@@ -91,14 +91,14 @@ class RopLop_checker(unittest.TestCase):
(i.e. the tensor with which you multiply the (i.e. the tensor with which you multiply the
Jacobian). It should be a tuple of ints. Jacobian). It should be a tuple of ints.
If the Op have more then 1 input, one of them must be mx, the If the Op has more than 1 input, one of them must be mx, while
other must be shared variable/constant. We will test only others must be shared variables / constants. We will test only
again the input self.mx, so you must call against the input self.mx, so you must call
check_mat_rop_lop/check_rop_lop for the others input. check_mat_rop_lop/check_rop_lop for the other inputs.
We expect all inputs/outputs have dtype floatX. We expect all inputs/outputs have dtype floatX.
If you want to test an out with an output matrix, add a sum If you want to test an Op with an output matrix, add a sum
after the Op you want to test. after the Op you want to test.
""" """
vx = numpy.asarray(self.rng.uniform(size=self.mat_in_shape), vx = numpy.asarray(self.rng.uniform(size=self.mat_in_shape),
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论