* As conv_patch_stack, but used for the full convolution by padding the image in shared memory.
* I keep it separated from conv_patch as we take 19-20 register which is more then the 10/16 max for each thread and thus this could lower the occupency.
* I keep it separated from conv_patch as we take 19-20 register which is more than the 10/16 max for each thread and thus this could lower the occupency.
* Implementation of the valid convolution that keep the full image and the full kernel in shared memory
* each thread compute only one value for the output if split is true. Otherwise compute ceil((float)out_len/N) pixel.
* thread block size=out_wid, nb_rows (optimized value is ceil(out_len/N))
static char __pyx_doc_6theano_11scan_module_12scan_perform_1perform[] = "\n Parameters\n ----------\n n_shared_outs: unsigned int\n Number of arugments that correspond to shared variables with\n updates\n n_mit_mot_outs: unsigned int\n Sum over the number of output taps for each mit_mot sequence\n n_seqs: unsigned int\n Number of sequences provided as input\n n_mit_mot : unsigned int\n Number of mit_mot arguemnts\n n_mit_sot: unsigned int\n Number of mit_sot arguments\n n_sit_sot: unsigned int\n Number of sit sot arguemnts\n n_nit_sot: unsigned int\n Number of nit_sot arguments\n n_steps: unsigned int\n Number of steps to loop over\n mintaps: int32 ndarray (can also be a simple python list if that is better !)\n For any of the mit_mot, mit_sot, sit_sot says which is the furtherst\n away input tap from current position. For example, if the taps where [-2,\n -5, -9], the mintap would be -9. For sit_sot this is always -1 since\n is the only allowed tap.\n tap_array: int32 ndarray( can be replaced by a list of list in python if better)\n For each of the mit_mot, mit_sot, sit_sot (the first dimension) says\n which are the corresponding input taps. While this is a matrix, not all\n values in a row are needed and tap_array_len is there to say up to\n which entry we are dealing with valid taps ( afterwards there are\n just 0s to ensure the fix format)\n tap_array_len: int32 ndarray( can be replaced by a list if better)\n For each of the mit_mot, mit_sot, sit_sot says how many input taps\n each has. For sit_sot this will always be 1.\n vector_seqs: int32 ndarray (can be replaced by a list of bools if better)\n For each sequence the corresponding entry is either a 1, is the\n sequence is a vector or 0 if it has more then 1 dimension\n vector_outs: int32 ndarray( can be replaced by list of bools if better)\n For each output ( mit_mot, mit_sot, si""t_sot, nit_sot in this order)\n the entry is 1 if the corresponding argument is a 1 dimensional\n tensor, 0 otherwise.\n mit_mot_out_slices : int32 ndarray( can be replaced by list of lists)\n Same as tap_array, but for the output taps of mit_mot sequences\n mit_mot_out_nslices: int32 ndarray (Can be replaced by a list)\n Same as tap_array_len, but is the number of output taps of the\n mit_mot sequences (i.e. it corresponds to mit_mot_out_slices)\n fn: callable\n This is the linker, i.e. the function that will loop over the\n computational graph and call the perform of each operation. For this\n linker there is a c version in gof/lazy_linker.c that will be the\n starting point of implementing this funciton in C ( we need to take\n all the code around the call of this function and put in C inside\n that code)\n fnct: python object\n Only used to attach some timings for the profile mode ( can be\n skiped if we don't care about Theano's profile mode)\n inplace\n Boolean that says if things should be computed inplace or if they\n should not.\n args: list of ndarrays (and random states)\n The inputs of scan in a given order ( n_steps, sequences, mit_mot,\n mit_sot, sit_sot, nit_sot, shared_outs, other_args)\n outs: list of 1 element list ( or storage objects?)\n This is where we need to copy our outputs ( we don't return the\n results, though we can change the code such that we return, and\n figure things out on the outside - python)\n self: python object\n The scan op itself. I only use it to attach to it some timing\n informations .. but I don;t need to.\n\n ";
static char __pyx_doc_6theano_11scan_module_12scan_perform_1perform[] = "\n Parameters\n ----------\n n_shared_outs: unsigned int\n Number of arugments that correspond to shared variables with\n updates\n n_mit_mot_outs: unsigned int\n Sum over the number of output taps for each mit_mot sequence\n n_seqs: unsigned int\n Number of sequences provided as input\n n_mit_mot : unsigned int\n Number of mit_mot arguemnts\n n_mit_sot: unsigned int\n Number of mit_sot arguments\n n_sit_sot: unsigned int\n Number of sit sot arguemnts\n n_nit_sot: unsigned int\n Number of nit_sot arguments\n n_steps: unsigned int\n Number of steps to loop over\n mintaps: int32 ndarray (can also be a simple python list if that is better !)\n For any of the mit_mot, mit_sot, sit_sot says which is the furtherst\n away input tap from current position. For example, if the taps where [-2,\n -5, -9], the mintap would be -9. For sit_sot this is always -1 since\n is the only allowed tap.\n tap_array: int32 ndarray( can be replaced by a list of list in python if better)\n For each of the mit_mot, mit_sot, sit_sot (the first dimension) says\n which are the corresponding input taps. While this is a matrix, not all\n values in a row are needed and tap_array_len is there to say up to\n which entry we are dealing with valid taps ( afterwards there are\n just 0s to ensure the fix format)\n tap_array_len: int32 ndarray( can be replaced by a list if better)\n For each of the mit_mot, mit_sot, sit_sot says how many input taps\n each has. For sit_sot this will always be 1.\n vector_seqs: int32 ndarray (can be replaced by a list of bools if better)\n For each sequence the corresponding entry is either a 1, is the\n sequence is a vector or 0 if it has more than 1 dimension\n vector_outs: int32 ndarray( can be replaced by list of bools if better)\n For each output ( mit_mot, mit_sot, si""t_sot, nit_sot in this order)\n the entry is 1 if the corresponding argument is a 1 dimensional\n tensor, 0 otherwise.\n mit_mot_out_slices : int32 ndarray( can be replaced by list of lists)\n Same as tap_array, but for the output taps of mit_mot sequences\n mit_mot_out_nslices: int32 ndarray (Can be replaced by a list)\n Same as tap_array_len, but is the number of output taps of the\n mit_mot sequences (i.e. it corresponds to mit_mot_out_slices)\n fn: callable\n This is the linker, i.e. the function that will loop over the\n computational graph and call the perform of each operation. For this\n linker there is a c version in gof/lazy_linker.c that will be the\n starting point of implementing this funciton in C ( we need to take\n all the code around the call of this function and put in C inside\n that code)\n fnct: python object\n Only used to attach some timings for the profile mode ( can be\n skiped if we don't care about Theano's profile mode)\n inplace\n Boolean that says if things should be computed inplace or if they\n should not.\n args: list of ndarrays (and random states)\n The inputs of scan in a given order ( n_steps, sequences, mit_mot,\n mit_sot, sit_sot, nit_sot, shared_outs, other_args)\n outs: list of 1 element list ( or storage objects?)\n This is where we need to copy our outputs ( we don't return the\n results, though we can change the code such that we return, and\n figure things out on the outside - python)\n self: python object\n The scan op itself. I only use it to attach to it some timing\n informations .. but I don;t need to.\n\n ";