提交 35d1fa4d authored 作者: Frédéric Bastien's avatar Frédéric Bastien

Merge pull request #3562 from abergeron/doc_context

Small tutorial for op params
......@@ -43,11 +43,11 @@ There are less methods to define for an Op than for a Type:
that a python exception is set) if your C code needs to
raise an exception.
``sub['context']``
``sub['params']``
(optional) The name of the variable which holds the context
for the node. This will only appear if the op has requested
a context by having a :meth:`get_context()` method that return
a context by having a :meth:`get_params()` method that return
something other than None.
.. method:: c_code_cleanup(node, name, input_names, output_names, sub)
......@@ -142,11 +142,11 @@ There are less methods to define for an Op than for a Type:
that a python exception is set) if your C code needs to
raise an exception.
``sub['context']``
``sub['params']``
(optional) The name of the variable which holds the context
for the node. This will only appear if the op has requested
a context by having a :meth:`get_context()` method that return
a context by having a :meth:`get_params()` method that return
something other than None.
.. method:: c_support_code()
......@@ -223,18 +223,18 @@ There are less methods to define for an Op than for a Type:
is high or when theano compilation directory is shared by many
process (like on a network file server on a cluster).
.. method:: get_context(node)
.. method:: get_params(node)
(optional) If defined, should return the runtime context the op
needs. This context will be passed to the C code through the
variable named in `sub['context']`. The variable is also
(optional) If defined, should return the runtime params the op
needs. These parameters will be passed to the C code through the
variable named in `sub['params']`. The variable is also
available for use in the code returned by
:meth:`c_init_code_struct`. If it returns `None` this is
considered the same as if the method was not defined.
If this method is defined and does not return `None`, then the
Op *must* have a `context_type` property with the Type to use
for the context variable.
Op *must* have a `params_type` property with the Type to use
for the params variable.
.. attribute:: _f16_ok
......
......@@ -33,6 +33,7 @@ a C implementation.
other_ops
ctype
cop
using_params
optimization
tips
unittest
......
.. _extending_op_params:
===============
Using Op params
===============
The Op params is a facility to pass some runtime parameters to the
code of an op without modifying it. It can enable a single instance
of C code to serve different needs and therefore reduce compilation.
The code enables you to pass a single object, but it can be a struct
or python object with multiple values if you have more than one value
to pass.
We will first introduce the parts involved in actually using this
functionality and then present a simple working example.
The params type
----------------
You can either reuse an existing type such as :class:`Generic` or
create your own.
Using a python object for your op parameters (:class:`Generic`) can be
annoying to access from C code since you would have to go through the
Python-C API for all accesses.
Making a purpose-built class may require more upfront work, but can
pay off if you reuse the type for a lot of Ops, by not having to re-do
all of the python manipulation.
Defining a params type
~~~~~~~~~~~~~~~~~~~~~~
.. note::
This section is only relevant if you decide to create your own type.
The first thing you need to do is to define a Theano Type for your
params object. It doesn't have to be complete type because only the
following methods will be used for the type:
- :meth:`filter <PureType.filter>`
- :meth:`__eq__ <PureType.__eq__>`
- :meth:`__hash__ <PureType.__hash__>`
- :meth:`values_eq <PureType.values_eq>`
Additionaly if you want to use your params with C code, you need the
following methods:
- :meth:`c_declare <CLinkerType.c_declare>`
- :meth:`c_init <CLinkerType.c_init>`
- :meth:`c_extract <CLinkerType.c_extract>`
- :meth:`c_cleanup <CLinkerType.c_cleanup>`
You can also define other convenience methods such as
:meth:`c_headers <CLinkerType.c_headers>` if you need any special things.
Registering the params with your Op
-----------------------------------
To declare that your Op uses params you have to set the class
attribute :attr:`params_type` to an instance of your params Type.
.. note::
If you want to have multiple parameters you have to bundle those
inside a single object and use that as the params type.
For example if we decide to use an int as the params the following
would be appropriate:
.. code-block:: python
class MyOp(Op):
params_type = Generic()
After that you need to define a :meth:`get_params` method on your
class with the following signature:
.. code-block:: python
def get_params(self, node)
This method must return a valid object for your Type (an object that
passes :meth:`filter`). The `node` parameter is the Apply node for
which we want the params. Therefore the params object can depend on
the inputs and outputs of the node.
.. note::
Due to implementation restrictions, None is not allowed as a
params object and will be taken to mean that the Op doesn't have
parameters.
Since this will change the expected signature of a few methods, it
is strongly discouraged to have your :meth:`get_params` method
return None.
Signature changes from having params
------------------------------------
Having declared a params for your Op will affect the expected
signature of :meth:`perform`. The new expected signature will have an
extra parameter at the end which corresponds to the params object.
.. warning::
If you do not account for this extra parameter, the code will fail
at runtime if it tries to run the python version.
Also, for the C code, the `sub` dictionary will contain an extra entry
`'params'` which will map to the variable name of the params object.
This is true for all methods that recieve a `sub` parameter, so this
means that you can use your params in the :meth:`c_code <Op.c_code>`
and :meth:`c_init_code_struct <Op.c_init_code_struct>` method.
A simple example
----------------
This is a simple example which uses a params object to pass a value.
This Op will multiply a scalar input by a fixed floating point value.
Since the value in this case is a python float, we chose Generic as
the params type.
.. testcode::
from theano import Op
from theano.gof.type import Generic
from theano.scalar import as_scalar
class MulOp(Op):
params_type = Generic()
__props__ = ('mul',)
def __init__(self, mul):
self.mul = float(mul)
def get_params(self, node):
return self.mul
def make_node(self, inp):
inp = as_scalar(inp)
return Apply(self, [inp], [inp.type()]
def perform(self, node, inputs, output_storage, params):
# Here params is a python float so this is ok
output_storage[0][0] = inputs[0] * params
def c_code(self, node, name, inputs, outputs, sub):
return ("%(z)s = %(x)s * PyFloat_AsDouble(%(p)s);" %
dict(z=outputs[0], x=inputs[0], p=sub['params']))
.. testoutput::
:hide:
A more complex example
----------------------
This is a more complex example which actually passes multiple values.
It does a linear combination of two values using floating point
weights.
.. testcode::
from theano import Op
from theano.gof.type import Generic
from theano.scalar import as_scalar
class ab(object):
def __init__(self, alpha, beta):
self.alpha = alpha
self.beta = beta
class Mix(Op):
params_type = Generic()
__props__ = ('alpha', 'beta')
def __init__(self, alpha, beta):
self.alpha = alpha
self.beta = beta
def get_params(self, node):
return ab(alpha=self.alpha, beta=self.beta)
def make_node(self, x, y):
x = as_scalar(x)
y = as_scalar(y)
return Apply(self, [x, y], [x.type()]
def c_support_code_struct(self, node, name):
return """
double alpha_%(name)s;
double beta_%(name)s;
""" % dict(name=name)
def c_init_code_struct(self, node, name, sub):
return """{
PyObject *tmp;
tmp = PyObject_GetAttrString(%(p)s, "alpha");
if (tmp == NULL)
%(fail)s
alpha_%(name)s = PyFloat_AsDouble(tmp);
Py_DECREF(%(tmp)s);
if (PyErr_Occurred())
%(fail)s
tmp = PyObject_GetAttrString(%(p)s, "beta");
if (tmp == NULL)
%(fail)s
beta_%(name)s = PyFloat_AsDouble(tmp);
Py_DECREF(tmp);
if (PyErr_Occurred())
%(fail)s
}""" % dict(name=name, p=sub['params'], fail=sub['fail'])
def c_code(self, node, name, inputs, outputs, sub):
return """
%(z)s = alpha_%(name)s * %(x)s + beta_%(name)s * %(y)s;
""" % dict(name=name, z=outputs[0], x=inputs[0], y=inputs[1])
.. testoutput::
:hide:
......@@ -924,8 +924,8 @@ In addition to these macros, the ``init_code_struct``, ``code``, and
You can add a semicolon after the macro if it makes your editor
happy.
* ``CONTEXT`` : Name of the context variable for this node. (only
for Ops which have a context, which is discussed elsewhere)
* ``PARAMS`` : Name of the params variable for this node. (only
for Ops which have params, which is discussed elsewhere)
Finally the tag ``code`` and ``code_cleanup`` have macros to
pass the inputs and output names. These are name ``INPUT_{i}`` and
......
......@@ -585,22 +585,22 @@ class CLinker(link.Linker):
self.variables = [var for var in self.inputs if not len(var.clients)]
self.variables += graph.variables(self.inputs, self.outputs)
# This adds a hidden input which is the context for each node
# This adds a hidden input which is the params for each node
# that needs it
self.contexts = dict()
self.node_params = dict()
for node in self.node_order:
ctx = node.run_context()
if ctx is not graph.NoContext:
params = node.run_params()
if params is not graph.NoParams:
# try to avoid creating more than one variable for the
# same context.
if ctx in self.contexts:
var = self.contexts[ctx]
assert var.type == node.context_type
var.clients.append((node, 'context'))
# same params.
if params in self.node_params:
var = self.node_params[params]
assert var.type == node.params_type
var.clients.append((node, 'params'))
else:
var = graph.Constant(node.context_type, ctx)
var.clients = [(node, 'context')]
self.contexts[ctx] = var
var = graph.Constant(node.params_type, params)
var.clients = [(node, 'params')]
self.node_params[params] = var
self.variables.append(var)
# The orphans field is listified to ensure a consistent order.
......@@ -743,9 +743,9 @@ class CLinker(link.Linker):
sub = dict(failure_var=failure_var)
ctx = node.run_context()
if ctx is not graph.NoContext:
context_var = symbol[self.contexts[ctx]]
params = node.run_params()
if params is not graph.NoParams:
params_var = symbol[self.node_params[params]]
# The placeholder will be replaced by a hash of the entire
# code (module + support code) in DynamicModule.code.
......@@ -761,16 +761,16 @@ class CLinker(link.Linker):
# Make the CodeBlock for c_code
sub['id'] = id
sub['fail'] = failure_code(sub)
if ctx is not graph.NoContext:
sub['context'] = context_var
if params is not graph.NoParams:
sub['params'] = params_var
sub_struct = dict()
sub_struct['id'] = id + 1
sub_struct['fail'] = failure_code_init(sub)
if ctx is not graph.NoContext:
# Since context inputs are always constants they are
if params is not graph.NoParams:
# Since params inputs are always constants they are
# guaranteed to be available in the struct init code.
sub_struct['context'] = context_var
sub_struct['params'] = params_var
struct_support = ""
struct_init = ""
......
......@@ -22,7 +22,7 @@ __docformat__ = "restructuredtext en"
is_same_graph_with_merge = None
equal_computations = None
NoContext = object()
NoParams = object()
class Node(utils.object2):
......@@ -123,14 +123,14 @@ class Apply(Node):
else:
raise TypeError("The 'outputs' argument to Apply must contain Variable instances with no owner, not %s" % output)
def run_context(self):
def run_params(self):
"""
Returns the context for the node, or NoContext if no context is set.
Returns the params for the node, or NoParams if no params is set.
"""
if hasattr(self.op, 'get_context'):
return self.op.get_context(self)
return NoContext
if hasattr(self.op, 'get_params'):
return self.op.get_params(self)
return NoParams
def __getstate__(self):
d = self.__dict__
......@@ -263,7 +263,7 @@ class Apply(Node):
Property: Number of outputs.
"""
context_type = property(lambda self: self.op.context_type, doc='type to use for the context')
params_type = property(lambda self: self.op.params_type, doc='type to use for the params')
class Variable(Node):
......
......@@ -857,9 +857,9 @@ class Op(utils.object2, PureOp, CLinkerOp):
p = node.op.perform
ctx = node.run_context()
params = node.run_params()
if ctx is graph.NoContext:
if params is graph.NoParams:
# default arguments are stored in the closure of `rval`
def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
r = p(n, [x[0] for x in i], o)
......@@ -867,11 +867,11 @@ class Op(utils.object2, PureOp, CLinkerOp):
compute_map[o][0] = True
return r
else:
ctx_val = node.context_type.filter(ctx)
params_val = node.params_type.filter(params)
def rval(p=p, i=node_input_storage, o=node_output_storage, n=node,
ctx=ctx_val):
r = p(n, [x[0] for x in i], o, ctx)
params=params_val):
r = p(n, [x[0] for x in i], o, params)
for o in node.outputs:
compute_map[o][0] = True
return r
......@@ -1403,9 +1403,9 @@ class COp(Op):
define_macros.append("#define FAIL %s" % (
self._lquote_macro(sub['fail']),))
undef_macros.append("#undef FAIL")
if 'context' in sub:
define_macros.append("#define CONTEXT %s" % (sub['context'],))
undef_macros.append("#undef CONTEXT")
if 'params' in sub:
define_macros.append("#define PARAMS %s" % (sub['params'],))
undef_macros.append("#undef PARAMS")
return os.linesep.join(define_macros), os.linesep.join(undef_macros)
......@@ -1442,21 +1442,21 @@ class COp(Op):
define_macros, undef_macros = self.get_c_macros(node, name,
check_input=False)
ctx = ""
if 'context' in sub:
ctx = ", %s" % (sub['context'],)
params = ""
if 'params' in sub:
params = ", %s" % (sub['params'],)
# Generate the C code
return """
%(define_macros)s
{
if (%(func_name)s(%(func_args)s%(ctx)s) != 0) {
if (%(func_name)s(%(func_args)s%(params)s) != 0) {
%(fail)s
}
}
%(undef_macros)s
""" % dict(func_name=self.func_name,
fail=sub['fail'], ctx=ctx,
fail=sub['fail'], params=params,
func_args=self.format_c_function_args(inp, out),
define_macros=define_macros,
undef_macros=undef_macros)
......
......@@ -174,7 +174,7 @@ class Kernel(object):
class GpuKernelBase(object):
context_type = gpu_context_type
params_type = gpu_context_type
def gpu_kernels(self, node, name):
"""
......@@ -219,7 +219,7 @@ class GpuKernelBase(object):
def c_support_code_apply(self, node, name):
kernels = self.gpu_kernels(node, name)
ctx = self.get_context(node)
ctx = self.get_params(node)
bins = '\n'.join(self._generate_kernel_bin(k, ctx) for k in kernels)
codes = '\n'.join(self._generate_kernel_code(k) for k in kernels)
return '\n'.join([bins, codes])
......@@ -253,7 +253,7 @@ class GpuKernelBase(object):
flags=k._get_c_flags(), fail=fail, ctx=ctx)
def c_init_code_struct(self, node, name, sub):
ctx = sub['context']
ctx = sub['params']
kernels = self.gpu_kernels(node, name)
inits_0 = '\n'.join(self._generate_zeros(k) for k in kernels)
inits = '\n'.join(self._generate_kernel_init(k, sub['fail'], ctx)
......@@ -274,7 +274,7 @@ class GpuKernelBase(object):
return (self.c_code_cache_version(), self.kernel_version(node))
def kernel_version(self, node):
return (3, node.get_context().bin_id)
return (3, self.get_params(node).bin_id)
class HostFromGpu(Op):
......@@ -356,7 +356,7 @@ host_from_gpu = HostFromGpu()
class GpuFromHost(Op):
__props__ = ('context_name',)
_f16_ok = True
context_type = gpu_context_type
params_type = gpu_context_type
def __init__(self, context_name):
self.context_name = context_name
......@@ -371,7 +371,7 @@ class GpuFromHost(Op):
context_name=self.context_name,
dtype=x.dtype)()])
def get_context(self, node):
def get_params(self, node):
return get_context(self.context_name)
def perform(self, node, inp, out, ctx):
......@@ -429,7 +429,7 @@ class GpuFromHost(Op):
%(fail)s
}
}
""" % {'name': name, 'inp': inputs[0], 'ctx': sub['context'],
""" % {'name': name, 'inp': inputs[0], 'ctx': sub['params'],
'out': outputs[0], 'fail': sub['fail']}
def c_code_cache_version(self):
......@@ -439,7 +439,7 @@ class GpuFromHost(Op):
class GpuToGpu(Op):
__props__ = ('context_name',)
_f16_ok = True
context_type = gpu_context_type
params_type = gpu_context_type
def __init__(self, context_name):
self.context_name = context_name
......@@ -454,7 +454,7 @@ class GpuToGpu(Op):
context_name=self.context_name,
dtype=x.dtype)()])
def get_context(self, node):
def get_params(self, node):
return get_context(self.context_name)
def perform(self, node, inp, out, ctx):
......@@ -479,7 +479,7 @@ class GpuToGpu(Op):
if (%(out)s == NULL) {
%(fail)s
}
""" % {'inp': inputs[0], 'ctx': sub['context'],
""" % {'inp': inputs[0], 'ctx': sub['params'],
'out': outputs[0], 'fail': sub['fail']}
def c_code_cache_version(self):
......@@ -501,13 +501,13 @@ class GpuAlloc(HideC, Alloc):
__props__ = ('memset_0', 'context_name')
_f16_ok = True
context_type = gpu_context_type
params_type = gpu_context_type
def __init__(self, context_name, memset_0=False):
self.context_name = context_name
self.memset_0 = memset_0
def get_context(self, node):
def get_params(self, node):
return get_context(self.context_name)
def __str__(self):
......@@ -605,7 +605,7 @@ class GpuAlloc(HideC, Alloc):
%(fail)s
}
}
""" % dict(name=name, ndim=ndim, zz=zz, vv=vv, ctx=sub['context'],
""" % dict(name=name, ndim=ndim, zz=zz, vv=vv, ctx=sub['params'],
fail=sub['fail'], memset_0=memset_0)
if config.gpuarray.sync:
......@@ -650,13 +650,13 @@ class GpuAlloc(HideC, Alloc):
class GpuAllocEmpty(HideC, Alloc):
__props__ = ('dtype', 'context_name')
_f16_ok = True
context_type = gpu_context_type
params_type = gpu_context_type
def __init__(self, dtype, context_name):
self.dtype = dtype
self.context_name = context_name
def get_context(self, node):
def get_params(self, node):
return get_context(self.context_name)
def make_node(self, *shape):
......@@ -702,7 +702,7 @@ if (theano_prep_output(&%(zz)s, %(ndim)s, shape, %(type)s, GA_C_ORDER,
%(fail)s
}
""" % dict(zz=zz, ndim=ndim, type=gpuarray.dtype_to_typecode(self.dtype),
fail=fail, ctx=sub['context']))
fail=fail, ctx=sub['params']))
return ''.join(code)
......@@ -909,7 +909,7 @@ class GpuReshape(HideC, tensor.Reshape):
class GpuJoin(HideC, Join):
_f16_ok = True
context_type = gpu_context_type
params_type = gpu_context_type
def make_node(self, axis, *tensors):
node = Join.make_node(self, axis, *tensors)
......@@ -924,7 +924,7 @@ class GpuJoin(HideC, Join):
dtype=node.outputs[0].dtype,
context_name=ctx_name)()])
def get_context(self, node):
def get_params(self, node):
return node.outputs[0].type.context
def perform(self, node, axis_and_tensors, out_, ctx):
......@@ -972,7 +972,7 @@ if (%(out)s == NULL)
%(fail)s
""" % dict(n=len(inputs[1:]), fail=sub['fail'], out=out_[0],
axis=inputs[0], copy_inputs_to_list='\n'.join(copy_to_list),
restype=restype, ctx=sub['context'])
restype=restype, ctx=sub['params'])
gpu_join = GpuJoin()
......@@ -998,7 +998,7 @@ class GpuEye(GpuKernelBase, Op):
self.dtype = dtype
self.context_name = context_name
def get_context(self, node):
def get_params(self, node):
return get_context(self.context_name)
def make_node(self, n, m, k):
......@@ -1043,7 +1043,7 @@ KERNEL void k(GLOBAL_MEM %(ctype)s *a, ga_size n, ga_size m) {
n, m = inp
z, = out
fail = sub['fail']
ctx = sub['context']
ctx = sub['params']
typecode = pygpu.gpuarray.dtype_to_typecode(self.dtype)
sync = bool(config.gpuarray.sync)
kname = self.gpu_kernels(node, name)[0].objvar
......
......@@ -135,7 +135,7 @@ class GpuConv(GpuKernelBase, gof.Op):
out = GpuArrayType(img.dtype, broadcastable, context_name=ctx_name)()
return gof.Apply(self, [img, kern], [out])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def flops(self, inputs, outputs):
......
......@@ -133,9 +133,9 @@ class DnnBase(COp):
# dnn does not know about broadcasting, so we do not need to assert
# the input broadcasting pattern.
check_broadcast = False
context_type = gpu_context_type
params_type = gpu_context_type
def get_context(self, node):
def get_params(self, node):
return node.outputs[0].type.context
def __init__(self, files=None, c_func=None):
......
......@@ -107,14 +107,14 @@ cudnnHandle_t APPLY_SPECIFIC(_handle);
#section init_code_struct
{
cuda_enter(CONTEXT->ctx);
cuda_enter(PARAMS->ctx);
cudnnStatus_t err;
APPLY_SPECIFIC(_handle) = NULL;
if ((err = cudnnCreate(&APPLY_SPECIFIC(_handle))) != CUDNN_STATUS_SUCCESS) {
PyErr_Format(PyExc_RuntimeError, "could not create cuDNN handle: %s",
cudnnGetErrorString(err));
cuda_exit(CONTEXT->ctx);
cuda_exit(PARAMS->ctx);
FAIL;
}
cuda_exit(CONTEXT->ctx);
cuda_exit(PARAMS->ctx);
}
......@@ -101,7 +101,7 @@ class GpuElemwise(GpuKernelBase, HideC, Elemwise):
return node
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def generate_kernel(self, node, nodename):
......@@ -173,7 +173,7 @@ class GpuElemwise(GpuKernelBase, HideC, Elemwise):
("npy_float64", "ga_double"),
]:
kop = kop.replace(npy, ga)
return ElemwiseKernel(self.get_context(node), inps + outs, kop,
return ElemwiseKernel(self.get_params(node), inps + outs, kop,
preamble=support_code)
def c_headers(self):
......@@ -222,7 +222,7 @@ class GpuElemwise(GpuKernelBase, HideC, Elemwise):
fail = sub["fail"]
initial_dims = ','.join('1' for i in xrange(nd))
opname = str(self.scalar_op)
ctx = sub['context']
ctx = sub['params']
# check that all inputs have valid dimensions
emitted_inames = {}
......@@ -650,7 +650,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
ret.outputs[0].type.broadcastable,
context_name=x.type.context_name)()])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def perform(self, node, inp, out, ctx):
......@@ -683,7 +683,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
inp = ['fake_input_name_%d' % i for i in xrange(len(inputs))]
out = ['fake_output_name_%d' % i for i in xrange(len(node.outputs))]
sub = {'fail': 'fake failure code', 'context': 'fake context'}
sub = {'fail': 'fake failure code', 'params': 'fake context'}
try:
self.c_code(node, name, inp, out, sub)
......@@ -711,7 +711,7 @@ class GpuCAReduceCuda(GpuKernelBase, HideC, CAReduceDtype):
sio = StringIO()
fail = sub['fail']
ctx = sub['context']
ctx = sub['params']
# check input
print("""
......@@ -2664,7 +2664,7 @@ class GpuCAReduceCPY(GpuKernelBase, HideC, CAReduceDtype):
return Apply(res.op, [input], [otype()])
def get_context(self, node):
def get_params(self, node):
return node.outputs[0].type.context
def make_thunk(self, node, storage_map, compute_map, no_recycling):
......@@ -2776,7 +2776,7 @@ class GpuCAReduceCPY(GpuKernelBase, HideC, CAReduceDtype):
}
}
""" % dict(output=output, nd_out=nd_out, fail=sub['fail'],
ctx=sub['context'],
ctx=sub['params'],
out_type=dtype_to_typecode(node.outputs[0].type.dtype))
else:
code += """
......@@ -2788,7 +2788,7 @@ class GpuCAReduceCPY(GpuKernelBase, HideC, CAReduceDtype):
%(fail)s
}
}
""" % dict(output=output, fail=sub['fail'], ctx=sub['context'],
""" % dict(output=output, fail=sub['fail'], ctx=sub['params'],
out_type=dtype_to_typecode(node.outputs[0].type.dtype))
if acc_dtype != node.outputs[0].type.dtype:
......@@ -2796,7 +2796,7 @@ class GpuCAReduceCPY(GpuKernelBase, HideC, CAReduceDtype):
tmp = pygpu_empty(%(output)s->ga.nd, %(output)s->ga.dimensions,
%(acc_type)s, GA_C_ORDER, %(ctx)s, Py_None);
if (!tmp) %(fail)s
""" % dict(output=output, fail=sub['fail'], ctx=sub['context'],
""" % dict(output=output, fail=sub['fail'], ctx=sub['params'],
acc_type=dtype_to_typecode(acc_dtype))
else:
code += """
......
......@@ -2,7 +2,7 @@
/* Why do we need this? */
size_t dim = 2048 * 32;
rand_buf = pygpu_empty(1, &dim, GA_UINT, GA_C_ORDER, CONTEXT,
rand_buf = pygpu_empty(1, &dim, GA_UINT, GA_C_ORDER, PARAMS,
Py_None);
if (rand_buf == NULL) {
FAIL;
......
......@@ -41,7 +41,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
dtype=ten4.type.dtype,
context_name=ten4.type.context_name)()])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def c_code_cache_version(self):
......@@ -250,7 +250,7 @@ class GpuImages2Neibs(GpuKernelBase, Images2Neibs, Op):
ten4, neib_shape, neib_step = inp
z, = out
fail = sub['fail']
ctx = sub['context']
ctx = sub['params']
mode = self.mode
err_check = """
if (err != GA_NO_ERROR) {
......
......@@ -43,7 +43,7 @@ def ensure_float(val, name):
class Gemm16(COp):
__props__ = ('relu', 'inplace')
_f16_ok = True
context_type = gpu_context_type
params_type = gpu_context_type
KERN_NAMES = ('nn_128x128', 'nn_128x64', 'nn_128x32',
'nn_vec_128x128', 'nn_vec_128x64', 'nn_vec_128x32',
'tn_128x128', 'tn_128x64', 'tn_128x32',
......@@ -75,7 +75,7 @@ class Gemm16(COp):
return Apply(self, [C, alpha, A, B, beta], [C.type()])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def c_headers(self):
......@@ -128,7 +128,7 @@ if (GpuKernel_init(&k_%(name)s, c->ops, c->ctx, 1, &bcode, &sz,
codel.append("memset(&k_{0}, 0, sizeof(GpuKernel));".format(name))
codel.append("const char *bcode;")
codel.append("size_t sz;")
codel.append("PyGpuContextObject *c = %s;" % (sub['context'],))
codel.append("PyGpuContextObject *c = %s;" % (sub['params'],))
codel.append("int types[13] = {GA_BUFFER, GA_BUFFER, GA_BUFFER, "
"GA_BUFFER, GA_INT, GA_INT, GA_INT, GA_INT, GA_INT, "
"GA_INT, GA_FLOAT, GA_FLOAT, GA_INT};")
......
......@@ -41,7 +41,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op):
am = y_idx.type()
return Apply(self, [x, b, y_idx], [nll, sm, am])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def c_headers(self):
......@@ -169,7 +169,7 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias(GpuKernelBase, Op):
dtype_am = node.outputs[2].dtype
classname = self.__class__.__name__
fail = sub['fail']
ctx = sub['context']
ctx = sub['params']
k_var = "k_xent_sm_1hot_bias_%(nodename)s" % locals()
err_check = """
if (err != GA_NO_ERROR) {
......@@ -322,7 +322,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op):
y_idx = as_gpuarray_variable(y_idx, ctx_name)
return Apply(self, [dnll, sm, y_idx], [sm.type()])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def c_code_cache_version(self):
......@@ -347,7 +347,7 @@ class GpuCrossentropySoftmax1HotWithBiasDx(GpuKernelBase, Op):
dnll, sm, y_idx = inp
dx, = out
fail = sub['fail']
ctx = sub['context']
ctx = sub['params']
k_var = "kCrossEntropySoftmax1HotWithBiasDx_" + nodename
err_check = """
if (err != GA_NO_ERROR) {
......@@ -528,7 +528,7 @@ class GpuSoftmax(GpuKernelBase, Op):
x = as_gpuarray_variable(x, infer_context_name(x))
return Apply(self, [x], [x.type()])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def infer_shape(self, node, shape):
......@@ -552,7 +552,7 @@ class GpuSoftmax(GpuKernelBase, Op):
x, = inp
z, = out
fail = sub['fail']
ctx = sub['context']
ctx = sub['params']
err_check = """
if (err != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, fmt_str, msg);
......@@ -727,7 +727,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op):
b = as_gpuarray_variable(b, ctx_name)
return Apply(self, [x, b], [x.type()])
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
def infer_shape(self, node, shape):
......@@ -753,7 +753,7 @@ class GpuSoftmaxWithBias(GpuKernelBase, Op):
x, b = inp
z, = out
fail = sub['fail']
ctx = sub['context']
ctx = sub['params']
err_check = """
if (err != GA_NO_ERROR) {
PyErr_Format(PyExc_RuntimeError, fmt_str, msg);
......
......@@ -202,7 +202,7 @@ class GpuIncSubtensor(GpuKernelBase, IncSubtensor):
op.create_iadd_node(ret)
return ret
def get_context(self, node):
def get_params(self, node):
return node.outputs[0].type.context
def create_iadd_node(self, node):
......@@ -609,7 +609,7 @@ class GpuAdvancedIncSubtensor1_dev20(GpuKernelBase, GpuAdvancedIncSubtensor1):
return gof.Apply(self, [x_, y_, ilist_], [x_.type()])
def get_context(self, node):
def get_params(self, node):
return node.outputs[0].type.context
def perform(self, node, inp, out, ctx):
......@@ -626,7 +626,7 @@ class GpuAdvancedIncSubtensor1_dev20(GpuKernelBase, GpuAdvancedIncSubtensor1):
return [os.path.dirname(__file__)]
def c_code(self, node, name, inputs, outputs, sub):
ctx = self.get_context(node)
ctx = self.get_params(node)
if ctx.kind != 'cuda':
raise NotImplementedError("cuda only")
if (self.set_instead_of_inc or
......
......@@ -771,7 +771,7 @@ class GPUA_mrg_uniform(GpuKernelBase, mrg_uniform_base):
# GpuArray version
_f16_ok = True
def get_context(self, node):
def get_params(self, node):
return node.inputs[0].type.context
@classmethod
......@@ -1014,7 +1014,7 @@ class GPUA_mrg_uniform(GpuKernelBase, mrg_uniform_base):
""" % locals()
def c_code_cache_version(self):
return (7, self.GpuKernelBase_version)
return (7,)
def guess_n_streams(size, warn=False):
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论