Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
c7dbca05
提交
c7dbca05
authored
9月 22, 2014
作者:
Pierre Luc Carrier
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
First version of the C op tutorial.
上级
025d484e
隐藏空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
460 行增加
和
0 行删除
+460
-0
extending_theano_c.txt
doc/tutorial/extending_theano_c.txt
+460
-0
没有找到文件。
doc/tutorial/extending_theano_c.txt
0 → 100644
浏览文件 @
c7dbca05
.. _extending_theano_c:
============================
Extending Theano with a C Op
============================
This tutorial covers how to extend Theano with an op that offers a C
implementation. It does not cover ops that run on a GPU but it does introduce
many elements and concepts which are relevant for GPU ops. This tutorial is
aimed at individuals who already know how to extend Theano (see tutorial
:ref:`extending_theano`) by adding a new op with a python implementation
and will only cover the additional knowledge required to also produce ops
with C implementations.
Providing a Theano op with a C implementation requires to interact with
Python's C-API and Numpy's C-API. Thus, the first step of this tutorial is to
introduce both and highlight their features which are most relevant to the
task of implementing a C op. This tutorial then introduces the most important
methods that the op needs to implement in order to provide a usable C
implementation. Finally, it shows how to combine these elements to write a
simple C op for performing the simple task of multiplying every element in a
vector by a scalar.
Python C-API
============
Python provides a C-API to allow the manipulation of python objects from
C code. In this API, all classes that represent Python objects are descendants
of the class PyObject. This class is essentially a wrapper; an instance of
PyObject contains a pointer to another object as well as a reference count
for that object. Thus, an instance of PyObject allows to treat a pointer to an
object as an object itself.
As such, manipulating a PyObject instance is often straight-forward but it
is important to properly manage its reference count. Failing to do so can
lead to undesired behavior in the C code.
Reference counting
------------------
Reference counting is a mechanism for keeping track, for an object, of
the number of references to it held by other entities. This mechanism is often
used for purposes of garbage collecting because it allows to easily see if
an object is still being used by other entities. When the reference count
for an object drops to 0, it means it is not used by any anyone and can
be safely deleted.
PyObjects implement reference counting and the Python C-API defines a number
of macros to help manage those reference counts. The definition of these
macros can be found here : `Python C-API Reference Counting
<https://docs.python.org/2/c-api/refcounting.html>`_. Listed below are the
two macros most often used in Theano C ops.
.. method:: void Py_XINCREF(PyObject *o)
Increments the reference count of object o. Without effect if the object
is NULL.
.. method:: void Py_XDECREF(PyObject *o)
Decrements the reference count of object o. If the reference count reaches
0, it will trigger a call of the object's deallocation function. Without
effect if the object is NULL.
The general principle, in the reference counting paradigm, is that the owner
of a reference to an object is responsible for disposing properly of it.
This can be done by decrementing the reference count once the reference is no
longer used or by transfering ownership; passing on the reference to a new
owner which becomes responsible for it.
Some functions return "borrowed references"; this means that they return a
reference to an object **without** transfering ownership of the reference to the
caller of the function. This means that if you call a function which returns a
borrowed reference, you do not have the burden of properly disposing of that
reference. You should **not** call Py_XDECREF() on a borrowed reference.
Correctly managing the reference counts is important as failing to do so can
lead to issues ranging from memory leaks to segmentation fauls.
NumPy C-API
===========
The NumPy library provides a C-API to allow users to create, access and
manipulate NumPy arrays from within their own C routines. NumPy's ndarrays
are used extensively inside theano and so extending Theano with a C op will
require interaction with the NumPy C-API.
This sections covers the API's elements that are often required to write code
for a Theano C op. The full documentation for the API can be found here :
`NumPy C-API <http://docs.scipy.org/doc/numpy/reference/c-api.html>`_
NumPy ndarrays
--------------
In the NumPy C-API, NumPy arrays are represented as instances of the
PyArrayObject class which is a descendant of the PyObject class. This means
that, as for any other Python object that you manipulate from C code, you
need to appropriatedly manage the reference counts of PyArrayObject instances.
Unlike in a standard multidimensionnal C array, a NumPy array's internal data
representation does not have to occupy a continuous region in memory. In fact,
it can be C-contiguous, F-contiguous or non-contiguous. C-contiguous means
that the data is not only contiguous in memory but also that it is organized
such that the index of the latest dimension changes the fastest. If the
following array x
.. code-block:: python
x = [[1, 2, 3],
[4, 5, 6]]
is C-contiguous, it means that, in memory, the six values contained in the
array x are stored in the order [1, 2, 3, 4, 5, 6] (the first value is x[0,0],
the second value is x[0,1], the third value is x[0,2], the fourth value is
x[1,0], etc). F-contiguous (or Fortran Contiguous) also means that the data is
contiguous but that it is organized such that the index of the latest
dimension changes the slowest. If the array x is F-contiguous, it means that,
in memory, the values appear in the order [1, 4, 2, 5, 3, 6] (the first
value is x[0,0], the second value is x[1,0], the third value is x[0,1], etc).
Finally, the internal data can be non-contiguous. In this case, it occupies
a non-contiguous region in memory but it is still stored in an organized
fashion : the distance between the element x[i,j] and the element x[i+1,j]
of the array is constant over all valid values of i and j, just as the
distance between the element x[i,j] and the element x[i,j+1] of the array
is constant over all valid values of i and j. This distance between
consecutive elements of an array over a given dimension, is called the
stride of that dimension.
Accessing NumPy ndarrays' data and properties
---------------------------------------------
The following macros serve to access various attributes of NumPy ndarrays.
.. method:: void* PyArray_DATA(PyArrayObject* arr)
Returns a pointer to the first element of the array's data.
.. method:: int PyArray_NDIM(PyArrayObject* arr)
Returns the number of dimensions in the the array pointed by arr
.. method:: npy_intp* PyArray_DIMS(PyArrayObject* arr)
Returns a pointer on the first element of arr's internal array describing
its dimensions. This internal array contains as many elements as the
array arr has dimensions.
The macro PyArray_SHAPE is a synonym of PyArray_DIMS : it has the same
effect and is used in an identical way.
.. method:: npy_intp* PyArray_STRIDES(PyArrayObject* arr)
Returns a pointer on the first element of arr's internal array describing
the stride for each of its dimension. This array has as many elements as
the number of dimensions in arr. In this array, the strides are expressed
in number of bytes.
.. method:: PyArray_Descr* PyArray_DESCR(PyArrayObject* arr)
Returns a reference to the object representing the dtype of the array.
The macro PyArray_DTYPE is a synonym of the PyArray_DESCR() : it has the
same effect and is used in an identical way.
:note:
This is a borrowed reference so you do not need to decrement its
reference count once you are done with it.
.. method:: int PyArray_TYPE(PyArrayObject* arr)
Returns the typenumber for the elements of the array. Like the dtype, the
typenumber is a descriptor for the type of the data in the array. However,
the two are not synonyms and, as such, cannot be used in place of the
other.
.. method:: npy_intp PyArray_SIZE(PyArrayObject* arr)
Returns to total number of elements in the array
.. method:: bool PyArray_CHKFLAGS(PyArrayObject* arr, flags)
Returns true if the array has the specified flags. The variable flag
should either be a NumPy array flag or an integer obtained by applying
bitwise or to an ensemble of flags.
The flags that can be used in with this macro are :
NPY_ARRAY_C_CONTIGUOUS, NPY_ARRAY_F_CONTIGUOUS, NPY_ARRAY_OWNDATA,
NPY_ARRAY_ALIGNED, NPY_ARRAY_WRITEABLE, NPY_ARRAY_UPDATEIFCOPY.
Creating NumPy ndarrays
-----------------------
The following functions allow the creation and copy of NumPy arrays :
.. method:: PyObject* PyArray_Empty(int nd, npy_intp* dims, PyArray_Descr*
dtype, int fortran)
Constructs a new ndarray with the number of dimensions specified by nd,
shape specified by dims and data type specified by dtype. If fortran is
equal to 0, the data is organized in a C-contiguous layout, otherwise it
is organized in a F-contiguous layout. The array elements are not
initialized in any way.
The macro PyArray_EMPTY() performs the same function as the function
PyArray_Empty() but the data type is given as a typenum instead of a
pointer to a PyArray_Descr object.
.. method:: PyObject* PyArray_Zeros(int nd, npy_intp* dims, PyArray_Descr*
dtype, int fortran)
Constructs a new ndarray with the number of dimensions specified by nd,
shape specified by dims and data type specified by dtype. If fortran is
equal to 0, the data is organized in a C-contiguous layout, otherwise it
is organized in a F-contiguous layout. Every element in the array is
initialized to 0.
The macro PyArray_ZEROS() performs the same function as the function
PyArray_Zeros() but the data type is given as a typenum instead of a
pointer to a PyArray_Descr object.
.. method:: PyArrayObject* PyArray_GETCONTIGUOUS(PyObject* op):
Returns a C-contiguous and well-behaved copy of the array op. If op is
already C-congiguous and well-behaved, this function simply returns a
reference new reference to op.
Functions the C Op needs to define
==================================
There is a key difference between and op defining a Python implementation for
its computation and defining a C implementation. In the case of a Python
implementation, the op defines a function perform() which executes the
required python code to realize the op. In the case of a C implementation,
however, the op does **not** define a function that will execute the C code; it
instead defines functions that will **return** the C code to the caller.
This is because calling C code from Python code comes with a significant
overhead. If every op was responsible for executing it's own C code, every
time a Theano function was called, this overhead would occur as many times
as the number of ops with C implementations in the function's computational
graph.
To maximize performance, Theano instead requires the C ops to simply return
the code needed for their execution and takes upon itself the task of
organizing, linking and compiling the code from the various ops. Through this,
Theano is able to minimize the number of times C code is called from Python
code by maximizing the amount of computation that is done every time C code
is called from Python.
The following is a very crude example to illustrate how it's possible to
obtain performance gains with this process. Suppose you need to execute,
from Python code, 10 different ops, each one having a C implementation. If
each op was responsible for executing it's own C code, the overhead of
calling C code from Python code would occur 10 times. Consider now the case
where the ops instead return the C code for their execution. You could get
the C code from each op and then define your own C module that would call
the C code from each op in succession. In this case, the overhead would only
occur once; when calling your custom module itself.
Moreover, the fact that Theano itself takes care of compiling the C code,
instead of the individual ops, allows Theano to easily cache the compiled C
code. This allows for faster compilation times.
See :ref:`cop` for the full documentation of the various methods of the
class Op that are related to the C implementation. Of particular interest are:
* The functions c_libraries() and c_lib_dirs() to allow your op to use
external libraries.
* The function c_code_cleanup() to specify how the op should clean up
what it has allocated during its execution.
* The functions c_init_code() and c_init_code_apply() to specify code
that should be executed once when the module is initialized, before
anything else is executed.
* The functions c_compile_args() and c_no_compile_args() to specify
requirements regarding how the op's C code should be compiled.
This sections describes the functions c_code(), c_support_code() and
c_code_cache_version() because they are the ones that are most commonly
used.
.. method:: c_code(node, name, input_names, output_names, sub)
This method returns a string containing the C code to perform the
computation required by this op.
The ``node`` argument is an :ref:`apply` node representing an
application of the current Op on a list of inputs, producing a list of
outputs.
``input_names`` is a sequence of strings which contains as many strings
as the op has inputs. Each string contains the name of the C variable
to which the corresponding input has been assigned. For example, the name
of the C variable representing the first input of the op is given by
``input_names[0]``. You should therefore use this name to interact in your
C code to interact with that variable. ``output_names`` is used
identically to ``input_names``, but for the ops' outputs.
Finally, `sub` is a dictionary of extras parameters to the c_code
method. Among other things, it contains ``sub['fail']`` which is a string
of C code that you should execute (after ensuring that a python exception
is set) if your C code needs to raise an exception.
:note:
Your C code should not return the output of the computation but
rather put the results in the C variables whose names are contained in
the `output_names``.
.. method:: c_support_code()
Returns a string containing the support C code for this op. This code
will be included at the global scope level and can be used to define
functions and structs that will be used by the op's main C code.
.. method:: c_code_cache_version()
Returns a tuple of integers representing the version of the C code in this
op. Ex : (1, 4, 0) for version 1.4.0
This tuple is used by theano to cache the compiled C code for this op. As
such, the return value **MUST be CHANGED** everytime the C code is altered or
else Theano will disregard the change in the code and simply load a
previous version of the op from the cache. If you want to avoid caching of
the C code of this op, return an empty tuple or do not implement this
method.
:note:
Theano can handle tuples of any hashable objects as return values
for this function but, for greater readability and easier management,
this function should return a tuple of integers as previously
described.
Complete C Op example
=====================
In this section, we put together every concept that was covered in this
tutorial to generate an op which multiplies every element in a vector
by a scalar.
Notice how the reference count on the output variable is
managed. Also take note of how the new variables required for the op's
computation are declared in a new scope to avoid cross-initialization errors.
:note:
Given the simple nature of this op, there was no need to use the
c_support_code() function.
.. code-block:: python
import numpy
import theano
from theano import gof
import theano.tensor as T
class VectorTimesScalar(gof.Op):
__props__ = ()
def __init__(self, **kwargs):
gof.Op.__init__(self, **kwargs)
def make_node(self, x, y):
# Validate the inputs' type
if x.type.ndim != 1:
raise TypeError('x must be a 1-d vector')
if y.type.ndim != 0:
raise TypeError('y must be a scalar')
# Create an output variable of the same type as x
output_var = x.type.make_variable()
return gof.Apply(self, [x, y], [output_var])
def __str__(self):
return self.__class__.__name__
def c_code_cache_version(self):
return (1, 0)
def c_code(self, node, name, inp, out, sub):
x, y = inp
z, = out
dtype_x = node.inputs[0].dtype
dtype_y = node.inputs[1].dtype
dtype_z = node.outputs[0].dtype
itemsize_x = numpy.dtype(dtype_x).itemsize
itemsize_z = numpy.dtype(dtype_z).itemsize
fail = sub['fail']
c_code = """
// Validate the inputs
if (PyArray_NDIM(%(x)s) != 1)
{
PyErr_SetString(PyExc_ValueError, "x is not a 1d tensor");
%(fail)s;
}
if (PyArray_NDIM(%(y)s) != 0)
{
PyErr_SetString(PyExc_ValueError, "y is not a scalar");
%(fail)s;
}
// Validate that the output storage exists and has the same
// dimension as x.
if ((NULL == %(z)s) || PyArray_NDIM(%(z)s) != 1 ||
(PyArray_DIMS(%(x)s)[0] != PyArray_DIMS(%(z)s)[0]))
{
/* Reference received to invalid output variable.
Decrease received reference's ref count and allocate new
output variable */
Py_XDECREF(%(z)s);
%(z)s = (PyArrayObject*)PyArray_Empty(1,
PyArray_DIMS(%(x)s),
PyArray_DESCR(%(x)s),
0);
if (!%(z)s) {
%(fail)s;
}
}
// Perform the vector multiplication by a scalar
{
/* The declaration of the following variables is done in a new
scope to prevent cross initialization errors */
npy_%(dtype_x)s* x_data_ptr =
(npy_%(dtype_x)s*)PyArray_DATA(%(x)s);
npy_%(dtype_z)s* z_data_ptr =
(npy_%(dtype_z)s*)PyArray_DATA(%(z)s);
npy_%(dtype_y)s y_value =
((npy_%(dtype_y)s*)PyArray_DATA(%(y)s))[0];
int x_stride = PyArray_STRIDES(%(x)s)[0] / %(itemsize_x)s;
int z_stride = PyArray_STRIDES(%(z)s)[0] / %(itemsize_z)s;
int x_dim = PyArray_DIMS(%(x)s)[0];
for(int i=0; i < x_dim; i++)
{
z_data_ptr[i * z_stride] = (x_data_ptr[i * x_stride] *
y_value);
}
}
"""
return c_code % locals()
\ No newline at end of file
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论