Refactor doc.extending

* Fix class, attribute, and function formatting * Update the relevant glossary terms and references to them * Rename documentation to better reflect their content * Remove redundant documentation * Reorder index so that it starts with the basic, necessary articles * Grammar and wording updates

Refactor doc.extending
0091bc3a · Brandon T. Willard · Brandon T. Willard · a7183bc9 · a7183bc9 · a7183bc9
--- a/doc/extending/aesara_vs_c.rst
+++ b/doc/extending/aesara_vs_c.rst
-
-.. _aesara_vs_c:
-
-============
-Aesara vs. C
-============
-
-We describe some of the patterns in Aesara, and present their closest
-analogue in a statically typed language such as C:
-
-=============== ===========================================================
-Aesara          C
-=============== ===========================================================
-Apply           function application / function call
-Variable        local function data / variable
-Shared Variable global function data / variable
-Op              operations carried out in computation / function definition
-Type            data types
-=============== ===========================================================
-
-For example:
-
-.. code-block:: c
-
-   int d = 0;
-
-   int main(int a) {
-       int b = 3;
-       int c = f(b)
-       d = b + c;
-       return g(a, c);
-   }
-
-
-Based on this code snippet, we can relate ``f`` and ``g`` to Ops, ``a``,
-``b`` and ``c`` to Variables, ``d`` to Shared Variable, ``g(a, c)``,
-``f(b)`` and ``d = b + c`` (taken as meaning
-the action of computing ``f``, ``g`` or ``+`` on their respective inputs) to
-Applies. Lastly, ``int`` could be interpreted as the Aesara Type of the
-Variables ``a``, ``b``, ``c`` and ``d``.
--- a/doc/extending/cop.rst
+++ b/doc/extending/cop.rst
-.. _cop:
-
-=====================================
-Implementing the arithmetic COps in C
-=====================================
-
-Now that we have set up our ``double`` type properly to allow C
-implementations for operations that work on it, all we have to do now
-is to actually define these operations in C.
-
-
-How does it work?
-=================
-
-Before a C :ref:`COp` is executed, the variables related to each of its
-inputs will be declared and will be filled appropriately, either from
-an input provided by the end user (using `c_extract`) or it might simply
-have been calculated by another operation. For each of the outputs,
-the variables associated to them will be declared and initialized.
-
-The operation then has to compute what it needs to using the
-input variables and place the variables in the output variables.
-
-
-What needs to be defined
-========================
-
-There are less methods to define for a `COp` than for a `Type`:
-
-.. class:: COp
-
-    .. method:: c_code(node, name, input_names, output_names, sub)
-
-       This must return C code that carries the computation we want to
-       do.
-
-       `sub` is a dictionary of extras parameters to the c_code
-       method. It contains the following values:
-
-       ``sub['fail']``
-
-          A string of code that you should execute (after ensuring
-          that a python exception is set) if your C code needs to
-          raise an exception.
-
-       ``sub['params']``
-
-          (optional) The name of the variable which holds the context
-          for the node.  This will only appear if the op has requested
-          a context by having a :meth:`get_params()` method that return
-          something other than None.
-
-    .. method:: c_code_cleanup(node, name, input_names, output_names, sub)
-
-       This must return C code that cleans up whatever c_code
-       allocated and that we must free.
-
-       *Default:* The default behavior is to do nothing.
-
-    .. method:: c_headers([c_compiler])
-
-       Returns a list of headers to include in the file. 'Python.h' is
-       included by default so you don't need to specify it.  Also all
-       of the headers required by the Types involved (inputs and
-       outputs) will also be included.
-
-       The `c_compiler` [#2v]_ parameter is the C compiler that will
-       be used to compile the code for the node. You may get multiple
-       calls with different C compilers.
-
-    .. method:: c_header_dirs([c_compiler])
-
-       Returns a list of directories to search for headers (arguments
-       to -I).
-
-       The `c_compiler` [#2v]_ parameter is the C compiler that will
-       be used to compile the code for the node. You may get multiple
-       calls with different C compilers.
-
-    .. method:: c_libraries([c_compiler])
-
-       Returns a list of library names that your op needs to link to.
-       All ops are automatically linked with 'python' and the
-       libraries their types require. (arguments to -l)
-
-       The `c_compiler` [#2v]_ parameter is the C compiler that will
-       be used to compile the code for the node. You may get multiple
-       calls with different C compilers.
-
-    .. method:: c_lib_dirs([c_compiler])
-
-       Returns a list of directory to search for libraries (arguments
-       to -L).
-
-       The `c_compiler` [#2v]_ parameter is the C compiler that will
-       be used to compile the code for the node. You may get multiple
-       calls with different C compilers.
-
-    .. method:: c_compile_args([c_compiler])
-
-       Allows to specify additional arbitrary arguments to the C
-       compiler.  This is not usually required.
-
-       The `c_compiler` [#2v]_ parameter is the C compiler that will
-       be used to compile the code for the node. You may get multiple
-       calls with different C compilers.
-
-    .. method:: c_no_compile_args([c_compiler])
-
-       Returns a list of C compiler arguments that are forbidden when
-       compiling this Op.
-
-       The `c_compiler` [#2v]_ parameter is the C compiler that will
-       be used to compile the code for the node. You may get multiple
-       calls with different C compilers.
-
-    .. method:: c_init_code()
-
-       Allows you to specify code that will be executed once when the
-       module is initialized, before anything else is executed.  This
-       is for code that will be executed once per Op.
-
-    .. method:: c_init_code_apply(node, name)
-
-       Allows you to specify code that will be executed once when the
-       module is initialized, before anything else is executed and is
-       specialized for a particular `Apply` of an :ref:`Op`.
-
-    .. method:: c_init_code_struct(node, name, sub)
-
-       Allows you to specify code that will be inserted in the struct
-       constructor of the Op.  This is for code which should be
-       executed once per thunk (Apply node, more or less).
-
-       `sub` is a dictionary of extras parameters to the
-       c_code_init_code_struct method. It contains the following
-       values:
-
-       ``sub['fail']``
-
-          A string of code that you should execute (after ensuring
-          that a python exception is set) if your C code needs to
-          raise an exception.
-
-       ``sub['params']``
-
-          (optional) The name of the variable which holds the context
-          for the node.  This will only appear if the op has requested
-          a context by having a :meth:`get_params()` method that return
-          something other than None.
-
-    .. method:: c_support_code()
-
-       Allows you to specify helper functions/structs (in a string or a list of string) that the
-       :ref:`op` needs.  That code will be reused for each apply of
-       this op. It will be inserted at global scope.
-
-    .. method:: c_support_code_apply(node, name)
-
-       Allows you to specify helper functions/structs specialized for
-       a particular apply of an :ref:`op`. Use :meth:`c_support_code`
-       if the code is the same for each apply of an op.  It will be
-       inserted at global scope.
-
-    .. method:: c_support_code_struct(node, name)
-
-       Allows you to specify helper functions of variables that will
-       be specific to one particular thunk.  These are inserted at
-       struct scope.
-
-       :note:
-         You cannot specify CUDA kernels in the code returned by this
-         since that isn't supported by CUDA.  You should place your
-         kernels in :meth:`c_support_code()` or
-         :meth:`c_support_code_apply()` and call them from this code.
-
-    .. method:: c_cleanup_code_struct(node, name)
-
-       Allows you to specify code that will be inserted in the struct
-       destructor of the `Op`.  This is for cleaninp up allocations and
-       stuff like this when the thunk is released (when you "free" a
-       compiled function using this op).
-
-    .. method:: infer_shape(fgraph, node, (i0_shapes,i1_shapes,...))
-
-       Allow optimizations to lift the `Shape` `Op` over this `Op`.  An
-       example of why this is good is when we only need the shape of a
-       variable: we will be able to obtain it without computing the
-       variable itself.
-
-       Must return a list where each element is a tuple representing
-       the shape of one output.
-
-       For example, for the matrix-matrix product ``infer_shape`` will
-       have as inputs ``(fgraph, node, ((x0,x1), (y0,y1)))`` and should return
-       ``[(x0, y1)]``. Both the inputs and the return value may be Aesara
-       variables.
-
-    .. method:: c_code_cache_version()
-
-       Must return a tuple of hashable objects like integers. This
-       specifies the version of the code. It is used to cache the
-       compiled code. You MUST change the returned tuple for each
-       change in the code. If you don't want to cache the compiled
-       code return an empty tuple or don't implement it.
-
-    .. method:: c_code_cache_version_apply(node)
-
-       Overrides :meth:`c_code_cache_version` if defined, but
-       otherwise has the same contract.
-
-    .. method:: get_params(node)
-
-       (optional) If defined, should return the runtime params the op
-       needs.  These parameters will be passed to the C code through the
-       variable named in `sub['params']`.  The variable is also
-       available for use in the code returned by
-       :meth:`c_init_code_struct`.  If it returns `None` this is
-       considered the same as if the method was not defined.
-
-       If this method is defined and does not return `None`, then the
-       `Op` *must* have a `params_type` property with the `Type` to use
-       for the params variable.
-
-    .. attribute:: _f16_ok
-
-       (optional) If this attribute is absent or evaluates to `False`,
-       C code will be disabled for the op if any of its inputs or
-       outputs contains float16 data. This is added as a check to make
-       sure we don't compute wrong results since there is no hardware
-       float16 type so special care must be taken to make sure
-       operations are done correctly.
-
-       If you don't intend to deal with float16 data you can leave
-       this undefined.
-
-       This attribute is internal and may go away at any point during
-       development if a better solution is found.
-
-The ``name`` argument is currently given an invalid value, so steer
-away from it. As was the case with `Type`, ``sub['fail']`` provides
-failure code that you *must* use if you want to raise an exception,
-after setting the exception message.
-
-The ``node`` argument is an :ref:`apply` node representing an
-application of the current Op on a list of inputs, producing a list of
-outputs. ``input_names`` and ``output_names`` arguments contain as
-many strings as there are inputs and outputs to the application of the
-Op and they correspond to the ``name`` that is passed to the type of
-each Variable in these lists. For example, if ``node.inputs[0].type ==
-double``, then ``input_names[0]`` is the ``name`` argument passed to
-``double.c_declare`` etc. when the first input is processed by Aesara.
-
-In a nutshell, ``input_names`` and ``output_names`` parameterize the
-names of the inputs your operation needs to use and the outputs it
-needs to put variables into. But this will be clear with the examples.
-
-.. rubric:: Footnotes
-
-.. [#2v] There are actually two versions of this method one with a
-         `c_compiler` parameter and one without. The calling code will
-         try the version with c_compiler and try the version without
-         if it does not work.  Defining both versions is pointless
-         since the one without `c_compiler` will never get called.
-
-         Note that these methods are not specific to a single apply
-         node so they may get called more than once on the same object
-         with different values for c_compiler.
-
-
-Defining the methods
-====================
-
-We will be defining C code for the multiplication `COp` on doubles.
-
-**c_code**
-
-.. testsetup::
-
-   from aesara.graph.op import COp
-   mul = COp()
-
-.. testcode::
-
-   def c_code(node, name, input_names, output_names, sub):
-       x_name, y_name = input_names[0], input_names[1]
-       output_name = output_names[0]
-       return """
-       %(output_name)s = %(x_name)s * %(y_name)s;
-       """ % locals()
-   mul.c_code = c_code
-
-And that's it. As we enter the scope of the C code we are defining in
-the method above, many variables are defined for us. Namely, the
-variables x_name, y_name and output_name are all of the primitive C
-``double`` type and they were declared using the C code returned by
-``double.c_declare``.
-
-Implementing multiplication is as simple as multiplying the two input
-doubles and setting the output double to what comes out of it. If you
-had more than one output, you would just set the variable(s) for
-each output to what they should be.
-
-.. warning::
-   Do *NOT* use C's ``return`` statement to return the variable(s) of
-   the computations. Set the output variables directly as shown
-   above. Aesara will pick them up for you.
-
-
-**c_code_cleanup**
-
-There is nothing to cleanup after multiplying two doubles. Typically,
-you won't need to define this method unless you malloc() some
-temporary storage (which you would free() here) or create temporary
-Python objects (which you would Py_XDECREF() here).
-
-
-Final version
-=============
-
-As before, I tried to organize the code in order to minimize
-repetition. You can check that mul produces the same C code in this
-version that it produces in the code I gave above.
-
-.. testcode::
-
-   from aesara.graph.basic import Apply, Constant
-   from aesara.graph.op import COp
-
-
-   class BinaryDoubleOp(COp):
-
-       __props__ = ("name", "fn", "ccode")
-
-       def __init__(self, name, fn, ccode):
-           self.name = name
-           self.fn = fn
-           self.ccode = ccode
-
-       def make_node(self, x, y):
-           if isinstance(x, (int, float)):
-               x = Constant(double, x)
-           if isinstance(y, (int, float)):
-               y = Constant(double, y)
-           if x.type != double or y.type != double:
-               raise TypeError('%s only works on doubles' % self.name)
-           return Apply(self, [x, y], [double()])
-
-       def perform(self, node, inp, out):
-           x, y = inp
-           z, = out
-           z[0] = self.fn(x, y)
-
-       def __str__(self):
-           return self.name
-
-       def c_code(self, node, name, inp, out, sub):
-           x, y = inp
-           z, = out
-           return self.ccode % locals()
-
-
-   add = BinaryDoubleOp(name='add',
-                        fn=lambda x, y: x + y,
-                        ccode="%(z)s = %(x)s + %(y)s;")
-
-   sub = BinaryDoubleOp(name='sub',
-                        fn=lambda x, y: x - y,
-                        ccode="%(z)s = %(x)s - %(y)s;")
-
-   mul = BinaryDoubleOp(name='mul',
-                        fn=lambda x, y: x * y,
-                        ccode="%(z)s = %(x)s * %(y)s;")
-
-   div = BinaryDoubleOp(name='div',
-                        fn=lambda x, y: x / y,
-                        ccode="%(z)s = %(x)s / %(y)s;")
--- a/doc/extending/extending_aesara_c.rst
+++ b/doc/extending/extending_aesara_c.rst

-.. _extending_aesara_c:
+.. _creating_a_c_op:

-============================
-Extending Aesara with a C Op
-============================
+=====================================
+Extending Aesara with a C :Class:`Op`
+=====================================

-This tutorial covers how to extend Aesara with an op that offers a C
-implementation. It does not cover ops that run on a GPU but it does introduce
-many elements and concepts which are relevant for GPU ops. This tutorial is
+This tutorial covers how to extend Aesara with an :class:`Op` that offers a C
+implementation. It does not cover :class:`Op`\s that run on a GPU but it does introduce
+many elements and concepts which are relevant for GPU :class:`Op`\s. This tutorial is
 aimed at individuals who already know how to extend Aesara (see tutorial
-:ref:`extending_aesara`) by adding a new op with a Python implementation
-and will only cover the additional knowledge required to also produce ops
+:ref:`creating_an_op`) by adding a new :class:`Op` with a Python implementation
+and will only cover the additional knowledge required to also produce :class:`Op`\s
 with C implementations.

-Providing an Aesara op with a C implementation requires to interact with
+Providing an Aesara :class:`Op` with a C implementation requires to interact with
 Python's C-API and Numpy's C-API. Thus, the first step of this tutorial is to
 introduce both and highlight their features which are most relevant to the
-task of implementing a C op. This tutorial then introduces the most important
-methods that the op needs to implement in order to provide a usable C
+task of implementing a C :class:`Op`. This tutorial then introduces the most important
+methods that the :class:`Op` needs to implement in order to provide a usable C
 implementation. Finally, it shows how to combine these elements to write a
-simple C op for performing the simple task of multiplying every element in a
+simple C :class:`Op` for performing the simple task of multiplying every element in a
 vector by a scalar.

 Python C-API
@@ -46,11 +46,11 @@ an object is still being used by other entities. When the reference count
 for an object drops to 0, it means it is not used by anyone any longer and can
 be safely deleted.

-PyObjects implement reference counting and the Python C-API defines a number
+``PyObject``\s implement reference counting and the Python C-API defines a number
 of macros to help manage those reference counts. The definition of these
 macros can be found here : `Python C-API Reference Counting
 <https://docs.python.org/2/c-api/refcounting.html>`_. Listed below are the
-two macros most often used in Aesara C ops.
+two macros most often used in Aesara C :class:`Op`\s.


 .. method:: void Py_XINCREF(PyObject *o)
@@ -84,12 +84,12 @@ NumPy C-API
 ===========

 The NumPy library provides a C-API to allow users to create, access and
-manipulate NumPy arrays from within their own C routines. NumPy's ndarrays
-are used extensively inside Aesara and so extending Aesara with a C op will
+manipulate NumPy arrays from within their own C routines. NumPy's :class:`ndarray`\s
+are used extensively inside Aesara and so extending Aesara with a C :class:`Op` will
 require interaction with the NumPy C-API.

 This sections covers the API's elements that are often required to write code
-for an Aesara C op. The full documentation for the API can be found here :
+for an Aesara C :class:`Op`. The full documentation for the API can be found here :
 `NumPy C-API <http://docs.scipy.org/doc/numpy/reference/c-api.html>`_.


@@ -98,7 +98,7 @@ NumPy data types

 To allow portability between platforms, the NumPy C-API defines its own data
 types which should be used whenever you are manipulating a NumPy array's
-internal data. The data types most commonly used to implement C ops are the
+internal data. The data types most commonly used to implement C :class:`Op`\s are the
 following : ``npy_int{8,16,32,64}``, ``npy_uint{8,16,32,64}`` and
 ``npy_float{32,64}``.

@@ -114,8 +114,8 @@ The full list of defined data types can be found here :
 <http://docs.scipy.org/doc/numpy/reference/c-api.dtype.html#c-type-names>`_.


-NumPy ndarrays
--------------
+NumPy :class:`ndarray`\s
+------------------------

 In the NumPy C-API, NumPy arrays are represented as instances of the
 PyArrayObject class which is a descendant of the PyObject class. This means
@@ -154,10 +154,10 @@ This distance between consecutive elements of an array over a given dimension,
 is called the stride of that dimension.


-Accessing NumPy ndarrays' data and properties
---------------------------------------------
+Accessing NumPy :class`ndarray`\s' data and properties
+------------------------------------------------------

-The following macros serve to access various attributes of NumPy ndarrays.
+The following macros serve to access various attributes of NumPy :class:`ndarray`\s.

 .. method:: void* PyArray_DATA(PyArrayObject* arr)

@@ -214,19 +214,19 @@ The following macros serve to access various attributes of NumPy ndarrays.
    bitwise or to an ensemble of flags.

    The flags that can be used in with this macro are :
-    NPY_ARRAY_C_CONTIGUOUS, NPY_ARRAY_F_CONTIGUOUS, NPY_ARRAY_OWNDATA,
-    NPY_ARRAY_ALIGNED, NPY_ARRAY_WRITEABLE, NPY_ARRAY_UPDATEIFCOPY.
+    ``NPY_ARRAY_C_CONTIGUOUS``, ``NPY_ARRAY_F_CONTIGUOUS``, ``NPY_ARRAY_OWNDATA``,
+    ``NPY_ARRAY_ALIGNED``, ``NPY_ARRAY_WRITEABLE``, ``NPY_ARRAY_UPDATEIFCOPY``.


-Creating NumPy ndarrays
-----------------------
+Creating NumPy :class:`ndarray`\s
+---------------------------------

 The following functions allow the creation and copy of NumPy arrays :

 .. method:: PyObject* PyArray_EMPTY(int nd, npy_intp* dims, typenum dtype,
                                    int fortran)

-    Constructs a new ndarray with the number of dimensions specified by
+    Constructs a new :class:`ndarray` with the number of dimensions specified by
    ``nd``, shape specified by ``dims`` and data type specified by ``dtype``.
    If ``fortran`` is equal to 0, the data is organized in a C-contiguous
    layout, otherwise it is organized in a F-contiguous layout. The array
@@ -239,7 +239,7 @@ The following functions allow the creation and copy of NumPy arrays :
 .. method:: PyObject* PyArray_ZEROS(int nd, npy_intp* dims, typenum dtype,
                                    int fortran)

-    Constructs a new ndarray with the number of dimensions specified by
+    Constructs a new :class:`ndarray` with the number of dimensions specified by
    ``nd``, shape specified by ``dims`` and data type specified by ``dtype``.
    If ``fortran`` is equal to 0, the data is organized in a C-contiguous
    layout, otherwise it is organized in a F-contiguous layout. Every element
@@ -251,64 +251,64 @@ The following functions allow the creation and copy of NumPy arrays :

 .. method:: PyArrayObject* PyArray_GETCONTIGUOUS(PyObject* op)

-    Returns a C-contiguous and well-behaved copy of the array op. If op is
+    Returns a C-contiguous and well-behaved copy of the array :class:`Op`. If :class:`Op` is
    already C-contiguous and well-behaved, this function simply returns a
-    new reference to op.
+    new reference to :class:`Op`.



-Methods the C Op needs to define
-================================
+Methods the C :Class:`Op` needs to define
+=========================================

-There is a key difference between an op defining a Python implementation for
+There is a key difference between an :class:`Op` defining a Python implementation for
 its computation and defining a C implementation. In the case of a Python
-implementation, the op defines a function ``perform()`` which executes the
-required Python code to realize the op. In the case of a C implementation,
-however, the op does **not** define a function that will execute the C code; it
+implementation, the :class:`Op` defines a function ``perform()`` which executes the
+required Python code to realize the :class:`Op`. In the case of a C implementation,
+however, the :class:`Op` does **not** define a function that will execute the C code; it
 instead defines functions that will **return** the C code to the caller.

 This is because calling C code from Python code comes with a significant
-overhead. If every op was responsible for executing its own C code, every
+overhead. If every :class:`Op` was responsible for executing its own C code, every
 time an Aesara function was called, this overhead would occur as many times
-as the number of ops with C implementations in the function's computational
+as the number of :class:`Op`\s with C implementations in the function's computational
 graph.

-To maximize performance, Aesara instead requires the C ops to simply return
+To maximize performance, Aesara instead requires the C :class:`Op`\s to simply return
 the code needed for their execution and takes upon itself the task of
-organizing, linking and compiling the code from the various ops. Through this,
+organizing, linking and compiling the code from the various :class:`Op`\s. Through this,
 Aesara is able to minimize the number of times C code is called from Python
 code.

 The following is a very simple example to illustrate how it's possible to
 obtain performance gains with this process. Suppose you need to execute,
-from Python code, 10 different ops, each one having a C implementation. If
-each op was responsible for executing its own C code, the overhead of
+from Python code, 10 different :class:`Op`\s, each one having a C implementation. If
+each :class:`Op` was responsible for executing its own C code, the overhead of
 calling C code from Python code would occur 10 times. Consider now the case
-where the ops instead return the C code for their execution. You could get
-the C code from each op and then define your own C module that would call
-the C code from each op in succession. In this case, the overhead would only
+where the :class:`Op`\s instead return the C code for their execution. You could get
+the C code from each :class:`Op` and then define your own C module that would call
+the C code from each :class:`Op` in succession. In this case, the overhead would only
 occur once; when calling your custom module itself.

 Moreover, the fact that Aesara itself takes care of compiling the C code,
-instead of the individual ops, allows Aesara to easily cache the compiled C
+instead of the individual :class:`Op`\s, allows Aesara to easily cache the compiled C
 code. This allows for faster compilation times.

-See :ref:`cop` for the full documentation of the various methods of the
-class Op that are related to the C implementation. Of particular interest are:
+The following are some of the various methods of the class :class:`COp` that are
+related to the C implementation:

-*       The methods :meth:`CLinkerObject.c_libraries` and :meth:`CLinkerObject.c_lib_dirs` to allow
-        your op to use external libraries.
+* The methods :meth:`CLinkerObject.c_libraries` and :meth:`CLinkerObject.c_lib_dirs` to allow
+  your :class:`Op` to use external libraries.

-*       The method :meth:`CLinkerOp.c_code_cleanup` to specify how the op should
-        clean up what it has allocated during its execution.
+* The method :meth:`CLinkerOp.c_code_cleanup` to specify how the :class:`Op` should
+  clean up what it has allocated during its execution.

-*       The methods :meth:`COp.c_init_code` and :meth:`CLinkerOp.c_init_code_apply`
-        to specify code that should be executed once when the module is
-        initialized, before anything else is executed.
+* The methods :meth:`COp.c_init_code` and :meth:`CLinkerOp.c_init_code_apply`
+  to specify code that should be executed once when the module is
+  initialized, before anything else is executed.

-*       The methods :meth:`CLinkerObject.c_compile_args` and
-        :meth:`CLinkerObject.c_no_compile_args` to specify requirements regarding how
-        the `Op`'s C code should be compiled.
+* The methods :meth:`CLinkerObject.c_compile_args` and
+  :meth:`CLinkerObject.c_no_compile_args` to specify requirements regarding how
+  the :class:`Op`'s C code should be compiled.

 This section describes the methods :meth:`CLinkerOp.c_code`,
 :meth:`CLinkerObject.c_support_code`, :meth:`Op.c_support_code_apply` and
@@ -395,7 +395,7 @@ commonly used.
 .. method:: c_code_cache_version()

    Returns a tuple of integers representing the version of the C code in this
-    op. Ex : (1, 4, 0) for version 1.4.0
+    :class:`Op`. Ex : (1, 4, 0) for version 1.4.0

    This tuple is used by Aesara to cache the compiled C code for this `Op`. As
    such, the return value **MUST BE CHANGED** every time the C code is altered
@@ -410,8 +410,8 @@ commonly used.
        this function should return a tuple of integers as previously
        described.

-Important restrictions when implementing a COp
-==============================================
+Important restrictions when implementing a :class:`COp`
+=======================================================

 There are some important restrictions to remember when implementing an `COp`.
 Unless your `COp` correctly defines a ``view_map`` attribute, the ``perform`` and ``c_code`` must not
@@ -441,23 +441,23 @@ definitely not make a change that would have an impact on ``__eq__``,
 or ``c_code``.


-Simple COp example
-===================
+Simple :class:`COp` example
+===========================

 In this section, we put together the concepts that were covered in this
-tutorial to generate an op which multiplies every element in a vector
+tutorial to generate an :class:`Op` which multiplies every element in a vector
 by a scalar and returns the resulting vector. This is intended to be a simple
 example so the methods ``c_support_code`` and ``c_support_code_apply`` are
 not used because they are not required.

 In the C code below notice how the reference count on the output variable is
-managed. Also take note of how the new variables required for the op's
+managed. Also take note of how the new variables required for the :class:`Op`'s
 computation are declared in a new scope to avoid cross-initialization errors.

 Also, in the C code, it is very important to properly validate the inputs
 and outputs storage. Aesara guarantees that the inputs exist and have the
 right number of dimensions but it does not guarantee their exact shape. For
-instance, if an op computes the sum of two vectors, it needs to validate that
+instance, if an :class:`Op` computes the sum of two vectors, it needs to validate that
 its two inputs have the same shape. In our case, we do not need to validate
 the exact shapes of the inputs because we don't have a need that they match
 in any way.
@@ -563,37 +563,37 @@ The ``c_code`` method accepts variable names as arguments (``name``, ``inp``,
 output. In case of error, the ``%(fail)s`` statement cleans up and returns
 properly.

-More complex C Op example
-=========================
+More complex C :Class:`Op` example
+==================================

 This section introduces a new example, slightly more complex than the previous
-one, with an op to perform an element-wise multiplication between the elements
+one, with an :class:`Op` to perform an element-wise multiplication between the elements
 of two vectors. This new example differs from the previous one in its use
 of the methods ``c_support_code`` and ``c_support_code_apply`` (it does
 not `need` to use them but it does so to explain their use) and its capacity
 to support inputs of different dtypes.

 Recall the method ``c_support_code`` is meant to produce code that will
-be used for every apply of the op. This means that the C code in this
-method must be valid in every setting your op supports. If the op is meant
+be used for every apply of the :class:`Op`. This means that the C code in this
+method must be valid in every setting your :class:`Op` supports. If the :class:`Op` is meant
 to supports inputs of various dtypes, the C code in this method should be
-generic enough to work with every supported dtype. If the op operates on
+generic enough to work with every supported dtype. If the :class:`Op` operates on
 inputs that can be vectors or matrices, the C code in this method should
 be able to accommodate both kinds of inputs.

 In our example, the method ``c_support_code`` is used to declare a C
 function to validate that two vectors have the same shape. Because our
-op only supports vectors as inputs, this function is allowed to rely
-on its inputs being vectors. However, our op should support multiple
+:class:`Op` only supports vectors as inputs, this function is allowed to rely
+on its inputs being vectors. However, our :class:`Op` should support multiple
 dtypes so this function cannot rely on a specific dtype in its inputs.

 The method ``c_support_code_apply``, on the other hand, is allowed
-to depend on the inputs to the op because it is apply-specific. Therefore, we
+to depend on the inputs to the :class:`Op` because it is apply-specific. Therefore, we
 use it to define a function to perform the multiplication between two vectors.
 Variables or functions defined in the method ``c_support_code_apply`` will
-be included at the global scale for every apply of the Op. Because of this,
-the names of those variables and functions should include the name of the op,
-like in the example. Otherwise, using the op twice in the same graph will give
+be included at the global scale for every apply of the :Class:`Op`. Because of this,
+the names of those variables and functions should include the name of the :class:`Op`,
+like in the example. Otherwise, using the :class:`Op` twice in the same graph will give
 rise to conflicts as some elements will be declared more than once.

 The last interesting difference occurs in the ``c_code()`` method. Because the
@@ -712,20 +712,20 @@ C code.
            return c_code % locals()


-Alternate way of defining C Ops
-===============================
+Alternate way of defining C :class:`Op`\s
+=========================================

-The two previous examples have covered the standard way of implementing C Ops
+The two previous examples have covered the standard way of implementing C :class:`Op`\s
 in Aesara by inheriting from the class :class:`Op`. This process is mostly
 simple but it still involves defining many methods as well as mixing, in the
 same file, both Python and C code which tends to make the result less
 readable.

-To help with this, Aesara defines a class, ``ExternalCOp``, from which new C ops
+To help with this, Aesara defines a class, ``ExternalCOp``, from which new C :class:`Op`\s
 can inherit. The class ``ExternalCOp`` aims to simplify the process of implementing
-C ops by doing the following :
+C :class:`Op`\s by doing the following :

-*      It allows you to define the C implementation of your op in a distinct
+*      It allows you to define the C implementation of your :class:`Op` in a distinct
       C code file. This makes it easier to keep your Python and C code
       readable and well indented.

@@ -734,13 +734,13 @@ C ops by doing the following :
       provided external C implementation.

 To illustrate how much simpler the class ``ExternalCOp`` makes the process of defining
-a new op with a C implementation, let's revisit the second example of this
-tutorial, the ``VectorTimesVector`` op. In that example, we implemented an op
+a new :class:`Op` with a C implementation, let's revisit the second example of this
+tutorial, the ``VectorTimesVector`` :class:`Op`. In that example, we implemented an :class:`Op`
 to perform the task of element-wise vector-vector multiplication. The two
-following blocks of code illustrate what the op would look like if it was
+following blocks of code illustrate what the :class:`Op` would look like if it was
 implemented using the ``ExternalCOp`` class.

-The new op is defined inside a Python file with the following code :
+The new :class:`Op` is defined inside a Python file with the following code :

 .. testcode::

@@ -770,8 +770,8 @@ The new op is defined inside a Python file with the following code :

            return Apply(self, [x, y], [output_var])

-And the following is the C implementation of the op, defined in an external
-C file named vectorTimesVector.c :
+And the following is the C implementation of the :class:`Op`, defined in an external
+C file named ``vectorTimesVector.c``:

 .. code-block:: c

@@ -935,7 +935,7 @@ defined to False. In these descrptions 'i' refers to the position
 *       ``DTYPE_INPUT_{i}`` : NumPy dtype of the data in the array.
        This is the variable type corresponding to the NumPy dtype, not the
        string representation of the NumPy dtype. For instance, if the `Op`'s
-        first input is a float32 ndarray, then the macro ``DTYPE_INPUT_0``
+        first input is a float32 :class:`ndarray`, then the macro ``DTYPE_INPUT_0``
        corresponds to ``npy_float32`` and can directly be used to declare a
        new variable of the same dtype as the data in the array :

@@ -1028,8 +1028,8 @@ macros and also because it calls ``vector_elemwise_mult`` which is
 an apply-specific function.


-Using GDB to debug COp's C code
-===============================
+Using GDB to debug :class:`COp`'s C code
+========================================

 When debugging C code, it can be useful to use GDB for code compiled
 by Aesara.

--- a/doc/extending/jax_op.rst
+++ b/doc/extending/jax_op.rst
@@ -14,7 +14,7 @@ Find the source for the Aesara :class:`Op` you’d like to be supported in JAX,
 identify the function signature and return values.  These can be determined by
 looking at the :meth:`Op.make_node` implementation.  In general, one needs to be familiar
 with Aesara :class:`Op`\s in order to provide a conversion implementation, so first read
-:ref:`extending_aesara` if you are not familiar.
+:ref:`creating_an_op` if you are not familiar.

 For example, the :class:`Eye`\ :class:`Op` current has an :meth:`Op.make_node` as follows:


--- a/doc/extending/extending_aesara.rst
+++ b/doc/extending/extending_aesara.rst

-.. _extending_aesara:
+.. _creating_an_op:

 Creating a new :class:`Op`: Python implementation
 =================================================
@@ -13,17 +13,10 @@ has no bugs, and potentially profits from optimizations that have already been
 implemented.

 However, if you cannot implement an :class:`Op` in terms of an existing :class:`Op`, you have to
-write a new one. Don't worry, Aesara was designed to make it easy to add a new
-:class:`Op`, :class:`Type`, and :class:`Optimization`.
+write a new one.

-.. These first few pages will walk you through the definition of a new :ref:`type`,
-.. ``double``, and a basic arithmetic :ref:`operations <op>` on that :class:`Type`.
-
-As an illustration, this tutorial shows how to write a simple Python-based
-:ref:`operations <op>` which performs operations on
-:ref:`type`, ``double<Double>``.
-.. It also shows how to implement tests that
-.. ensure the proper working of an :class:`Op`.
+As an illustration, this tutorial will demonstrate how a simple Python-based
+:class:`Op` that performs operations on ``np.float64``\s is written.

 .. note::

@@ -31,15 +24,15 @@ As an illustration, this tutorial shows how to write a simple Python-based
    an :class:`Op` that returns a view or modifies the values in its inputs. Thus, all
    :class:`Op`\s created with the instructions described here MUST return newly
    allocated memory or reuse the memory provided in the parameter
-    ``output_storage`` of the :func:`perform` function. See
+    ``output_storage`` of the :meth:`Op.perform` method. See
    :ref:`views_and_inplace` for an explanation on how to do this.

    If your :class:`Op` returns a view or changes the value of its inputs
    without doing as prescribed in that page, Aesara will run, but will
    return correct results for some graphs and wrong results for others.

-    It is recommended that you run your tests in DebugMode (Aesara *flag*
-    ``mode=DebugMode``) since it verifies if your :class:`Op` behaves correctly in this
+    It is recommended that you run your tests in :class:`DebugMode`, since it
+    can help verify whether or not your :class:`Op` behaves correctly in this
    regard.


@@ -50,7 +43,7 @@ Aesara Graphs refresher
    :width: 500 px

 Aesara represents symbolic mathematical computations as graphs. Those graphs
-are bi-partite graphs (graphs with 2 types of nodes), they are composed of
+are bi-partite graphs (graphs with two types of nodes), they are composed of
 interconnected :ref:`apply` and :ref:`variable` nodes.
 :class:`Variable` nodes represent data in the graph, either inputs, outputs or
 intermediary values. As such, inputs and outputs of a graph are lists of Aesara
@@ -168,7 +161,7 @@ or :func:`make_thunk`.

  An :class:`Op`\s implementation can be defined in other ways, as well.
  For instance, it is possible to define a C-implementation via :meth:`COp.c_code`.
-  Please refers to tutorial :ref:`extending_aesara_c` for a description of
+  Please refers to tutorial :ref:`creating_a_c_op` for a description of
  :meth:`COp.c_code` and other related ``c_**`` methods. Note that an
  :class:`Op` can provide both Python and C implementations.


--- a/doc/extending/graph_rewriting.rst
+++ b/doc/extending/graph_rewriting.rst
@@ -28,7 +28,7 @@ local optimization, on the other hand, is defined as a function on a
 nothing is to be done) or a list of new :class:`Variable`\s that we would like to
 replace the node's outputs with. A :ref:`navigator` is a special kind
 of global optimization which navigates the computation graph in some
-fashion (in topological order, reverse-topological order, random
+fashion (e.g. in topological order, reverse-topological order, random
 order, etc.) and applies one or more local optimizations at each step.

 Optimizations which are holistic, meaning that they must take into
@@ -42,32 +42,24 @@ we want to define are local.
 Global optimization
 -------------------

-A global optimization (or optimizer) is an object which defines the following
-methods:
-
 .. class:: GlobalOptimizer

    .. method:: apply(fgraph)

-      This method takes a ``FunctionGraph`` object which contains the computation graph
+      This method takes a :class:`FunctionGraph` object which contains the computation graph
      and does modifications in line with what the optimization is meant
      to do. This is one of the main methods of the optimizer.

    .. method:: add_requirements(fgraph)

-      This method takes a ``FunctionGraph`` object and adds :ref:`features
+      This method takes a :class:`FunctionGraph` object and adds :ref:`features
      <libdoc_graph_fgraphfeature>` to it. These features are "plugins" that are needed
-      for the ``apply`` method to do its job properly.
+      for the :meth:`GlobalOptimizer.apply` method to do its job properly.

    .. method:: optimize(fgraph)

-      This is the interface function called by Aesara.
-
-      *Default:* this is defined by ``GlobalOptimizer`` as ``add_requirement(fgraph);
-      apply(fgraph)``.
-
-See the section about :class:`FunctionGraph` to understand how to define these
-methods.
+      This is the interface function called by Aesara.  It calls
+      :meth:`GlobalOptimizer.apply` by default.


 Local optimization
@@ -87,9 +79,8 @@ A local optimization is an object which defines the following methods:
      the list returned.


-
-One simplification rule
-=======================
+A simplification rule
+=====================

 For starters, let's define the following simplification:

@@ -116,6 +107,7 @@ simplification described above:
   class Simplify(GlobalOptimizer):
       def add_requirements(self, fgraph):
           fgraph.attach_feature(ReplaceValidate())
+
       def apply(self, fgraph):
           for node in fgraph.toposort():
               if node.op == true_div:
@@ -130,25 +122,21 @@ simplification described above:

   simplify = Simplify()

-.. todo::
-
-   What is add_requirements? Why would we know to do this? Are there other
-   requirements we might want to  know about?

 Here's how it works: first, in :meth:`add_requirements`, we add the
 :class:`ReplaceValidate` :class:`Feature` located in
 :ref:`libdoc_graph_features`. This feature adds the :meth:`replace_validate`
-method to ``fgraph``, which is an enhanced version of :meth:`replace` that
+method to ``fgraph``, which is an enhanced version of :meth:`FunctionGraph.replace` that
 does additional checks to ensure that we are not messing up the
-computation graph (note: if :class:`ReplaceValidate` was already added by
-another optimizer, ``extend`` will do nothing). In a nutshell,
-:class:`ReplaceValidate` grants access to :meth:`fgraph.replace_validate`,
+computation graph.
+
+In a nutshell, :class:`ReplaceValidate` grants access to :meth:`fgraph.replace_validate`,
 and :meth:`fgraph.replace_validate` allows us to replace a :class:`Variable` with
 another while respecting certain validation constraints. As an
-exercise, try to rewrite ``Simplify`` using :class:`NodeFinder`. (Hint: you
-want to use the method it publishes instead of the call to toposort!)
+exercise, try to rewrite :class:`Simplify` using :class:`NodeFinder`. (Hint: you
+want to use the method it publishes instead of the call to toposort)

-Then, in :meth:`apply` we do the actual job of simplification. We start by
+Then, in :meth:`GlobalOptimizer.apply` we do the actual job of simplification. We start by
 iterating through the graph in topological order. For each node
 encountered, we check if it's a ``div`` node. If not, we have nothing
 to do here. If so, we put in ``x``, ``y`` and ``z`` the numerator,
@@ -158,28 +146,24 @@ so we check for that. If the numerator is a multiplication we put the
 two operands in ``a`` and ``b``, so
 we can now say that ``z == (a*b)/y``. If ``y==a`` then ``z==b`` and if
 ``y==b`` then ``z==a``. When either case happens then we can replace
-``z`` by either ``a`` or ``b`` using :meth:`fgraph.replace_validate` - else we do
-nothing. You might want to check the documentation about :ref:`variable`
-and :ref:`apply` to get a better understanding of the
-pointer-following game you need to get ahold of the nodes of interest
-for the simplification (``x``, ``y``, ``z``, ``a``, ``b``, etc.).
+``z`` by either ``a`` or ``b`` using :meth:`FunctionGraph.replace_validate`; otherwise, we do
+nothing.

-
-Test time:
+Now, we test the optimization:

 >>> from aesara.scalar import float64, add, mul, true_div
 >>> x = float64('x')
 >>> y = float64('y')
 >>> z = float64('z')
 >>> a = add(z, mul(true_div(mul(y, x), y), true_div(z, x)))
->>> e = graph.fg.FunctionGraph([x, y, z], [a])
+>>> e = aesara.graph.fg.FunctionGraph([x, y, z], [a])
 >>> e
-[add(z, mul(true_div(mul(y, x), y), true_div(z, x)))]
+FunctionGraph(add(z, mul(true_div(mul(y, x), y), true_div(z, x))))
 >>> simplify.optimize(e)
 >>> e
-[add(z, mul(x, true_div(z, x)))]
+FunctionGraph(add(z, mul(x, true_div(z, x))))

-Cool! It seems to work. You can check what happens if you put many
+You can check what happens if you put many
 instances of :math:`\frac{xy}{y}` in the graph. Note that it sometimes
 won't work for reasons that have nothing to do with the quality of the
 optimization you wrote. For example, consider the following:
@@ -188,12 +172,12 @@ optimization you wrote. For example, consider the following:
 >>> y = float64('y')
 >>> z = float64('z')
 >>> a = true_div(mul(add(y, z), x), add(y, z))
->>> e = graph.fg.FunctionGraph([x, y, z], [a])
+>>> e = aesara.graph.fg.FunctionGraph([x, y, z], [a])
 >>> e
-[true_div(mul(add(y, z), x), add(y, z))]
+FunctionGraph(true_div(mul(add(y, z), x), add(y, z)))
 >>> simplify.optimize(e)
 >>> e
-[true_div(mul(add(y, z), x), add(y, z))]
+FunctionGraph(true_div(mul(add(y, z), x), add(y, z)))

 Nothing happened here. The reason is: ``add(y, z) != add(y,
 z)``. That is the case for efficiency reasons. To fix this problem we
@@ -205,10 +189,10 @@ computation, using the :class:`MergeOptimizer` defined in
 >>> MergeOptimizer().optimize(e)  # doctest: +ELLIPSIS
 (0, ..., None, None, {}, 1, 0)
 >>> e
-[true_div(mul(*1 -> add(y, z), x), *1)]
+FunctionGraph(true_div(mul(*1 -> add(y, z), x), *1))
 >>> simplify.optimize(e)
 >>> e
-[x]
+FunctionGraph(x)

 Once the merge is done, both occurrences of ``add(y, z)`` are
 collapsed into a single one and is used as an input in two
@@ -222,11 +206,8 @@ for this somewhere in the future.
 .. note::

   :class:`FunctionGraph` is an Aesara structure intended for the optimization
-   phase. It is used internally by function and is rarely
-   exposed to the end user. You can use it to test out optimizations,
-   etc. if you are comfortable with it, but it is recommended to use
-   the function front-end and interface optimizations with
-   :class:`optdb` (we'll see how to do that soon).
+   phase. It is used internally by :func:`aesara.function` and is rarely
+   exposed to the end user.


 Local Optimization
@@ -237,7 +218,10 @@ The local version of the above code would be the following:

 .. testcode::

-   class LocalSimplify(graph.opt.LocalOptimizer):
+   from aesara.graph.opt import LocalOptimizer
+
+
+   class LocalSimplify(LocalOptimizer):
       def transform(self, fgraph, node):
           if node.op == true_div:
               x, y = node.inputs
@@ -248,60 +232,59 @@ The local version of the above code would be the following:
                   elif y == b:
                       return [a]
           return False
+
       def tracks(self):
-           # This should be needed for the EquilibriumOptimizer
-           # but it isn't now
-           # TODO: do this and explain it
-           return [] # that's not what you should do
+           # This tells certain navigators to only apply this `LocalOptimizer`
+           # on these kinds of `Op`s
+           return [true_div]

   local_simplify = LocalSimplify()

-.. todo::

-    Fix up previous example.
+In this case, the transformation is defined in the
+:meth:`LocalOptimizer.transform` method, which is given an explicit
+:class:`Apply` node on which to work.  The entire graph--as a ``fgraph``--is
+also provided, in case global information is needed.

-
-The definition of the transform is the inner loop of the global optimizer,
-where the node is given as an argument. If no changes are to be made,
-``False`` must be returned; otherwise, a list of replacements for the node's
-outputs must be returned. This list must have the same length as
+If no changes are to be made, ``False`` must be returned; otherwise, a list of replacements for the node's
+outputs are returned. This list must have the same length as
 :attr:`node.outputs`. If one of :attr:`node.outputs` doesn't have clients
-(i.e. it is not used in the graph), you can put ``None`` in the returned
-list to remove it.
+(e.g. available via ``fgraph.clients``), then it is not used elsewhere in the graph and
+you can put ``None`` in the returned list to remove it.

-In order to apply the local optimizer we must use it in conjunction
+In order to apply the local optimizer we can use it in conjunction
 with a :class:`NavigatorOptimizer`. Basically, a :class:`NavigatorOptimizer` is
 a global optimizer that loops through all nodes in the graph (or a well-defined
-subset of them) and applies one or several local optimizers on them.
+subset of them) and applies one or several local optimizers.

 >>> x = float64('x')
 >>> y = float64('y')
 >>> z = float64('z')
 >>> a = add(z, mul(true_div(mul(y, x), y), true_div(z, x)))
->>> e = graph.fg.FunctionGraph([x, y, z], [a])
+>>> e = aesara.graph.fg.FunctionGraph([x, y, z], [a])
 >>> e
-[add(z, mul(true_div(mul(y, x), y), true_div(z, x)))]
->>> simplify = graph.opt.TopoOptimizer(local_simplify)
+FunctionGraph(add(z, mul(true_div(mul(y, x), y), true_div(z, x))))
+>>> simplify = aesara.graph.opt.TopoOptimizer(local_simplify)
 >>> simplify.optimize(e)
 (<aesara.graph.opt.TopoOptimizer object at 0x...>, 1, 5, 3, ..., ..., ...)
 >>> e
-[add(z, mul(x, true_div(z, x)))]
+FunctionGraph(add(z, mul(x, true_div(z, x))))

 :class:`OpSub`, :class:`OpRemove`, :class:`PatternSub`
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++

-Aesara defines some shortcuts to make :class:`LocalOptimizers`:
+Aesara defines some shortcuts to make :class:`LocalOptimizer`\s:

 .. function:: OpSub(op1, op2)

-  Replaces all uses of `op1` by `op2`. In other
-  words, the outputs of all :ref:`apply` involving `op1` by the outputs
-  of :class:`Apply` nodes involving `op2`, where their inputs are the same.
+  Replaces all uses of ``op1`` by ``op2``. In other
+  words, the outputs of all :class:`Apply` nodes using ``op1`` by the outputs
+  of :class:`Apply` nodes involving ``op2``, where their inputs are the same.

 .. function:: OpRemove(op)

-  Removes all uses of `op` in the following way:
-  if ``y = op(x)`` then ``y`` is replaced by ``x``. `op` must have as many
+  Removes all uses of ``op`` in the following way:
+  if ``y = op(x)`` then ``y`` is replaced by ``x``. ``op`` must have as many
  outputs as it has inputs. The first output becomes the first input,
  the second output becomes the second input, and so on.

@@ -310,47 +293,35 @@ Aesara defines some shortcuts to make :class:`LocalOptimizers`:
  Replaces all occurrences of the first pattern by the second pattern.
  See :class:`PatternSub`.

-.. testsetup::
+.. code::

   from aesara.scalar import identity
-
-.. testcode::
-
   from aesara.graph.opt import OpSub, OpRemove, PatternSub

-   # Replacing add by mul (this is not recommended for primarily
+   # Replacing `add` by `mul` (this is not recommended for primarily
   # mathematical reasons):
   add_to_mul = OpSub(add, mul)

-   # Removing identity
+   # Removing `identity`
   remove_identity = OpRemove(identity)

   # The "simplify" operation we've been defining in the past few
   # sections. Note that we need two patterns to account for the
-   # permutations of the arguments to mul.
-   local_simplify_1 = PatternSub((true_div, (mul, 'x', 'y'), 'y'),
-                                 'x')
-   local_simplify_2 = PatternSub((true_div, (mul, 'x', 'y'), 'x'),
-                                 'y')
+   # permutations of the arguments to `mul`.
+   local_simplify_1 = PatternSub((true_div, (mul, 'x', 'y'), 'y'), 'x')
+   local_simplify_2 = PatternSub((true_div, (mul, 'x', 'y'), 'x'), 'y')

 .. note::

   :class:`OpSub`, :class:`OpRemove` and :class:`PatternSub` produce local optimizers, which
   means that everything we said previously about local optimizers
-   apply: they need to be wrapped in a :class:`NavigatorOptimizer`, etc.
+   apply (e.g. they need to be wrapped in a :class:`NavigatorOptimizer`, etc.)

-.. todo::
-
-   Explain what a :class:`NavigatorOptimizer`?

 When an optimization can be naturally expressed using :class:`OpSub`, :class:`OpRemove`
-or :class:``PatternSub``, it is highly recommended to use them.
+or :class:`PatternSub`, it is highly recommended to use them.

-.. todo::

-   More about using :class:`PatternSub` (syntax for the patterns, how to use
-   constraints, etc. - there's some decent doc at :class:`PatternSub` for those
-   interested)


 .. _optdb:

--- a/doc/extending/graphstructures.rst
+++ b/doc/extending/graphstructures.rst
@@ -5,28 +5,24 @@
 Graph Structures
 ================

-Debugging or profiling code written in Aesara is not that simple if you
-do not know what goes on under the hood. This chapter is meant to
-introduce you to a required minimum of the inner workings of Aesara.
-
-The first step in writing Aesara code is to write down all mathematical
-relations using symbolic placeholders (**variables**). When writing down
-these expressions you use operations like ``+``, ``-``, ``**``,
-``sum()``, ``tanh()``. All these are represented internally as **ops**.
-An *op* represents a certain computation on some type of inputs
-producing some type of output. You can see it as a *function definition*
-in most programming languages.
-
-Aesara represents symbolic mathematical computations as graphs. These
-graphs are composed of interconnected :ref:`apply`, :ref:`variable` and
-:ref:`op` nodes. *Apply* node represents the application of an *op* to some
-*variables*. It is important to draw the difference between the
-definition of a computation represented by an *op* and its application
-to some actual data which is represented by the *apply* node.
-Furthermore, data types are represented by :ref:`type` instances. Here is a
-piece of code and a diagram showing the structure built by that piece of code.
-This should help you understand how these pieces fit together:
-
+Aesara works by modeling mathematical operations and their outputs using
+symbolic placeholders, or :term:`variables <variable>`, which inherit from the class
+:class:`Variable`.  When writing expressions in Aesara one uses operations like
+``+``, ``-``, ``**``, ``sum()``, ``tanh()``. These are represented
+internally as :term:`Op`\s.  An :class:`Op` represents a computation that is
+performed on a set of symbolic inputs and produces a set of symbolic outputs.
+These symbolic input and output :class:`Variable`\s carry information about
+their types, like their data type (e.g. float, int), the number of dimensions,
+etc.
+
+Aesara graphs are composed of interconnected :term:`Apply`, :term:`Variable` and
+:class:`Op` nodes. An :class:`Apply` node represents the application of an
+:class:`Op` to specific :class:`Variable`\s. It is important to draw the
+difference between the definition of a computation represented by an :class:`Op`
+and its application to specific inputs, which is represented by the
+:class:`Apply` node.
+
+The following illustrates these elements:

 **Code**

@@ -46,31 +42,25 @@ This should help you understand how these pieces fit together:
    :align: center


-Arrows represent references to the Python objects pointed at. The blue
-box is an :ref:`Apply` node. Red boxes are :ref:`Variable` nodes. Green
-circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
+The blue box is an :class:`Apply` node. Red boxes are :class:`Variable`\s. Green
+circles are :class:`Op`\s. Purple boxes are :class:`Type`\s.

 .. TODO
    Clarify the 'acyclic' graph and the 'back' pointers or references that
    'don't count'.

-When we create :ref:`Variables <variable>` and then :ref:`apply`
-:ref:`Ops <op>` to them to make more Variables, we build a
-bi-partite, directed, acyclic graph. Variables point to the Apply nodes
+When we create :class:`Variable`\s and then :class:`Apply`
+:class:`Op`\s to them to make more :class:`Variable`\s, we build a
+bi-partite, directed, acyclic graph. :class:`Variable`\s point to the :class:`Apply` nodes
 representing the function application producing them via their
-``owner`` field. These Apply nodes point in turn to their input and
-output Variables via their ``inputs`` and ``outputs`` fields.
-(Apply instances also contain a list of references to their ``outputs``, but
-those pointers don't count in this graph.)
+:attr:`Variable.owner` field. These :class:`Apply` nodes point in turn to their input and
+output :class:`Variable`\s via their :attr:`Apply.inputs` and :attr:`Apply.outputs` fields.

-The ``owner`` field of both ``x`` and ``y`` point to ``None`` because
+The :attr:`Variable.owner` field of both ``x`` and ``y`` point to ``None`` because
 they are not the result of another computation. If one of them was the
-result of another computation, it's ``owner`` field would point to another
+result of another computation, its :attr:`Variable.owner` field would point to another
 blue box like ``z`` does, and so on.

-Note that the ``Apply`` instance's outputs points to
-``z``, and ``z.owner`` points back to the ``Apply`` instance.
-

 Traversing the graph
 ====================
@@ -84,14 +74,14 @@ Take for example the following code:
 >>> y = x * 2.

 If you enter ``type(y.owner)`` you get ``<class 'aesara.graph.basic.Apply'>``,
-which is the apply node that connects the op and the inputs to get this
-output. You can now print the name of the op that is applied to get
-*y*:
+which is the :class:`Apply` node that connects the :class:`Op` and the inputs to get this
+output. You can now print the name of the :class:`Op` that is applied to get
+``y``:

 >>> y.owner.op.name
 'Elemwise{mul,no_inplace}'

-Hence, an element-wise multiplication is used to compute *y*. This
+Hence, an element-wise multiplication is used to compute ``y``. This
 multiplication is done between the inputs:

 >>> len(y.owner.inputs)
@@ -101,9 +91,9 @@ x
 >>> y.owner.inputs[1]
 InplaceDimShuffle{x,x}.0

-Note that the second input is not 2 as we would have expected. This is
-because 2 was first :term:`broadcasted <broadcasting>` to a matrix of
-same shape as *x*. This is done by using the op ``DimShuffle`` :
+Note that the second input is not ``2`` as we would have expected. This is
+because ``2`` was first :term:`broadcasted <broadcasting>` to a matrix of
+same shape as ``x``. This is done by using the :class:`Op`\ :class:`DimShuffle`:

 >>> type(y.owner.inputs[1])
 <class 'aesara.tensor.var.TensorVariable'>
@@ -114,19 +104,25 @@ same shape as *x*. This is done by using the op ``DimShuffle`` :
 >>> y.owner.inputs[1].owner.inputs
 [TensorConstant{2.0}]

+All of the above can be succinctly summarized with the :func:`aesara.dprint`
+function:
+
+>>> aesara.dprint(y)
+Elemwise{mul,no_inplace} [id A] ''
+ |x [id B]
+ |InplaceDimShuffle{x,x} [id C] ''
+   |TensorConstant{2.0} [id D]

 Starting from this graph structure it is easier to understand how
 *automatic differentiation* proceeds and how the symbolic relations
-can be *optimized* for performance or stability.
+can be rewritten for performance or stability.


 Graph Structures
 ================

 The following section outlines each type of structure that may be used
-in an Aesara-built computation graph. The following structures are
-explained: :ref:`apply`, :ref:`constant`, :ref:`op`, :ref:`variable` and
-:ref:`type`.
+in an Aesara-built computation graph.


 .. index::
@@ -135,42 +131,42 @@ explained: :ref:`apply`, :ref:`constant`, :ref:`op`, :ref:`variable` and

 .. _apply:

-Apply
-----
+:class:`Apply`
+--------------

-An *Apply node* is a type of internal node used to represent a
+An :class:`Apply` node is a type of internal node used to represent a
 :term:`computation graph <graph>` in Aesara. Unlike
-:ref:`Variable nodes <variable>`, Apply nodes are usually not
+:class:`Variable`, :class:`Apply` nodes are usually not
 manipulated directly by the end user. They may be accessed via
-a Variable's ``owner`` field.
+the :attr:`Variable.owner` field.

-An Apply node is typically an instance of the :class:`Apply`
+An :class:`Apply` node is typically an instance of the :class:`Apply`
 class. It represents the application
-of an :ref:`op` on one or more inputs, where each input is a
-:ref:`variable`. By convention, each Op is responsible for
-knowing how to build an Apply node from a list of
-inputs. Therefore, an Apply node may be obtained from an Op
+of an :class:`Op` on one or more inputs, where each input is a
+:class:`Variable`. By convention, each :class:`Op` is responsible for
+knowing how to build an :class:`Apply` node from a list of
+inputs. Therefore, an :class:`Apply` node may be obtained from an :class:`Op`
 and a list of inputs by calling ``Op.make_node(*inputs)``.

-Comparing with the Python language, an :ref:`apply` node is
-Aesara's version of a function call whereas an :ref:`op` is
+Comparing with the Python language, an :class:`Apply` node is
+Aesara's version of a function call whereas an :class:`Op` is
 Aesara's version of a function definition.

-An Apply instance has three important fields:
+An :class:`Apply` instance has three important fields:

 **op**
-  An :ref:`op` that determines the function/transformation being
+  An :class:`Op` that determines the function/transformation being
  applied here.

 **inputs**
-  A list of :ref:`Variables <variable>` that represent the arguments of
+  A list of :class:`Variable`\s that represent the arguments of
  the function.

 **outputs**
-  A list of :ref:`Variables <variable>` that represent the return values
+  A list of :class:`Variable`\s that represent the return values
  of the function.

-An Apply instance can be created by calling ``graph.basic.Apply(op, inputs, outputs)``.
+An :class:`Apply` instance can be created by calling ``graph.basic.Apply(op, inputs, outputs)``.



@@ -181,21 +177,21 @@ An Apply instance can be created by calling ``graph.basic.Apply(op, inputs, outp
 .. _op:


-Op
--
+:class:`Op`
+-----------

-An :ref:`op` in Aesara defines a certain computation on some types of
+An :class:`Op` in Aesara defines a certain computation on some types of
 inputs, producing some types of outputs. It is equivalent to a
 function definition in most programming languages. From a list of
-input :ref:`Variables <variable>` and an Op, you can build an :ref:`apply`
-node representing the application of the Op to the inputs.
+input :ref:`Variables <variable>` and an :class:`Op`, you can build an :ref:`apply`
+node representing the application of the :class:`Op` to the inputs.

-It is important to understand the distinction between an Op (the
-definition of a function) and an Apply node (the application of a
+It is important to understand the distinction between an :class:`Op` (the
+definition of a function) and an :class:`Apply` node (the application of a
 function). If you were to interpret the Python language using Aesara's
-structures, code going like ``def f(x): ...`` would produce an Op for
+structures, code going like ``def f(x): ...`` would produce an :class:`Op` for
 ``f`` whereas code like ``a = f(x)`` or ``g(f(4), 5)`` would produce an
-Apply node involving the ``f`` Op.
+:class:`Apply` node involving the ``f`` :class:`Op`.


 .. index::
@@ -204,18 +200,17 @@ Apply node involving the ``f`` Op.

 .. _type:

+:class:`Type`
+-------------

-Type
----
-
-A :ref:`type` in Aesara represents a set of constraints on potential
+A :class:`Type` in Aesara represents a set of constraints on potential
 data objects. These constraints allow Aesara to tailor C code to handle
 them and to statically optimize the computation graph. For instance,
-the :ref:`irow <libdoc_tensor_creation>` type in the ``aesara.tensor`` package
-gives the following constraints on the data the Variables of type ``irow``
+the :ref:`irow <libdoc_tensor_creation>` type in the :mod:`aesara.tensor` package
+gives the following constraints on the data the :class:`Variable`\s of type ``irow``
 may contain:

-#. Must be an instance of ``numpy.ndarray``: ``isinstance(x, numpy.ndarray)``
+#. Must be an instance of :class:`numpy.ndarray`: ``isinstance(x, numpy.ndarray)``
 #. Must be an array of 32-bit integers: ``str(x.dtype) == 'int32'``
 #. Must have a shape of 1xN: ``len(x.shape) == 2 and x.shape[0] == 1``

@@ -223,24 +218,21 @@ Knowing these restrictions, Aesara may generate C code for addition, etc.
 that declares the right data types and that contains the right number
 of loops over the dimensions.

-Note that an Aesara :ref:`type` is not equivalent to a Python type or
+Note that an Aesara :class:`Type` is not equivalent to a Python type or
 class. Indeed, in Aesara, :ref:`irow <libdoc_tensor_creation>` and :ref:`dmatrix
-<libdoc_tensor_creation>` both use ``numpy.ndarray`` as the underlying type
+<libdoc_tensor_creation>` both use :class:`numpy.ndarray` as the underlying type
 for doing computations and storing data, yet they are different Aesara
-Types. Indeed, the constraints set by ``dmatrix`` are:
+:class:`Type`\s. Indeed, the constraints set by `dmatrix` are:

-#. Must be an instance of ``numpy.ndarray``: ``isinstance(x, numpy.ndarray)``
+#. Must be an instance of :class:`numpy.ndarray`: ``isinstance(x, numpy.ndarray)``
 #. Must be an array of 64-bit floating point numbers: ``str(x.dtype) == 'float64'``
-#. Must have a shape of MxN, no restriction on M or N: ``len(x.shape) == 2``
+#. Must have a shape of ``MxN``, no restriction on ``M`` or ``N``: ``len(x.shape) == 2``

 These restrictions are different from those of ``irow`` which are listed above.

-There are cases in which a Type can fully correspond to a Python type,
-such as the ``double`` Type we will define here, which corresponds to
-Python's ``float``. But, it's good to know that this is not necessarily
-the case. Unless specified otherwise, when we say "Type" we mean a
-Aesara Type.
-
+There are cases in which a :class:`Type` can fully correspond to a Python type,
+such as the `double`\ :class:`Type`, which corresponds to
+Python's ``float``.

 .. index::
   single: Variable
@@ -248,53 +240,51 @@ Aesara Type.

 .. _variable:

+:class:`Variable`
+-----------------

-
-Variable
--------
-
-A :ref:`variable` is the main data structure you work with when using
-Aesara. The symbolic inputs that you operate on are Variables and what
-you get from applying various Ops to these inputs are also
-Variables. For example, when I type
+A :class:`Variable` is the main data structure you work with when using
+Aesara. The symbolic inputs that you operate on are :class:`Variable`\s and what
+you get from applying various :class:`Op`\s to these inputs are also
+:class:`Variable`\s. For example, when one inputs

 >>> import aesara
 >>> x = aesara.tensor.ivector()
 >>> y = -x

-``x`` and ``y`` are both Variables, i.e. instances of the :class:`Variable` class. The :ref:`type` of both ``x`` and
-``y`` is ``aesara.tensor.ivector``.
+``x`` and ``y`` are both :class:`Variable`\s. The :class:`Type` of both ``x`` and
+``y`` is `aesara.tensor.ivector`.

-Unlike ``x``, ``y`` is a Variable produced by a computation (in this
-case, it is the negation of ``x``). ``y`` is the Variable corresponding to
-the output of the computation, while ``x`` is the Variable
+Unlike ``x``, ``y`` is a :class:`Variable` produced by a computation (in this
+case, it is the negation of ``x``). ``y`` is the :class:`Variable` corresponding to
+the output of the computation, while ``x`` is the :class:`Variable`
 corresponding to its input. The computation itself is represented by
-another type of node, an :ref:`apply` node, and may be accessed
+another type of node, an :class:`Apply` node, and may be accessed
 through ``y.owner``.

-More specifically, a Variable is a basic structure in Aesara that
+More specifically, a :class:`Variable` is a basic structure in Aesara that
 represents a datum at a certain point in computation. It is typically
 an instance of the class :class:`Variable` or
 one of its subclasses.

-A Variable ``r`` contains four important fields:
+A :class:`Variable` ``r`` contains four important fields:

 **type**
-  a :ref:`type` defining the kind of value this Variable can hold in
+  a :class:`Type` defining the kind of value this :class:`Variable` can hold in
  computation.

 **owner**
-  this is either None or an :ref:`apply` node of which the Variable is
+  this is either ``None`` or an :class:`Apply` node of which the :class:`Variable` is
  an output.

 **index**
  the integer such that ``owner.outputs[index] is r`` (ignored if
-  ``owner`` is None)
+  :attr:`Variable.owner` is ``None``)

 **name**
  a string to use in pretty-printing and debugging.

-Variable has one special subclass: :ref:`Constant <constant>`.
+:class:`Variable` has an important subclass: :ref:`Constant <constant>`.

 .. index::
   single: Constant
@@ -303,127 +293,58 @@ Variable has one special subclass: :ref:`Constant <constant>`.
 .. _constant:


-Constant
-^^^^^^^^
+:class:`Constant`
+^^^^^^^^^^^^^^^^^

-A Constant is a :ref:`Variable` with one extra field, *data* (only
-settable once). When used in a computation graph as the input of an
-:ref:`Op` :ref:`application <Apply>`, it is assumed that said input
-will *always* take the value contained in the constant's data
-field. Furthermore, it is assumed that the :ref:`Op` will not under
-any circumstances modify the input. This means that a constant is
-eligible to participate in numerous optimizations: constant inlining
+A :class:`Constant` is a :class:`Variable` with one extra, immutable field:
+:attr:`Constant.data`.
+When used in a computation graph as the input of an
+:class:`Op`\ :class:`Apply`, it is assumed that said input
+will *always* take the value contained in the :class:`Constant`'s data
+field. Furthermore, it is assumed that the :class:`Op` will not under
+any circumstances modify the input. This means that a :class:`Constant` is
+eligible to participate in numerous optimizations: constant in-lining
 in C code, constant folding, etc.

-A constant does not need to be specified in a :func:`function
-<function.function>`'s list
-of inputs.  In fact, doing so will raise an exception.
-
-
-
-Graph Structures Extension
-==========================
-
-When we start the compilation of an Aesara function, we compute some
-extra information. This section describes a portion of the information
-that is made available.
-
-The graph gets cloned at the start of compilation, so modifications done
-during compilation won't affect the user graph.
-
-Each variable receives a new field called clients. It is a list with
-references to every place in the graph where this variable is used. If
-its length is 0, it means the variable isn't used. Each place where it
-is used is described by a tuple of 2 elements. There are two types of
-pairs:
-
- The first element is an Apply node.
- The first element is the string "output". It means the
-  function outputs this variable.
-
-In both types of pairs, the second element of the tuple is an index,
-such that: ``fgraph.clients[var][*][0].inputs[index]`` or
-``fgraph.outputs[index]`` is that variable.
-
-
->>> import aesara
->>> v = aesara.tensor.vector()
->>> f = aesara.function([v], (v+1).sum())
->>> aesara.printing.debugprint(f)
-Sum{acc_dtype=float64} [id A] ''   1
- |Elemwise{add,no_inplace} [id B] ''   0
-   |TensorConstant{(1,) of 1.0} [id C]
-   |<TensorType(float64, vector)> [id D]
->>> # Sorted list of all nodes in the compiled graph.
->>> fgraph = f.maker.fgraph
->>> topo = fgraph.toposort()
->>> fgraph.clients[topo[0].outputs[0]]
-[(Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)]
->>> fgraph.clients[topo[1].outputs[0]]
-[('output', 0)]
-
->>> # An internal variable
->>> var = topo[0].outputs[0]
->>> client = fgraph.clients[var][0]
->>> client
-(Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)
->>> type(client[0])
-<class 'aesara.graph.basic.Apply'>
->>> assert client[0].inputs[client[1]] is var
-
->>> # An output of the graph
->>> var = topo[1].outputs[0]
->>> client = fgraph.clients[var][0]
->>> client
-('output', 0)
->>> assert fgraph.outputs[client[1]] is var
-
-
 Automatic Differentiation
 =========================

 Having the graph structure, computing automatic differentiation is
 simple. The only thing :func:`aesara.grad` has to do is to traverse the
-graph from the outputs back towards the inputs through all *apply*
-nodes (*apply* nodes are those that define which computations the
-graph does). For each such *apply* node, its *op* defines
-how to compute the *gradient* of the node's outputs with respect to its
-inputs. Note that if an *op* does not provide this information,
-it is assumed that the *gradient* is not defined.
-Using the
-`chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_
-these gradients can be composed in order to obtain the expression of the
-*gradient* of the graph's output with respect to the graph's inputs.
+graph from the outputs back towards the inputs through all :class:`Apply`
+nodes. For each such :class:`Apply` node, its :class:`Op` defines
+how to compute the gradient of the node's outputs with respect to its
+inputs. Note that if an :class:`Op` does not provide this information,
+it is assumed that the gradient is not defined.

-A following section of this tutorial will examine the topic of :ref:`differentiation<tutcomputinggrads>`
-in greater detail.
+Using the `chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_,
+these gradients can be composed in order to obtain the expression of the
+gradient of the graph's output with respect to the graph's inputs.

+A following section of this tutorial will examine the topic of
+:ref:`differentiation<tutcomputinggrads>` in greater detail.

 Optimizations
 =============

-When compiling an Aesara function, what you give to the
-:func:`aesara.function <function.function>` is actually a graph
-(starting from the output variables you can traverse the graph up to
-the input variables). While this graph structure shows how to compute
-the output from the input, it also offers the possibility to improve the
-way this computation is carried out. The way optimizations work in
-Aesara is by identifying and replacing certain patterns in the graph
-with other specialized patterns that produce the same results but are either
-faster or more stable. Optimizations can also detect
-identical subgraphs and ensure that the same values are not computed
-twice or reformulate parts of the graph to a GPU specific version.
+When compiling an Aesara graph using :func:`aesara.function`, a graph is
+necessarily provided.  While this graph structure shows how to compute the
+output from the input, it also offers the possibility to improve the way this
+computation is carried out. The way optimizations work in Aesara is by
+identifying and replacing certain patterns in the graph with other specialized
+patterns that produce the same results but are either faster or more
+stable. Optimizations can also detect identical subgraphs and ensure that the
+same values are not computed twice or reformulate parts of the graph to a GPU
+specific version.

 For example, one (simple) optimization that Aesara uses is to replace
-the pattern :math:`\frac{xy}{y}` by *x.*
-
+the pattern :math:`\frac{xy}{y}` by :math:`x`.

 See :ref:`graph_rewriting` and :ref:`optimizations` for more information.

 **Example**

-Symbolic programming involves a change of paradigm: it will become clearer
-as we apply it. Consider the following example of optimization:
+Consider the following example of optimization:

 >>> import aesara
 >>> a = aesara.tensor.vector("a")      # declare symbolic variable

--- a/doc/extending/index.rst
+++ b/doc/extending/index.rst
@@ -5,15 +5,15 @@
 Extending Aesara
 ================

-This advanced tutorial is for users who want to extend Aesara with new Types,
-new Operations (Ops), and new graph optimizations. This first page of the
-tutorial mainly focuses on the Python implementation of an Op and then
-proposes an overview of the most important methods that define an op.
-The second page of the tutorial (:ref:`extending_aesara_c`) provides then
-information on the C implementation of an Op. The rest of the tutorial
-goes more in depth on advanced topics related to Ops, such as how to write
-efficient code for an Op and how to write an optimization to speed up the
-execution of an Op.
+This advanced tutorial is for users who want to extend Aesara with new :class:`Type`\s,
+new Operations (:Class:`Op`\S), and new graph optimizations. This first page of the
+tutorial mainly focuses on the Python implementation of an :Class:`Op` and then
+proposes an overview of the most important methods that define an :class:`Op`.
+The second page of the tutorial (:ref:`creating_a_c_op`) provides then
+information on the C implementation of an :Class:`Op`. The rest of the tutorial
+goes more in depth on advanced topics related to :Class:`Op`\s, such as how to write
+efficient code for an :Class:`Op` and how to write an optimization to speed up the
+execution of an :Class:`Op`.

 Along the way, this tutorial also introduces many aspects of how Aesara works,
 so it is also good for you if you are interested in getting more under the hood
@@ -23,7 +23,7 @@ with Aesara itself.

    Before tackling this more advanced presentation, it is highly recommended
    to read the introductory :ref:`Tutorial<tutorial>`, especially the sections
-    that introduce the Aesara Graphs, as providing a novel Aesara op requires a
+    that introduce the Aesara Graphs, as providing a novel Aesara :class:`Op` requires a
    basic understanting of the Aesara Graphs.

    See also the :ref:`dev_start_guide` for information regarding the
@@ -32,21 +32,19 @@ with Aesara itself.

 .. toctree::

+    graphstructures
    graph_rewriting
-    extending_aesara
-    extending_aesara_c
+    op
+    creating_an_op
+    creating_a_c_op
+    creating_a_numba_jax_op
    pipeline
-    aesara_vs_c
-    graphstructures
    type
-    op
+    ctype
    inplace
+    scan
    other_ops
-    ctype
-    cop
    using_params
-    tips
    unittest
-    scan
    extending_faq
-    jax_op
+    tips
--- a/doc/extending/op.rst
+++ b/doc/extending/op.rst

-=========================================
-Making arithmetic :class:`Op`\s on double
-=========================================
-
-.. testsetup:: *
-
-   from aesara.graph.type import Type
-
-   class Double(Type):
-
-       def filter(self, x, strict=False, allow_downcast=None):
-           if strict:
-               if isinstance(x, float):
-                   return x
-               else:
-                   raise TypeError('Expected a float!')
-           elif allow_downcast:
-               return float(x)
-           else:   # Covers both the False and None cases.
-               x_float = float(x)
-               if x_float == x:
-                   return x_float
-               else:
-                    raise TypeError('The double type cannot accurately represent '
-                                    'value %s (of type %s): you must explicitly '
-                                    'allow downcasting if you want to do this.'
-                                    % (x, type(x)))
-
-       def values_eq_approx(self, x, y, tolerance=1e-4):
-           return abs(x - y) / (abs(x) + abs(y)) < tolerance
-
-       def __str__(self):
-           return "double"
-
-   double = Double()
-
-
-Now that we have a ``double`` type, we have yet to use it to perform
-computations. We'll start by defining multiplication.
-
 .. _op_contract:

-:class:`Op`'s contract
-======================
+=============
+:class:`Op`\s
+=============
+
+An :class:`Op` is a :ref:`graph object <graphstructures>` that defines and performs computations in a graph.

-An `Op` is any object which inherits from :class:`Op`.  It has to
-define the following methods.
+It has to define the following methods.

 .. function:: make_node(*inputs)

@@ -110,9 +72,6 @@ define the following methods.
  operations <views_and_inplace>` before writing a :meth:`Op.perform`
  implementation that does either of these things.

-Instead (or in addition to) ``perform()`` You can also provide a
-:ref:`C implementation <cop>` of For more details, refer to the
-documentation for :class:`Op`.

 .. function:: __eq__(other)

@@ -274,9 +233,9 @@ Optional methods or attributes
   Undefined by default.

   If you define this function then it will be used instead of C code
-   or perform() to do the computation while debugging (currently
+   or :meth:`Op.perform` to do the computation while debugging (currently
   DebugMode, but others may also use it in the future).  It has the
-   same signature and contract as :func:`perform`.
+   same signature and contract as :meth:`Op.perform`.

   This enables :class:`Op`\s that cause trouble with DebugMode with their
   normal behaviour to adopt a different one when run under that
@@ -364,26 +323,26 @@ These are the function required to work with :func:`aesara.gradient.grad`.
  derivative :math:`\frac{d f}{d x}` of the latter with respect to the
  primitive :class:`Variable` (this has to be computed).

-  In mathematics, the total derivative of a scalar variable (C) with
-  respect to a vector of scalar variables (x), i.e. the gradient, is
+  In mathematics, the total derivative of a scalar variable :math:`C` with
+  respect to a vector of scalar variables :math:`x`, i.e. the gradient, is
  customarily represented as the row vector of the partial
  derivatives, whereas the total derivative of a vector of scalar
-  variables (f) with respect to another (x), is customarily
-  represented by the matrix of the partial derivatives, i.e.the
-  jacobian matrix. In this convenient setting, the chain rule
-  instructs that the gradient of the final scalar variable C with
-  respect to the primitive scalar variables in x through those in f is
-  simply given by the matrix product: :math:`\frac{d C}{d x} = \frac{d
-  C}{d f} * \frac{d f}{d x}`.
+  variables :math:`f` with respect to another :math:`x`, is customarily
+  represented by the matrix of the partial derivatives, i.e. the
+  Jacobian matrix. In this convenient setting, the chain rule
+  says that the gradient of the final scalar variable :math:`C` with
+  respect to the primitive scalar variables in :math:`x` through those in
+  :math:`f` is simply given by the matrix product:
+  :math:`\frac{d C}{d x} = \frac{d C}{d f} * \frac{d f}{d x}`.

  Here, the chain rule must be implemented in a similar but slightly
  more complex setting: Aesara provides in the list
  ``output_gradients`` one gradient for each of the :class:`Variable`\s returned
-  by the `Op`. Where f is one such particular :class:`Variable`, the
+  by the `Op`. Where :math:`f` is one such particular :class:`Variable`, the
  corresponding gradient found in ``output_gradients`` and
  representing :math:`\frac{d C}{d f}` is provided with a shape
-  similar to f and thus not necessarily as a row vector of scalars.
-  Furthermore, for each :class:`Variable` x of the Op's list of input variables
+  similar to :math:`f` and thus not necessarily as a row vector of scalars.
+  Furthermore, for each :class:`Variable` :math:`x` of the :class:`Op`'s list of input variables
  ``inputs``, the returned gradient representing :math:`\frac{d C}{d
  x}` must have a shape similar to that of :class:`Variable` x.

@@ -407,7 +366,7 @@ These are the function required to work with :func:`aesara.gradient.grad`.
  1) They must be :class:`Variable` instances.
  2) When they are types that have dtypes, they must never have an integer dtype.

-  The output gradients passed *to* `Op.grad` will also obey these constraints.
+  The output gradients passed *to* :meth:`Op.grad` will also obey these constraints.

  Integers are a tricky subject. Integers are the main reason for
  having :class:`DisconnectedType`, :class:`NullType` or zero gradient. When you have an
@@ -449,30 +408,30 @@ These are the function required to work with :func:`aesara.gradient.grad`.

  Examples:

-  1) :math:`f(x,y)` is a dot product between x and y. x and y are integers.
-     Since the output is also an integer, f is a step function.
+  1) :math:`f(x,y)` is a dot product between :math:`x` and :math:`y`. :math:`x` and :math:`y` are integers.
+     Since the output is also an integer, :math:`f` is a step function.
     Its gradient is zero almost everywhere, so :meth:`Op.grad` should return
-     zeros in the shape of x and y.
-  2) :math:`f(x,y)` is a dot product between x and y. x is floating point and y is an integer.
-     In this case the output is floating point. It doesn't matter
-     that y is an integer.  We consider f to still be defined at
-     :math:`f(x,y+\epsilon)`. The gradient is exactly the same as if y were
-     floating point.
-  3) :math:`f(x,y)` is the argmax of x along axis y.
-     The gradient with respect to y is undefined, because :math:`f(x,y)` is
-     not defined for floating point y. How could you take an argmax
-     along a fractional axis?  The gradient with respect to x is
-     0, because :math:`f(x+\epsilon, y) = f(x)` almost everywhere.
-  4) :math:`f(x,y)` is a vector with y elements, each of which taking on the value x
-     The :meth:`Op.grad` method should return :class:`DisconnectedType` for y,
-     because the elements of f don't depend on y. Only the shape of
-     f depends on y. You probably also want to implement a
-     connection_pattern method to encode this.
-  5) :math:`f(x) = int(x)` converts float x into an int. :math:`g(y) = float(y)`
-     converts an integer y into a float.  If the final cost :math:`C = 0.5 *
-     g(y) = 0.5 g(f(x))`, then the gradient with respect to y will be 0.5,
-     even if y is an integer. However, the gradient with respect to x will be
-     0, because the output of f is integer-valued.
+     zeros in the shape of :math:`x` and :math:`y`.
+  2) :math:`f(x,y)` is a dot product between :math:`x` and :math:`y`. :math:`x`
+     is floating point and :math:`y` is an integer.  In this case the output is
+     floating point. It doesn't matter that :math:`y` is an integer.  We
+     consider :math:`f` to still be defined at :math:`f(x,y+\epsilon)`. The
+     gradient is exactly the same as if :math:`y` were floating point.
+  3) :math:`f(x,y)` is the argmax of :math:`x` along axis :math:`y`.  The
+     gradient with respect to :math:`y` is undefined, because :math:`f(x,y)` is
+     not defined for floating point :math:`y`. How could you take an argmax
+     along a fractional axis?  The gradient with respect to :math:`x` is 0,
+     because :math:`f(x+\epsilon, y) = f(x)` almost everywhere.
+  4) :math:`f(x,y)` is a vector with :math:`y` elements, each of which taking on
+     the value :math:`x` The :meth:`Op.grad` method should return
+     :class:`DisconnectedType` for :math:`y`, because the elements of :math:`f`
+     don't depend on :math:`y`. Only the shape of :math:`f` depends on
+     :math:`y`. You probably also want to implement a connection_pattern method to encode this.
+  5) :math:`f(x) = int(x)` converts float :math:`x` into an integer. :math:`g(y) = float(y)`
+     converts an integer :math:`y` into a float.  If the final cost :math:`C = 0.5 *
+     g(y) = 0.5 g(f(x))`, then the gradient with respect to :math:`y` will be 0.5,
+     even if :math:`y` is an integer. However, the gradient with respect to :math:`x` will be
+     0, because the output of :math:`f` is integer-valued.

 .. function:: connection_pattern(node):

@@ -484,9 +443,8 @@ These are the function required to work with :func:`aesara.gradient.grad`.
  elements of ``inputs[input_idx]`` have an effect on the elements of
  ``outputs[output_idx]``.

-  The ``node`` parameter is needed to determine the number of
-  inputs. Some :class:`Op`\s such as :class:`Subtensor` take a variable number of
-  inputs.
+  The ``node`` parameter is needed to determine the number of inputs. Some
+  :class:`Op`\s such as :class:`Subtensor` take a variable number of inputs.

  If no connection_pattern is specified, :func:`aesara.gradient.grad` will
  assume that all inputs have some elements connected to some
@@ -496,15 +454,14 @@ These are the function required to work with :func:`aesara.gradient.grad`.
  not part of the Aesara graph:

  1) Which of the :class:`Op`'s inputs are truly ancestors of each of the
-     :class:`Op`'s outputs. Suppose an :class:`Op` has two inputs, ``x`` and ``y``, and
-     outputs ``f(x)`` and ``g(y)``. ``y`` is not really an ancestor of ``f``, but
+     :class:`Op`'s outputs. Suppose an :class:`Op` has two inputs, :math:`x` and :math:`y`, and
+     outputs :math:`f(x)` and :math:`g(y)`. :math:`y` is not really an ancestor of :math:`f`, but
     it appears to be so in the Aesara graph.
-  2) Whether the actual elements of each input/output are relevant
-     to a computation.
+  2) Whether the actual elements of each input/output are relevant to a
+     computation.
     For example, the shape :class:`Op` does not read its input's elements,
     only its shape metadata. :math:`\frac{d shape(x)}{dx}` should thus raise
-     a disconnected input exception (if these exceptions are
-     enabled).
+     a disconnected input exception (if these exceptions are enabled).
     As another example, the elements of the :class:`Alloc` :class:`Op`'s outputs
     are not affected by the shape arguments to the :class:`Alloc` :class:`Op`.

@@ -531,9 +488,9 @@ These are the function required to work with :func:`aesara.gradient.grad`.
   point, namely: :math:`\frac{\partial f}{\partial x} v`.

   ``inputs`` are the symbolic variables corresponding to the value of
-   the input where you want to evaluate the jacobian, and ``eval_points``
+   the input where you want to evaluate the Jacobian, and ``eval_points``
   are the symbolic variables corresponding to the value you want to
-   right multiply the jacobian with.
+   right multiply the Jacobian with.

   Same conventions as for the :meth:`Op.grad` method hold. If your :class:`Op`
   is not differentiable, you can return None. Note that in contrast to the
@@ -543,242 +500,10 @@ These are the function required to work with :func:`aesara.gradient.grad`.
   into a single vector :math:`x`. You do the same with the evaluation
   points (which are as many as inputs and of the shame shape) and obtain
   another vector :math:`v`. For each output, you reshape it into a vector,
-   compute the jacobian of that vector with respect to :math:`x` and
+   compute the Jacobian of that vector with respect to :math:`x` and
   multiply it by :math:`v`. As a last step you reshape each of these
   vectors you obtained for each outputs (that have the same shape as
   the outputs) back to their corresponding shapes and return them as the
   output of the :meth:`Op.R_op` method.

   :ref:`List of op with r op support <R_op_list>`.
-
-Defining an :class:`Op`: ``mul``
-================================
-
-We'll define multiplication as a *binary* operation, even though a
-multiplication `Op` could take an arbitrary number of arguments.
-
-First, we'll instantiate a ``mul`` :class:`Op`:
-
-.. testcode:: mul
-
-   from aesara.graph.op import Op
-
-
-   mul = Op()
-
-
-**make_node**
-
-This function must take as many arguments as the operation we are
-defining is supposed to take as inputs---in this example that would be
-two.  This function ensures that both inputs have the ``double`` type.
-Since multiplying two doubles yields a double, this function makes an
-:class:`Apply` node with an output :class:`Variable` of type ``double``.
-
-.. testcode:: mul
-
-   def make_node(x, y):
-       if x.type != double or y.type != double:
-           raise TypeError('mul only works on doubles')
-       return Apply(mul, [x, y], [double()])
-   mul.make_node = make_node
-
-
-The first two lines make sure that both inputs are :class:`Variable`\s of the
-``double`` type that we created in the previous section. We would not
-want to multiply two arbitrary types, it would not make much sense
-(and we'd be screwed when we implement this in C!)
-
-The last line is the meat of the definition. There we create an :class:`Apply`
-node representing the application of the `Op` ``mul`` to inputs ``x`` and
-``y``, giving a :class:`Variable` instance of type ``double`` as the output.
-
-.. note::
-
-   Aesara relies on the fact that if you call the :meth:`Op.make_node` method
-   of :class:`Apply`'s first argument on the inputs passed as the :class:`Apply`'s
-   second argument, the call will not fail and the returned :class:`Apply`
-   instance will be equivalent.  This is how graphs are copied.
-
-**perform**
-
-This code actually computes the function.
-In our example, the data in ``inputs`` will be instances of Python's
-built-in type ``float`` because this is the type that ``double.filter()``
-will always return, per our own definition. ``output_storage`` will
-contain a single storage cell for the multiplication's variable.
-
-.. testcode:: mul
-
-   def perform(node, inputs, output_storage):
-       x, y = inputs[0], inputs[1]
-       z = output_storage[0]
-       z[0] = x * y
-   mul.perform = perform
-
-Here, ``z`` is a list of one element. By default, ``z == [None]``.
-
-.. note::
-
-   It is possible that ``z`` does not contain ``None``. If it contains
-   anything else, Aesara guarantees that whatever it contains is what
-   :meth:`Op.perform` put there the last time it was called with this
-   particular storage. Furthermore, Aesara gives you permission to do
-   whatever you want with ``z``'s contents, chiefly reusing it or the
-   memory allocated for it. More information can be found in the
-   :class:`Op` documentation.
-
-.. warning::
-
-   We gave ``z`` the Aesara type ``double`` in :meth:`Op.make_node`, which means
-   that a Python ``float`` must be put there. You should not put, say, an
-   ``int`` in ``z[0]`` because Aesara assumes :class:`Op`\s handle typing properly.
-
-
-Trying out our new :class:`Op`
-==============================
-
-In the following code, we use our new `Op`:
-
-.. doctest:: mul
-
-   >>> import aesara
-   >>> x, y = double('x'), double('y')
-   >>> z = mul(x, y)
-   >>> f = aesara.function([x, y], z)
-   >>> f(5, 6)
-   30.0
-   >>> f(5.6, 6.7)
-   37.519999999999996
-
-Note that there is an implicit call to
-``double.filter()`` on each argument, so if we give integers as inputs
-they are magically cast to the right type. Now, what if we try this?
-
-.. doctest:: mul
-
-   >>> x = double('x')
-   >>> z = mul(x, 2)
-   Traceback (most recent call last):
-     File "<stdin>", line 1, in <module>
-     File "/u/breuleuo/hg/aesara/aesara/graph/op.py", line 207, in __call__
-     File "<stdin>", line 2, in make_node
-   AttributeError: 'int' object has no attribute 'type'
-
-Automatic Constant Wrapping
---------------------------
-
-Well, OK. We'd like our `Op` to be a bit more flexible. This can be done
-by modifying :meth:`Op.make_node` to accept Python ``int`` or ``float`` as
-``x`` and/or ``y``:
-
-.. testcode:: mul
-
-   def make_node(x, y):
-       if isinstance(x, (int, float)):
-           x = Constant(double, x)
-       if isinstance(y, (int, float)):
-           y = Constant(double, y)
-       if x.type != double or y.type != double:
-           raise TypeError('mul only works on doubles')
-       return Apply(mul, [x, y], [double()])
-   mul.make_node = make_node
-
-Whenever we pass a Python int or float instead of a :class:`Variable` as ``x`` or
-``y``, :meth:`Op.make_node` will convert it to :ref:`constant` for us. ``Constant``
-is a :ref:`variable` we statically know the value of.
-
-.. doctest:: mul
-
-   >>> import numpy
-   >>> x = double('x')
-   >>> z = mul(x, 2)
-   >>> f = aesara.function([x], z)
-   >>> f(10)
-   20.0
-   >>> numpy.allclose(f(3.4), 6.8)
-   True
-
-Now the code works the way we want it to.
-
-.. note::
-   Most Aesara :class:`Op`\s follow this convention of up-casting literal
-   :meth:`Op.make_node` arguments to :class:`Constant`\s.
-   This makes typing expressions more natural.  If you do
-   not want a constant somewhere in your graph, you have to pass a :class:`Variable`
-   (like ``double('x')`` here).
-
-
-
-Final version
-=============
-
-The above example is pedagogical.  When you define other basic arithmetic
-operations ``add``, ``sub`` and ``div``, code for :meth:`Op.make_node` can be
-shared between these :class:`Op`\s. Here is revised implementation of these four
-arithmetic operators:
-
-.. testcode::
-
-   from aesara.graph.basic import Apply, Constant
-   from aesara.graph.op import Op
-
-
-   class BinaryDoubleOp(Op):
-
-       __props__ = ("name", "fn")
-
-       def __init__(self, name, fn):
-           self.name = name
-           self.fn = fn
-
-       def make_node(self, x, y):
-           if isinstance(x, (int, float)):
-               x = Constant(double, x)
-           if isinstance(y, (int, float)):
-               y = Constant(double, y)
-           if x.type != double or y.type != double:
-               raise TypeError('%s only works on doubles' % self.name)
-           return Apply(self, [x, y], [double()])
-
-       def perform(self, node, inp, out):
-           x, y = inp
-           z, = out
-           z[0] = self.fn(x, y)
-
-       def __str__(self):
-           return self.name
-
-   add = BinaryDoubleOp(name='add',
-                        fn=lambda x, y: x + y)
-
-   sub = BinaryDoubleOp(name='sub',
-                        fn=lambda x, y: x - y)
-
-   mul = BinaryDoubleOp(name='mul',
-                        fn=lambda x, y: x * y)
-
-   div = BinaryDoubleOp(name='div',
-                        fn=lambda x, y: x / y)
-
-Instead of working directly on an instance of `Op`, we create a subclass of
-`Op` that we can parametrize. All the operations we define are binary. They
-all work on two inputs with type ``double``. They all return a single
-:class:`Variable` of type ``double``. Therefore, :meth:`Op.make_node` does the same thing
-for all these operations, except for the `Op` reference ``self`` passed
-as first argument to :class:`Apply`.  We define :meth:`Op.perform` using the function
-``fn`` passed in the constructor.
-
-This design is a flexible way to define basic operations without
-duplicating code. The same way a `Type` subclass represents a set of
-structurally similar types (see previous section), an `Op` subclass
-represents a set of structurally similar operations: operations that
-have the same input/output types, operations that only differ in one
-small detail, etc. If you see common patterns in several :class:`Op`\s that you
-want to define, it can be a good idea to abstract out what you can.
-Remember that an `Op` is just an object which satisfies the contract
-described above on this page and that you should use all the tools at
-your disposal to create these objects as efficiently as possible.
-
-**Exercise**: Make a generic ``DoubleOp``, where the number of
-arguments can also be given as a parameter.
--- a/doc/extending/pipeline.rst
+++ b/doc/extending/pipeline.rst
@@ -5,59 +5,40 @@
 Overview of the compilation pipeline
 ====================================

-The purpose of this page is to explain each step of defining and
-compiling an Aesara function.
+Once one has an Aesara graph, they can use :func:`aesara.function` to compile a
+function that will perform the computations modeled by the graph in Python, C,
+Numba, or JAX.

+More specifically, :func:`aesara.function` takes a list of input and output
+:ref:`Variables <variable>` that define the precise sub-graphs that
+correspond to the desired computations.

-Definition of the computation graph
-----------------------------------
-
-By creating Aesara :ref:`Variables <variable>` using
-``aesara.tensor.lscalar`` or ``aesara.tensor.dmatrix`` or by using
-Aesara functions such as ``aesara.tensor.sin`` or
-``aesara.tensor.log``, the user builds a computation graph. The
-structure of that graph and details about its components can be found
-in the :ref:`graphstructures` article.
-
-
-
-Compilation of the computation graph
------------------------------------
-
-Once the user has built a computation graph, they can use
-:func:`aesara.function` in order to make one or more functions that
-operate on real data. function takes a list of input :ref:`Variables
-<variable>` as well as a list of output :class:`Variable`\s that define a
-precise subgraph corresponding to the function(s) we want to define,
-compile that subgraph and produce a callable.
-
-Here is an overview of the various steps that are done with the
-computation graph in the compilation phase:
+Here is an overview of the various steps that are taken during the
+compilation performed by :func:`aesara.function`.


 Step 1 - Create a :class:`FunctionGraph`
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-The subgraph given by the end user is wrapped in a structure called
-:class:`FunctionGraph`. That structure defines several hooks on adding and
-removing (pruning) nodes as well as on modifying links between nodes
-(for example, modifying an input of an :ref:`apply` node) (see the
-article about :ref:`libdoc_graph_fgraph` for more information).
+The subgraph specified by the end-user is wrapped in a structure called
+:class:`FunctionGraph`. This structure defines several callback hooks for when specific
+changes are made to a :class:`FunctionGraph`--like adding and
+removing nodes, as well as modifying links between nodes
+(e.g. modifying an input of an :ref:`apply` node). See :ref:`libdoc_graph_fgraph`.

 :class:`FunctionGraph` provides a method to change the input of an :class:`Apply` node from one
-:class:`Variable` to another and a more high-level method to replace a :class:`Variable`
-with another. This is the structure that :ref:`Optimizers
-<optimization>` work on.
+:class:`Variable` to another, and a more high-level method to replace a :class:`Variable`
+with another. These are the primary means of performing :ref:`graph rewrites <graph_rewriting>`.

 Some relevant :ref:`Features <libdoc_graph_fgraphfeature>` are typically added to the
-:class:`FunctionGraph`, namely to prevent any optimization from operating inplace on
-inputs declared as immutable.
+:class:`FunctionGraph` at this stage.  Namely, :class:`Feature`\s that prevent
+rewrites from operating in-place on inputs declared as immutable.


-Step 2 - Execute main :class:`Optimizer`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Step 2 - Perform graph optimizations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Once the :class:`FunctionGraph` is made, an :term:`optimizer` is produced by
+Once the :class:`FunctionGraph` is constructed, an :term:`optimizer` is produced by
 the :term:`mode` passed to :func:`function` (the :class:`Mode` basically has two
 important fields, :attr:`linker` and :attr:`optimizer`). That optimizer is
 applied on the :class:`FunctionGraph` using its :meth:`Optimizer.optimize` method.

--- a/doc/extending/scan.rst
+++ b/doc/extending/scan.rst
@@ -32,7 +32,7 @@ Pre-requisites

 The following sections assumes the reader is familiar with the following :

-1. Aesara's :ref:`graph structure <extending_aesara>` (`Apply` nodes, `Variable` nodes and `Op`\s)
+1. Aesara's :ref:`graph structure <graphstructures>` (`Apply` nodes, `Variable` nodes and `Op`\s)

 2. The interface and usage of Aesara's :ref:`scan <lib_scan>` function


--- a/doc/extending/tips.rst
+++ b/doc/extending/tips.rst
@@ -3,10 +3,11 @@ Tips
 ====


-Reusing outputs
-===============
+..
+   Reusing outputs
+   ===============

-.. todo:: Write this.
+   .. todo:: Write this.


 Don't define new :class:`Op`\s unless you have to
@@ -17,7 +18,7 @@ implemented using other already existing :class:`Op`\s. For example, instead of
 writing a "sum_square_difference" :class:`Op`, you should probably just write a
 simple function:

-.. testcode::
+.. code::

   from aesara import tensor as aet

@@ -41,13 +42,14 @@ used to make transpose-like transformations. These higher order :class:`Op`\s
 are mostly tensor-related, as this is Aesara's specialty.


-.. _opchecklist:
+..
+   .. _opchecklist:

-:class:`Op` Checklist
-=====================
+   :class:`Op` Checklist
+   =====================

-Use this list to make sure you haven't forgotten anything when
-defining a new :class:`Op`. It might not be exhaustive but it covers a lot of
-common mistakes.
+   Use this list to make sure you haven't forgotten anything when
+   defining a new :class:`Op`. It might not be exhaustive but it covers a lot of
+   common mistakes.

-.. todo:: Write a list.
+   .. todo:: Write a list.
--- a/doc/glossary.rst
+++ b/doc/glossary.rst
@@ -105,9 +105,10 @@ Glossary
        The ``.op`` of an :term:`Apply`, together with its symbolic inputs
        fully determines what kind of computation will be carried out for that
        :class:`Apply` at run-time.  Mathematical functions such as addition
-        (``T.add``) and indexing  ``x[i]`` are :class:`Op`\s in Aesara.  Much of the
-        library documentation is devoted to describing the various :class:`Op`\s that
-        are provided with Aesara, but you can add more.
+        (i.e. :func:`aesara.tensor.add`) and indexing ``x[i]`` are :class:`Op`\s
+        in Aesara.  Much of the library documentation is devoted to describing
+        the various :class:`Op`\s that are provided with Aesara, but you can add
+        more.

        See also :term:`Variable`, :term:`Type`, and :term:`Apply`,
        or read more about :ref:`graphstructures`.

--- a/doc/tutorial/extending_aesara.rst
+++ b/doc/tutorial/extending_aesara.rst
-:orphan:
-
-This page has been moved. Please refer to: :ref:`extending_aesara`.
--- a/doc/tutorial/extending_aesara_c.rst
+++ b/doc/tutorial/extending_aesara_c.rst
-:orphan:
-
-This page has been moved. Please refer to: :ref:`extending_aesara_c`.