"At the core of autodiff is the vector-Jacobian product (VJP), or in PyTensor's archaic terminology, the L-Operator (because the vector is on the left). The Jacobian is the matrix (or tensor) of all first-order partial derivatives. Let us completely ignore what the vector means, and think how do we go about computing the product of a general vector with the Jacobian matrix?"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "35d314e1728148d5",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-06T20:20:19.996660Z",
"start_time": "2025-07-06T20:20:19.594782Z"
}
},
"outputs": [],
"source": [
"import pytensor.tensor as pt\n",
"from pytensor.gradient import Lop, jacobian as jacobian_raw\n",
"The naive way is to create the full Jacobian matrix and then right-multiply it by the vector. Let's look at a concrete example for the Elemtwise operation log(x)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "153da4ec5f8b5a71",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-06T20:20:20.128996Z",
"start_time": "2025-07-06T20:20:20.124242Z"
}
},
"outputs": [],
"source": [
"x = pt.vector(\"x\")\n",
"log_x = pt.log(x)"
]
},
{
"cell_type": "markdown",
"id": "91012e92f0eafee9",
"metadata": {},
"source": [
"If x has length 3, the Jacobian of y with respect to x is a 3x3 matrix, since there are 3 outputs and 3 inputs.\n",
"\n",
"Each entry contains the partial derivative of a one of the outputs (rows) with respect to one of the inputs (columns).\n",
"For the elementwise operation log(x), the Jacobian is a diagonal matrix, as each input affects only the corresponding output. For the log operation the partial derivatives are given by $\\frac{1}{x_i}$, so the Jacobian is:\n",
"\n",
"$$\n",
"J = \\begin{pmatrix}\n",
"\\frac{1}{x_1} & 0 & 0 \\\\\n",
"0 & \\frac{1}{x_2} & 0 \\\\\n",
"0 & 0 & \\frac{1}{x_3}\n",
"\\end{pmatrix}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7c359f386b4cb918",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-06T20:20:20.195333Z",
"start_time": "2025-07-06T20:20:20.172499Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True_div [id A]\n",
" ├─ Eye{dtype='float64'} [id B]\n",
" │ ├─ Shape_i{0} [id C]\n",
" │ │ └─ x [id D]\n",
" │ ├─ Shape_i{0} [id C]\n",
" │ │ └─ ···\n",
" │ └─ 0 [id E]\n",
" └─ ExpandDims{axis=0} [id F]\n",
" └─ x [id D]\n"
]
}
],
"source": [
"J_log = jacobian(log_x, x)\n",
"simplify_print(J_log)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a3441a71c6ddb3ad",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-06T20:20:20.517814Z",
"start_time": "2025-07-06T20:20:20.237690Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1. , 0. , 0. ],\n",
" [0. , 0.5 , 0. ],\n",
" [0. , 0. , 0.33333333]])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"J_log.eval({\"x\": [1.0, 2.0, 3.0]})"
]
},
{
"cell_type": "markdown",
"id": "f57772bc5c85c82c",
"metadata": {},
"source": [
"To get the vector-Jacobian product, we will left-multiply the Jacobian by a vector v. In this case, it simplifies to an elementwise division of the vector v by the input vector x:\n",
"It is unnecessary to compute the whole Jacobian matrix, and then perform a vector-matrix multiplication. The `Lop` returns the smart computations directly:"
"A pattern that will become obvious in this notebook is that we can often exploit some property of the Jacobian matrix (and that we want to multiply it by a vector) to compute the VJP cheaply. Let's take a look at the cumulative sum operation."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f1c068fed4abb1a7",
"metadata": {
"ExecuteTime": {
"end_time": "2025-07-06T20:20:21.240090Z",
"start_time": "2025-07-06T20:20:21.227054Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([1., 3., 6.])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = pt.vector(\"x\")\n",
"cumsum_x = pt.cumsum(x)\n",
"cumsum_x.eval({\"x\": [1.0, 2.0, 3.0]})"
]
},
{
"cell_type": "markdown",
"id": "a9b41bc35d7c1716",
"metadata": {},
"source": [
"The jacobian of the cumulative sum operation is a lower triangular matrix of ones, since the first input affects all outputs additively, the second input affects all outputs but the first, and so on, until the last input which only affects the last output. If x has length 3:\n",
"\n",
"$$\n",
"J = \\begin{pmatrix}\n",
"1 & 0 & 0 \\\\\n",
"1 & 1 & 0 \\\\\n",
"1 & 1 & 1 \\\\\n",
"\\end{pmatrix}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "87a87483-e92a-42aa-b42f-509739063888",
"metadata": {},
"source": [
"PyTensor autodiff builds this jacobian in a funny way. Starting from a diagonal matrix, it flips the columns, performs a cumsum across the them and then flips them back. A more direct way would do cumsum along the row of the diagonal matrix, but since a flip is just a view (no copy needed) it doesn't actually cost us much."
"If you're not familiar with convolution, we get those four numbers by padding `x` with zeros and then performing an inner product with the flipped `y`, one pair of values at a time"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "56075ceb-dc1d-4e17-8e95-e43dbc28783a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 1, 1, -2])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x_padded = np.array([0, 0, 1, 2, 0])\n",
"res = np.array([\n",
" x_padded[0:2] @ [-1, 1],\n",
" x_padded[1:3] @ [-1, 1],\n",
" x_padded[2:4] @ [-1, 1],\n",
" x_padded[3:5] @ [-1, 1],\n",
"])\n",
"res"
]
},
{
"cell_type": "markdown",
"id": "bdb3a78a-ca4e-4adc-98f6-65663ffd0ec6",
"metadata": {},
"source": [
"Let's focus on the Jacobian wrt to y, as that's smaller. If you look at the expression above you'll see that it implies the following jacobian:\n",
"\n",
"$$\n",
"J = \\begin{pmatrix}\n",
"x_1 & 0 \\\\\n",
"x_2 & x_1 \\\\\n",
"x_3 & x_2 \\\\\n",
"0 & x_3 \\\\\n",
"\\end{pmatrix}\n",
"$$\n",
"\n",
"The constant zeros come from the padding. Curious how PyTensor builds this sort of jacobian?"
"It performs a batched \"valid\" convolution between eye(4) and the flipped x vector. In a valid convolution, there is no padding, and we only multiply the sub-sequences that match in length."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "1ab05d9a-33fd-46a9-abfd-4de25cd6e520",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 0.],\n",
" [1., 0.],\n",
" [2., 1.],\n",
" [0., 2.]])"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"J_convolution.eval({\"x\": [0, 1, 2]})"
]
},
{
"cell_type": "markdown",
"id": "05577d83-091a-48fb-a383-854f69c4cab5",
"metadata": {},
"source": [
"Following the theme, is there any special structure in this Jacobian that can be exploited to compute VJP efficiently?"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "89caf646-0213-4aa6-9aca-b7cb0230c8bc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Convolve1d [id A]\n",
" ├─ v [id B]\n",
" ├─ Subtensor{::step} [id C]\n",
" │ ├─ x [id D]\n",
" │ └─ -1 [id E]\n",
" └─ ScalarFromTensor [id F]\n",
" └─ False [id G]\n"
]
}
],
"source": [
"v = pt.vector(\"v\", shape=(4,))\n",
"vjp_convolution = Lop(convolution_xy, y, v)\n",
"simplify_print(vjp_convolution)"
]
},
{
"cell_type": "markdown",
"id": "6b369467-9ceb-4f4f-86f7-486801ff3275",
"metadata": {},
"source": [
"It's just the \"valid\" convolution between v and x flipped. Our Jacobian has a [toeplitz structure](https://en.wikipedia.org/wiki/Toeplitz_matrix), and the dot product between such a matrix and a vector is equivalent to a discrete convolution!"
"PyTensor builds this Jacobian with two reshapes and a tranpose of an `eye`."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "c65f6740-f809-4fc0-831f-61bd2e09f510",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reshape{2} [id A]\n",
" ├─ Transpose{axes=[0, 2, 1]} [id B]\n",
" │ └─ Reshape{3} [id C]\n",
" │ ├─ Eye{dtype='float64'} [id D]\n",
" │ │ ├─ Mul [id E]\n",
" │ │ │ ├─ 2 [id F]\n",
" │ │ │ └─ Shape_i{1} [id G]\n",
" │ │ │ └─ A [id H]\n",
" │ │ ├─ Mul [id E]\n",
" │ │ │ └─ ···\n",
" │ │ └─ 0 [id I]\n",
" │ └─ MakeVector{dtype='int64'} [id J]\n",
" │ ├─ Mul [id E]\n",
" │ │ └─ ···\n",
" │ ├─ Shape_i{1} [id G]\n",
" │ │ └─ ···\n",
" │ └─ 2 [id F]\n",
" └─ [6 6] [id K]\n"
]
}
],
"source": [
"simplify_print(J_transpose)"
]
},
{
"cell_type": "markdown",
"id": "0e39d268-12c3-463f-9ec7-e8b48bbd95d4",
"metadata": {},
"source": [
"To recreate the outcome of `Lop`, we ravel the `V` matrix, multiply it with the Jacobian defined on the raveled function, and reshape the result to the original input shape."
"Because J is a permutation matrix, the multiplication with it simply rearranges the entries of `V`. \n",
"\n",
"What's more, after the reshape, we end up with a simple transposition of the original `V` matrix!\n",
"\n",
"Unsurprisingly, `Lop` takes the direct shortcut:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "e29d3283-77f6-4539-8d41-851848267cd6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Transpose{axes=[1, 0]} [id A]\n",
" └─ V [id B]\n"
]
},
{
"data": {
"text/plain": [
"<ipykernel.iostream.OutStream at 0x7fe7bb110a60>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Lop(transpose_A, A, V).dprint()"
]
},
{
"cell_type": "markdown",
"id": "6de26bd3-4e43-444a-abf9-eee682d201b0",
"metadata": {},
"source": [
"## VJP and auto-diff"
]
},
{
"cell_type": "markdown",
"id": "4f92c6a4-27d2-4854-8b66-038eca62ebb7",
"metadata": {},
"source": [
"It is time to reveal the meaning of the mysterious vector (or reshaped tensor) `v`. In the context ouf auto-diff, it is the vector that accumulates the partial derivatives of intermediate computations. If you chain the VJP for each operation in your graph you obtain reverse-mode autodiff. \n",
"\n",
"Let's look at a simple example with the operations we discussed already:"
"Similarly, forward-mode autodiff makes use of the R-Operator (Rop) or Jacobian-vector product (JVP) to accumulate the partial derivations from inputs to outputs."
]
},
{
"cell_type": "markdown",
"id": "eff94009-64f2-4c5d-b4d5-5ff36a86b7fa",
"metadata": {},
"source": [
"## Conclusion"
]
},
{
"cell_type": "markdown",
"id": "123a10f2-ef4d-4cd9-86b7-b2b7975a1c31",
"metadata": {},
"source": [
"We hope this sheds some light on how PyTensor (and most auto-diff frameworks) implement vector Jacobian products efficiently, in a way that avoids both having to build the full jacobian and having to multiply with it."