Tensor Calculus Layout Conventions

Introduction

In my previous article “Derivatives”, we have discussed the derivatives of vector forms, including $\frac{\partial \mathbf{y}}{\partial x}$, $\frac{\partial y}{\partial \mathbf{x}}$, $\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$, where $\mathbf{y}$ and $\mathbf{x}$ are vectors of real numbers. But what if the variable in the numerator or denominator is a matrix or even higher-dimensional matrix, i.e., tensor? In this case, we will need to use the tensor calculus layout conventions, including the numerator layout and denominator layout.

In this blog post, I would like to discuss the rules of tensor calculus layout conventions, including the numerator layout and denominator layout, and how these rules can be applied and generalized to tensors of all dimensions.

Tensor Calculus Numerator Layout

Suppose $X \in \mathbb{R}^{n_{1} \times n_{2} \times \cdots \times n_{k}}$ and $Y \in \mathbb{R}^{m_{1} \times m_{2} \times \cdots \times m_{h}}$, where $X$ and $Y$ are tensors of dimensions $k$ and $h$, respectively.

The numerator layout notations for the derivatives of $Y$ with respect to $X$, $\frac{\partial Y}{\partial X}$, first iteratively unrolls the tensor $Y$ into vectors in the order from the first dimension to the last dimension, and then iteratively unrolls the tensor $X$ into vectors in the order from the last dimension to the first dimension. The resulting derivative is a tensor $\frac{\partial Y}{\partial X} \in \mathbb{R}^{m_{1} \times m_{2} \times \cdots \times m_{h} \times n_{k} \times n_{k-1} \times \cdots \times n_{1}}$.

Let’s see some examples of the numerator layout notations.

Vector-By-Scalar Notation

Suppose $x \in \mathbb{R}$, i.e., $x$ is a scalar or a 0D tensor, and $\mathbf{y} = \{y_{1}, y_{2}, \ldots, y_{m}\} \in \mathbb{R}^{m}$, i.e., $\mathbf{y}$ is a vector of dimension $n$. The derivative $\frac{\partial \mathbf{y}}{\partial x}$ is a vector of the same dimension as $\mathbf{y}$, i.e., $\frac{\partial \mathbf{y}}{\partial x} \in \mathbb{R}^{m}$.

$$
\begin{align}
\frac{\partial \mathbf{y}}{\partial x} &= \left\{\frac{\partial y_{1}}{\partial x}, \frac{\partial y_{2}}{\partial x}, \ldots, \frac{\partial y_{m}}{\partial x}\right\} \\
\end{align}
$$

Note that we never mention if $\mathbf{y}$ is a row vector or a column vector as it might be seen elsewhere. This is because a row vector or a column vector is essentially a 2D tensor instead of 1D vector, which will break the mathematical consistency of our notations from lower dimensional tensors to higher dimensional tensors.

Scalar-By-Vector Notation

Suppose $\mathbf{x} = \{x_{1}, x_{2}, \ldots, x_{n}\} \in \mathbb{R}^{n}$, i.e., $\mathbf{x}$ is a vector of dimension $n$, and $y \in \mathbb{R}$, i.e., $y$ is a scalar or a 0D tensor. The derivative $\frac{\partial y}{\partial \mathbf{x}}$ is a vector of the same dimension as $\mathbf{x}$, i.e., $\frac{\partial y}{\partial \mathbf{x}} \in \mathbb{R}^{n}$.

$$
\begin{align}
\frac{\partial y}{\partial \mathbf{x}} &= \left\{\frac{\partial y}{\partial x_{1}}, \frac{\partial y}{\partial x_{2}}, \ldots, \frac{\partial y}{\partial x_{n}}\right\} \\
\end{align}
$$

Similarly, we never mention if $\mathbf{x}$ is a row vector or a column vector.

Vector-By-Vector Notation

Suppose $\mathbf{x} = \{x_{1}, x_{2}, \ldots, x_{n}\} \in \mathbb{R}^{n}$, i.e., $\mathbf{x}$ is a vector of dimension $n$, and $\mathbf{y} = \{y_{1}, y_{2}, \ldots, y_{m}\} \in \mathbb{R}^{m}$, i.e., $\mathbf{y}$ is a vector of dimension $m$. The derivative $\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$ is a matrix of the size $m \times n$, i.e., $\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \in \mathbb{R}^{m \times n}$.

$$
\begin{align}
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}
&=
\left\{
\frac{\partial y_{1}}{\partial \mathbf{x}}, \frac{\partial y_{2}}{\partial \mathbf{x}}, \ldots, \frac{\partial y_{m}}{\partial \mathbf{x}}
\right\} \\
&=
\left\{
\left\{
\frac{\partial y_{1}}{\partial x_{1}}, \frac{\partial y_{1}}{\partial x_{2}}, \ldots, \frac{\partial y_{1}}{\partial x_{n}}\right\},
\left\{
\frac{\partial y_{2}}{\partial x_{1}}, \frac{\partial y_{2}}{\partial x_{2}}, \ldots, \frac{\partial y_{2}}{\partial x_{n}}\right\},
\ldots,
\left\{
\frac{\partial y_{m}}{\partial x_{1}}, \frac{\partial y_{m}}{\partial x_{2}}, \ldots, \frac{\partial y_{m}}{\partial x_{n}}\right\}
\right\} \\
&=
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}} \\
\frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \cdots & \frac{\partial y_{2}}{\partial x_{n}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m}}{\partial x_{1}} & \frac{\partial y_{m}}{\partial x_{2}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{bmatrix} \\
\end{align}
$$

Note that a matrix or a 2D tensor is essentially vector of vectors, and it does not matter whether the vector is a row vector or a column vector.

Matrix-By-Scalar Notation

Suppose $x \in \mathbb{R}$, i.e., $x$ is a scalar or a 0D tensor, and $\mathbf{Y} \in \mathbb{R}^{m_{1} \times m_{2}}$, i.e., $\mathbf{Y}$ is a matrix of size $m_{1} \times m_{2}$.

$$
\begin{align}
\mathbf{Y} &=
\left\{
\mathbf{Y}_{1,:}, \mathbf{Y}_{2,:}, \ldots, \mathbf{Y}_{m_{1},:}
\right\} \\
&=
\left\{
\left\{
y_{11}, y_{12}, \ldots, y_{1m_{2}}\right\},
\left\{
y_{21}, y_{22}, \ldots, y_{2m_{2}}\right\},
\ldots,
\left\{
y_{m_{1}1}, y_{m_{1}2}, \ldots, y_{m_{1}m_{2}}\right\}
\right\} \\
&=
\begin{bmatrix}
y_{11} & y_{12} & \cdots & y_{1m_{2}} \\
y_{21} & y_{22} & \cdots & y_{2m_{2}} \\
\vdots & \vdots & \ddots & \vdots \\
y_{m_{1}1} & y_{m_{1}2} & \cdots & y_{m_{1}m_{2}} \\
\end{bmatrix} \\
\end{align}
$$

The derivative $\frac{\partial \mathbf{Y}}{\partial x}$ is a matrix of the same size as $\mathbf{Y}$, i.e., $\frac{\partial \mathbf{Y}}{\partial x} \in \mathbb{R}^{m_{1} \times m_{2}}$.

$$
\begin{align}
\frac{\partial \mathbf{Y}}{\partial x} &=
\left\{
\frac{\mathbf{Y}_{1,:}}{\partial x}, \frac{\mathbf{Y}_{2,:}}{\partial x}, \ldots, \frac{\mathbf{Y}_{m_{1},:}}{\partial x}
\right\} \\
&=
\left\{
\left\{
\frac{\partial y_{11}}{\partial x}, \frac{\partial y_{12}}{\partial x}, \ldots, \frac{\partial y_{1m_{2}}}{\partial x}\right\},
\left\{
\frac{\partial y_{21}}{\partial x}, \frac{\partial y_{22}}{\partial x}, \ldots, \frac{\partial y_{2m_{2}}}{\partial x}\right\},
\ldots,
\left\{
\frac{\partial y_{m_{1}1}}{\partial x}, \frac{\partial y_{m_{1}2}}{\partial x}, \ldots, \frac{\partial y_{m_{1}m_{2}}}{\partial x}\right\}
\right\} \\
&=
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1m_{2}}}{\partial x} \\
\frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2m_{2}}}{\partial x} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m_{1}1}}{\partial x} & \frac{\partial y_{m_{1}2}}{\partial x} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x}
\end{bmatrix} \\
\end{align}
$$

Scalar-By-Matrix Notation

Suppose $\mathbf{X} \in \mathbb{R}^{n_{1} \times n_{2}}$, i.e., $\mathbf{X}$ is a matrix of size $n_{1} \times n_{2}$, and $y \in \mathbb{R}$, i.e., $y$ is a scalar or a 0D tensor.

$$
\begin{align}
\mathbf{X}
&=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1n_{2}} \\
x_{21} & x_{22} & \cdots & x_{2n_{2}} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n_{1}1} & x_{n_{1}2} & \cdots & x_{n_{1}n_{2}} \\
\end{bmatrix} \\
\end{align}
$$

The derivative $\frac{\partial y}{\partial \mathbf{X}}$ is a matrix of size $n_{2} \times n_{1}$, i.e., $\frac{\partial y}{\partial \mathbf{X}} \in \mathbb{R}^{n_{2} \times n_{1}}$.

$$
\begin{align}
\frac{\partial y}{\partial \mathbf{X}} &=
\left\{
\frac{\partial y}{\partial \mathbf{X}_{:,1}}, \frac{\partial y}{\partial \mathbf{X}_{:,2}}, \ldots, \frac{\partial y}{\partial \mathbf{X}_{:,n_{2}}}
\right\} \\
&=
\left\{
\left\{
\frac{\partial y}{\partial x_{11}}, \frac{\partial y}{\partial x_{21}}, \ldots, \frac{\partial y}{\partial x_{n_{1}1}}\right\},
\left\{
\frac{\partial y}{\partial x_{12}}, \frac{\partial y}{\partial x_{22}}, \ldots, \frac{\partial y}{\partial x_{n_{1}2}}\right\},
\ldots,
\left\{
\frac{\partial y}{\partial x_{1n_{2}}}, \frac{\partial y}{\partial x_{2n_{2}}}, \ldots, \frac{\partial y}{\partial x_{n_{1}n_{2}}}\right\}
\right\} \\
&=
\begin{bmatrix}
\frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{n_{1}1}} \\
\frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y}{\partial x_{1n_{2}}} & \frac{\partial y}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} \\
\end{align}
$$

Note that the order of the dimensions of the derivative $\frac{\partial y}{\partial \mathbf{X}}$ is different from the order of the dimensions of the matrix $\mathbf{X}$. This is because we have unrolled the matrix $\mathbf{X}$ into a vector in the order from the last dimension to the first dimension, according to our rules of the numerator layout.

Matrix-By-Vector Notation

Suppose $\mathbf{x} \in \mathbb{R}^{n}$, i.e., $\mathbf{x}$ is a vector of dimension $n$, and $\mathbf{Y} \in \mathbb{R}^{m_{1} \times m_{2}}$, i.e., $\mathbf{Y}$ is a matrix of size $m_{1} \times m_{2}$. The derivative $\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}$ is a tensor of size $m_{1} \times m_{2} \times n$, i.e., $\frac{\partial \mathbf{Y}}{\partial \mathbf{x}} \in \mathbb{R}^{m_{1} \times m_{2} \times n}$.

$$
\begin{align}
\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}
&=
\begin{bmatrix}
\frac{\partial y_{11}}{\partial \mathbf{x}} & \frac{\partial y_{12}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{1m_{2}}}{\partial \mathbf{x}} \\
\frac{\partial y_{21}}{\partial \mathbf{x}} & \frac{\partial y_{22}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{2m_{2}}}{\partial \mathbf{x}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m_{1}1}}{\partial \mathbf{x}} & \frac{\partial y_{m_{1}2}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial \mathbf{x}}
\end{bmatrix} \\
&=
\begin{bmatrix}
\left\{
\frac{\partial y_{11}}{\partial x_{1}}, \frac{\partial y_{11}}{\partial x_{2}}, \ldots, \frac{\partial y_{11}}{\partial x_{n}}\right\} &
\left\{
\frac{\partial y_{12}}{\partial x_{1}}, \frac{\partial y_{12}}{\partial x_{2}}, \ldots, \frac{\partial y_{12}}{\partial x_{n}}\right\} &
\cdots &
\left\{
\frac{\partial y_{1m_{2}}}{\partial x_{1}}, \frac{\partial y_{1m_{2}}}{\partial x_{2}}, \ldots, \frac{\partial y_{1m_{2}}}{\partial x_{n}}\right\} \\
\left\{
\frac{\partial y_{21}}{\partial x_{1}}, \frac{\partial y_{21}}{\partial x_{2}}, \ldots, \frac{\partial y_{21}}{\partial x_{n}}\right\} &
\left\{
\frac{\partial y_{22}}{\partial x_{1}}, \frac{\partial y_{22}}{\partial x_{2}}, \ldots, \frac{\partial y_{22}}{\partial x_{n}}\right\} &
\cdots &
\left\{
\frac{\partial y_{2m_{2}}}{\partial x_{1}}, \frac{\partial y_{2m_{2}}}{\partial x_{2}}, \ldots, \frac{\partial y_{2m_{2}}}{\partial x_{n}}\right\} \\
\vdots & \vdots & \ddots & \vdots \\
\left\{
\frac{\partial y_{m_{1}1}}{\partial x_{1}}, \frac{\partial y_{m_{1}1}}{\partial x_{2}}, \ldots, \frac{\partial y_{m_{1}1}}{\partial x_{n}}\right\} &
\left\{
\frac{\partial y_{m_{1}2}}{\partial x_{1}}, \frac{\partial y_{m_{1}2}}{\partial x_{2}}, \ldots, \frac{\partial y_{m_{1}2}}{\partial x_{n}}\right\} &
\cdots &
\left\{
\frac{\partial y_{m_{1}m_{2}}}{\partial x_{1}}, \frac{\partial y_{m_{1}m_{2}}}{\partial x_{2}}, \ldots, \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n}}\right\}
\end{bmatrix} \\
\end{align}
$$

Vector-By-Matrix Notation

Suppose $\mathbf{X} \in \mathbb{R}^{n_{1} \times n_{2}}$, i.e., $\mathbf{X}$ is a matrix of size $n_{1} \times n_{2}$, and $\mathbf{y} = \{y_{1}, y_{2}, \ldots, y_{m}\} \in \mathbb{R}^{m}$, i.e., $\mathbf{y}$ is a vector of dimension $m$. The derivative $\frac{\partial \mathbf{y}}{\partial \mathbf{X}}$ is a tensor of size $m \times n_{2} \times n_{1}$, i.e., $\frac{\partial \mathbf{y}}{\partial \mathbf{X}} \in \mathbb{R}^{m \times n_{2} \times n_{1}}$.

$$
\begin{align}
\frac{\partial \mathbf{y}}{\partial \mathbf{X}}
&=
\left\{
\frac{\partial y_{1}}{\partial \mathbf{X}}, \frac{\partial y_{2}}{\partial \mathbf{X}}, \ldots, \frac{\partial y_{m}}{\partial \mathbf{X}} \\
\right\} \\
&=
\left\{
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{11}} & \frac{\partial y_{1}}{\partial x_{21}} & \cdots & \frac{\partial y_{1}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{1}}{\partial x_{12}} & \frac{\partial y_{1}}{\partial x_{22}} & \cdots & \frac{\partial y_{1}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1}}{\partial x_{1n_{2}}} & \frac{\partial y_{1}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{1}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix},
\begin{bmatrix}
\frac{\partial y_{2}}{\partial x_{11}} & \frac{\partial y_{2}}{\partial x_{21}} & \cdots & \frac{\partial y_{2}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{2}}{\partial x_{12}} & \frac{\partial y_{2}}{\partial x_{22}} & \cdots & \frac{\partial y_{2}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{2}}{\partial x_{1n_{2}}} & \frac{\partial y_{2}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{2}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix},
\ldots,
\begin{bmatrix}
\frac{\partial y_{m}}{\partial x_{11}} & \frac{\partial y_{m}}{\partial x_{21}} & \cdots & \frac{\partial y_{m}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{m}}{\partial x_{12}} & \frac{\partial y_{m}}{\partial x_{22}} & \cdots & \frac{\partial y_{m}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m}}{\partial x_{1n_{2}}} & \frac{\partial y_{m}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} \\
\right\} \\
\end{align}
$$

Matrix-By-Matrix Notation

Suppose $\mathbf{X} \in \mathbb{R}^{n_{1} \times n_{2}}$, i.e., $\mathbf{X}$ is a matrix of size $n_{1} \times n_{2}$, and $\mathbf{Y} \in \mathbb{R}^{m_{1} \times m_{2}}$, i.e., $\mathbf{Y}$ is a matrix of size $m_{1} \times m_{2}$. The derivative $\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}$ is a tensor of size $m_{1} \times m_{2} \times n_{2} \times n_{1}$, i.e., $\frac{\partial \mathbf{Y}}{\partial \mathbf{X}} \in \mathbb{R}^{m_{1} \times m_{2} \times n_{2} \times n_{1}}$.

$$
\begin{align}
\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}
&=
\begin{bmatrix}
\frac{\partial y_{11}}{\partial \mathbf{X}} & \frac{\partial y_{12}}{\partial \mathbf{X}} & \cdots & \frac{\partial y_{1m_{2}}}{\partial \mathbf{X}} \\
\frac{\partial y_{21}}{\partial \mathbf{X}} & \frac{\partial y_{22}}{\partial \mathbf{X}} & \cdots & \frac{\partial y_{2m_{2}}}{\partial \mathbf{X}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m_{1}1}}{\partial \mathbf{X}} & \frac{\partial y_{m_{1}2}}{\partial \mathbf{X}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial \mathbf{X}}
\end{bmatrix} \\
&=
\begin{bmatrix}
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{11}} & \frac{\partial y_{11}}{\partial x_{21}} & \cdots & \frac{\partial y_{11}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{11}}{\partial x_{12}} & \frac{\partial y_{11}}{\partial x_{22}} & \cdots & \frac{\partial y_{11}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{11}}{\partial x_{1n_{2}}} & \frac{\partial y_{11}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{11}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} &
\begin{bmatrix}
\frac{\partial y_{12}}{\partial x_{11}} & \frac{\partial y_{12}}{\partial x_{21}} & \cdots & \frac{\partial y_{12}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{12}}{\partial x_{12}} & \frac{\partial y_{12}}{\partial x_{22}} & \cdots & \frac{\partial y_{12}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{12}}{\partial x_{1n_{2}}} & \frac{\partial y_{12}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{12}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} &
\cdots &
\begin{bmatrix}
\frac{\partial y_{1m_{2}}}{\partial x_{11}} & \frac{\partial y_{1m_{2}}}{\partial x_{21}} & \cdots & \frac{\partial y_{1m_{2}}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{1m_{2}}}{\partial x_{12}} & \frac{\partial y_{1m_{2}}}{\partial x_{22}} & \cdots & \frac{\partial y_{1m_{2}}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{1n_{2}}} & \frac{\partial y_{1m_{2}}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{1m_{2}}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} \\
\begin{bmatrix}
\frac{\partial y_{21}}{\partial x_{11}} & \frac{\partial y_{21}}{\partial x_{21}} & \cdots & \frac{\partial y_{21}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{21}}{\partial x_{12}} & \frac{\partial y_{21}}{\partial x_{22}} & \cdots & \frac{\partial y_{21}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{21}}{\partial x_{1n_{2}}} & \frac{\partial y_{21}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{21}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} &
\begin{bmatrix}
\frac{\partial y_{22}}{\partial x_{11}} & \frac{\partial y_{22}}{\partial x_{21}} & \cdots & \frac{\partial y_{22}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{22}}{\partial x_{12}} & \frac{\partial y_{22}}{\partial x_{22}} & \cdots & \frac{\partial y_{22}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{22}}{\partial x_{1n_{2}}} & \frac{\partial y_{22}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{22}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} &
\cdots &
\begin{bmatrix}
\frac{\partial y_{2m_{2}}}{\partial x_{11}} & \frac{\partial y_{2m_{2}}}{\partial x_{21}} & \cdots & \frac{\partial y_{2m_{2}}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{2m_{2}}}{\partial x_{12}} & \frac{\partial y_{2m_{2}}}{\partial x_{22}} & \cdots & \frac{\partial y_{2m_{2}}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{2m_{2}}}{\partial x_{1n_{2}}} & \frac{\partial y_{2m_{2}}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{2m_{2}}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} \\
\vdots & \vdots & \ddots & \vdots \\
\begin{bmatrix}
\frac{\partial y_{m_{1}1}}{\partial x_{11}} & \frac{\partial y_{m_{1}1}}{\partial x_{21}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{m_{1}1}}{\partial x_{12}} & \frac{\partial y_{m_{1}1}}{\partial x_{22}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m_{1}1}}{\partial x_{1n_{2}}} & \frac{\partial y_{m_{1}1}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} &
\begin{bmatrix}
\frac{\partial y_{m_{1}2}}{\partial x_{11}} & \frac{\partial y_{m_{1}2}}{\partial x_{21}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{m_{1}2}}{\partial x_{12}} & \frac{\partial y_{m_{1}2}}{\partial x_{22}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m_{1}2}}{\partial x_{1n_{2}}} & \frac{\partial y_{m_{1}2}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} &
\cdots &
\begin{bmatrix}
\frac{\partial y_{m_{1}m_{2}}}{\partial x_{11}} & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{21}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{m_{1}m_{2}}}{\partial x_{12}} & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{22}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{m_{1}m_{2}}}{\partial x_{1n_{2}}} & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix}
\end{bmatrix} \\
\end{align}
$$

Tensor Calculus Denominator Layout

Suppose $X \in \mathbb{R}^{n_{1} \times n_{2} \times \cdots \times n_{k}}$ and $Y \in \mathbb{R}^{m_{1} \times m_{2} \times \cdots \times m_{h}}$, where $X$ and $Y$ are tensors of dimensions $k$ and $h$, respectively.

The denominator layout notations for the derivatives of $Y$ with respect to $X$, $\frac{\partial Y}{\partial X}$, first iteratively unrolls the tensor $X$ into vectors in the order from the first dimension to the last dimension, and then iteratively unrolls the tensor $Y$ into vectors in the order from the last dimension to the first dimension. The resulting derivative is a tensor $\frac{\partial Y}{\partial X} \in \mathbb{R}^{n_{1} \times n_{2} \times \cdots \times n_{k} \times m_{h} \times m_{h-1} \times \cdots \times m_{1}}$.

Note that the order of the dimensions of the derivative $\frac{\partial Y}{\partial X}$ for the denominator layout notation is just the reverse of the order of the dimensions of the derivative $\frac{\partial Y}{\partial X}$ for the numerator layout notation. In this article, we will generalize the matrix transpose operation to tensors, and we define the tensor transpose operation as the follows.

Given a tensor $X \in \mathbb{R}^{n_{1} \times n_{2} \times \cdots \times n_{k}}$, the tensor transpose operation $X^{\top}$ is defined as the tensor $X^{\top} \in \mathbb{R}^{n_{k} \times n_{k-1} \times \cdots \times n_{1}}$ such that $X_{i_{1} i_{2} \cdots i_{k}} = X^{\top}_{i_{k} i_{k-1} \cdots i_{1}}$ for all $i_{1}, i_{2}, \ldots, i_{k}$.

Let’s see some examples of the denominator layout notations.

Vector-By-Scalar Notation

Suppose $x \in \mathbb{R}$, i.e., $x$ is a scalar or a 0D tensor, and $\mathbf{y} = \{y_{1}, y_{2}, \ldots, y_{m}\} \in \mathbb{R}^{m}$, i.e., $\mathbf{y}$ is a vector of dimension $n$. The derivative $\frac{\partial \mathbf{y}}{\partial x}$ is a vector of the same dimension as $\mathbf{y}$, i.e., $\frac{\partial \mathbf{y}}{\partial x} \in \mathbb{R}^{m}$.

$$
\begin{align}
\frac{\partial \mathbf{y}}{\partial x} &= \left\{\frac{\partial y_{1}}{\partial x}, \frac{\partial y_{2}}{\partial x}, \ldots, \frac{\partial y_{m}}{\partial x}\right\} \\
\end{align}
$$

Note that this denominator layout notation happens to be the same as the numerator layout notation because the transpose of a vector is itself.

Scalar-By-Vector Notation

Suppose $\mathbf{x} = \{x_{1}, x_{2}, \ldots, x_{n}\} \in \mathbb{R}^{n}$, i.e., $\mathbf{x}$ is a vector of dimension $n$, and $y \in \mathbb{R}$, i.e., $y$ is a scalar or a 0D tensor. The derivative $\frac{\partial y}{\partial \mathbf{x}}$ is a vector of the same dimension as $\mathbf{x}$, i.e., $\frac{\partial y}{\partial \mathbf{x}} \in \mathbb{R}^{n}$.

$$
\begin{align}
\frac{\partial y}{\partial \mathbf{x}} &= \left\{\frac{\partial y}{\partial x_{1}}, \frac{\partial y}{\partial x_{2}}, \ldots, \frac{\partial y}{\partial x_{n}}\right\} \\
\end{align}
$$

Note that this denominator layout notation happens to be the same as the numerator layout notation because the transpose of a vector is itself.

Vector-By-Vector Notation

Suppose $\mathbf{x} = \{x_{1}, x_{2}, \ldots, x_{n}\} \in \mathbb{R}^{n}$, i.e., $\mathbf{x}$ is a vector of dimension $n$, and $\mathbf{y} = \{y_{1}, y_{2}, \ldots, y_{m}\} \in \mathbb{R}^{m}$, i.e., $\mathbf{y}$ is a vector of dimension $m$. The derivative $\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$ is a matrix of the size $n \times m$, i.e., $\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \in \mathbb{R}^{n \times m}$.

$$
\begin{align}
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}
&=
\left\{
\frac{\partial \mathbf{y}}{\partial x_{1}}, \frac{\partial \mathbf{y}}{\partial x_{2}}, \ldots, \frac{\partial \mathbf{y}}{\partial x_{n}}
\right\} \\
&=
\left\{
\left\{
\frac{\partial y_{1}}{\partial x_{1}}, \frac{\partial y_{2}}{\partial x_{1}}, \ldots, \frac{\partial y_{m}}{\partial x_{1}}\right\},
\left\{
\frac{\partial y_{1}}{\partial x_{2}}, \frac{\partial y_{2}}{\partial x_{2}}, \ldots, \frac{\partial y_{m}}{\partial x_{2}}\right\},
\ldots,
\left\{
\frac{\partial y_{1}}{\partial x_{n}}, \frac{\partial y_{2}}{\partial x_{n}}, \ldots, \frac{\partial y_{m}}{\partial x_{n}}\right\}
\right\} \\
&=
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}} \\
\frac{\partial y_{1}}{\partial x_{2}} & \frac{\partial y_{2}}{\partial x_{2}} & \cdots & \frac{\partial y_{m}}{\partial x_{2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1}}{\partial x_{n}} & \frac{\partial y_{2}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{bmatrix} \\
\end{align}
$$

Matrix-By-Scalar Notation

Suppose $x \in \mathbb{R}$, i.e., $x$ is a scalar or a 0D tensor, and $\mathbf{Y} \in \mathbb{R}^{m_{1} \times m_{2}}$, i.e., $\mathbf{Y}$ is a matrix of size $m_{1} \times m_{2}$.

$$
\begin{align}
\mathbf{Y}
&=
\begin{bmatrix}
y_{11} & y_{12} & \cdots & y_{1m_{2}} \\
y_{21} & y_{22} & \cdots & y_{2m_{2}} \\
\vdots & \vdots & \ddots & \vdots \\
y_{m_{1}1} & y_{m_{1}2} & \cdots & y_{m_{1}m_{2}} \\
\end{bmatrix} \\
\end{align}
$$

The derivative $\frac{\partial \mathbf{Y}}{\partial x}$ is a matrix of size $m_{2} \times m_{1}$, i.e., $\frac{\partial \mathbf{Y}}{\partial x} \in \mathbb{R}^{m_{2} \times m_{1}}$.

$$
\begin{align}
\frac{\partial \mathbf{Y}}{\partial x} &=
\left\{
\frac{\mathbf{Y}_{:,1}}{\partial x}, \frac{\mathbf{Y}_{:,2}}{\partial x}, \ldots, \frac{\mathbf{Y}_{:,m_{2}}}{\partial x}
\right\} \\
&=
\left\{
\left\{
\frac{\partial y_{11}}{\partial x}, \frac{\partial y_{21}}{\partial x}, \ldots, \frac{\partial y_{m_{1}1}}{\partial x}\right\},
\left\{
\frac{\partial y_{12}}{\partial x}, \frac{\partial y_{22}}{\partial x}, \ldots, \frac{\partial y_{m_{1}2}}{\partial x}\right\},
\ldots,
\left\{
\frac{\partial y_{1m_{2}}}{\partial x}, \frac{\partial y_{2m_{2}}}{\partial x}, \ldots, \frac{\partial y_{m_{1}m_{2}}}{\partial x}\right\}
\right\} \\
&=
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x} & \frac{\partial y_{21}}{\partial x} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x} \\
\frac{\partial y_{12}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x} & \frac{\partial y_{2m_{2}}}{\partial x} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x}
\end{bmatrix} \\
\end{align}
$$

Scalar-By-Matrix Notation

Suppose $\mathbf{X} \in \mathbb{R}^{n_{1} \times n_{2}}$, i.e., $\mathbf{X}$ is a matrix of size $n_{1} \times n_{2}$, and $y \in \mathbb{R}$, i.e., $y$ is a scalar or a 0D tensor.

$$
\begin{align}
\mathbf{X}
&=
\left\{
\mathbf{X}_{1,:}, \mathbf{X}_{2,:}, \ldots, \mathbf{X}_{n_{1},:}
\right\} \\
&=
\left\{
\left\{
x_{11}, x_{12}, \ldots, x_{1n_{2}}\right\},
\left\{
x_{21}, x_{22}, \ldots, x_{2n_{2}}\right\},
\ldots,
\left\{
x_{n_{1}1}, x_{n_{1}2}, \ldots, x_{n_{1}n_{2}}\right\}
\right\} \\
&=
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1n_{2}} \\
x_{21} & x_{22} & \cdots & x_{2n_{2}} \\
\vdots & \vdots & \ddots & \vdots \\
x_{n_{1}1} & x_{n_{1}2} & \cdots & x_{n_{1}n_{2}} \\
\end{bmatrix} \\
\end{align}
$$

The derivative $\frac{\partial y}{\partial \mathbf{X}}$ is a matrix of the same size as $\mathbf{X}$, i.e., $\frac{\partial y}{\partial \mathbf{X}} \in \mathbb{R}^{n_{1} \times n_{2}}$.

$$
\begin{align}
\frac{\partial y}{\partial \mathbf{X}} &=
\left\{
\frac{\partial y}{\partial \mathbf{X}_{1,:}}, \frac{\partial y}{\partial \mathbf{X}_{2,:}}, \ldots, \frac{\partial y}{\partial \mathbf{X}_{n_{1},:}}
\right\} \\
&=
\left\{
\left\{
\frac{\partial y}{\partial x_{11}}, \frac{\partial y}{\partial x_{12}}, \ldots, \frac{\partial y}{\partial x_{1n_{2}}}\right\},
\left\{
\frac{\partial y}{\partial x_{21}}, \frac{\partial y}{\partial x_{22}}, \ldots, \frac{\partial y}{\partial x_{2n_{2}}}\right\},
\ldots,
\left\{
\frac{\partial y}{\partial x_{n_{1}1}}, \frac{\partial y}{\partial x_{n_{1}2}}, \ldots, \frac{\partial y}{\partial x_{n_{1}n_{2}}}\right\}
\right\} \\
&=
\begin{bmatrix}
\frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1n_{2}}} \\
\frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2n_{2}}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y}{\partial x_{n_{1}1}} & \frac{\partial y}{\partial x_{n_{1}2}} & \cdots & \frac{\partial y}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} \\
\end{align}
$$

Matrix-By-Vector Notation

Suppose $\mathbf{x} \in \mathbb{R}^{n}$, i.e., $\mathbf{x}$ is a vector of dimension $n$, and $\mathbf{Y} \in \mathbb{R}^{m_{1} \times m_{2}}$, i.e., $\mathbf{Y}$ is a matrix of size $m_{1} \times m_{2}$. The derivative $\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}$ is a tensor of size $n \times m_{2} \times m_{1}$, i.e., $\frac{\partial \mathbf{Y}}{\partial \mathbf{x}} \in \mathbb{R}^{n \times m_{2} \times m_{1}}$.

$$
\begin{align}
\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}
&=
\left\{
\frac{\partial \mathbf{Y}}{\partial x_{1}}, \frac{\partial \mathbf{Y}}{\partial x_{2}}, \ldots, \frac{\partial \mathbf{Y}}{\partial x_{n}}
\right\} \\
&=
\left\{
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{1}} & \frac{\partial y_{21}}{\partial x_{1}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{1}} \\
\frac{\partial y_{12}}{\partial x_{1}} & \frac{\partial y_{22}}{\partial x_{1}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{1}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{1}} & \frac{\partial y_{2m_{2}}}{\partial x_{1}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{1}}
\end{bmatrix},
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{2}} & \frac{\partial y_{21}}{\partial x_{2}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{2}} \\
\frac{\partial y_{12}}{\partial x_{2}} & \frac{\partial y_{22}}{\partial x_{2}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{2}} & \frac{\partial y_{2m_{2}}}{\partial x_{2}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{2}}
\end{bmatrix},
\ldots,
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{n}} & \frac{\partial y_{21}}{\partial x_{n}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{n}} \\
\frac{\partial y_{12}}{\partial x_{n}} & \frac{\partial y_{22}}{\partial x_{n}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{n}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{n}} & \frac{\partial y_{2m_{2}}}{\partial x_{n}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n}}
\end{bmatrix}
\right\} \\
\end{align}
$$

Vector-By-Matrix Notation

Suppose $\mathbf{X} \in \mathbb{R}^{n_{1} \times n_{2}}$, i.e., $\mathbf{X}$ is a matrix of size $n_{1} \times n_{2}$, and $\mathbf{y} = \{y_{1}, y_{2}, \ldots, y_{m}\} \in \mathbb{R}^{m}$, i.e., $\mathbf{y}$ is a vector of dimension $m$. The derivative $\frac{\partial \mathbf{y}}{\partial \mathbf{X}}$ is a tensor of size $n_{1} \times n_{2} \times m$, i.e., $\frac{\partial \mathbf{y}}{\partial \mathbf{X}} \in \mathbb{R}^{n_{1} \times n_{2} \times m}$.

$$
\begin{align}
\frac{\partial \mathbf{y}}{\partial \mathbf{X}}
&=
\left\{
\frac{\partial \mathbf{y}}{\partial \mathbf{X}_{1,:}}, \frac{\partial \mathbf{y}}{\partial \mathbf{X}_{2,:}}, \ldots, \frac{\partial \mathbf{y}}{\partial \mathbf{X}_{n_{1},:}}
\right\} \\
&=
\left\{
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{11}} & \frac{\partial y_{2}}{\partial x_{11}} & \cdots & \frac{\partial y_{m}}{\partial x_{11}} \\
\frac{\partial y_{1}}{\partial x_{12}} & \frac{\partial y_{2}}{\partial x_{12}} & \cdots & \frac{\partial y_{m}}{\partial x_{12}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1}}{\partial x_{1n_{2}}} & \frac{\partial y_{2}}{\partial x_{1n_{2}}} & \cdots & \frac{\partial y_{m}}{\partial x_{1n_{2}}}
\end{bmatrix},
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{21}} & \frac{\partial y_{2}}{\partial x_{21}} & \cdots & \frac{\partial y_{m}}{\partial x_{21}} \\
\frac{\partial y_{1}}{\partial x_{22}} & \frac{\partial y_{2}}{\partial x_{22}} & \cdots & \frac{\partial y_{m}}{\partial x_{22}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1}}{\partial x_{2n_{2}}} & \frac{\partial y_{2}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m}}{\partial x_{2n_{2}}}
\end{bmatrix},
\ldots,
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{n_{1}1}} & \frac{\partial y_{2}}{\partial x_{n_{1}1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{1}}{\partial x_{n_{1}2}} & \frac{\partial y_{2}}{\partial x_{n_{1}2}} & \cdots & \frac{\partial y_{m}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1}}{\partial x_{n_{1}n_{2}}} & \frac{\partial y_{2}}{\partial x_{n_{1}n_{2}}} & \cdots & \frac{\partial y_{m}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix}
\right\} \\
\end{align}
$$

Matrix-By-Matrix Notation

Suppose $\mathbf{X} \in \mathbb{R}^{n_{1} \times n_{2}}$, i.e., $\mathbf{X}$ is a matrix of size $n_{1} \times n_{2}$, and $\mathbf{Y} \in \mathbb{R}^{m_{1} \times m_{2}}$, i.e., $\mathbf{Y}$ is a matrix of size $m_{1} \times m_{2}$. The derivative $\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}$ is a tensor of size $n_{1} \times n_{2} \times m_{2} \times m_{1}$, i.e., $\frac{\partial \mathbf{Y}}{\partial \mathbf{X}} \in \mathbb{R}^{n_{1} \times n_{2} \times m_{2} \times m_{1}}$.

$$
\begin{align}
\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}
&=
\begin{bmatrix}
\frac{\partial \mathbf{Y}}{\partial x_{11}} & \frac{\partial \mathbf{Y}}{\partial x_{12}} & \cdots & \frac{\partial \mathbf{Y}}{\partial x_{1n_{2}}} \\
\frac{\partial \mathbf{Y}}{\partial x_{21}} & \frac{\partial \mathbf{Y}}{\partial x_{22}} & \cdots & \frac{\partial \mathbf{Y}}{\partial x_{2n_{2}}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial \mathbf{Y}}{\partial x_{n_{1}1}} & \frac{\partial \mathbf{Y}}{\partial x_{n_{1}2}} & \cdots & \frac{\partial \mathbf{Y}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix} \\
&=
\begin{bmatrix}
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{11}} & \frac{\partial y_{21}}{\partial x_{11}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{11}} \\
\frac{\partial y_{12}}{\partial x_{11}} & \frac{\partial y_{22}}{\partial x_{11}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{11}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{11}} & \frac{\partial y_{2m_{2}}}{\partial x_{11}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{11}}
\end{bmatrix} &
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{12}} & \frac{\partial y_{21}}{\partial x_{12}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{12}} \\
\frac{\partial y_{12}}{\partial x_{12}} & \frac{\partial y_{22}}{\partial x_{12}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{12}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{12}} & \frac{\partial y_{2m_{2}}}{\partial x_{12}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{12}}
\end{bmatrix} &
\cdots &
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{1n_{2}}} & \frac{\partial y_{21}}{\partial x_{1n_{2}}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{1n_{2}}} \\
\frac{\partial y_{12}}{\partial x_{1n_{2}}} & \frac{\partial y_{22}}{\partial x_{1n_{2}}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{1n_{2}}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{1n_{2}}} & \frac{\partial y_{2m_{2}}}{\partial x_{1n_{2}}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{1n_{2}}}
\end{bmatrix} \\
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{21}} & \frac{\partial y_{21}}{\partial x_{21}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{21}} \\
\frac{\partial y_{12}}{\partial x_{21}} & \frac{\partial y_{22}}{\partial x_{21}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{21}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{21}} & \frac{\partial y_{2m_{2}}}{\partial x_{21}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{21}}
\end{bmatrix} &
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{22}} & \frac{\partial y_{21}}{\partial x_{22}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{22}} \\
\frac{\partial y_{12}}{\partial x_{22}} & \frac{\partial y_{22}}{\partial x_{22}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{22}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{22}} & \frac{\partial y_{2m_{2}}}{\partial x_{22}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{22}}
\end{bmatrix} &
\cdots &
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{2n_{2}}} & \frac{\partial y_{21}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{2n_{2}}} \\
\frac{\partial y_{12}}{\partial x_{2n_{2}}} & \frac{\partial y_{22}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{2n_{2}}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{2n_{2}}} & \frac{\partial y_{2m_{2}}}{\partial x_{2n_{2}}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{2n_{2}}}
\end{bmatrix} \\
\vdots & \vdots & \ddots & \vdots \\
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{n_{1}1}} & \frac{\partial y_{21}}{\partial x_{n_{1}1}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{n_{1}1}} \\
\frac{\partial y_{12}}{\partial x_{n_{1}1}} & \frac{\partial y_{22}}{\partial x_{n_{1}1}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{n_{1}1}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{n_{1}1}} & \frac{\partial y_{2m_{2}}}{\partial x_{n_{1}1}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n_{1}1}} \\
\end{bmatrix} &
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{n_{1}2}} & \frac{\partial y_{21}}{\partial x_{n_{1}2}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{n_{1}2}} \\
\frac{\partial y_{12}}{\partial x_{n_{1}2}} & \frac{\partial y_{22}}{\partial x_{n_{1}2}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{n_{1}2}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{n_{1}2}} & \frac{\partial y_{2m_{2}}}{\partial x_{n_{1}2}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n_{1}2}} \\
\end{bmatrix} &
\cdots &
\begin{bmatrix}
\frac{\partial y_{11}}{\partial x_{n_{1}n_{2}}} & \frac{\partial y_{21}}{\partial x_{n_{1}n_{2}}} & \cdots & \frac{\partial y_{m_{1}1}}{\partial x_{n_{1}n_{2}}} \\
\frac{\partial y_{12}}{\partial x_{n_{1}n_{2}}} & \frac{\partial y_{22}}{\partial x_{n_{1}n_{2}}} & \cdots & \frac{\partial y_{m_{1}2}}{\partial x_{n_{1}n_{2}}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_{1m_{2}}}{\partial x_{n_{1}n_{2}}} & \frac{\partial y_{2m_{2}}}{\partial x_{n_{1}n_{2}}} & \cdots & \frac{\partial y_{m_{1}m_{2}}}{\partial x_{n_{1}n_{2}}}
\end{bmatrix}
\end{bmatrix} \\
\end{align}
$$

Miscellaneous

The Wikipedia “Matrix Calculus” article also describes the layout conventions for matrix calculus. In my opinion, however, the description was very confusing and did not indicate the generalization of the notation to higher dimensional tensors. For example, the notation of row-vector and column-vector, which are essentially 2D matrices, for 1D vectors are used, but it will only break the mathematical consistency of the notations from lower dimensional tensors to higher dimensional tensors. As the readers could see from my article, I have not used the row-vector and column-vector notations for vectors at all in order to derive the notations for higher dimensional tensors. In addition, the transpose operation is not well defined for higher dimensional tensors. Finally, the rules from the Wikipedia article are too complicated to memorize, whereas the rules I have been used in my article are simple, intuitive, and consistent for tensors of all dimensions.

References

Author

Lei Mao

Posted on

05-08-2025

Updated on

05-08-2025

Licensed under


Comments