CuTe Inverse Layout
Introduction
In my previous article “CuTe Index To Coordinate”, I have discussed how to compute coordinate from index for a given layout.
In this article, I would like to go one step further and derive the inverse layout given a layout mathematically, which generalizes the index to coordinate mapping.
CuTe Inverse Layout
Suppose we have a compact layout $L = \mathbf{S}:\mathbf{D}$ where $\mathbf{S} = (M_0, M_1, \ldots, M_\alpha)$ and $\mathbf{D} = (d_0, d_1, \ldots, d_\alpha)$ are the respective shape and stride tuples of the layout, and $M = M_0 \cdot M_1 \cdots M_\alpha$. We want to construct an inverse layout $L^{-1} = \mathbf{S}^{-1}:\mathbf{D}^{-1}$, where $\mathbf{S}^{-1} = (M_0^{-1}, M_1^{-1}, \ldots, M_\alpha^{-1})$ and $\mathbf{D}^{-1} = (d_0^{-1}, d_1^{-1}, \ldots, d_\alpha^{-1})$, such that $f_{L^{-1}}(f_L(x)) = x$ for all $x < \text{size}(L)$.
Because it is a compact layout, we have the sorted layout $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$ where $\mathbf{S}^{\prime} = (M_0^{\prime}, M_1^{\prime}, \ldots, M_\alpha^{\prime})$ and $\mathbf{D}^{\prime} = (d_0^{\prime}, d_1^{\prime}, \ldots, d_\alpha^{\prime})$, and $d_i^{\prime} < d_{i+1}^{\prime}$ for all $i < \alpha$. More specifically, we have $d_{0} = 1$, $d_{1} = M_0^{\prime}$, $d_{2} = M_0^{\prime} \cdot M_1^{\prime}$, $\cdots$, $d_{\alpha} = M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime}$.
For the layout $L$, given an one-dimensional coordinate mapping to $(\alpha + 1)$-dimensional coordinates $x \leftrightarrow (x_{0}, x_{1}, \ldots, x_{\alpha})$, and further maps to the permutate coordinates for the sorted layout $L^{\prime}$, $x \leftrightarrow (x_{0}, x_{1}, \ldots, x_{\alpha}) \leftrightarrow (x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$.
$$
\begin{align}
x &= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$
$$
\begin{align}
f_{L}(x) &= f_{L}(x_{0}, x_{1}, \ldots, x_{\alpha}) \\
&= x_{0} \cdot d_{0} + x_{1} \cdot d_{1} + \cdots + x_{\alpha} \cdot d_{\alpha} \\
&= x_{0}^{\prime} \cdot d_{0}^{\prime} + x_{1}^{\prime} \cdot d_{1}^{\prime} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{\prime} \\
&= x_{0}^{\prime} + x_{1}^{\prime} \cdot M_0^{\prime} + x_{2}^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} + \cdots + x_{\alpha}^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime} \\
\end{align}
$$
Suppose the shape of the inverse layout $L^{-1}$ is given by $\mathbf{S}^{-1} = \left(M_0^{\prime}, M_1^{\prime}, \ldots, M_{\alpha - 1}^{\prime}, \frac{M}{M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime}}\right)$, the one-dimensional coordinate $f_{L}(x)$ mapping to the $(\alpha + 1)$-dimensional coordinates is given by $f_{L}(x) \leftrightarrow (x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$.
We want to construct the inverse layout $L^{-1}$ such that:
$$
\begin{align}
f_{L^{-1}}(f_{L}(x)) &= f_{L^{-1}}(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime}) \\
&= x_{0}^{\prime} \cdot d_{0}^{-1} + x_{1}^{\prime} \cdot d_{1}^{-1} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{-1} \\
&= x \\
&= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$
Because there is an one-to-one mapping between the coordinate before permutation $(x_{0}, x_{1}, \ldots, x_{\alpha})$ and the coordinate after permutation $(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$, it is straightforward to find the one-to-one mapping between $(d_{0}^{-1}, d_{1}^{-1}, \ldots, d_{\alpha}^{-1})$ and $(1, M_0, M_0 \cdot M_1, \ldots, M_0 \cdot M_1 \cdots M_{\alpha-1})$ such that $f_{L^{-1}}(f_{L}(x)) = x$.
Suppose the sort index for the layout $L$ is given by $\mathbf{I} = (i_0, i_1, \ldots, i_\alpha)$, where $\mathbf{I}$ is a permutation index that maps to the set $\{0, 1, \ldots, \alpha\}$, $M_{0}^{\prime} = M_{i_0}$, $M_{1}^{\prime} = M_{i_1}$, $\ldots$, $M_{\alpha}^{\prime} = M_{i_\alpha}$, and $d_{0}^{\prime} = d_{i_0}$, $d_{1}^{\prime} = d_{i_1}$, $\ldots$, $d_{\alpha}^{\prime} = d_{i_\alpha}$.
We have the sorted layout $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$, where $\mathbf{S}^{\prime} = (M_{i_0}, M_{i_1}, \ldots, M_{i_\alpha})$ and $\mathbf{D}^{\prime} = (d_{i_0}, d_{i_1}, \ldots, d_{i_\alpha})$. More specifically, we have $d_{i_0} = 1$, $d_{i_1} = M_{i_0}$, $d_{i_2} = M_{i_0} \cdot M_{i_1}$, $\cdots$, $d_{i_\alpha} = M_{i_0} \cdot M_{i_1} \cdots M_{i_{\alpha-1}}$.
Because
$$
\begin{align}
f_{L^{-1}}(f_{L}(x)) &= f_{L^{-1}}(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime}) \\
&= x_{0}^{\prime} \cdot d_{0}^{-1} + x_{1}^{\prime} \cdot d_{1}^{-1} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{-1} \\
&= x_{i_0} \cdot d_{0}^{-1} + x_{i_1} \cdot d_{1}^{-1} + \cdots + x_{i_\alpha} \cdot d_{\alpha}^{-1} \\
&= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$
We must have $d_{0}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_0 - 1}$, $d_{1}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_1 - 1}$, $\cdots$, $d_{\alpha}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_{\alpha}-1}$. This can also be rewritten as $d_{j}^{-1} = \prod_{k=0}^{i_j - 1} M_{k}$.
Therefore, the inverse layout $L^{-1}$ is given by $L^{-1} = \mathbf{S}^{-1}:\mathbf{D}^{-1}$, where $\mathbf{S}^{-1} = \left(M_{i_0}, M_{i_1}, \ldots, M_{i_{\alpha - 1}}, \frac{M}{M_{i_0} \cdot M_{i_1} \cdots M_{i_{\alpha-1}}}\right)$ and $\mathbf{D}^{-1} = \left(\prod_{k=0}^{i_0 - 1} M_{k}, \prod_{k=0}^{i_1 - 1} M_{k}, \ldots, \prod_{k=0}^{i_{\alpha-1} - 1} M_{k}, \prod_{k=0}^{i_{\alpha} - 1} M_{k}\right)$.
Example
Using the above derived formula for inverse layout, we could compute some inverse layouts manually and compare with the inverse layout computed by CuTe.
Suppose we have a layout $L = (8, 16, 4): (64, 1, 16)$, where $M_{0} = 8$, $M_1 = 16$, $M_2 = 4$, and $M = M_{0} \cdot M_1 \cdot M_2 = 8 \cdot 16 \cdot 4 = 512$. To compute the inverse layout $L^{-1}$, we first compute the sorted layout $L^{\prime} = (16, 4, 8): (1, 16, 64)$, where the sort index is given by $\mathbf{I} = (1, 2, 0)$. Thus, we have $i_{0} = 1$, $i_{1} = 2$, and $i_{2} = 0$.
The shape of the inverse layout is given by
$$
\begin{align}
\mathbf{S}^{-1} &= \left(M_{i_0}, M_{i_1}, \frac{M}{M_{i_0} \cdot M_{i_1}}\right) \\
&= \left(M_{1}, M_{2}, \frac{M}{M_{1} \cdot M_{2}}\right) \\
&= \left(16, 4, \frac{512}{16 \cdot 4}\right) \\
&= (16, 4, 8) \\
\end{align}
$$
The stride of the inverse layout is given by
$$
\begin{align}
\mathbf{D}^{-1} &= \left(\prod_{k=0}^{i_0 - 1} M_{k}, \prod_{k=0}^{i_1 - 1} M_{k}, \prod_{k=0}^{i_{2} - 1} M_{k}\right) \\
&= \left(\prod_{k=0}^{1 - 1} M_{k}, \prod_{k=0}^{2 - 1} M_{k}, \prod_{k=0}^{0 - 1} M_{k}\right) \\
&= \left(\prod_{k=0}^{0} M_{k}, \prod_{k=0}^{1} M_{k}, \prod_{k=0}^{-1} M_{k}\right) \\
&= \left(M_{0}, M_{0} \cdot M_{1}, 1\right) \\
&= (8, 8 \cdot 16, 1) \\
&= (8, 128, 1) \\
\end{align}
$$
Therefore, the inverse layout for the layout $L = (8, 16, 4): (64, 1, 16)$ is $L^{-1} = (16, 4, 8): (8, 128, 1)$. The inverse layout can be further simplified using coalesce, resulting in $L^{-1} = (64, 8) : (8, 1)$.
To compute the inverse layout using CuTe, we can use the right_inverse
function as follows:
1 |
|
We could see that the manually computed right inverse layout and the one computed by CuTe are the same, confirming our inverse layout derivation.
1 | $ ./cute_layout_playground |
References
CuTe Inverse Layout