CuTe Inverse Layout

Introduction

In my previous article “CuTe Index To Coordinate”, I have discussed how to compute coordinate from index for a given layout.

In this article, I would like to go one step further and derive the inverse layout given a layout mathematically, which generalizes the index to coordinate mapping.

CuTe Inverse Layout

Suppose we have a compact layout $L = \mathbf{S}:\mathbf{D}$ where $\mathbf{S} = (M_0, M_1, \ldots, M_\alpha)$ and $\mathbf{D} = (d_0, d_1, \ldots, d_\alpha)$ are the respective shape and stride tuples of the layout, and $M = M_0 \cdot M_1 \cdots M_\alpha$. We want to construct an inverse layout $L^{-1} = \mathbf{S}^{-1}:\mathbf{D}^{-1}$, where $\mathbf{S}^{-1} = (M_0^{-1}, M_1^{-1}, \ldots, M_\alpha^{-1})$ and $\mathbf{D}^{-1} = (d_0^{-1}, d_1^{-1}, \ldots, d_\alpha^{-1})$, such that $f_{L^{-1}}(f_L(x)) = x$ for all $x < \text{size}(L)$.

Because it is a compact layout, we have the sorted layout $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$ where $\mathbf{S}^{\prime} = (M_0^{\prime}, M_1^{\prime}, \ldots, M_\alpha^{\prime})$ and $\mathbf{D}^{\prime} = (d_0^{\prime}, d_1^{\prime}, \ldots, d_\alpha^{\prime})$, and $d_i^{\prime} < d_{i+1}^{\prime}$ for all $i < \alpha$. More specifically, we have $d_{0} = 1$, $d_{1} = M_0^{\prime}$, $d_{2} = M_0^{\prime} \cdot M_1^{\prime}$, $\cdots$, $d_{\alpha} = M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime}$.

For the layout $L$, given an one-dimensional coordinate mapping to $(\alpha + 1)$-dimensional coordinates $x \leftrightarrow (x_{0}, x_{1}, \ldots, x_{\alpha})$, and further maps to the permutate coordinates for the sorted layout $L^{\prime}$, $x \leftrightarrow (x_{0}, x_{1}, \ldots, x_{\alpha}) \leftrightarrow (x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$.

$$
\begin{align}
x &= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$

$$
\begin{align}
f_{L}(x) &= f_{L}(x_{0}, x_{1}, \ldots, x_{\alpha}) \\
&= x_{0} \cdot d_{0} + x_{1} \cdot d_{1} + \cdots + x_{\alpha} \cdot d_{\alpha} \\
&= x_{0}^{\prime} \cdot d_{0}^{\prime} + x_{1}^{\prime} \cdot d_{1}^{\prime} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{\prime} \\
&= x_{0}^{\prime} + x_{1}^{\prime} \cdot M_0^{\prime} + x_{2}^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} + \cdots + x_{\alpha}^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime} \\
\end{align}
$$

Suppose the shape of the inverse layout $L^{-1}$ is given by $\mathbf{S}^{-1} = \left(M_0^{\prime}, M_1^{\prime}, \ldots, M_{\alpha - 1}^{\prime}, \frac{M}{M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime}}\right)$, the one-dimensional coordinate $f_{L}(x)$ mapping to the $(\alpha + 1)$-dimensional coordinates is given by $f_{L}(x) \leftrightarrow (x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$.

We want to construct the inverse layout $L^{-1}$ such that:

$$
\begin{align}
f_{L^{-1}}(f_{L}(x)) &= f_{L^{-1}}(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime}) \\
&= x_{0}^{\prime} \cdot d_{0}^{-1} + x_{1}^{\prime} \cdot d_{1}^{-1} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{-1} \\
&= x \\
&= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$

Because there is an one-to-one mapping between the coordinate before permutation $(x_{0}, x_{1}, \ldots, x_{\alpha})$ and the coordinate after permutation $(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$, it is straightforward to find the one-to-one mapping between $(d_{0}^{-1}, d_{1}^{-1}, \ldots, d_{\alpha}^{-1})$ and $(1, M_0, M_0 \cdot M_1, \ldots, M_0 \cdot M_1 \cdots M_{\alpha-1})$ such that $f_{L^{-1}}(f_{L}(x)) = x$.

Suppose the sort index for the layout $L$ is given by $\mathbf{I} = (i_0, i_1, \ldots, i_\alpha)$, where $\mathbf{I}$ is a permutation index that maps to the set $\{0, 1, \ldots, \alpha\}$, $M_{0}^{\prime} = M_{i_0}$, $M_{1}^{\prime} = M_{i_1}$, $\ldots$, $M_{\alpha}^{\prime} = M_{i_\alpha}$, and $d_{0}^{\prime} = d_{i_0}$, $d_{1}^{\prime} = d_{i_1}$, $\ldots$, $d_{\alpha}^{\prime} = d_{i_\alpha}$.

We have the sorted layout $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$, where $\mathbf{S}^{\prime} = (M_{i_0}, M_{i_1}, \ldots, M_{i_\alpha})$ and $\mathbf{D}^{\prime} = (d_{i_0}, d_{i_1}, \ldots, d_{i_\alpha})$. More specifically, we have $d_{i_0} = 1$, $d_{i_1} = M_{i_0}$, $d_{i_2} = M_{i_0} \cdot M_{i_1}$, $\cdots$, $d_{i_\alpha} = M_{i_0} \cdot M_{i_1} \cdots M_{i_{\alpha-1}}$.

Because

$$
\begin{align}
f_{L^{-1}}(f_{L}(x)) &= f_{L^{-1}}(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime}) \\
&= x_{0}^{\prime} \cdot d_{0}^{-1} + x_{1}^{\prime} \cdot d_{1}^{-1} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{-1} \\
&= x_{i_0} \cdot d_{0}^{-1} + x_{i_1} \cdot d_{1}^{-1} + \cdots + x_{i_\alpha} \cdot d_{\alpha}^{-1} \\
&= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$

We must have $d_{0}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_0 - 1}$, $d_{1}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_1 - 1}$, $\cdots$, $d_{\alpha}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_{\alpha}-1}$. This can also be rewritten as $d_{j}^{-1} = \prod_{k=0}^{i_j - 1} M_{k}$.

Therefore, the inverse layout $L^{-1}$ is given by $L^{-1} = \mathbf{S}^{-1}:\mathbf{D}^{-1}$, where $\mathbf{S}^{-1} = \left(M_{i_0}, M_{i_1}, \ldots, M_{i_{\alpha - 1}}, \frac{M}{M_{i_0} \cdot M_{i_1} \cdots M_{i_{\alpha-1}}}\right)$ and $\mathbf{D}^{-1} = \left(\prod_{k=0}^{i_0 - 1} M_{k}, \prod_{k=0}^{i_1 - 1} M_{k}, \ldots, \prod_{k=0}^{i_{\alpha-1} - 1} M_{k}, \prod_{k=0}^{i_{\alpha} - 1} M_{k}\right)$.

Example

Using the above derived formula for inverse layout, we could compute some inverse layouts manually and compare with the inverse layout computed by CuTe.

Suppose we have a layout $L = (8, 16, 4): (64, 1, 16)$, where $M_{0} = 8$, $M_1 = 16$, $M_2 = 4$, and $M = M_{0} \cdot M_1 \cdot M_2 = 8 \cdot 16 \cdot 4 = 512$. To compute the inverse layout $L^{-1}$, we first compute the sorted layout $L^{\prime} = (16, 4, 8): (1, 16, 64)$, where the sort index is given by $\mathbf{I} = (1, 2, 0)$. Thus, we have $i_{0} = 1$, $i_{1} = 2$, and $i_{2} = 0$.

The shape of the inverse layout is given by

$$
\begin{align}
\mathbf{S}^{-1} &= \left(M_{i_0}, M_{i_1}, \frac{M}{M_{i_0} \cdot M_{i_1}}\right) \\
&= \left(M_{1}, M_{2}, \frac{M}{M_{1} \cdot M_{2}}\right) \\
&= \left(16, 4, \frac{512}{16 \cdot 4}\right) \\
&= (16, 4, 8) \\
\end{align}
$$

The stride of the inverse layout is given by

$$
\begin{align}
\mathbf{D}^{-1} &= \left(\prod_{k=0}^{i_0 - 1} M_{k}, \prod_{k=0}^{i_1 - 1} M_{k}, \prod_{k=0}^{i_{2} - 1} M_{k}\right) \\
&= \left(\prod_{k=0}^{1 - 1} M_{k}, \prod_{k=0}^{2 - 1} M_{k}, \prod_{k=0}^{0 - 1} M_{k}\right) \\
&= \left(\prod_{k=0}^{0} M_{k}, \prod_{k=0}^{1} M_{k}, \prod_{k=0}^{-1} M_{k}\right) \\
&= \left(M_{0}, M_{0} \cdot M_{1}, 1\right) \\
&= (8, 8 \cdot 16, 1) \\
&= (8, 128, 1) \\
\end{align}
$$

Therefore, the inverse layout for the layout $L = (8, 16, 4): (64, 1, 16)$ is $L^{-1} = (16, 4, 8): (8, 128, 1)$. The inverse layout can be further simplified using coalesce, resulting in $L^{-1} = (64, 8) : (8, 1)$.

To compute the inverse layout using CuTe, we can use the right_inverse function as follows:

cute_layout_playground.cu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <cassert>
#include <fstream>
#include <iomanip>
#include <iostream>

#include <cute/layout.hpp>
#include <cute/swizzle.hpp>
#include <cute/tensor.hpp>

int main(int argc, const char** argv)
{
auto const layout = cute::make_layout(
cute::make_shape(cute::Int<8>{}, cute::Int<16>{}, cute::Int<4>{}),
cute::make_stride(cute::Int<64>{}, cute::Int<1>{}, cute::Int<16>{}));

auto const right_inverse_layout_manual = cute::make_layout(
cute::make_shape(cute::Int<16>{}, cute::Int<4>{}, cute::Int<8>{}),
cute::make_stride(cute::Int<8>{}, cute::Int<128>{}, cute::Int<1>{}));

auto const right_inverse_layout_manual_coalesced =
cute::coalesce(right_inverse_layout_manual);

auto const right_inverse_layout_computed = cute::right_inverse(layout);

auto const right_inverse_layout_computed_coalesced =
cute::coalesce(right_inverse_layout_computed);

std::cout << "Original Layout: " << std::endl;
cute::print(layout);
std::cout << std::endl;
std::cout << "Right Inverse Layout (Manual): " << std::endl;
cute::print(right_inverse_layout_manual);
std::cout << std::endl;
std::cout << "Right Inverse Layout (Manual Coalesced): " << std::endl;
cute::print(right_inverse_layout_manual_coalesced);
std::cout << std::endl;
std::cout << "Right Inverse Layout (Computed): " << std::endl;
cute::print(right_inverse_layout_computed);
std::cout << std::endl;
std::cout << "Right Inverse Layout (Computed Coalesced): " << std::endl;
cute::print(right_inverse_layout_computed_coalesced);
std::cout << std::endl;

CUTE_STATIC_ASSERT_V(right_inverse_layout_manual_coalesced ==
right_inverse_layout_computed_coalesced,
"Right inverse layout computed and manual coalesced "
"should be equal");
}

We could see that the manually computed right inverse layout and the one computed by CuTe are the same, confirming our inverse layout derivation.

1
2
3
4
5
6
7
8
9
10
11
$ ./cute_layout_playground
Original Layout:
(_8,_16,_4):(_64,_1,_16)
Right Inverse Layout (Manual):
(_16,_4,_8):(_8,_128,_1)
Right Inverse Layout (Manual Coalesced):
(_64,_8):(_8,_1)
Right Inverse Layout (Computed):
(_64,_8):(_8,_1)
Right Inverse Layout (Computed Coalesced):
(_64,_8):(_8,_1)

References

Author

Lei Mao

Posted on

08-13-2025

Updated on

08-13-2025

Licensed under


Comments