CuTe Inverse Layout

08-13-202508-13-2025 blog 9 minutes read (About 1390 words) visits

Introduction

In my previous article “CuTe Index To Coordinate”, I have discussed how to compute coordinate from index for a given layout.

In this article, I would like to go one step further and derive the inverse layout given a layout mathematically, which generalizes the index to coordinate mapping.

CuTe Inverse Layout

Suppose we have a compact layout $L = \mathbf{S}:\mathbf{D}$ where $\mathbf{S} = (M_0, M_1, \ldots, M_\alpha)$ and $\mathbf{D} = (d_0, d_1, \ldots, d_\alpha)$ are the respective shape and stride tuples of the layout, and $M = M_0 \cdot M_1 \cdots M_\alpha$. We want to construct an inverse layout $L^{-1} = \mathbf{S}^{-1}:\mathbf{D}^{-1}$, where $\mathbf{S}^{-1} = (M_0^{-1}, M_1^{-1}, \ldots, M_\alpha^{-1})$ and $\mathbf{D}^{-1} = (d_0^{-1}, d_1^{-1}, \ldots, d_\alpha^{-1})$, such that $f_{L^{-1}}(f_L(x)) = x$ for all $x < \text{size}(L)$.

Because it is a compact layout, we have the sorted layout $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$ where $\mathbf{S}^{\prime} = (M_0^{\prime}, M_1^{\prime}, \ldots, M_\alpha^{\prime})$ and $\mathbf{D}^{\prime} = (d_0^{\prime}, d_1^{\prime}, \ldots, d_\alpha^{\prime})$, and $d_i^{\prime} < d_{i+1}^{\prime}$ for all $i < \alpha$. More specifically, we have $d_{0} = 1$, $d_{1} = M_0^{\prime}$, $d_{2} = M_0^{\prime} \cdot M_1^{\prime}$, $\cdots$, $d_{\alpha} = M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime}$.

For the layout $L$, given an one-dimensional coordinate mapping to $(\alpha + 1)$-dimensional coordinates $x \leftrightarrow (x_{0}, x_{1}, \ldots, x_{\alpha})$, and further maps to the permutate coordinates for the sorted layout $L^{\prime}$, $x \leftrightarrow (x_{0}, x_{1}, \ldots, x_{\alpha}) \leftrightarrow (x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$.

$$
\begin{align}
x &= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$

$$
\begin{align}
f_{L}(x) &= f_{L}(x_{0}, x_{1}, \ldots, x_{\alpha}) \\
&= x_{0} \cdot d_{0} + x_{1} \cdot d_{1} + \cdots + x_{\alpha} \cdot d_{\alpha} \\
&= x_{0}^{\prime} \cdot d_{0}^{\prime} + x_{1}^{\prime} \cdot d_{1}^{\prime} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{\prime} \\
&= x_{0}^{\prime} + x_{1}^{\prime} \cdot M_0^{\prime} + x_{2}^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} + \cdots + x_{\alpha}^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime} \\
\end{align}
$$

Suppose the shape of the inverse layout $L^{-1}$ is given by $\mathbf{S}^{-1} = \left(M_0^{\prime}, M_1^{\prime}, \ldots, M_{\alpha - 1}^{\prime}, \frac{M}{M_0^{\prime} \cdot M_1^{\prime} \cdots M_{\alpha-1}^{\prime}}\right)$, the one-dimensional coordinate $f_{L}(x)$ mapping to the $(\alpha + 1)$-dimensional coordinates is given by $f_{L}(x) \leftrightarrow (x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$.

We want to construct the inverse layout $L^{-1}$ such that:

$$
\begin{align}
f_{L^{-1}}(f_{L}(x)) &= f_{L^{-1}}(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime}) \\
&= x_{0}^{\prime} \cdot d_{0}^{-1} + x_{1}^{\prime} \cdot d_{1}^{-1} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{-1} \\
&= x \\
&= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$

Because there is an one-to-one mapping between the coordinate before permutation $(x_{0}, x_{1}, \ldots, x_{\alpha})$ and the coordinate after permutation $(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime})$, it is straightforward to find the one-to-one mapping between $(d_{0}^{-1}, d_{1}^{-1}, \ldots, d_{\alpha}^{-1})$ and $(1, M_0, M_0 \cdot M_1, \ldots, M_0 \cdot M_1 \cdots M_{\alpha-1})$ such that $f_{L^{-1}}(f_{L}(x)) = x$.

Suppose the sort index for the layout $L$ is given by $\mathbf{I} = (i_0, i_1, \ldots, i_\alpha)$, where $\mathbf{I}$ is a permutation index that maps to the set $\{0, 1, \ldots, \alpha\}$, $M_{0}^{\prime} = M_{i_0}$, $M_{1}^{\prime} = M_{i_1}$, $\ldots$, $M_{\alpha}^{\prime} = M_{i_\alpha}$, and $d_{0}^{\prime} = d_{i_0}$, $d_{1}^{\prime} = d_{i_1}$, $\ldots$, $d_{\alpha}^{\prime} = d_{i_\alpha}$.

We have the sorted layout $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$, where $\mathbf{S}^{\prime} = (M_{i_0}, M_{i_1}, \ldots, M_{i_\alpha})$ and $\mathbf{D}^{\prime} = (d_{i_0}, d_{i_1}, \ldots, d_{i_\alpha})$. More specifically, we have $d_{i_0} = 1$, $d_{i_1} = M_{i_0}$, $d_{i_2} = M_{i_0} \cdot M_{i_1}$, $\cdots$, $d_{i_\alpha} = M_{i_0} \cdot M_{i_1} \cdots M_{i_{\alpha-1}}$.

Because

$$
\begin{align}
f_{L^{-1}}(f_{L}(x)) &= f_{L^{-1}}(x_{0}^{\prime}, x_{1}^{\prime}, \ldots, x_{\alpha}^{\prime}) \\
&= x_{0}^{\prime} \cdot d_{0}^{-1} + x_{1}^{\prime} \cdot d_{1}^{-1} + \cdots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{-1} \\
&= x_{i_0} \cdot d_{0}^{-1} + x_{i_1} \cdot d_{1}^{-1} + \cdots + x_{i_\alpha} \cdot d_{\alpha}^{-1} \\
&= x_{0} + x_{1} \cdot M_0 + x_{2} \cdot M_0 \cdot M_1 + \cdots + x_{\alpha} \cdot M_0 \cdot M_1 \cdots M_{\alpha-1} \\
\end{align}
$$

We must have $d_{0}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_0 - 1}$, $d_{1}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_1 - 1}$, $\cdots$, $d_{\alpha}^{-1} = M_{0} \cdots M_{1} \cdots M_{i_{\alpha}-1}$. This can also be rewritten as $d_{j}^{-1} = \prod_{k=0}^{i_j - 1} M_{k}$.

Therefore, the inverse layout $L^{-1}$ is given by $L^{-1} = \mathbf{S}^{-1}:\mathbf{D}^{-1}$, where $\mathbf{S}^{-1} = \left(M_{i_0}, M_{i_1}, \ldots, M_{i_{\alpha - 1}}, \frac{M}{M_{i_0} \cdot M_{i_1} \cdots M_{i_{\alpha-1}}}\right)$ and $\mathbf{D}^{-1} = \left(\prod_{k=0}^{i_0 - 1} M_{k}, \prod_{k=0}^{i_1 - 1} M_{k}, \ldots, \prod_{k=0}^{i_{\alpha-1} - 1} M_{k}, \prod_{k=0}^{i_{\alpha} - 1} M_{k}\right)$.

Example

Using the above derived formula for inverse layout, we could compute some inverse layouts manually and compare with the inverse layout computed by CuTe.

Suppose we have a layout $L = (8, 16, 4): (64, 1, 16)$, where $M_{0} = 8$, $M_1 = 16$, $M_2 = 4$, and $M = M_{0} \cdot M_1 \cdot M_2 = 8 \cdot 16 \cdot 4 = 512$. To compute the inverse layout $L^{-1}$, we first compute the sorted layout $L^{\prime} = (16, 4, 8): (1, 16, 64)$, where the sort index is given by $\mathbf{I} = (1, 2, 0)$. Thus, we have $i_{0} = 1$, $i_{1} = 2$, and $i_{2} = 0$.

The shape of the inverse layout is given by

$$
\begin{align}
\mathbf{S}^{-1} &= \left(M_{i_0}, M_{i_1}, \frac{M}{M_{i_0} \cdot M_{i_1}}\right) \\
&= \left(M_{1}, M_{2}, \frac{M}{M_{1} \cdot M_{2}}\right) \\
&= \left(16, 4, \frac{512}{16 \cdot 4}\right) \\
&= (16, 4, 8) \\
\end{align}
$$

The stride of the inverse layout is given by

$$
\begin{align}
\mathbf{D}^{-1} &= \left(\prod_{k=0}^{i_0 - 1} M_{k}, \prod_{k=0}^{i_1 - 1} M_{k}, \prod_{k=0}^{i_{2} - 1} M_{k}\right) \\
&= \left(\prod_{k=0}^{1 - 1} M_{k}, \prod_{k=0}^{2 - 1} M_{k}, \prod_{k=0}^{0 - 1} M_{k}\right) \\
&= \left(\prod_{k=0}^{0} M_{k}, \prod_{k=0}^{1} M_{k}, \prod_{k=0}^{-1} M_{k}\right) \\
&= \left(M_{0}, M_{0} \cdot M_{1}, 1\right) \\
&= (8, 8 \cdot 16, 1) \\
&= (8, 128, 1) \\
\end{align}
$$

Therefore, the inverse layout for the layout $L = (8, 16, 4): (64, 1, 16)$ is $L^{-1} = (16, 4, 8): (8, 128, 1)$. The inverse layout can be further simplified using coalesce, resulting in $L^{-1} = (64, 8) : (8, 1)$.

To compute the inverse layout using CuTe, we can use the right_inverse function as follows:

cute_layout_playground.cu

#include <cassert>
#include <fstream>
#include <iomanip>
#include <iostream>

#include <cute/layout.hpp>
#include <cute/swizzle.hpp>
#include <cute/tensor.hpp>

int main(int argc, const char** argv)
{
    auto const layout = cute::make_layout(
        cute::make_shape(cute::Int<8>{}, cute::Int<16>{}, cute::Int<4>{}),
        cute::make_stride(cute::Int<64>{}, cute::Int<1>{}, cute::Int<16>{}));

    auto const right_inverse_layout_manual = cute::make_layout(
        cute::make_shape(cute::Int<16>{}, cute::Int<4>{}, cute::Int<8>{}),
        cute::make_stride(cute::Int<8>{}, cute::Int<128>{}, cute::Int<1>{}));

    auto const right_inverse_layout_manual_coalesced =
        cute::coalesce(right_inverse_layout_manual);

    auto const right_inverse_layout_computed = cute::right_inverse(layout);

    auto const right_inverse_layout_computed_coalesced =
        cute::coalesce(right_inverse_layout_computed);

    std::cout << "Original Layout: " << std::endl;
    cute::print(layout);
    std::cout << std::endl;
    std::cout << "Right Inverse Layout (Manual): " << std::endl;
    cute::print(right_inverse_layout_manual);
    std::cout << std::endl;
    std::cout << "Right Inverse Layout (Manual Coalesced): " << std::endl;
    cute::print(right_inverse_layout_manual_coalesced);
    std::cout << std::endl;
    std::cout << "Right Inverse Layout (Computed): " << std::endl;
    cute::print(right_inverse_layout_computed);
    std::cout << std::endl;
    std::cout << "Right Inverse Layout (Computed Coalesced): " << std::endl;
    cute::print(right_inverse_layout_computed_coalesced);
    std::cout << std::endl;

    CUTE_STATIC_ASSERT_V(right_inverse_layout_manual_coalesced ==
                             right_inverse_layout_computed_coalesced,
                         "Right inverse layout computed and manual coalesced "
                         "should be equal");
}

We could see that the manually computed right inverse layout and the one computed by CuTe are the same, confirming our inverse layout derivation.

$ ./cute_layout_playground
Original Layout:
(_8,_16,_4):(_64,_1,_16)
Right Inverse Layout (Manual):
(_16,_4,_8):(_8,_128,_1)
Right Inverse Layout (Manual Coalesced):
(_64,_8):(_8,_1)
Right Inverse Layout (Computed):
(_64,_8):(_8,_1)
Right Inverse Layout (Computed Coalesced):
(_64,_8):(_8,_1)

References

CuTe Inverse Layout

https://leimao.github.io/blog/CuTe-Inverse-Layout/

Author

Lei Mao

Posted on

08-13-2025

Updated on

08-13-2025

Licensed under

Mathematics,

Accelerated Computing,

CUDA,

CUTLASS,

CuTe

CuTe Inverse Layout

Introduction

CuTe Inverse Layout

Example

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue