CuTe Index To Coordinate

07-19-202507-19-2025 blog 14 minutes read (About 2040 words) visits

Introduction

In CuTe programming, sometimes we would like to convert an index to a coordinate. This requires an inverse layout function, which is generally not straightforward to derive. However, if we could assume the layout is compact, we can derive the inverse layout function from the coordinate isomorphism.

In this article, I would like to discuss the derivation of the inverse layout function of a compact layout and its analogy to the coordinate isomorphism mathematically.

Prerequisites

To understand how to convert an index to a coordinate in the CuTe, we have to be familiar with the coordinate isomorphism and the layout function, which are two common concepts in the CuTe layout algebra. I have discussed these in my previous article “CuTe Layout Algebra” and I have copied the relevant introductions to these concepts for convenience because they can be read standalone.

Coordinate Isomorphism

Let $\mathbf{S} = (M_0, M_1, \ldots, M_{\alpha})$ and $\mathbf{D} = (d_0, d_1, \ldots, d_{\alpha})$ be the respective shape and stride tuples of $L = \mathbf{S}:\mathbf{D}$. Let $M = M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha}$ be the size of $L$ and let $[0, M) \subset \mathbb{N}$ be the subset of the natural numbers given by ${0, 1, 2, \ldots, M - 1}$. Then we have an isomorphism

$$
\begin{align}
\iota: [0, M) \cong [0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha}) \\
\end{align}
$$

Given any $x \in [0, M)$, the isomorphism $\iota$ maps $x$ to the tuple

$$
\begin{align}
x \mapsto \left(x \mod M_0, \left\lfloor \frac{x}{M_0} \right\rfloor \mod M_1, \ldots, \left\lfloor \frac{x}{M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \mod M_{\alpha}\right)
\end{align}
$$

The isomorphism mapping is bijective. In our case, given any tuple $(x_0, x_1, \ldots, x_{\alpha}) \in [0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha})$, the isomorphism inverse maps the tuple to the integer

$$
\begin{align}
\left(x_0, x_1, \ldots, x_{\alpha}\right) \mapsto x_0 + x_1 \cdot M_0 + x_2 \cdot M_0 \cdot M_1 + \ldots + x_{\alpha} \cdot M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}
\end{align}
$$

It’s straightforward to verify the above isomorphism mapping is valid and proof the above isomorphism mapping is bijective (by contradiction).

One could imagine the isomorphism as a mapping between a one-dimensional coordinate and a multi-dimensional coordinate.

Layout Function

Given a layout $L$, its layout function is the function $f_L: [0, M) \to \mathbb{N}$ is defined to be the composite

$$
\begin{align}
[0, M) \cong [0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha}) \subset \mathbb{N}^{\times (\alpha + 1)} \xrightarrow{\cdot d_0, \cdot d_1, \ldots, \cdot d_{\alpha}} \mathbb{N}^{\times (\alpha + 1)} \xrightarrow{+} \mathbb{N}
\end{align}
$$

In other words, $f_L$ is the composition of the multilinear function

$$
\begin{align}
[0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha}) \to \mathbb{N} \\
(x_0, x_1, \ldots, x_{\alpha}) \mapsto x_0 \cdot d_0 + x_1 \cdot d_1 + \ldots + x_{\alpha} \cdot d_{\alpha}
\end{align}
$$

determined by the stride, with the isomorphism $\iota$, determined by the shape.

Computing the value of a layout function $f_L$ at a point $x \in [0, M)$ can be decomposed into computing the sum of the values of the layout function at multiple points. This is sometimes useful for computing the value of the layout function at a point handily.

Given a layout $L = (M_0, M_1, \ldots, M_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ and $x \in [0, M)$,

$$
\begin{align}
x \mapsto \left(x_0, x_1, \ldots, x_{\alpha}\right) \mapsto x_0 \cdot d_0 + x_1 \cdot d_1 + \ldots + x_{\alpha} \cdot d_{\alpha}
\end{align}
$$

We also have

$$
\begin{align}
x^{\prime}_{0} &\mapsto \left(x_0, 0, 0, \ldots, 0\right) \mapsto x_0 \cdot d_0 \\
x^{\prime}_{1} &\mapsto \left(0, x_1, 0, \ldots, 0\right) \mapsto x_1 \cdot d_1 \\
&\vdots \\
x^{\prime}_{\alpha} &\mapsto \left(0, 0, 0, \ldots, x_{\alpha}\right) \mapsto x_{\alpha} \cdot d_{\alpha}
\end{align}
$$

Therefore, we have

$$
\begin{align}
f_L(x) = f_L(x^{\prime}_{0}) + f_L(x^{\prime}_{1}) + \ldots + f_L(x^{\prime}_{\alpha})
\end{align}
$$

where

$$
\begin{align}
x^{\prime}_{0} &= x \mod M_0 \\
x^{\prime}_{1} &= \left\lfloor \frac{x}{M_0} \right\rfloor \mod M_1 \cdot M_0 \\
&\vdots \\
x^{\prime}_{\alpha} &= \left\lfloor \frac{x}{M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \mod M_{\alpha} \cdot M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}
\end{align}
$$

For example, given a layout $L = (3, 2):(2, 3)$ and $x = 5$, we have

$$
\begin{align}
f_L(5) &= f_L(5 \mod 3) + f_L\left(\left\lfloor \frac{5}{3} \right\rfloor \mod 2 \cdot 3\right) \\
&= f_L(2) + f_L(3) \\
&= 2 \cdot 2 + \left\lfloor \frac{3}{3} \right\rfloor \cdot 3 \\
&= 4 + 3 \\
&= 7
\end{align}
$$

Inverse Layout Function

The layout function essentially computes a dot product of the coordinate tuple with the stride tuple. The inverse of dot product, i.e., computing the coordinate tuple using the stride tuple and the value of layout function, is not straightforward. However, if we could assume the layout function is “compact” or injective, we can compute the coordinate tuple using the value of layout function and the stride tuple. In this situation, computing coordinate tuple from index is somewhat similar to computing multi-dimensional coordinate from one-dimensional coordinate in the coordinate isomorphism.

Naive Algorithm and Implementation

Let $\mathbf{S} = (M_0, M_1, \ldots, M_{\alpha})$ and $\mathbf{D} = (d_0, d_1, \ldots, d_{\alpha})$ be the respective shape and stride tuples of the layout $L = \mathbf{S}:\mathbf{D}$. Given its coordinate tuple $\mathbf{x} = (x_0, x_1, \ldots, x_{\alpha})$, we have the index mapping $f_L{(\mathbf{x})} = f_L(x_0, x_1, \ldots, x_{\alpha}) = x_0 \cdot d_0 + x_1 \cdot d_1 + \ldots + x_{\alpha} \cdot d_{\alpha}$. We could sort the layout $L$ by the stride tuple $\mathbf{D}$ so that $\mathbf{S}^{\prime} = (M_0^{\prime}, M_1^{\prime}, \ldots, M_{\alpha}^{\prime})$ and $\mathbf{D}^{\prime} = (d_0^{\prime}, d_1^{\prime}, \ldots, d_{\alpha}^{\prime})$ be the respective shape and stride tuples of the sorted $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$, where $d_0^{\prime} < d_1^{\prime} < \ldots < d_{\alpha}^{\prime}$. The permuted index has to be stored so that we could map the permuted tuples back to the original ones. The permuted coordinate tuple becomes $\mathbf{x}^{\prime} = (x_0^{\prime}, x_1^{\prime}, \ldots, x_{\alpha}^{\prime})$. The index mapping value, however, remains the same, i.e., $f_{L^{\prime}}{(\mathbf{x}^{\prime})} = f_{L^{\prime}}{(x_0^{\prime}, x_1^{\prime}, \ldots, x_{\alpha}^{\prime})} = x_0^{\prime} \cdot d_0^{\prime} + x_1^{\prime} \cdot d_1^{\prime} + \ldots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{\prime} = f_L{(\mathbf{x})}$.

Assuming the layout $L$ is compact, we must have $d_0^{\prime} = 1$, $d_1 = M_0^{\prime}$, $d_2 = M_0^{\prime} \cdot M_1^{\prime}$, $\ldots$, $d_{\alpha} = M_0^{\prime} \cdot M_1^{\prime} \cdot \ldots \cdot M_{\alpha - 1}^{\prime}$. Therefore, the index mapping can be rewritten as

$$
\begin{align}
f_L{(\mathbf{x})}
&= x_0^{\prime} \cdot d_0^{\prime} + x_1^{\prime} \cdot d_1^{\prime} + x_2^{\prime} \cdot d_2^{\prime} + \ldots + x_{\alpha}^{\prime} \cdot d_{\alpha}^{\prime} \\
&= x_0^{\prime} + x_1^{\prime} \cdot M_0^{\prime} + x_2^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} + \ldots + x_{\alpha}^{\prime} \cdot M_0^{\prime} \cdot M_1^{\prime} \cdot \ldots \cdot M_{\alpha - 1}^{\prime}
\end{align}
$$

This is exactly the same as the coordinate isomorphism we described earlier. Given an index $f_L{(\mathbf{x})}$ and the sorted stride tuple $\mathbf{D}^{\prime} = (d_0^{\prime}, d_1^{\prime}, \ldots, d_{\alpha}^{\prime})$, we can compute the coordinate tuple $\mathbf{x}^{\prime} = (x_0^{\prime}, x_1^{\prime}, \ldots, x_{\alpha}^{\prime})$ as follows:

Once we have computed the permuted coordinate tuple $\mathbf{x}^{\prime}$, because we have stored the sorted index, we can map the permuted coordinate tuple back to the original coordinate tuple $\mathbf{x} = (x_0, x_1, \ldots, x_{\alpha})$ using the sorted index.

An Improved Implementation

The inverse layout function implemented in CuTe, idx2crd, actually requires no sorting of the layout.

/** idx2crd(i,s,d) splits an index into a coordinate within <Shape,Stride>.
 *
 * This is computed as follows:
 *  [index, shape, and stride are all integers => determine 1D coord]
 * op(i, s, d)             => (i / d) % s
 *  [index is integer, shape and stride are tuple => determine component for each mode]
 * op(i, (s,S), (d,D))     => (op(i, s, d), op(i, S, D)...)
 *  [index, shape, and stride are all tuples => consider each mode independently]
 * op((i,I), (s,S), (d,D)) => (op(i, s, d), op((I), (S), (D)))
 *
 * NOTE: This only works for compact shape+stride layouts. A more general version would
 *       apply to all surjective layouts
 */
template <class Index, class Shape, class Stride>
CUTE_HOST_DEVICE constexpr
auto
idx2crd(Index  const& idx,
        Shape  const& shape,
        Stride const& stride)
{
  if constexpr (is_tuple<Index>::value) {
    if constexpr (is_tuple<Shape>::value) {      // tuple tuple tuple
      static_assert(tuple_size<Index>::value == tuple_size< Shape>::value, "Mismatched Ranks");
      static_assert(tuple_size<Index>::value == tuple_size<Stride>::value, "Mismatched Ranks");
      return transform(idx, shape, stride, [](auto const& i, auto const& s, auto const& d){ return idx2crd(i,s,d); });
    } else {                                     // tuple "int" "int"
      static_assert(sizeof(Index) == 0, "Invalid parameters");
    }
  } else {
    if constexpr (is_tuple<Shape>::value) {
      if constexpr (is_tuple<Stride>::value) {   // "int" tuple tuple
        static_assert(tuple_size<Shape>::value == tuple_size<Stride>::value, "Mismatched Ranks");
        return transform(shape, stride, [&](auto const& s, auto const& d){ return idx2crd(idx,s,d); });
      } else {                                   // "int" tuple "int"
        return transform(shape, compact_col_major(shape, stride), [&](auto const& s, auto const& d){ return idx2crd(idx,s,d); });
      }
    } else {                                     // "int" "int" "int"
      if constexpr (is_constant<1, Shape>::value) {
        // Skip potential stride-0 division
        return Int<0>{};
      } else {
        return (idx / stride) % shape;
      }
    }
  }

  CUTE_GCC_UNREACHABLE;
}

/** idx2crd(i,s) splits an index into a coordinate within Shape
 * via a colexicographical enumeration of coordinates in Shape.
 * c0 = (idx / 1) % s0
 * c1 = (idx / s0) % s1
 * c2 = (idx / (s0 * s1)) % s2
 * ...
 */
template <class Index, class Shape>
CUTE_HOST_DEVICE constexpr
auto
idx2crd(Index const& idx,
        Shape const& shape)
{
  if constexpr (is_tuple<Index>::value) {
    if constexpr (is_tuple<Shape>::value) {      // tuple tuple
      static_assert(tuple_size<Index>::value == tuple_size<Shape>::value, "Mismatched Ranks");
      return transform(idx, shape, [](auto const& i, auto const& s) { return idx2crd(i,s); });
    } else {                                     // tuple "int"
      static_assert(sizeof(Index) == 0, "Invalid parameters");
    }
  } else {
    if constexpr (is_tuple<Shape>::value) {      // "int" tuple
      return transform_leaf(as_arithmetic_tuple(crd2idx(idx, shape, make_basis_like(shape))), identity{});
    } else {                                     // "int" "int"
      return idx;
    }
  }

  CUTE_GCC_UNREACHABLE;
}

We notice that the permuted coordinate tuple formula could be rewritten in a way that requires no sorting and product accumulation of the shape tuple, because the product accumulation of the shape tuple is already stored in the stride tuple. The permuted coordinate tuple can be computed as follows:

$$
\begin{align}
\mathbf{x}^{\prime}
&= (x_0^{\prime}, x_1^{\prime}, \ldots, x_{\alpha}^{\prime}) \\
&= \left(f_L{(\mathbf{x})} \mod M_0^{\prime}, \left\lfloor \frac{f_L{(\mathbf{x})}}{M_0^{\prime}} \right\rfloor \mod M_1^{\prime}, \ldots, \left\lfloor \frac{f_L{(\mathbf{x})}}{M_0^{\prime} \cdot M_1^{\prime} \cdot \ldots \cdot M_{\alpha - 1}^{\prime}} \right\rfloor \mod M_{\alpha}^{\prime}\right) \\
&= \left(\left\lfloor \frac{f_L{(\mathbf{x})}}{d_0^{\prime}} \right\rfloor \mod M_0^{\prime}, \left\lfloor \frac{f_L{(\mathbf{x})}}{d_1^{\prime}} \right\rfloor \mod M_1^{\prime}, \ldots, \left\lfloor \frac{f_L{(\mathbf{x})}}{d_{\alpha}^{\prime}} \right\rfloor \mod M_{\alpha}^{\prime}\right) \\
\end{align}
$$

Because of this, there is no need to sort the layout just in order to compute the product accumulation of the shape tuple. The original coordinate tuple can be computed as follows:

$$
\begin{align}
\mathbf{x} &= (x_0, x_1, \ldots, x_{\alpha}) \\
&= \left(\left\lfloor \frac{f_L{(\mathbf{x})}}{d_0} \right\rfloor \mod M_0, \left\lfloor \frac{f_L{(\mathbf{x})}}{d_1} \right\rfloor \mod M_1, \ldots, \left\lfloor \frac{f_L{(\mathbf{x})}}{d_{\alpha}} \right\rfloor \mod M_{\alpha}\right) \\
\end{align}
$$

Conclusions

The inverse layout function of a compact layout can be derived from the coordinate isomorphism.

References

CuTe Layout Algebra

CuTe Index To Coordinate

https://leimao.github.io/blog/CuTe-Index-To-Coordinate/

Author

Lei Mao

Posted on

07-19-2025

Updated on

07-19-2025

Licensed under

Mathematics,

Accelerated Computing,

CUDA,

CUTLASS,

CuTe

CuTe Index To Coordinate

Introduction

Prerequisites

Coordinate Isomorphism

Layout Function

Inverse Layout Function

Naive Algorithm and Implementation

An Improved Implementation

Conclusions

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue