CuTe Layout Algebra

10-20-202407-14-2025 article 2 hours read (About 19874 words) visits

Introduction

CuTe layout algebra is extremely important for understanding and applying CUTLASS for accelerated computing. Despite the fact that CuTe has a documentation for its layout algebra, it cannot be understood completely without first understanding its mathematical foundations. I tried to create some proofs for the CuTe layout algebra on my own and realized that it was a huge amount of work. Gratefully, Jay Shah has created a paper “A Note on the Algebra of CuTe Layouts” that completes the CuTe layout algebra mathematical foundations that I wanted to create.

As my proofreading, I found Jay Shah’s paper mostly error-free, except for a few very minor oversights and typos. However, it does skip some details without which the paper is a little bit hard to understand. In this article, based on Jay Shah’s paper, I would like to provide more proofs and explanations of the CuTe layout algebra, some of which are not present in Jay Shah’s paper. Most of the definitions and annotations will follow Jay Shah’s paper.

This article can be read as a complement to Jay Shah’s paper, but it’s also completely standalone for understanding the CuTe layout algebra.

Layout Algebra Preliminaries

Definition 2.1: Layout

A layout $L$ is a pair of positive integer tuples $\mathbf{S}$ and $\mathbf{D}$ of matching dimensions. We call $\mathbf{S}$ the shape and $\mathbf{D}$ the stride. We write $L = \mathbf{S}:\mathbf{D}$.

A flattened layout means that there is no internal parentheses in the shape and stride. For example, $L = (5, 2, 2):(16, 80, 4)$ is a flattened layout, whereas $L = (5, (2, 2)):(16, (80, 4))$ is not. Flattening a layout will not change the semantics and operations of the layout.

Definition 2.2: Layout Size, Length, and Mode

Let $\alpha \geq 0$ be an integer and $L = \mathbf{S}:\mathbf{D} = (M_0, M_1, \ldots, M_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ be a layout. Then:

The size of $L$ is the product $M = M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha}$.
The length of $L$ is the integer $\alpha + 1$.
A mode of $L$ is one of the entries $(M_k):(d_k)$ for $0 \leq k \leq \alpha$. We may regard this as a length 1 layout.

Concatenation

Given two layouts $L = \mathbf{S}:\mathbf{D}$ and $L^{\prime} = \mathbf{S}^{\prime}:\mathbf{D}^{\prime}$, let $\mathbf{S}^{\prime\prime}$ and $\mathbf{D}^{\prime\prime}$ be the shape and stride tuples given by (the flattening of) $(\mathbf{S}, \mathbf{S}^{\prime})$ and $(\mathbf{D}, \mathbf{D}^{\prime})$ respectively. Then the concatenation of $L$ and $L^{\prime}$ is given by the layout

$$
(L, L^{\prime}) = \mathbf{S}^{\prime\prime}:\mathbf{D}^{\prime\prime}
$$

and we say that $(L, L^{\prime})$ is decomposed by $L$ and $L^{\prime}$.

Inductively, given layouts $L_0, L_1, \ldots, L_N$, we can then form the concatenation $(L_0, L_1, \ldots, L_N)$. Conversely, given $L$ a layout, $L$ is maximally decomposed by its modes.

Isomorphism

Let $\mathbf{S} = (M_0, M_1, \ldots, M_{\alpha})$ and $\mathbf{D} = (d_0, d_1, \ldots, d_{\alpha})$ be the respective shape and stride tuples of $L = \mathbf{S}:\mathbf{D}$. Let $M = M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha}$ be the size of $L$ and let $[0, M) \subset \mathbb{N}$ be the subset of the natural numbers given by ${0, 1, 2, \ldots, M - 1}$. Then we have an isomorphism

$$
\begin{align}
\iota: [0, M) \cong [0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha}) \\
\end{align}
$$

Given any $x \in [0, M)$, the isomorphism $\iota$ maps $x$ to the tuple

$$
\begin{align}
x \mapsto \left(x \mod M_0, \left\lfloor \frac{x}{M_0} \right\rfloor \mod M_1, \ldots, \left\lfloor \frac{x}{M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \mod M_{\alpha}\right)
\end{align}
$$

The isomorphism mapping is bijective. In our case, given any tuple $(x_0, x_1, \ldots, x_{\alpha}) \in [0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha})$, the isomorphism inverse maps the tuple to the integer

$$
\begin{align}
\left(x_0, x_1, \ldots, x_{\alpha}\right) \mapsto x_0 + x_1 \cdot M_0 + x_2 \cdot M_0 \cdot M_1 + \ldots + x_{\alpha} \cdot M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}
\end{align}
$$

It’s straightforward to verify the above isomorphism mapping is valid and proof the above isomorphism mapping is bijective (by contradiction).

One could imagine the isomorphism as a mapping between a one-dimensional coordinate and a multi-dimensional coordinate.

Definition 2.3: Layout Function

Given a layout $L$, its layout function is the function $f_L: [0, M) \to \mathbb{N}$ is defined to be the composite

$$
\begin{align}
[0, M) \cong [0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha}) \subset \mathbb{N}^{\times (\alpha + 1)} \xrightarrow{\cdot d_0, \cdot d_1, \ldots, \cdot d_{\alpha}} \mathbb{N}^{\times (\alpha + 1)} \xrightarrow{+} \mathbb{N}
\end{align}
$$

In other words, $f_L$ is the composition of the multilinear function

$$
\begin{align}
[0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha}) \to \mathbb{N} \\
(x_0, x_1, \ldots, x_{\alpha}) \mapsto x_0 \cdot d_0 + x_1 \cdot d_1 + \ldots + x_{\alpha} \cdot d_{\alpha}
\end{align}
$$

determined by the stride, with the isomorphism $\iota$, determined by the shape.

Computing the value of a layout function $f_L$ at a point $x \in [0, M)$ can be decomposed into computing the sum of the values of the layout function at multiple points. This is sometimes useful for computing the value of the layout function at a point handily.

Given a layout $L = (M_0, M_1, \ldots, M_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ and $x \in [0, M)$,

$$
\begin{align}
x \mapsto \left(x_0, x_1, \ldots, x_{\alpha}\right) \mapsto x_0 \cdot d_0 + x_1 \cdot d_1 + \ldots + x_{\alpha} \cdot d_{\alpha}
\end{align}
$$

We also have

$$
\begin{align}
x^{\prime}_{0} &\mapsto \left(x_0, 0, 0, \ldots, 0\right) \mapsto x_0 \cdot d_0 \\
x^{\prime}_{1} &\mapsto \left(0, x_1, 0, \ldots, 0\right) \mapsto x_1 \cdot d_1 \\
&\vdots \\
x^{\prime}_{\alpha} &\mapsto \left(0, 0, 0, \ldots, x_{\alpha}\right) \mapsto x_{\alpha} \cdot d_{\alpha}
\end{align}
$$

Therefore, we have

$$
\begin{align}
f_L(x) = f_L(x^{\prime}_{0}) + f_L(x^{\prime}_{1}) + \ldots + f_L(x^{\prime}_{\alpha})
\end{align}
$$

where

$$
\begin{align}
x^{\prime}_{0} &= x \mod M_0 \\
x^{\prime}_{1} &= \left\lfloor \frac{x}{M_0} \right\rfloor \mod M_1 \cdot M_0 \\
&\vdots \\
x^{\prime}_{\alpha} &= \left\lfloor \frac{x}{M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \mod M_{\alpha} \cdot M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}
\end{align}
$$

For example, given a layout $L = (3, 2):(2, 3)$ and $x = 5$, we have

$$
\begin{align}
f_L(5) &= f_L(5 \mod 3) + f_L\left(\left\lfloor \frac{5}{3} \right\rfloor \mod 2 \cdot 3\right) \\
&= f_L(2) + f_L(3) \\
&= 2 \cdot 2 + \left\lfloor \frac{3}{3} \right\rfloor \cdot 3 \\
&= 4 + 3 \\
&= 7
\end{align}
$$

Extension of Layout Function

Based on the definition of layout function, the extension of the layout function $f_L$ is the function, $\widehat{f}_L: \mathbb{N} \to \mathbb{N}$, defined by replacing $M_{\alpha}$ with $\infty$ in the definition of $f_L$, i.e., the composite

$$
\begin{align}
\mathbb{N} \cong [0, M_0) \times [0, M_1) \times \ldots \times [0, M_{\alpha - 1}) \times \mathbb{N} \subset \mathbb{N}^{\times (\alpha + 1)} \xrightarrow{\cdot d_0, \cdot d_1, \ldots, \cdot d_{\alpha}} \mathbb{N}^{\times (\alpha + 1)} \xrightarrow{+} \mathbb{N}
\end{align}
$$

where the extension of the isomorphism $\iota$, $\widehat{\iota}$, is given by

$$
\begin{align}
x \mapsto \left(x \mod M_0, \left\lfloor \frac{x}{M_0} \right\rfloor \mod M_1, \ldots, \left\lfloor \frac{x}{M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 2}} \right\rfloor \mod M_{\alpha - 1}, \left\lfloor \frac{x}{M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor\right)
\end{align}
$$

The extension of the isomorphism mapping is also bijective. The inverse mapping of the extension of the isomorphism is also given by

$$
\begin{align}
\left(x_0, x_1, \ldots, x_{\alpha - 1}, x_{\alpha}\right) \mapsto x_0 + x_1 \cdot M_0 + x_2 \cdot M_0 \cdot M_1 + \ldots + x_{\alpha} \cdot M_0 \cdot M_1 \cdot \ldots \cdot M_{\alpha - 1}
\end{align}
$$

One could imagine the extension of the isomorphism defines the last dimension of the shape to be a “batch” dimension and the batch size can be infinite.

Coalescence

Coalescence simplifies the layout and does not change the layout function.

Coalescence Rules

Considering a layout with just two integral modes, $A = (N_{0}, N_{1}):(d_{0}, d_{1})$, we have four cases to consider:

$N_{1} = 1$.
$N_{0} = 1$.
$d_{1} = N_{0} d_{0}$.
Anything else.

In the first case, obviously, $A = (N_{0}, 1):(d_{0}, d_{1}) = (N_{0}):(d_{0})$. This can be further flattened to $A = N_{0}:d_{0}$.

In the second case, also obviously, $A = (1, N_{1}):(d_{0}, d_{1}) = (N_{1}):(d_{1})$. This can be further flattened to $A = N_{1}:d_{1}$.

In the third case, we have $A = (N_{0}, N_{1}):(d_{0}, N_{0} d_{0}) = (N_{0} N_{1}):(d_{0})$. This can be further flattened to $A = N_{0} N_{1}:d_{0}$.

In the fourth case, we could do nothing and $A$ remains the same.

There is one case that can often be misunderstood, that is $d_{0} = N_{1} d_{1}$. In this case, we have $A = (N_{0}, N_{1}):(N_{1} d_{1}, d_{1})$. At first glance, it seems that we could coalesce $A$ to $(N_{0} N_{1}):(d_{1})$. However, this is not correct, because it changes the layout function.

Considering a layout with more than two integral modes, we could apply the above rules recursively, each time we could try to coalesce two adjacent integral modes, until no more coalescence is possible. This guarantees that the laytout function remains the same.

Proof

Let $L = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ be a layout of the concatenation of layouts $L_{0}$, $L_{1}$, $\ldots$, $L_{\alpha}$, where each $L_{k} = (N_{k}):(d_{k})$ for $k \in [0, \alpha]$. Given a coordinate $x \mapsto (x_0, x_1, \ldots, x_{\alpha})$, we have the layout function for layout $L$ as follows:

$$
\begin{align}
f_{L}(x) &= x_{0} d_{0} + x_{1} d_{1} + x_{2} d_{2} + \ldots + x_{\alpha} d_{\alpha} \\
\end{align}
$$

The layout function for layout $L$ is just the sum of the layout functions of each $L_{k}$, i.e.,

$$
\begin{align}
f_{L}(x) &= f_{L_{0}}(x_{0}) + f_{L_{1}}(x_{1}) + \ldots + f_{L_{\alpha}}(x_{\alpha}) \\
&= x_{0} d_{0} + x_{1} d_{1} + x_{2} d_{2} + \ldots + x_{\alpha} d_{\alpha} \\
\end{align}
$$

Suppose two adjacent integral modes, say $A = (L_{i}, L_{i+1}) = (N_{i}, N_{i + 1}):(d_{i}, d_{i + 1})$, can be coalesced to a new layout $A_{i}^{\prime}$ whose shape is $(N_{i} N_{i + 1})$, which satisfies the first, second, and third coalescence rules. By the definition of coalescence, we must have $f_{A^{\prime}}(x) = f_{A}(x)$.

For any $x_{i} \in [0, N_{i})$ and $x_{i + 1} \in [0, N_{i + 1})$, we have $x^{\prime} = x_{i} + x_{i + 1} \cdot N_{i}$, where $x^{\prime} \in [0, N_{i} N_{i + 1})$.

$$
\begin{align}
f_{A^{\prime}}(x^{\prime}) &= f_{A}(x^{\prime}) \\
&= x_{i} d_{i} + x_{i + 1} d_{i + 1} \\
&= f_{L_{i}}(x_{i}) + f_{L_{i + 1}}(x_{i + 1}) \\
\end{align}
$$

Given any coordinate $x \mapsto (x_0, x_1, \ldots, x_{i-1}, x_{i}, x_{i + 1}, x_{i + 2}, \ldots, x_{\alpha})$ for the layout $L$, after coalescing $A$ to $A^{\prime}$, the coordiante $x \mapsto (x_0, x_1, \ldots, x_{i-1}, x^{\prime}, x_{i + 2}, \ldots, x_{\alpha})$ for the layout $L^{\prime}$, where $L^{\prime}$ is the layout after coalescing $A$ to $A^{\prime}$.

This is easy to verify because of the isomorphism of coordinates.

$$
\begin{align}
x &= x_{0} + x_{1} \cdot N_0 + x_{2} \cdot N_0 N_1 + \ldots + x_{i - 1} \cdot N_0 N_1 \ldots N_{i - 2} + x_{i} \cdot N_0 N_1 \ldots N_{i - 1} + x_{i + 1} \cdot N_0 N_1 \ldots N_{i - 1} N_i + x_{i + 2} \cdot N_0 N_1 \ldots N_{i - 1} N_i N_{i + 1} + \ldots + x_{\alpha} \cdot N_0 N_1 \ldots N_{\alpha - 1} \\
&= x_{0} + x_{1} \cdot N_0 + x_{2} \cdot N_0 N_1 + \ldots + x_{i - 1} \cdot N_0 N_1 \ldots N_{i - 2} + (x_{i} + x_{i + 1} \cdot N_i) \cdot N_0 N_1 \ldots N_{i - 1} + x_{i + 2} \cdot N_0 N_1 \ldots N_{i - 1} N_i N_{i + 1} + \ldots + x_{\alpha} \cdot N_0 N_1 \ldots N_{\alpha - 1} \\
&= x_{0} + x_{1} \cdot N_0 + x_{2} \cdot N_0 N_1 + \ldots + x_{i - 1} \cdot N_0 N_1 \ldots N_{i - 2} + x^{\prime} \cdot N_0 N_1 \ldots N_{i - 1} + x_{i + 2} \cdot N_0 N_1 \ldots N_{i - 1} N_i N_{i + 1} + \ldots + x_{\alpha} \cdot N_0 N_1 \ldots N_{\alpha - 1} \\
\end{align}
$$

Then we have

$$
\begin{align}
f_{L^{\prime}}(x)
&= f_{L_{0}}(x_{0}) + f_{L_{1}}(x_{1}) + \ldots + f_{L_{i - 1}}(x_{i - 1}) + f_{A^{\prime}}(x^{\prime}) + f_{L_{i + 2}}(x_{i + 2}) + \ldots + f_{L_{\alpha}}(x_{\alpha}) \\
&= f_{L_{0}}(x_{0}) + f_{L_{1}}(x_{1}) + \ldots + f_{L_{i - 1}}(x_{i - 1}) + f_{L_{i}}(x_{i}) + f_{L_{i + 1}}(x_{i + 1}) + f_{L_{i + 2}}(x_{i + 2}) + \ldots + f_{L_{\alpha}}(x_{\alpha}) \\
&= f_{L}(x)
\end{align}
$$

Therefore, the layout function remains the same after coalescing $A$ to $A^{\prime}$.

This concludes the proof. $\square$

By-Mode Coalescence

In some cases, when the modes are not completely integral in the layout and we would like to keep the number of modes unchanged, we could perform by-mode coalescence. This can be achieved by disabling the coalescence of adjacent integral modes from two different modes in the coalescence rules for any layout with more than two integral modes.

For example, if we have a layout of two modes $A = ((N_{0}, N_{1}, \ldots, N_{\alpha}), (N_{\alpha + 1}, N_{\alpha + 2}, \ldots, N_{\beta}):((d_{0}, d_{1}, \ldots, d_{\alpha}), (d_{\alpha + 1}, d_{\alpha + 2}, \ldots, d_{\beta})))$, to perform by-mode coalescence, we could coalesce the integral modes $(N_{0}, N_{1}, \ldots, N_{\alpha})$ and $(N_{\alpha + 1}, N_{\alpha + 2}, \ldots, N_{\beta})$ separately until no more coalescence is possible for each one, and no more coalescence will be performed further between the two consequent coalesced modes even if they can be coalesced. This will result in a layout with the same number of modes as before.

Implication of Coalescence

Coalescence simplifies the layout and does not change the layout function. It is a useful operation to reduce the complexity of the layout and the related computation while preserving its functionality. If non-by-mode coalescence is performed on a layout, the layout can be simplified such that each mode is an integral mode, i.e., $A = (N_{0}, N_{1}, \ldots, N_{\alpha}):(d_{0}, d_{1}, \ldots, d_{\alpha})$ where $N_{k}$ is an integer, not a tuple of integers, and $N_{k} > 1$ for $k \in [0, \alpha]$.

The property of coalescence is very important and most of the proofs we will present in the article assumes that the layout is coalesced. This assumption is fine mathematically because the coalescence does not change the layout function.

Complementation

Definition 2.4: Sorted Layout

Let $A = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ be a layout. We say that $A$ is sorted if $d_0 \leq d_1 \leq \ldots \leq d_{\alpha}$ and for every $i < j$, if $d_i = d_j$, then $N_i \leq N_j$.

Note that sorting a layout, or more generally, changing the order of modes of a layout, will change the semantics and operations of the layout.

For example, suppose we have a layout $A = (2, 4):(4, 1)$ and a layout $B = (4, 2):(1, 4)$. We could see that $B$ is the sorted version of $A$. We could compute the layout function of $A$ and $B$ as follows using lookup tables:

$$
\begin{align}
f_A(0) &= f_A(0, 0) = 0 \cdot 4 + 0 \cdot 1 = 0 \\
f_A(1) &= f_A(1, 0) = 1 \cdot 4 + 0 \cdot 1 = 4 \\
f_A(2) &= f_A(0, 1) = 0 \cdot 4 + 1 \cdot 1 = 1 \\
f_A(3) &= f_A(1, 1) = 1 \cdot 4 + 1 \cdot 1 = 5 \\
f_A(4) &= f_A(0, 2) = 0 \cdot 4 + 2 \cdot 1 = 2 \\
f_A(5) &= f_A(1, 2) = 1 \cdot 4 + 2 \cdot 1 = 6 \\
f_A(6) &= f_A(0, 3) = 0 \cdot 4 + 3 \cdot 1 = 3 \\
f_A(7) &= f_A(1, 3) = 1 \cdot 4 + 3 \cdot 1 = 7
\end{align}
$$

$$
\begin{align}
f_B(0) &= f_B(0, 0) = 0 \cdot 1 + 0 \cdot 4 = 0 \\
f_B(1) &= f_B(1, 0) = 1 \cdot 1 + 0 \cdot 4 = 1 \\
f_B(2) &= f_B(2, 0) = 2 \cdot 1 + 0 \cdot 4 = 2 \\
f_B(3) &= f_B(3, 0) = 3 \cdot 1 + 0 \cdot 4 = 3 \\
f_B(4) &= f_B(0, 1) = 0 \cdot 1 + 1 \cdot 4 = 4 \\
f_B(5) &= f_B(1, 1) = 1 \cdot 1 + 1 \cdot 4 = 5 \\
f_B(6) &= f_B(2, 1) = 2 \cdot 1 + 1 \cdot 4 = 6 \\
f_B(7) &= f_B(3, 1) = 3 \cdot 1 + 1 \cdot 4 = 7
\end{align}
$$

We could see that the layout $B$ is typically referred as the column-major layout, and the layout $A$ is typically referred as the row-major layout. They are completely different layouts.

More generally, the sorted layout is a just like the “generalization” of the column-major layout.

Definition 2.5: Admission for Complementation

Let $A = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ be a layout and $M$ be a positive integer. If $A$ is not sorted then replace $A$ with its sorted version. We say that the pair $\{A, M\}$ is admissible for complementation (or simply admissible) if:

For all $1 \leq i \leq \alpha$, $N_{i - 1} \cdot d_{i - 1}$ divides $d_i$.
$N_{\alpha} \cdot d_{\alpha}$ divides $M$.

That $\{A, M\}$ is admissible for complementation also implies:

For all $1 \leq i \leq \alpha$, $N_{i - 1} \cdot d_{i - 1} \leq d_i$ and $d_{i - 1} \leq d_i$.
$N_{\alpha} \cdot d_{\alpha} \leq M$ and $d_{\alpha} \leq M$.

Definition 2.6: Complementation

Let $A = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ be a layout and $M$ be a positive integer. If $\{A, M\}$ is admissible for complementation, then if $A$ is not sorted, replace $A$ with its sorted version. The complement of $\{A, M\}$ is defined to be the layout

$$
\begin{align}
\text{complement}(A, M) = \left(d_0, \frac{d_1}{N_0 d_0}, \frac{d_2}{N_1 d_1}, \ldots, \frac{d_{\alpha}}{N_{\alpha - 1} d_{\alpha - 1}}, \frac{M}{N_{\alpha} d_{\alpha}} \right): \left(1, N_0 d_0, N_1 d_1, \ldots, N_{\alpha} d_{\alpha}\right)
\end{align}
$$

Note that the size of the complement of $\{A, M\}$, $\text{size}(\text{complement}(A, M))$, is $\frac{M}{\text{size}(A)} = \frac{M}{N_0 \cdot N_1 \cdot \ldots \cdot N_{\alpha}}$.

By definition, the complement of $\{A, M\}$ is insensitive to the order of the modes of $A$, since it will always be sorted before complementation.

The complement of $\{A, M\}$ is strictly increasing. This might not be very obvious, so we will show a proof.

Proof

Suppose $B = \text{complement}(A, M)$, to show that the layout function $f_{B}$, whose domain is a set of natural numbers, is strictly increasing, we need to show that for every two adjacent natural numbers $x$ and $x + 1$, $0 \leq x < x + 1 < \text{size}(B)$, we have $f_{B}(x) < f_{B}(x + 1)$.

Because of the isomorphism, suppose the mapping of $x$ is as follows:

$$
\begin{align}
x &\mapsto \left(x_0, x_1, \ldots, x_{\alpha}, x_{\alpha + 1}\right) \\
\end{align}
$$

By definition of the layout function $f_{B}$, we have

$$
\begin{align}
f_{B}(x) &= x_0 + x_1 \cdot N_0 d_0 + x_2 \cdot N_1 d_1 + \ldots + x_{\alpha} \cdot N_{\alpha - 1} d_{\alpha - 1} + x_{\alpha + 1} \cdot N_{\alpha} d_{\alpha} \\
\end{align}
$$

The mapping of $x + 1$ can have many different cases.

In the simplest case,

$$
\begin{align}
x + 1 &\mapsto \left(x_0 + 1, x_1, \ldots, x_{\alpha}, x_{\alpha + 1}\right) \\
\end{align}
$$

Then we have

$$
\begin{align}
f_{B}(x + 1) &= x_0 + 1 + x_1 \cdot N_0 d_0 + x_2 \cdot N_1 d_1 + \ldots + x_{\alpha} \cdot N_{\alpha - 1} d_{\alpha - 1} + x_{\alpha + 1} \cdot N_{\alpha} d_{\alpha} \\
&= f_{B}(x) + 1 \\
&> f_{B}(x)
\end{align}
$$

In a more complicated case, where $x_0 = d_0 - 1$ and $x_1 < \frac{d_1}{N_0 d_0} - 1$, we have

$$
\begin{align}
x + 1 &\mapsto \left(0, x_1 + 1, \ldots, x_{\alpha}, x_{\alpha + 1}\right) \\
\end{align}
$$

Then we have

$$
\begin{align}
f_{B}(x + 1) &= 0 + (x_1 + 1) \cdot N_0 d_0 + x_2 \cdot N_1 d_1 + \ldots + x_{\alpha} \cdot N_{\alpha - 1} d_{\alpha - 1} + x_{\alpha + 1} \cdot N_{\alpha} d_{\alpha} \\
&= f_{B}(x) - x_0 + N_0 d_0 \\
&= f_{B}(x) - (d_0 - 1) + N_0 d_0 \\
&= f_{B}(x) + 1 + (N_0 - 1) d_0 \\
&> f_{B}(x)
\end{align}
$$

Because $N_0 \geq 1$, we have $(N_0 - 1) d_0 \geq 0$, so we have

$$
\begin{align}
f_{B}(x + 1) &> f_{B}(x)
\end{align}
$$

In general, when $x_0 = d_0 - 1$, for some $k \in [1, \alpha - 1]$, $x_i = \frac{d_i}{N_{i - 1} d_{i - 1}} - 1$ for every $i \in [1, k]$, $x_{k + 1} < \frac{d_{k + 1}}{N_k d_k} - 1$, we have

$$
\begin{align}
x + 1 &\mapsto \left(0, 0, \ldots, 0, x_{k + 1} + 1, \ldots, x_{\alpha}, x_{\alpha + 1}\right) \\
\end{align}
$$

Then we have

$$
\begin{align}
f_{B}(x + 1) &= 0 + 0 \cdot N_0 d_0 + \ldots + 0 \cdot N_{k - 1} d_{k - 1} + (x_{k + 1} + 1) \cdot N_k d_k + \ldots + x_{\alpha} \cdot N_{\alpha - 1} d_{\alpha - 1} + x_{\alpha + 1} \cdot N_{\alpha} d_{\alpha} \\
&= f_{B}(x) - x_0 - \left(\sum_{i = 1}^{k} x_i \cdot N_{i - 1} d_{i - 1}\right) + N_k d_k \\
&= f_{B}(x) - (d_0 - 1) - \left(\sum_{i = 1}^{k} \left(\frac{d_i}{N_{i - 1} d_{i - 1}} - 1\right) \cdot N_{i - 1} d_{i - 1}\right) + N_k d_k \\
&= f_{B}(x) - (d_0 - 1) - \left(\sum_{i = 1}^{k} \left(d_i - N_{i - 1} d_{i - 1}\right)\right) + N_k d_k \\
&= f_{B}(x) - (d_0 - 1) + \sum_{i = 1}^{k} N_{i - 1} d_{i - 1} - \sum_{i = 1}^{k} d_i + N_k d_k \\
&= f_{B}(x) + \sum_{i = 0}^{k} \left(N_{i} - 1\right) d_{i} + 1 \\
\end{align}
$$

Because $N_{i} \geq 1$ for every $i$, we have $\left(N_{i} - 1\right) d_{i} \geq 0$ for every $i$, so we have

$$
\begin{align}
f_{B}(x + 1) &> f_{B}(x)
\end{align}
$$

This concludes the proof. $\square$

Similarly, we could also prove that the extension of the complement of $\{A, M\}$ is strictly increasing.

Proposition 2.7

Let $\{A = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha}), M\}$ be admissible for complementation and $B = \text{complement}(A, M)$. Let $C = (A, B)$ be the concatenated layout. Then the size of $C$ is $M$ and $f_C: [0, M) \to \mathbb{N}$ restricts to a bijection $[0, M) \cong [0, M)$.

Proof

Because $\text{size}(A) = \prod_{i = 0}^{\alpha} N_i$ and $\text{size}(B) = \frac{M}{\prod_{i = 0}^{\alpha} N_i}$, we have $\text{size}(C) = \text{size}(A) \cdot \text{size}(B) = M$. Thus the domain of $f_C$ is $[0, M)$.

Note that the image of $f_C$ is the same as that of $f_{C^{\prime}}$ for any permutation $C^{\prime}$ of $C$.

To see this, suppose we have the following layout $C$ and its permutation $C^{\prime}$ in which only one pair of the modes is permuted.

$$
\begin{align}
C &= \left(N_0, N_1, \ldots, N_{i}, \ldots, N_{j}, \ldots, N_{\alpha} \right): \left(d_0, d_1, \ldots, d_{i}, \ldots, d_{j}, \ldots, d_{\alpha}\right) \\
C^{\prime} &= \left(N_0, N_1, \ldots, N_{j}, \ldots, N_{i}, \ldots, N_{\alpha} \right): \left(d_0, d_1, \ldots, d_{j}, \ldots, d_{i}, \ldots, d_{\alpha}\right)
\end{align}
$$

The domains of $f_C$ and $f_C^{\prime}$ are both $[0, M)$. For any $x_C \in [0, M)$, we have

$$
\begin{align}
x_C &\mapsto \left(x_0, x_1, \ldots, x_{i}, \ldots, x_{j}, \ldots, x_{\alpha}\right) \\
x_{C^{\prime}} &\mapsto \left(x_0, x_1, \ldots, x_{j}, \ldots, x_{i}, \ldots, x_{\alpha}\right)
\end{align}
$$

and $x_C$ and $x_{C^{\prime}}$ are bijective.

Because by definition, $f_C(x_C) = f_{C^{\prime}}(x_{C^{\prime}})$, the image of $f_C$ is the same as that of $f_{C^{\prime}}$.

For any permutation $C^{\prime}$ of $C$, it can be obtained by permuting one pair of the modes of $C$ at a time and each time the image of $f_C$ is the same as that of $f_{C^{\prime}}$. Therefore, the image of $f_C$ is the same as that of $f_{C^{\prime}}$ for any permutation $C^{\prime}$ of $C$.

When computing the image of $f_C$ we may sort $C$. Without loss of generality, suppose $A = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$ is already sorted. After sorting $C$, the sorted $C^{\prime}$ could only be as follows:

$$
\begin{align}
C^{\prime} &= \left(d_0, N_0, \frac{d_1}{N_0 d_0}, N_1, \frac{d_2}{N_1 d_1}, N_2, \ldots, \frac{d_{\alpha}}{N_{\alpha - 1} d_{\alpha - 1}}, N_{\alpha}, \frac{M}{N_{\alpha} d_{\alpha}} \right): \left(1, d_0, N_0 d_0, d_1, N_1 d_1, d_2, \ldots, N_{\alpha - 1} d_{\alpha - 1}, d_{\alpha}, N_{\alpha} d_{\alpha}\right)
\end{align}
$$

Because $d_i \leq N_i d_i$ and $N_i d_i \leq d_{i + 1}$ for every $i$, when $N_i = 1$, $N_i \leq \frac{d_{i + 1}}{N_i d_i}$, when $N_i d_i = d_{i + 1}$, $\frac{d_{i + 1}}{N_i d_i} \leq N_{i + 1}$, thus $C^{\prime}$ is sorted and any permutation of $C^{\prime}$ will make it not sorted.

Then we may rewrite

$$
\begin{align}
C^{\prime} &= \left(r_0, r_1, r_2, \ldots, r_{\beta} \right): \left(1, r_0, r_0 r_1, \ldots, r_0 r_1 \ldots r_{\beta - 1}\right)
\end{align}
$$

where $\beta = 2 \alpha + 1$ and the maximum value that $f_{C^{\prime}}$ attains is computed as follows:

$$
\begin{align}
f_{C^{\prime}}(M - 1) &= f_{C^{\prime}}(r_0 - 1, r_1 - 1, r_2 - 1, \ldots, r_{\beta - 1} - 1, r_{\beta} - 1) \\
&= (r_0 - 1) + (r_1 - 1) \cdot r_0 + (r_2 - 1) \cdot r_0 r_1 + \ldots + (r_{\beta - 1} - 1) \cdot r_0 r_1 \ldots r_{\beta - 2} + (r_{\beta} - 1) \cdot r_0 r_1 \ldots r_{\beta - 1} \\
&= r_0 - 1 + r_0 r_1 - r_0 + r_0 r_1 r_2 - r_0 r_1 + \ldots + r_0 r_1 \ldots r_{\beta - 1} - r_0 r_1 \ldots r_{\beta - 2} + r_0 r_1 \ldots r_{\beta} - r_0 r_1 \ldots r_{\beta - 1} \\
&= r_0 r_1 \ldots r_{\beta} - 1 \\
&= M - 1
\end{align}
$$

Then in this case, to establish the bijectivity assertion, it’s sufficient to just show $f_{C^{\prime}}(x)$ is injective, i.e., for any $x, y \in [0, M)$, if $f_{C^{\prime}}(x) = f_{C^{\prime}}(y)$, then $x = y$.

Suppose the isomorphism mapping of $x$ and $y$ are as follows:

$$
\begin{align}
x &\mapsto \left(x_0, x_1, \ldots, x_{\beta}\right) \\
y &\mapsto \left(y_0, y_1, \ldots, y_{\beta}\right)
\end{align}
$$

Because $f_{C^{\prime}}(x) = f_{C^{\prime}}(y)$, we have

$$
\begin{align}
x_0 + x_1 \cdot r_0 + x_2 \cdot r_0 r_1 + \ldots + x_{\beta} \cdot r_0 r_1 \ldots r_{\beta - 1} = y_0 + y_1 \cdot r_0 + y_2 \cdot r_0 r_1 + \ldots + y_{\beta} \cdot r_0 r_1 \ldots r_{\beta - 1}
\end{align}
$$

We will use strong induction to show that $x_i = y_i$ for every $i \in [0, \beta]$.

Because $f_{C^{\prime}}(x) \mod r_0 = f_{C^{\prime}}(y) \mod r_0$, we have $x_0 = y_0$.

Now suppose by the strong induction that given $i \in (0, \beta]$, for all $j < i$, we have $x_j = y_j$. we have

$$
\begin{align}
x_i \cdot r_0 r_1 \ldots r_{i - 1} + x_{i + 1} \cdot r_0 r_1 \ldots r_{i} + \ldots + x_{\beta} \cdot r_0 r_1 \ldots r_{\beta - 1} = y_i \cdot r_0 r_1 \ldots r_{i - 1} + y_{i + 1} \cdot r_0 r_1 \ldots r_{i} + \ldots + y_{\beta} \cdot r_0 r_1 \ldots r_{\beta - 1}
\end{align}
$$

Because $x_i \in [0, r_i)$ and $y_i \in [0, r_i)$, taking this equation modulo $r_0 r_1 \ldots r_{i}$ and dividing by $r_0 r_1 \ldots r_{i - 1}$, we have $x_i = y_i$.

Because $(x_0, x_1, \ldots, x_{\beta}) = (y_0, y_1, \ldots, y_{\beta})$, and the isomorphism mapping is bijective, we have $x = y$.

Therefore $f_{C^{\prime}}: [0, M) \to \mathbb{N}$ restricts to a bijection $[0, M) \cong [0, M)$. So does $f_C$.

This concludes the proof. $\square$

Corollary 2.8 Complementation Disjointness

The Corollary 2.8 explains what it means of taking a complement of a layout.

In the setting of Proposition 2.7, let $I = [0, \text{size}(A)) = [0, N_0 N_1 \ldots N_{\alpha})$ be the domain of $f_A$. Then

$$
\begin{align}
f_A(I) \cap \widehat{f}_B(I) = \{0\}
\end{align}
$$

In other words, $\widehat{f}_A$ and $\widehat{f}_B$ have disjoint image when restricted to the domain of $f_A$, apart from 0.

Note that in the corollary, $f_A$ and $\widehat{f}_A$ are actually interchangeable, because the function domain is restricted to the domain of $f_A$.

Proof

Let $J = [0, \text{size}(B)) = [0, \frac{M}{N_0 N_1 \ldots N_{\alpha}})$ be the domain of $f_B$. Then by Proposition 2.7, we have

$$
\begin{align}
f_A(I) \cap f_B(J) = \{0\}
\end{align}
$$

To understand this, for any $x_A \in I$ and any $x_B \in J$, because of the isomorphism, we have

$$
\begin{align}
x_A &\mapsto \left(x_{A, 0}, x_{A, 1}, \ldots, x_{A, \alpha} \right) \\
x_B &\mapsto \left(x_{B, 0}, x_{B, 1}, \ldots, x_{B, \alpha}, x_{B, \alpha + 1} \right)
\end{align}
$$

Then we have

$$
\begin{align}
f_A(x_A) &= x_{A, 0} + x_{A, 1} \cdot N_0 + x_{A, 2} \cdot N_0 N_1 + \ldots + x_{A, \alpha} \cdot N_0 N_1 \ldots N_{\alpha - 1} \\
f_B(x_B) &= x_{B, 0} + x_{B, 1} \cdot N_0 d_0 + x_{B, 2} \cdot N_1 d_1 + \ldots + x_{B, \alpha} \cdot N_{\alpha - 1} d_{\alpha - 1} + x_{B, \alpha + 1} \cdot N_{\alpha} d_{\alpha}
\end{align}
$$

We orchestrate new coordinates for layout $C$ as follows:

$$
\begin{align}
x^{\prime}_A &\mapsto \left(0, x_{A, 0}, 0, x_{A, 1}, 0, x_{A, 2}, \ldots, 0, x_{A, \alpha}, 0 \right) \\
x^{\prime}_B &\mapsto \left(x_{B, 0}, 0, x_{B, 1}, 0, x_{B, 2}, \ldots, x_{B, \alpha}, 0, x_{B, \alpha + 1} \right)
\end{align}
$$

Then we have

$$
\begin{align}
f_C(x^{\prime}_A) &= x_{A, 0} + x_{A, 1} \cdot N_0 + x_{A, 2} \cdot N_0 N_1 + \ldots + x_{A, \alpha} \cdot N_0 N_1 \ldots N_{\alpha - 1} \\
&= f_A(x_A) \\
f_C(x^{\prime}_B) &= x_{B, 0} + x_{B, 1} \cdot N_0 d_0 + x_{B, 2} \cdot N_1 d_1 + \ldots + x_{B, \alpha} \cdot N_{\alpha - 1} d_{\alpha - 1} + x_{B, \alpha + 1} \cdot N_{\alpha} d_{\alpha} \\
&= f_B(x_B)
\end{align}
$$

By the Proposition 2.7, we have $f_C: [0, M) \to \mathbb{N}$ restricts to a bijection $[0, M) \cong [0, M)$. If $x^{\prime}_A \neq x^{\prime}_B$, then $f_C(x^{\prime}_A) \neq f_C(x^{\prime}_B)$, and $f_A(x_A) \neq f_B(x_B)$.

Obviously, other than $(0, 0, \ldots, 0)$, for any values of $x_{A, 0}, x_{A, 1}, \ldots, x_{A, \alpha}$ and $x_{B, 0}, x_{B, 1}, \ldots, x_{B, \alpha}, x_{B, \alpha + 1}$, $\left(0, x_{A, 0}, 0, x_{A, 1}, 0, x_{A, 2}, \ldots, 0, x_{A, \alpha}, 0 \right) \neq \left(x_{B, 0}, 0, x_{B, 1}, 0, x_{B, 2}, \ldots, x_{B, \alpha}, 0, x_{B, \alpha + 1} \right)$, $x^{\prime}_A \neq x^{\prime}_B$, $f_C(x^{\prime}_A) \neq f_C(x^{\prime}_B)$, and $f_A(x_A) \neq f_B(x_B)$.

This means, for any $x \in I$ that $x \neq 0$, there is no $y \in J$ such that $f_A(x) = f_B(y)$.

When $x = 0$, we have $f_A(x) = f_B(x) = 0$. Thus we could claim that

$$
\begin{align}
f_A(I) \cap f_B(J) = \{0\}
\end{align}
$$

In the Definition 2.6: Complementation, we have shown that the complement of $\{A, M\}$, $f_B$, as well as its extension $\widehat{f}_B$, are strictly increasing.

In addition, by the extension of the isomorphism, we have

$$
\begin{align}
\text{size}(B) \mapsto \left(0, 0, \ldots, 0, \frac{M}{N_{\alpha} d_{\alpha}} \right) \\
\end{align}
$$

Then we have

$$
\begin{align}
\widehat{f}_{B}(\text{size}(B)) &= 0 + 0 \cdot 1 + 0 \cdot N_0 d_0 + \ldots + 0 \cdot N_{\alpha - 1} d_{\alpha - 1} + \frac{M}{N_{\alpha} d_{\alpha}} \cdot N_{\alpha} d_{\alpha} \\
&= M
\end{align}
$$

The largest value attained by $f_A$ is at $N_0 N_1 \ldots N_{\alpha} - 1$, and $f_A(N_0 N_1 \ldots N_{\alpha} - 1) = (N_0 - 1) d_0 + (N_1 - 1) d_1 + \ldots + (N_{\alpha} - 1) d_{\alpha}$.

Because $(N_0 - 1) d_0 < N_0 d_0$ and $N_i d_i \leq d_{i + 1}$ for every $i \in [0, \alpha - 1]$, $N_{\alpha} d_{\alpha} \leq M$, we have

$$
\begin{align}
f_A(N_0 N_1 \ldots N_{\alpha} - 1) &= (N_0 - 1) d_0 + (N_1 - 1) d_1 + \ldots + (N_{\alpha} - 1) d_{\alpha} \\
&< N_0 d_0 + N_1 d_1 - d_1 + N_2 d_2 - d_2 + \ldots + N_{\alpha} d_{\alpha} - d_{\alpha} \\
&\leq d_1 + N_1 d_1 - d_1 + N_2 d_2 - d_2 + \ldots + N_{\alpha} d_{\alpha} - d_{\alpha} \\
&= N_1 d_1 + N_2 d_2 - d_2 + \ldots + N_{\alpha} d_{\alpha} - d_{\alpha} \\
&\leq d_2 + N_2 d_2 - d_2 + \ldots + N_{\alpha} d_{\alpha} - d_{\alpha} \\
&\vdots \\
&\leq d_{\alpha} + N_{\alpha} d_{\alpha} - d_{\alpha} \\
&= N_{\alpha} d_{\alpha} \\
&\leq M
\end{align}
$$

Thus $f_A(N_0 N_1 \ldots N_{\alpha} - 1) < \widehat{f}_{B}(\text{size}(B))$.

In the case of $I \cap J = I$, i.e., $\text{size}(A) \leq \text{size}(B)$. Then we have

$$
\begin{align}
f_A(I) \cap f_B(I) = \{0\}
\end{align}
$$

Because in this case, $f_B(I) = \widehat{f}_B(I)$, we have

$$
\begin{align}
f_A(I) \cap \widehat{f}_B(I) = \{0\}
\end{align}
$$

In the other case of $I \cap J = J$, i.e., $\text{size}(A) \geq \text{size}(B)$. Because the largest value attained by $f_A$ is $f_A(N_0 N_1 \ldots N_{\alpha} - 1)$, and $f_A(N_0 N_1 \ldots N_{\alpha} - 1) < \widehat{f}_{B}(\text{size}(B))$, for any $x \in I/J$, we have $f_A(x) < \widehat{f}_{B}(\text{size}(B))$.

Thus,

$$
\begin{align}
f_A(I) \cap \widehat{f}_B(I/J) = {\emptyset}
\end{align}
$$

Therefore,

$$
\begin{align}
f_A(I) \cap \widehat{f}_B(I) &= f_A(I) \cap \left(\widehat{f}_B(I) \cup \widehat{f}_B(I/J)\right) \\
&= f_A(I) \cap \left(f_B(I) \cup \widehat{f}_B(I/J)\right) \\
&= \left(f_A(I) \cap f_B(I)\right) \cup \left(f_A(I) \cap \widehat{f}_B(I/J)\right) \\
&= \{0\} \cup {\emptyset} \\
&= \{0\}
\end{align}
$$

Taken together, we have

$$
\begin{align}
f_A(I) \cap \widehat{f}_B(I) = \{0\}
\end{align}
$$

This concludes the proof. $\square$

A short note on the original proof of Corollary 2.8 in the paper is that Jay Shah claimed $f_A(I \cap J) \cap f_B(I \cap J) = \{0\}$, which is insufficient to show the proof. The sufficient statement should be $f_A(I) \cap f_B(J) = \{0\}$.

Remark 2.9 Complementation Disjointness, Ordering, and Boundedness

The complement $B$ of a layout $A$ with respect to an integer $M$ should satisfy three properties:

$A$ and $B$ are disjoint in the sense that $f_A(x) \neq \widehat{f}_B(y)$ for all $x \neq 0$ and $y$ in the domain of $f_A$.
$B$ is ordered in the sense that $f_B(x)$ is a strictly increasing function.
$B$ is bounded by $M$ in the sense that $\text{size}(B) \geq \frac{M}{\text{size}(A)}$ and $\text{cosize}(B) \leq \left\lfloor\frac{M}{\text{cosize}(A)}\right\rfloor \cdot \text{cosize}(A)$. Here the cosize of a layout $L$ is defined as $\text{cosize}(L) = f_L(\text{size}(L) - 1) + 1$.

The property 1 and 2 have been proved in the Corollary 2.8 and Definition 2.6. We will show a proof of the property 3.

Proof

By Definition 2.6, we have $\text{size}(B) = \frac{M}{\text{size}(A)}$.

Because cosize is insensitive to the ordering of the layout, without loss of generality, we sorted $A$ so that $A = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$.

By the definition of cosize, we have

$$
\begin{align}
\text{cosize}(B) &= f_B(\text{size}(B) - 1) + 1 \\
&= f_B\left(d_0 - 1, \frac{d_1}{N_0 d_0} - 1, \ldots, \frac{d_{\alpha}}{N_{\alpha - 1} d_{\alpha - 1}} - 1, \frac{M}{N_{\alpha} d_{\alpha}} - 1\right) + 1 \\
&= (d_0 - 1) + \left(\frac{d_1}{N_0 d_0} - 1\right) \cdot N_0 d_0 + \ldots + \left(\frac{d_{\alpha}}{N_{\alpha - 1} d_{\alpha - 1}} - 1\right) \cdot N_{\alpha - 1} d_{\alpha - 1} + \left(\frac{M}{N_{\alpha} d_{\alpha}} - 1\right) \cdot N_{\alpha} d_{\alpha} + 1 \\
&= d_0 + d_1 + \ldots + d_{\alpha} - N_0 d_0 - N_1 d_1 - \ldots - N_{\alpha} d_{\alpha} + M \\
&= M - \left(\left(N_0 - 1\right) d_0 + \left(N_1 - 1\right) d_1 + \ldots + \left(N_{\alpha} - 1\right) d_{\alpha}\right) \\
&= M - f_A(\text{size}(A) - 1) \\
&= M - \left( \text{cosize}(A) - 1 \right) \\
\end{align}
$$

To obtain the inequality $\text{cosize}(B) \leq \left\lfloor\frac{M}{\text{cosize}(A)}\right\rfloor \cdot \text{cosize}(A)$, we divide the above equation by $\text{cosize}(A)$.

$$
\begin{align}
\frac{\text{cosize}(B)}{\text{cosize}(A)} &= \frac{M - \left( \text{cosize}(A) - 1 \right)}{\text{cosize}(A)} \\
&= \frac{M}{\text{cosize}(A)} - 1 + \frac{1}{\text{cosize}(A)} \\
\end{align}
$$

and we have to show that

$$
\begin{align}
\frac{M}{\text{cosize}(A)} - 1 + \frac{1}{\text{cosize}(A)} \leq \left\lfloor\frac{M}{\text{cosize}(A)}\right\rfloor
\end{align}
$$

In fact, for any $a, b \in \mathbb{N}$ and $a \geq 1$, we have

$$
\begin{align}
\frac{b}{a} - 1 + \frac{1}{a} \leq \left\lfloor\frac{b}{a}\right\rfloor
\end{align}
$$

To see this, suppose $\frac{b}{a} = \left\lfloor\frac{b}{a}\right\rfloor + c$, where $c = \frac{k}{a}$ and $k$ is an integer such that $0 \leq k < a$. Then we have

$\frac{1}{a} \leq c < 1$ and $1 \leq ac < a$. Then we want to show that

$$
\begin{align}
\frac{b}{a} - 1 + \frac{1}{a} \leq \frac{b}{a} - c \\
\end{align}
$$

$$
\begin{align}
-a + 1 \leq -ac \\
\end{align}
$$

$$
\begin{align}
a - ac \geq 1 \\
\end{align}
$$

$$
\begin{align}
a - k \geq 1 \\
\end{align}
$$

Because $a$ and $k$ are both integers and $0 \leq k < a$, we have $a - k \geq 1$. Thus the inequality holds.

This concludes the proof. $\square$

Non-Integral Layout Complementation

All the properties and proofs of complementation above assumes that the layout being complemented is a layout whose mode is integral, i.e., $A = (N_0, N_1, \ldots, N_{\alpha}):(d_0, d_1, \ldots, d_{\alpha})$. In the case where the layout is non-integral, i.e., some of the modes are not integers, the layout shall be coalesced to an integral layout before the complementation is applied. This is valid because coalescence does not change the layout function.

Implication of Complementation

The complementation of a layout finds a complement layout with a positive integer so that when the two layouts are concatenated, such as $\left(B, \text{complement}(B, M)\right)$, the new layout is a bijection $[0, M) \cong [0, M)$. This is also saying, if the original layout is repeated using the complement layout, the new layout is still a bijection.

Composition

Definition 2.11 Left Divisibility

Let $M, d > 0$ be positive integers and let $M = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha}$ be a given factorization of $M$ by integers $M_{k} > 1$ for $k \in [0, \alpha]$. Replacing $M_{\alpha}$ by $\infty$, let

$$
\begin{align}
\widehat{M} = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \cdot \infty
\end{align}
$$

and consider $\infty$ to be divisible by every positive integer. We say that $M$ is left divisible by $d$ (implicitly, with respect to the given factorization) if there exists $0 \leq i \leq \alpha$ such that:

$M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1}$ divides $d$.
Suppose the first condition is satisfied. Let $c = \frac{d}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1}}$. Then if $i < \alpha$, we require in addition that $1 \leq c < M_{i}$.
For the second condition in the case $i < \alpha$, we require in addition that $c$ also divides $M_{i}$.

Here $i$ is necessarily unique if it exists. We could prove this by contradiction.

Proof

Suppose there exists two distinct $i$ and $j$ such that the three conditions are satisfied. Without loss of generality, suppose $i < j$.

There are two cases to consider.

In the case where $j < \alpha$, we will also have $i < \alpha$. Then we have

$$
\begin{align}
d &= c \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} \\
\end{align}
$$

where $c$ is some positive integer such that $1 \leq c < M_{i}$.

Similarly,

$$
\begin{align}
d &= c^{\prime} \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{j - 1} \\
\end{align}
$$

where $c^{\prime}$ is some positive integer such that $1 \leq c^{\prime} < M_{j}$.

Thus,

$$
\begin{align}
c \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} &= c^{\prime} \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{j - 1} \\
\end{align}
$$

$$
\begin{align}
c &= c^{\prime} \cdot M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1} \\
\end{align}
$$

To make the above equation valid, we must show

$$
\begin{align}
c^{\prime} \cdot \frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1} = 1
\end{align}
$$

However, because $M_{k} > 1$ for $k \in [0, \alpha]$, $\frac{M_{i}}{c} > 1$, and $c^{\prime} \geq 1$, it is not possible to have the above equation valid. This raises a contradiction. Therefore, $i$ is unique.

In the case where $j = \alpha$, we will also have $i < \alpha$. Then we have

$$
\begin{align}
d &= c \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} \\
\end{align}
$$

where $c$ is some positive integer such that $1 \leq c < M_{i}$.

Similarly,

$$
\begin{align}
d &= c^{\prime} \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \\
\end{align}
$$

where $c^{\prime}$ is some positive integer.

Thus,

$$
\begin{align}
c \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} &= c^{\prime} \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \\
\end{align}
$$

$$
\begin{align}
c &= c^{\prime} \cdot M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1} \\
\end{align}
$$

To make the above equation valid, we must show

$$
\begin{align}
c^{\prime} \cdot \frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1} = 1
\end{align}
$$

Taken together, $i$ is unique if it exists.

This concludes the proof. $\square$

If $i$ exists, we will refer to $i$ as the division index and write $\widehat{M} = d \cdot \widehat{M}^{\prime}$, where $\widehat{M}^{\prime}$ is endowed with the following induced factorization:

If $0 \leq i < \alpha$, then $\widehat{M}^{\prime} = \widehat{M}_{0}^{\prime} \cdot \widehat{M}_{1}^{\prime} \cdot \ldots \cdot \widehat{M}_{\alpha - i - 1}^{\prime} \cdot \infty$ with $\widehat{M}_{0}^{\prime} = \frac{M_{i}}{c} > 1$ and $\widehat{M}_{j}^{\prime} = M_{i + j}$ for $0 < j < \alpha - i$.
If $i = \alpha$, then $\widehat{M} = d \cdot \infty$ and we will let $\widehat{M}^{\prime} = \infty$.

To see this, in the case where $0 \leq i < \alpha$, we have

$$
\begin{align}
\widehat{M} &= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \cdot \infty \\
&= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} \cdot M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1} \cdot \infty \\
&= \frac{d}{c} \cdot M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1} \cdot \infty \\
&= d \cdot \frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1} \cdot \infty \\
&= d \cdot M_{0}^{\prime} \cdot M_{1}^{\prime} \cdot \ldots \cdot M_{\alpha - i - 1}^{\prime} \cdot \infty \\
&= d \cdot \widehat{M}^{\prime}
\end{align}
$$

where $\widehat{M}^{\prime} = M_{0}^{\prime} \cdot M_{1}^{\prime} \cdot \ldots \cdot M_{\alpha - i - 1}^{\prime} \cdot \infty$ with $M_{0}^{\prime} = \frac{M_{i}}{c} > 1$ and $M_{j}^{\prime} = M_{i + j}$ for $0 < j < \alpha - i$.

In the case where $i = \alpha$, we have

$$
\begin{align}
\widehat{M} &= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \cdot \infty \\
&= \frac{d}{c} \cdot \infty \\
&= d \cdot \infty \\
&= d \cdot \widehat{M}^{\prime}
\end{align}
$$

where $\widehat{M}^{\prime} = \infty$.

Furthermore, we say that $M$ is weakly left divisible by $d$ if there exists $0 \leq i \leq \alpha$ such that the conditions 1 and 2 are satisfied for left divisibility, but not necessarily the condition 3.

Notice that in the proof of the uniqueness of division index $i$, we have never used the condition 3. Therefore, we could still the necessarily unique $i$ the division index for weak left divisibility, but we no longer have the factorization of $\widehat{M}$, because the factorization assumes the condition 3 of left divisibility.

Also notice that $\widehat{M}^{\prime}$ with its induced factorization can itself be considered for left divisibility or weak left divisibility (with the step or replacing the last factor by $\infty$ now being superfluous). More specifically, because $\widehat{M}^{\prime} > 0$, $\widehat{M}_{j}^{\prime} > 1$ for $j \in [0, \alpha - i - 1]$, and $\widehat{M}^{\prime} = \widehat{M}_{0}^{\prime} \cdot \widehat{M}_{1}^{\prime} \cdot \ldots \cdot \widehat{M}_{\alpha - i - 1}^{\prime} \cdot \infty$, given another positive integer $d^{\prime} > 0$, we could completely test whether the properties of left divisibility or weak left divisibility hold for $\widehat{M}^{\prime}$ with respect to $d^{\prime}$. Replacing the last factor by $\infty$ is not necessary as it is already $\infty$.

Definition 2.12 Admission for Composition - Restricted Case

We first consider composition in the restricted case of length 1 layouts for the second layout.

Let $\mathbf{S} = (M_{0}, M_{1}, \ldots, M_{\alpha})$ be a shape tuple, let $M = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha}$, and let $B = (N):(r)$ be a layout of length 1. Then we say that the pair $\{\mathbf{S}, B\}$ is admissible for composition (or simply admissible) if:

$M$ is left divisible by $r$. Write $\widehat{M} = r \cdot \widehat{M}^{\prime}$.
With respect to its induced factorization, $\widehat{M}^{\prime}$ is weakly left divisible by $N$.

Definition 2.13 Composition - Restricted Case

The idea of admissibility is that the composition $A \circ B$ of layouts will entail “dividing $B$ along the modes of $A$”. More preciously, we have the following:

Suppose that $\mathbf{S} = (M_{0}, M_{1}, \ldots, M_{\alpha})$ is a shape tuple, and $B = (N):(r)$ is a layout of length 1 such that $\{\mathbf{S}, B\}$ is admissible for composition. Let $\mathbf{D} = (d_{0}, d_{1}, \ldots, d_{\alpha})$ be any stride tuple and let $A = (\mathbf{S}:\mathbf{D})$ be a coalesced layout.

Note that in Jay Shah’s original paper, the layout $A$ was not specified to be coalesced. It will result in some compositions not being valid.

As in Definition 2.11, let $M = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha}$ and $\widehat{M} = r \cdot \widehat{M}^{\prime}$ with division index $0 \leq i \leq \alpha$. We separate the definition of $A \circ B$ into two cases.

First suppose $0 \leq i < \alpha$, so that

$$
\begin{align}
r &= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} \cdot c \\
\widehat{M}^{\prime} &= \frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1} \cdot \infty
\end{align}
$$

Then if $N \leq \frac{M_{i}}{c}$, we let $A \circ B = (N):(cd_{i})$.

Otherwise, there exists a $j \in [i + 1, \alpha]$ such that $N = \frac{M_{i}}{c} \cdot \ldots \cdot M_{j-1} \cdot c^{\prime}$, where $1 \leq c^{\prime} < M_{j}$ if $j \neq \alpha$ (when $j = i + 1$, $N = \frac{M_{i}}{c} \cdot c^{\prime}$).

Note that here is an important fact that $c^{\prime}$ must be an integer because of the second condition for admission for composition, that is, $\widehat{M}^{\prime}$ is weakly left divisible by $N$. We must have $\frac{M_{i}}{c} \cdot \ldots \cdot M_{j-1}$ divides $N$, resulting in $c^{\prime}$ being an integer.

We let

$$
\begin{align}
A \circ B =
\begin{cases}
\left(\frac{M_{i}}{c}, M_{i + 1}, \ldots, M_{j - 1}, c^{\prime} \right) : \left(cd_{i}, d_{i + 1}, \ldots, d_{j - 1}, d_{j} \right) & \text{if } c^{\prime} > 1 \\
\left(\frac{M_{i}}{c}, M_{i + 1}, \ldots, M_{j - 1} \right) : \left(cd_{i}, d_{i + 1}, \ldots, d_{j - 1} \right) & \text{if } c^{\prime} = 1
\end{cases}
\end{align}
$$

If instead $i = \alpha$, then we have $r = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \cdot c$ as before but $\widehat{M}^{\prime} = \infty$, and we let $A \circ B = (N):(cd_{\alpha})$.

Let’s look at this definition more closely.

Essentially, we are taking the one-dimensional coordinates $k \cdot r$ along the layout $A$ where $k \in [0, N - 1]$. Because we have $r = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} \cdot c$, and $c$ divides $M_{i}$.

Let first consider the case of $0 \leq i < \alpha$.

If $N \leq \frac{M_{i}}{c}$, then the first mode in the layout $A$ is sufficient for dividing $B$. Consequently, the composition layout $A \circ B = (N):(cd_{i})$.

Otherwise if $N > \frac{M_{i}}{c}$, more modes in the layout $A$ will be involved for dividing $B$, and consequently the composition layout

Let’s then consider the case of $i = \alpha$. We have $r = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \cdot c$ and $\widehat{M}^{\prime} = \infty$. Here $c$ is an positive integer that can be infinitely large. So only the last mode in the layout $A$ is involved for dividing $B$, and consequently the composition layout $A \circ B = (N):(cd_{\alpha})$.

Note that by this definition, $\text{size}(A \circ B) = \text{size}(B)$. This is a critical property which we will use later.

Proposition 2.14

In the situation of Definition 2.13, we have that $f_{A \circ B} = \widehat{f}_A \circ f_B$.

Proof

This more formally proves the intuition we explained to Definition 2.13.

We carry over the notation from Definition 2.13.

Given an index $0 \leq k \leq \alpha$, let $\delta_{k} \in \mathbb{N}^{\times(\alpha + 1)}$ denote the coordinate that is zero everywhere except in the $k$-th position, where it is 1. Concretely,

$$
\begin{align}
\delta_{0} &= \underbrace{(1, 0, 0, \ldots, 0)}_{\alpha + 1} \\
\delta_{1} &= \underbrace{(0, 1, 0, \ldots, 0)}_{\alpha + 1} \\
&\vdots \\
\delta_{\alpha} &= \underbrace{(0, 0, 0, \ldots, 1)}_{\alpha + 1}
\end{align}
$$

With respect to the the isomorphism of the extended layout $A$, we have

$$
\begin{align}
\widehat{\iota}: \mathbb{N} \cong [0, M_{0}) \times [0, M_{1}) \times \ldots \times [0, M_{\alpha - 1}) \times \mathbb{N}
\end{align}
$$

Because $B = (N):(r)$, we have

$$
\begin{align}
f_B(k) &= k \cdot r \\
&= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1} \cdot k \cdot c \\
\end{align}
$$

where $k \in [0, N - 1]$.

Let first consider the case of $0 \leq i < \alpha$.

If $N \leq \frac{M_{i}}{c}$, i.e. $N \cdot c \leq M_{i}$, then we must have $k \cdot c < M_{i}$ for all $k \in [0, N - 1]$. Because of the isomorphism of the extended layout $A$, we have

$$
\begin{align}
f_B(k) \mapsto \delta_{i} \cdot k \cdot c
\end{align}
$$

Then we have

$$
\begin{align}
\left(\widehat{f}_A \circ f_B \right)(k) &= \widehat{f}_A \left(f_B\left(k\right)\right) \\
&= \widehat{f}_A \left(\delta_{i} \cdot k \cdot c\right) \\
&= k \cdot c \cdot d_{i} \\
\end{align}
$$

According to Definition 2.13, we have

$$
\begin{align}
f_{A \circ B}(k) &= k \cdot c \cdot d_{i} \\
\end{align}
$$

Therefore, $f_{A \circ B} = \widehat{f}_A \circ f_B$.

Otherwise if $N > \frac{M_{i}}{c}$, i.e. $N = \frac{M_{i}}{c} \cdot \ldots \cdot M_{j-1} \cdot c^{\prime}$. Because of the isomorphism of the extended layout $A$, by definition, we have

$$
\begin{align}
f_B(k) &\mapsto \left(f_B(k) \mod M_{0}, \left\lfloor \frac{f_B(k)}{M_{0}} \right\rfloor \mod M_{1}, \left\lfloor \frac{f_B(k)}{M_{0} \cdot M_{1}} \right\rfloor \mod M_{2}, \ldots, \left\lfloor \frac{f_B(k)}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1}} \right\rfloor \mod M_{i}, \ldots, \left\lfloor \frac{f_B(k)}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 2}} \right\rfloor \mod M_{\alpha - 1}, \left\lfloor \frac{f_B(k)}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \right) \\
&= \left(0, 0, \ldots, \left(k \cdot c\right) \mod M_{i}, \left\lfloor \frac{k \cdot c}{M_{i}} \right\rfloor \mod M_{i + 1}, \ldots, \left\lfloor \frac{k \cdot c}{M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod M_{j}, \ldots, \left\lfloor \frac{k \cdot c}{M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 2}} \right\rfloor \mod M_{\alpha - 1}, \left\lfloor \frac{k \cdot c}{M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \right) \\
&= \left(0, 0, \ldots, \left( k \mod \frac{M_{i}}{c} \right) \cdot c, \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1}, \ldots, \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod M_{j}, \ldots, \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 2}} \right\rfloor \mod M_{\alpha - 1}, \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \right) \\
\end{align}
$$

Note that here we used the property $\left(k \cdot c\right) \mod M_{i} = \left( k \mod \frac{M_{i}}{c} \right) \cdot c$, if $c$ divides $M_{i}$.

To see this, suppose $k \cdot c = p \cdot M_{i} + r$, where $0 \leq r < M_{i}$. Then we have $k = \frac{p \cdot M_{i} + r}{c} = p \cdot \frac{M_{i}}{c} + \frac{r}{c}$. Because $c$ divides $M_{i}$, $\frac{r}{c}$ ust be an integer. Thus, we have $\left(k \cdot c\right) \mod M_{i} = r$, and $\left( k \mod \frac{M_{i}}{c} \right) \cdot c = \frac{r}{c} \cdot c = r$. Therefore, $\left(k \cdot c\right) \mod M_{i} = \left( k \mod \frac{M_{i}}{c} \right) \cdot c$.

Further more, because $0 \leq k < N$, we have $0 \leq k \cdot c < N \cdot c = M_{i} \cdot \ldots \cdot M_{j - 1} \cdot c^{\prime}$, where $1 \leq c^{\prime} < M_{j}$. Thus $0 \leq k \cdot c < M_{i} \cdot \ldots \cdot M_{j - 1} \cdot M_{j}$.

When $c^{\prime} > 1$, we have

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j}} \right\rfloor
&= \left\lfloor \frac{k \cdot c}{M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{j}} \right\rfloor \\
&= 0
\end{align}
$$

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j}} \right\rfloor \mod M_{j + 1}
&= 0
\end{align}
$$

and of course for any $l \in [j + 1, \alpha - 1]$, we have

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{l}} \right\rfloor
&= \left\lfloor \frac{k \cdot c}{M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{l}} \right\rfloor \\
&= 0
\end{align}
$$

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{l}} \right\rfloor \mod M_{l + 1}
&= 0
\end{align}
$$

Thus, we have

$$
\begin{align}
f_B(k) &\mapsto \left(0, 0, \ldots, \left( k \mod \frac{M_{i}}{c} \right) \cdot c, \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1}, \ldots, \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod M_{j}, 0, 0, \ldots, 0 \right) \\
\end{align}
$$

What’s more, because $k \leq (N - 1)$ and $\frac{M_{i}}{c} \cdot \ldots \cdot M_{j - 1} = \frac{N}{c^{\prime}}$, we have

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor
&= \left\lfloor \frac{k}{\frac{N}{c^{\prime}}} \right\rfloor \\
&= \left\lfloor \frac{k}{N} \cdot c^{\prime} \right\rfloor \\
&\leq \left\lfloor \frac{N - 1}{N} \cdot c^{\prime} \right\rfloor \\
&\leq \left\lfloor c^{\prime} \right\rfloor \\
&\leq c^{\prime}
\end{align}
$$

Because $c^{\prime} < M_{j}$, we have

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod M_{j}
&= \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod c^{\prime} \\
\end{align}
$$

Thus, we have

Then we have

$$
\begin{align}
\left(\widehat{f}_A \circ f_B \right)(k) &= \widehat{f}_A \left(f_B\left(k\right)\right) \\
&= \widehat{f}_A \left(0, 0, \ldots, \left( k \mod \frac{M_{i}}{c} \right) \cdot c, \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1}, \ldots, \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod c^{\prime}, 0, 0, \ldots, 0 \right) \\
&= 0 \cdot d_{0} + 0 \cdot d_{1} + \ldots + \left( k \mod \frac{M_{i}}{c} \right) \cdot c \cdot d_{i} + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1} \right) \cdot d_{i + 1} + \ldots + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod c^{\prime} \right) \cdot d_{j} + 0 \cdot d_{j + 1} + \ldots + 0 \cdot d_{\alpha} \\
&= \left( k \mod \frac{M_{i}}{c} \right) \cdot c \cdot d_{i} + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1} \right) \cdot d_{i + 1} + \ldots + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod c^{\prime} \right) \cdot d_{j} \\
\end{align}
$$

According to Definition 2.13, we have

$$
\begin{align}
f_{A \circ B}(k) &= \left( k \mod \frac{M_{i}}{c} \right) \cdot c \cdot d_{i} + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1} \right) \cdot d_{i + 1} + \ldots + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod c^{\prime} \right) \cdot d_{j} \\
\end{align}
$$

Therefore, $f_{A \circ B} = \widehat{f}_A \circ f_B$.

When $c^{\prime} = 1$, we have $0 \leq k \cdot c < N \cdot c = M_{i} \cdot \ldots \cdot M_{j - 1} \cdot c^{\prime} = M_{i} \cdot \ldots \cdot M_{j - 1}$.

Thus, we have

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor
&= \left\lfloor \frac{k \cdot c}{M_{i} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \\
&= 0
\end{align}
$$

$$
\begin{align}
\left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 1}} \right\rfloor \mod M_{j}
&= 0
\end{align}
$$

Thus, we have

$$
\begin{align}
f_B(k) &\mapsto \left(0, 0, \ldots, \left( k \mod \frac{M_{i}}{c} \right) \cdot c, \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1}, \ldots, \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 2}} \right\rfloor \mod M_{j - 1}, 0, 0, \ldots, 0 \right) \\
\end{align}
$$

Then we have

$$
\begin{align}
\left(\widehat{f}_A \circ f_B \right)(k) &= \widehat{f}_A \left(f_B\left(k\right)\right) \\
&= \widehat{f}_A \left(0, 0, \ldots, \left( k \mod \frac{M_{i}}{c} \right) \cdot c, \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1}, \ldots, \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 2}} \right\rfloor \mod M_{j - 1}, 0, 0, \ldots, 0 \right) \\
&= 0 \cdot d_{0} + 0 \cdot d_{1} + \ldots + \left( k \mod \frac{M_{i}}{c} \right) \cdot c \cdot d_{i} + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1} \right) \cdot d_{i + 1} + \ldots + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 2}} \right\rfloor \mod M_{j - 1} \right) \cdot d_{j - 1} + 0 \cdot d_{j} + 0 \cdot d_{j + 1} + \ldots + 0 \cdot d_{\alpha} \\
&= \left( k \mod \frac{M_{i}}{c} \right) \cdot c \cdot d_{i} + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1} \right) \cdot d_{i + 1} + \ldots + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 2}} \right\rfloor \mod M_{j - 1} \right) \cdot d_{j - 1} \\
\end{align}
$$

According to Definition 2.13, we have

$$
\begin{align}
f_{A \circ B}(k) &= \left( k \mod \frac{M_{i}}{c} \right) \cdot c \cdot d_{i} + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c}} \right\rfloor \mod M_{i + 1} \right) \cdot d_{i + 1} + \ldots + \left( \left\lfloor \frac{k}{\frac{M_{i}}{c} \cdot M_{i + 1} \cdot \ldots \cdot M_{j - 2}} \right\rfloor \mod M_{j - 1} \right) \cdot d_{j - 1} \\
\end{align}
$$

Therefore, $f_{A \circ B} = \widehat{f}_A \circ f_B$.

Let’s then consider the case of $i = \alpha$.

$$
\begin{align}
f_B(k) &= k \cdot r \\
&= k \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1} \cdot c \\
\end{align}
$$

where $k \in [0, N - 1]$.

Because of the isomorphism of the extended layout $A$, we have

$$
\begin{align}
f_B(k) &\mapsto \delta_{\alpha} \cdot k \cdot c \\
\end{align}
$$

Then we have

$$
\begin{align}
\left(\widehat{f}_A \circ f_B \right)(k) &= \widehat{f}_A \left(f_B\left(k\right)\right) \\
&= \widehat{f}_A \left(\delta_{\alpha} \cdot k \cdot c\right) \\
&= k \cdot c \cdot d_{\alpha} \\
\end{align}
$$

According to Definition 2.13, we have

$$
\begin{align}
f_{A \circ B}(k) &= k \cdot c \cdot d_{\alpha} \\
\end{align}
$$

Therefore, $f_{A \circ B} = \widehat{f}_A \circ f_B$.

Taken together, we have $f_{A \circ B} = \widehat{f}_A \circ f_B$ for all the cases in Definition 2.13.

This concludes the proof. $\square$

One might ask why the second condition for admission for composition is necessary. If we don’t have it, $c^{\prime}$ can be fractional and we can still define $A \circ B$ to be

$$
\begin{align}
A \circ B =
\begin{cases}
\left(\frac{M_{i}}{c}, M_{i + 1}, \ldots, M_{j - 1}, \left\lceil c^{\prime} \right\rceil \right) : \left(cd_{i}, d_{i + 1}, \ldots, d_{j - 1}, d_{j} \right) & \text{if } c^{\prime} > 1 \\
\left(\frac{M_{i}}{c}, M_{i + 1}, \ldots, M_{j - 1} \right) : \left(cd_{i}, d_{i + 1}, \ldots, d_{j - 1} \right) & \text{if } c^{\prime} = 1
\end{cases}
\end{align}
$$

It’s not too difficult to show that we still have $f_{A \circ B} = \widehat{f}_A \circ f_B$ for the domain of $f_B$ when the length of $B$ is 1.

However, the critical property $\text{size}(A \circ B) = \text{size}(B)$ will not hold in this case. As we will see later, without having this property, $f_{A \circ B} = \widehat{f}_A \circ f_B$ cannot be true when $B$ is multi-modal, i.e., the length of $B$ is greater than 1.

Definition 2.16 Interval of Definition

In the situation of Definition 2.12, where layout $B$ is of length 1, let $f_B: [0, N) \to \mathbb{N}$ be the layout function, and let $I = [r, r(N - 1)]$ be the interval given by the convex closure of the image $f_B([1, N))$. Let $M^{\prime} = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}$ and $J = I \cap [1, M^{\prime})$ (so $J = \emptyset$ if $\alpha = 0$). Then the interval of definition for $\{\mathbf{S}, B\}$ is $J$.

Definition 2.17 Composition - General Case

Let $\mathbf{S} = (M_{0}, M_{1}, \ldots, M_{\alpha})$ be a shape tuple, and let $B = (N_{0}, N_{1}, \ldots, N_{\beta}) : (r_{0}, r_{1}, \ldots, r_{\beta})$ be a layout, let $B_{k} = (N_{k}) : (r_{k})$ for $0 \leq k \leq \beta$. Then we say that the pair $\{\mathbf{S}, B\}$ is admissible for composition if:

For all $0 \leq k \leq \beta$, the pair $\{\mathbf{S}, B_{k}\}$ is admissible for composition in the sense of Definition 2.12.
The interval of definition for the pairs $\{\mathbf{S}, B_{k}\}_{0 \leq k \leq \beta}$ are disjoint.

In this case, if $\mathbf{D} = (d_{0}, d_{1}, \ldots, d_{\alpha})$ is a stride tuple and $A = \mathbf{S} : \mathbf{D}$, then we define the composition $A \circ B$ to be the concatenated layout

$$
\begin{align}
A \circ B := \left(A \circ B_{0}, A \circ B_{1}, \ldots, A \circ B_{\beta}\right)
\end{align}
$$

where each $A \circ B_{k}$ is defined as in Definition 2.13.

Theorem 2.18 Composition - General Case

In the situation of Definition 2.17, we have that $f_{A \circ B} = \widehat{f}_A \circ f_B$.

Proof

By Definition 2.13, $\text{size}(A \circ B_{k}) = \text{size}(B_{k}) = N_{k}$ for all $0 \leq k \leq \beta$. We have the following isomorphism for both the layout $A \circ B$ or the layout $B$.

$$
\begin{align}
\iota: [0, N_{0} \cdot N_{1} \cdot \ldots \cdot N_{\beta}) \cong [0, N_{0}) \times [0, N_{1}) \times \ldots \times [0, N_{\beta})
\end{align}
$$

Given any $x \in [0, N_{0} \cdot N_{1} \cdot \ldots \cdot N_{\beta})$, because of the isomorphism $\iota$, we have

$$
\begin{align}
x &\mapsto \left(x_{0}, x_{1}, \ldots, x_{\beta}\right)
\end{align}
$$

By Lemma 2.19, we have

$$
\begin{align}
\widehat{f}_A \circ f_B(x) &= \widehat{f}_A \left(f_B(x)\right) \\
&= \widehat{f}_A \left(f_{B_{0}}(x_{0}) + f_{B_{1}}(x_{1}) + \ldots + f_{B_{\beta}}(x_{\beta})\right) \\
\end{align}
$$

By Definition 2.17, Lemma 2.19, and Definition 2.13, we have

$$
\begin{align}
f_{A \circ B}(x) &= f_{A \circ B_{0}}(x_{0}) + f_{A \circ B_{1}}(x_{1}) + \ldots + f_{A \circ B_{\beta}}(x_{\beta}) \\
&= \widehat{f}_A \circ f_{B_{0}}(x_{0}) + \widehat{f}_A \circ f_{B_{1}}(x_{1}) + \ldots + \widehat{f}_A \circ f_{B_{\beta}}(x_{\beta}) \\
&= \widehat{f}_A \left(f_{B_{0}}(x_{0})\right) + \widehat{f}_A \left(f_{B_{1}}(x_{1})\right) + \ldots + \widehat{f}_A \left(f_{B_{\beta}}(x_{\beta})\right) \\
\end{align}
$$

Normally, we don’t have $\widehat{f}_A \left(x_{A, 0} + x_{A, 1} + \ldots + x_{A, \beta}\right) = \widehat{f}_A \left(x_{A, 0}\right) + \widehat{f}_A \left(x_{A, 1}\right) + \ldots + \widehat{f}_A \left(x_{A, \beta}\right)$, because the layout function $\widehat{f}_A$ is not linear. For example, suppose $A = (2, 3) : (1, 4)$, and we have $\widehat{f}_A(1) = 1$ and $\widehat{f}_A(3) = 5$. $\widehat{f}_A(3) = \widehat{f}_A(1 + 1 +1) \neq \widehat{f}_A(1) + \widehat{f}_A(1) + \widehat{f}_A(1)$.

However, there are some special cases where the above equation holds. For example, for simplicity, suppose $\beta = \alpha$, if we have

$$
\begin{align}
x_{A, 0} &\in [0, M_{0}) \\
x_{A, 1} &\in \{0, 1 \cdot M_{0}, 2 \cdot M_{0}, \ldots, \infty \cdot M_{0}\} \cap [0, M_{1}) \\
x_{A, 2} &\in \{0, 1 \cdot M_{0} \cdot M_{1}, 2 \cdot M_{0} \cdot M_{1}, \ldots, \infty \cdot M_{0} \cdot M_{1}\} \cap [0, M_{2}) \\
&\vdots \\
x_{A, \alpha} &\in \{0, 1 \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}, 2 \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}, \ldots, \infty \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}\} \cap [0, M_{\alpha}) \\
\end{align}
$$

By definition,

$$
\begin{align}
\widehat{f}_A \left(x\right)
&= \left( x \mod M_{0} \right) \cdot d_{0} + \left( \left\lfloor \frac{x}{M_{0}} \right\rfloor \mod M_{1} \right) \cdot d_{1} + \ldots + \left( \left\lfloor \frac{x}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \mod M_{\alpha} \right) \cdot d_{\alpha} \\
\end{align}
$$

So in our case, we have

$$
\begin{align}
\widehat{f}_A \left(x_{A, 0} + x_{A, 1} + \ldots + x_{A, \beta}\right)
&= \left( \left( x_{A, 0} + x_{A, 1} + \ldots + x_{A, \beta} \right) \mod M_{0} \right) \cdot d_{0} + \left( \left\lfloor \frac{x_{A, 0} + x_{A, 1} + \ldots + x_{A, \beta}}{M_{0}} \right\rfloor \mod M_{1} \right) \cdot d_{1} + \ldots + \left( \left\lfloor \frac{x_{A, 0} + x_{A, 1} + \ldots + x_{A, \beta}}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \mod M_{\alpha} \right) \cdot d_{\alpha} \\
&= \left( x_{A, 0} \mod M_{0} \right) \cdot d_{0} + \left( \left\lfloor \frac{x_{A, 1}}{M_{0}} \right\rfloor \mod M_{1} \right) \cdot d_{1} + \ldots + \left( \left\lfloor \frac{x_{A, \beta}}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{\alpha - 1}} \right\rfloor \mod M_{\alpha} \right) \cdot d_{\alpha} \\
&= \widehat{f}_A \left(x_{A, 0}\right) + \widehat{f}_A \left(x_{A, 1}\right) + \ldots + \widehat{f}_A \left(x_{A, \beta}\right) \\
\end{align}
$$

The idea of having the second condition for admission for composition, i.e., the interval of definition for the pairs $\{\mathbf{S}, B_{k}\}_{0 \leq k \leq \beta}$ are disjoint, are exactly the same.

Because $r_{k} = M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c$, for $x_{k} \in [0, N_{k})$, we have

$$
\begin{align}
f_{B_{k}}(x_{k}) &\in [0, 1 \cdot r, 2 \cdot r, \ldots, (N_{k} - 1) \cdot r] \\
&= [0, M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c, 2 \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c, \ldots, (N_{k} - 1) \cdot M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c] \\
\end{align}
$$

Because of the isomorphism of the layout $A$, we have

$$
\begin{align}
f_{B_{k}}(x_{k}) &\mapsto \left(x_{A, 0}, x_{A, 1}, \ldots, x_{A, \alpha}\right) \\
\end{align}
$$

where

$$
\begin{align}
x_{A, i} = \left\lfloor \frac{f_{B_{k}}(x_{k})}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i - 1}} \right\rfloor \mod M_{i}
\end{align}
$$

Then we must have some integer $p_{k}$ and $q_{k}$, $p_{k} < q_{k}$, where $x_{A, i} = 0$ for $i < p_{k}$ and $i > q_{k}$.

The second condition for admission for composition ensures that $[p_{k}, q_{k}]$ are disjoint for all $0 \leq k \leq \beta$. Therefore, we can have the equation:

$$
\begin{align}
\widehat{f}_A \left(x_{A, 0} + x_{A, 1} + \ldots + x_{A, \beta}\right) = \widehat{f}_A \left(x_{A, 0}\right) + \widehat{f}_A \left(x_{A, 1}\right) + \ldots + \widehat{f}_A \left(x_{A, \beta}\right)
\end{align}
$$

This concludes the proof. $\square$

Going back to the discussion why the second condition for admission for composition in the restricted case in Proposition 2.14 is necessary.

Because the critical property $\text{size}(A \circ B) = \text{size}(B)$ will not hold, we will have two completely different isomorphisms for the layout $A \circ B$ and the layout $B$, respectively.

Lemma 2.19 Concatenation of Layouts

Let $C = (C_{0}, C_{1}, \ldots, C_{\gamma})$ be a concatenated layout. Let

$$
\begin{align}
\iota: [0, \text{size}(C)) \cong [0, \text{size}(C_{0})) \times \cdots \times [0, \text{size}(C_{\gamma}))
\end{align}
$$

be the usual isomorphism (as in Definition 2.3). Then the following diagram commutes:

$$
\begin{equation}
\begin{CD}
[0, \text{size}(C)) @>{\iota}>{\cong}> [0, \text{size}(C_0)) \times \dots \times [0, \text{size}(C_\gamma)) \\
@V{f_C}VV @VV{(f_{C_0}, \dots, f_{C_\gamma})}V \\
\mathbb{N} @<+<< \mathbb{N} \times \dots \times \mathbb{N}
\end{CD}
\end{equation}
$$

Proof

If $C_{0}, \ldots, C_{\gamma}$ are all length 1 layouts, then this is immediate from Definition 2.3.

Concretely, suppose $C_{k} = (M_{k}) : (d_{k})$ for all $0 \leq k \leq \gamma$. The concatenated layout becomes

$$
\begin{align}
C &= (C_{0}, C_{1}, \ldots, C_{\gamma}) \\
&= (M_{0}:d_{0}, M_{1}:d_{1}, \ldots, M_{\gamma}:d_{\gamma}) \\
&= (M_{0}, M_{1}, \ldots, M_{\gamma}) : (d_{0}, d_{1}, \ldots, d_{\gamma})
\end{align}
$$

Because of the isomorphism of the layout $C$, we have

$$
\begin{align}
x &\mapsto \left(x_{0}, x_{1}, \ldots, x_{\gamma}\right) \\
\end{align}
$$

Then by definition, the concatenated layout function is

$$
\begin{align}
f_C(x) &= x_{0} \cdot d_{0} + x_{1} \cdot d_{1} + \ldots + x_{\gamma} \cdot d_{\gamma} \\
\end{align}
$$

For each of the length 1 layouts $C_{k}$, by definition, the layout function is

$$
\begin{align}
f_{C_{k}}(x_{k}) &= x_{k} \cdot d_{k} \\
\end{align}
$$

Therefore, we have

$$
\begin{align}
f_C(x) &= f_{C_{0}}(x_{0}) + f_{C_{1}}(x_{1}) + \ldots + f_{C_{\gamma}}(x_{\gamma}) \\
\end{align}
$$

In the case where some of the layouts $C_{k}$ are not length 1, we can apply the same argument to each of the sublayouts $C_{k}$, and the result follows by induction.

Concretely, suppose $C_{k}$ are not length 1 and $C_{k} = (C_{k, 0}, C_{k, 1}, \ldots, C_{k, \gamma_{k}})$, where $C_{k, 0}, \ldots, C_{k, \gamma_{k}}$ are length 1 layouts. Based on what we have proved above, we have

$$
\begin{align}
f_{C_{k}}(x_{k}) &= f_{C_{k, 0}}(x_{k, 0}) + f_{C_{k, 1}}(x_{k, 1}) + \ldots + f_{C_{k, \gamma_{k}}}(x_{k, \gamma_{k}}) \\
\end{align}
$$

where

$$
\begin{align}
x_{k} &\mapsto \left(x_{k, 0}, x_{k, 1}, \ldots, x_{k, \gamma_{k}}\right) \\
\end{align}
$$

Suppose the layout $C$ can be maximally decomposed into layouts of length 1.

$$
\begin{align}
C &= (C_{0}, C_{1}, \ldots, C_{\gamma}) \\
&= (C_{0, 0}, C_{0, 1}, \ldots, C_{0, \gamma_{0}}, C_{1, 0}, C_{1, 1}, \ldots, C_{1, \gamma_{1}}, \ldots, C_{\gamma, 0}, C_{\gamma, 1}, \ldots, C_{\gamma, \gamma_{\gamma}}) \\
\end{align}
$$

Then we have

$$
\begin{align}
f_C(x) &= f_{C_{0,0}}(x_{0,0}) + f_{C_{0,1}}(x_{0,1}) + \ldots + f_{C_{0,\gamma_{0}}}(x_{0,\gamma_{0}}) + f_{C_{1,0}}(x_{1,0}) + f_{C_{1,1}}(x_{1,1}) + \ldots + f_{C_{1,\gamma_{1}}}(x_{1,\gamma_{1}}) + \ldots + f_{C_{\gamma,0}}(x_{\gamma,0}) + f_{C_{\gamma,1}}(x_{\gamma,1}) + \ldots + f_{C_{\gamma,\gamma_{\gamma}}}(x_{\gamma,\gamma_{\gamma}}) \\
&= f_{C_{0}}(x_{0}) + f_{C_{1}}(x_{1}) + \ldots + f_{C_{\gamma}}(x_{\gamma}) \\
\end{align}
$$

where

$$
\begin{align}
x &\mapsto \left(x_{0}, x_{1}, \ldots, x_{\gamma}\right) \\
\end{align}
$$

This concludes the proof. $\square$

Definition 2.21 CUTLASS Admission for Composition - Restricted Case

The CUTLASS admission for composition in the restricted case is more restrictive.

$M$ is left divisible by $r$. Write $\widehat{M} = r \cdot \widehat{M}^{\prime}$.
With respect to its induced factorization, $\widehat{M}^{\prime}$ is left divisible by $N$.

Note that the second condition is the left divisibility, instead of the weak left divisibility in Definition 2.12.

For example, suppose $A = (8, 6, 8) : (1, 16, 108)$ and $B = (8) : (4)$. According to Definition 2.12, $A \circ B = (2, 4) : (4, 16)$. However, if we run composition for $A$ and $B$ in CUTLASS, we will encounter an error because CUTLASS requires left divisibility for the second condition.

More specifically, in the CUTLASS composition layout algebra implementation, we have

void shape_div(int* shapeA, int N, int& strideB) {
   for (int i = 0; i < N; ++i) {
      assert(shapeA[i] %   strideB == 0 or
               strideB % shapeA[i] == 0);
      int new_shape  = ceil_div(shapeA[i], strideB);
      int new_stride = ceil_div(strideB, shapeA[i]);
      shapeA[i] = new_shape;
      strideB   = new_stride;
   }
}

void shape_mod(int* shapeA, int N, int& shapeB) {
   for (int i = 0; i < N; ++i) {
      assert(shapeA[i] %    shapeB == 0 or
                shapeB % shapeA[i] == 0);
      int new_shapeA =      min(shapeA[i], shapeB);
      int new_shapeB = ceil_div(shapeB, shapeA[i]);
      shapeA[i] = new_shapeA;
      shapeB    = new_shapeB;
   }
}

The reason why CUTLASS enforces this is because of the logical division operation. Without this restriction, the logical division operation will not be defined in some cases.

By-Mode Composition

In some cases, we would like to perform composition for each mode of the layout separately.

Let the layout $A$ be a concatenation of layouts $A = (A_{0}, A_{1}, \ldots, A_{\alpha})$, and $B$ be a tile of layouts $B = \langle B_{0}, B_{1}, \ldots, B_{\alpha} \rangle$. Note that $B$, a tiler, is just a tuple of layouts instead of a concatenated layout. A tiler may consists of one or more than one layouts. Then we define the by-mode composition $A \circ B$ to be the layout

$$
\begin{align}
A \circ B := \left(A_{0} \circ B_{0}, A_{1} \circ B_{1}, \ldots, A_{\alpha} \circ B_{\alpha}\right)
\end{align}
$$

In some cases, we also have the layout $A$ be a concatenation of layouts $A = (A_{0}, A_{1}, \ldots, A_{\alpha})$, and $B$ be a tile of layouts $B = \langle B_{0}, B_{1}, \ldots, B_{\beta} \rangle$, where $\beta \geq \alpha$. In this case, we define the by-mode composition $A \circ B$ to be the layout

$$
\begin{align}
A \circ B := \left(A_{0} \circ B_{0}, A_{1} \circ B_{1}, \ldots, A_{\alpha} \circ B_{\alpha}\right)
\end{align}
$$

Non-Integral Layout Composition

Similar to the non-integral layout complementation, the non-integral layout composition can be derived by coalescing the the layout so that the layout becomes integral. All the properties of the integral layout composition derived above can then be used.

Implication of Composition

The composition operation is usually used for selecting a sublayout from a layout. The sublayout can be strided even for each mode if the by-mode composition is performed.

Logical Division

Definition 2.22 Logical Division

Let $A = \mathbf{S} : \mathbf{D}$ and $B$ be layouts, and let $M$ be the size of $A$. Suppose that the pairs $\{B, M\}$ and $\{\mathbf{S}, B\}$ are admissible (for complementation and composition, respectively). Then we define the logical division $A / B$ to be the layout

$$
\begin{align}
A / B := A \circ \left(B, \text{complement}(B, M)\right)
\end{align}
$$

Note that here the conditions of admission for composition follows Definition 2.21 rather than Definition 2.12.

Implicitly Lemma 2.23 is used in Definition 2.22.

Lemma 2.23 Logical Division Implication

Suppose $A = \mathbf{S} : \mathbf{D}$, $M = \text{size}(A)$, and $B$ are as in Definition 2.22. Then $\{\mathbf{S}, \left(B, \text{complement}(B, M)\right)\}$ is admissible for composition.

Proof

We denote $A = \mathbf{S} : \mathbf{D} = (M_{0}, M_{1}, \ldots, M_{\alpha}) : (d_{0}, d_{1}, \ldots, d_{\alpha})$, and $B = (N_{0}, N_{1}, \ldots, N_{\beta}) : (r_{0}, r_{1}, \ldots, r_{\beta})$. Let

$$
\begin{align}
\varphi: [0, \beta] \xrightarrow{\cong} [0, \beta]
\end{align}
$$

be the automorphism such that $B^{\varphi} := (N_{\varphi(0)}, N_{\varphi(1)}, \ldots, N_{\varphi(\beta)}) : (r_{\varphi(0)}, r_{\varphi(1)}, \ldots, r_{\varphi(\beta)})$ is sorted.

Then by Definition 2.6, we have

$$
\begin{align}
B^{\prime}
&= \text{complement}(B, M) \\
&= \left(r_{\varphi(0)}, \frac{r_{\varphi(1)}}{N_{\varphi(0)}r_{\varphi(0)}}, \frac{r_{\varphi(2)}}{N_{\varphi(1)}r_{\varphi(1)}}, \ldots, \frac{r_{\varphi(\beta)}}{N_{\varphi(\beta - 1)}r_{\varphi(\beta - 1)}}, \frac{M}{N_{\varphi(\beta)}r_{\varphi(\beta)}}\right) : (1, N_{\varphi(0)}r_{\varphi(0)}, N_{\varphi(1)}r_{\varphi(1)}, \ldots, N_{\varphi(\beta - 1)}r_{\varphi(\beta - 1)}, N_{\varphi(\beta)}r_{\varphi(\beta)})
\end{align}
$$

Now we denote each mode of $B^{\prime}$ as

$$
\begin{align}
B^{\prime}_{k}
&=
\begin{cases}
\left(r_{\varphi(0)}\right) : (1) & \text{if } k = 0 \\
\left(\frac{r_{\varphi(k)}}{N_{\varphi(k - 1)}r_{\varphi(k - 1)}}\right) : (N_{\varphi(k - 1)}r_{\varphi(k - 1)}) & \text{if } 1 \leq k \leq \beta \\
\left(\frac{M}{N_{\varphi(\beta)}r_{\varphi(\beta)}}\right) : (N_{\varphi(\beta)}r_{\varphi(\beta)}) & \text{if } k = \beta + 1
\end{cases}
\end{align}
$$

for $k \in [0, \beta + 1]$.

Because the pair $\{\mathbf{S}, B\}$ is admissible for composition, for each mode in $B$, $B_{k} = (N_{k}) : (r_{k})$ for $k \in [0, \beta]$, by Definition 2.17, the pair $\{\mathbf{S}, B_{k}\}$ is admissible for composition. Therefore, by Definition 2.12, $M$ is left divisible by $r_{k}$ and the quotient $\frac{M}{r_{k}}$ is left divisible (not weakly left divisible) by $N_{k}$ for all $k \in [0, \beta]$.

It is trivial to see $M$ is left divisible by $1$. Let’s see if $M$ is also left divisible by $N_{\varphi(k - 1)}r_{\varphi(k - 1)}$ for all $k \in [1, \beta + 1]$.

Suppose $\varphi(k - 1) = h$ and Because $M$ is left divisible by $r_{h}$, we have

$$
\begin{align}
r_{\varphi(k - 1)} &= r_{h} \\
&= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k - 1} - 1} \cdot c_{k - 1}
\end{align}
$$

where $c_{k}$ divides $M_{i_{k}}$.

$$
\begin{align}
N_{\varphi(k - 1)} &= N_{h} \\
&= \frac{M_{i_{k}}}{c_{k - 1}} \cdot M_{i_{k - 1} + 1} \cdot \ldots \cdot M_{j_{k - 1} - 1} \cdot c_{k - 1}^{\prime}
\end{align}
$$

where $c_{k - 1}^{\prime}$ divides $M_{j_{k - 1}}$.

Thus, $M$ is also left divisible by $N_{\varphi(k - 1)}r_{\varphi(k - 1)}$, because

$$
\begin{align}
N_{\varphi(k - 1)}r_{\varphi(k - 1)} &= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k - 1} - 1} \cdot c_{k - 1} \cdot \frac{M_{i_{k}}}{c_{k - 1}} \cdot M_{i_{k} + 1} \cdot \ldots \cdot M_{j_{k - 1} - 1} \cdot c_{k - 1}^{\prime} \\
&= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{j_{k - 1} - 1} \cdot c_{k - 1}^{\prime}
\end{align}
$$

where $c_{k - 1}^{\prime}$ divides $M_{j_{k - 1}}$.

Next, we will have to show $M$ is left divisible by $\frac{r_{\varphi(k)}}{N_{\varphi(k - 1)}r_{\varphi(k - 1)}}$ for all $k \in [1, \beta]$.

$$
\begin{align}
r_{\varphi(k)} &= M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c_{k} \\
\end{align}
$$

Because $r_{\varphi(k)} \geq N_{\varphi(k - 1)}r_{\varphi(k - 1)}$, we must have $i_{k} \geq j_{k - 1}$. Thus,

$$
\begin{align}
\frac{r_{\varphi(k)}}{N_{\varphi(k - 1)}r_{\varphi(k - 1)}}
&= \frac{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c_{k}}{M_{0} \cdot M_{1} \cdot \ldots \cdot M_{j_{k - 1} - 1} \cdot c_{k - 1}^{\prime}} \\
&= \frac{M_{j_{k - 1}} \cdot M_{j_{k - 1} + 1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c_{k}}{c_{k - 1}^{\prime}} \\
&= \frac{M_{j_{k - 1}}}{c_{k - 1}^{\prime}} \cdot M_{j_{k - 1} + 1} \cdot \ldots \cdot M_{i_{k} - 1} \cdot c_{k} \\
\end{align}
$$

Thus, $M$ is left divisible by $\frac{r_{\varphi(k)}}{N_{\varphi(k - 1)}r_{\varphi(k - 1)}}$ for all $k \in [1, \beta]$.

It is trivial to see $M$ is left divisible by $\frac{M}{N_{\varphi(\beta)}r_{\varphi(\beta)}}$.

Therefore, the pair $\{\mathbf{S}, B^{\prime}_{k}\}$ is admissible for composition for all $k \in [0, \beta + 1]$.

By Definition 2.17, in order to show $\{\mathbf{S}, (B, \text{complement}(B, M))\}$ is admissible for composition, we also have to show the interval of definition for the pairs $\{\mathbf{S}, B_{k}\}_{0 \leq k \leq \beta}$ and $\{\mathbf{S}, B^{\prime}_{k}\}_{0 \leq k \leq \beta + 1}$ are disjoint.

By Proposition 2.7, the concatenated layout $(B, \text{complement}(B, M))$ is automatically satisfied with the disjoint argument.

Therefore, $\{\mathbf{S}, (B, \text{complement}(B, M))\}$ is admissible for composition.

This concludes the proof. $\square$

Note that in Definition 2.22, if the conditions of admission for composition follows Definition 2.12, our proof above will not be valid. That’s why CUTLASS enforces the conditions of admission for composition follows Definition 2.21.

By-Mode Logical Division

In some cases, we would like to perform logical division for each mode of the layout separately.

Let the layout $A$ be a concatenation of layouts $A = (A_{0}, A_{1}, \ldots, A_{\alpha})$, and $B$ be a tile of layouts $B = \langle B_{0}, B_{1}, \ldots, B_{\alpha} \rangle$. Note that $B$ is just a tuple of layouts instead of a concatenated layout. Then we define the by-mode logical division $A / B$ to be the layout

$$
\begin{align}
A / B &:= \left(A_{0} / B_{0}, A_{1} / B_{1}, \ldots, A_{\alpha} / B_{\alpha}\right) \\
&= \left(A_{0} \circ \left(B_{0}, \text{complement}(B_{0}, M)\right), A_{1} \circ \left(B_{1}, \text{complement}(B_{1}, M)\right), \ldots, A_{\alpha} \circ \left(B_{\alpha}, \text{complement}(B_{\alpha}, M)\right)\right) \\
&= \left(\left(A_{0} \circ B_{0}, A_{0} \circ \text{complement}(B_{0}, M)\right), \left(A_{1} \circ B_{1}, A_{1} \circ \text{complement}(B_{1}, M)\right), \ldots, \left(A_{\alpha} \circ B_{\alpha}, A_{\alpha} \circ \text{complement}(B_{\alpha}, M)\right)\right) \\
\end{align}
$$

In some cases, we also have $A$ be a concatenation of layouts $A = (A_{0}, A_{1}, \ldots, A_{\alpha})$, and $B$ be a tile of layouts $B = \langle B_{0}, B_{1}, \ldots, B_{\beta} \rangle$, where $\alpha \geq \beta$. In this case, we define the by-mode logical division $A / B$ to be the layout

$$
\begin{align}
A / B
&:= \left(A_{0} / B_{0}, A_{1} / B_{1}, \ldots, A_{\beta} / B_{\beta}, A_{\beta + 1}, \ldots, A_{\alpha}\right) \\
&= \left(A_{0} \circ \left(B_{0}, \text{complement}(B_{0}, M)\right), A_{1} \circ \left(B_{1}, \text{complement}(B_{1}, M)\right), \ldots, A_{\beta} \circ \left(B_{\beta}, \text{complement}(B_{\beta}, M)\right), A_{\beta + 1}, \ldots, A_{\alpha}\right) \\
&= \left(\left(A_{0} \circ B_{0}, A_{0} \circ \text{complement}(B_{0}, M)\right), \left(A_{1} \circ B_{1}, A_{1} \circ \text{complement}(B_{1}, M)\right), \ldots, \left(A_{\beta} \circ B_{\beta}, A_{\beta} \circ \text{complement}(B_{\beta}, M)\right), A_{\beta + 1}, \ldots, A_{\alpha}\right) \\
\end{align}
$$

Logical Division Variants

In by-mode logical division, it is inconvenient to select a multi-dimensional tile. Given logical division $A / B$,

$$
\begin{align}
A / B
&= \left(\left(A_{0} \circ B_{0}, A_{0} \circ \text{complement}(B_{0}, M)\right), \left(A_{1} \circ B_{1}, A_{1} \circ \text{complement}(B_{1}, M)\right), \ldots, \left(A_{\beta} \circ B_{\beta}, A_{\beta} \circ \text{complement}(B_{\beta}, M)\right), A_{\beta + 1}, \ldots, A_{\alpha}\right) \\
\end{align}
$$

We would like to iterate through each multi-dimensional sublayout $\left(A_{0} \circ B_{0}, A_{1} \circ B_{1}, \ldots, A_{\beta} \circ B_{\beta}\right)$ from $A / B$ by indexing.

So zipped division is introduced by rearranging the layout $A / B$ to

$$
\begin{align}
\text{zipped_division}(A, B)
&= \left(\left(A_{0} \circ B_{0}, A_{1} \circ B_{1}, \ldots, A_{\beta} \circ B_{\beta}\right), \left(A_{0} \circ \text{complement}(B_{0}, M), A_{1} \circ \text{complement}(B_{1}, M), \ldots, A_{\beta} \circ \text{complement}(B_{\beta}, M)\right), A_{\beta + 1}, \ldots, A_{\alpha}\right) \\
&= \left(A \circ B, \left(A_{0} \circ \text{complement}(B_{0}, M), A_{1} \circ \text{complement}(B_{1}, M), \ldots, A_{\beta} \circ \text{complement}(B_{\beta}, M)\right), A_{\beta + 1}, \ldots, A_{\alpha}\right) \\
\end{align}
$$

In this way, we can iterate through each multi-dimensional sublayout $A \circ B = \left(A_{0} \circ B_{0}, A_{1} \circ B_{1}, \ldots, A_{\beta} \circ B_{\beta}\right)$ from $\text{zipped_division}(A, B)$ by indexing on the second mode. Note because $A_{0} \circ B_{0}$ has the same domain as $B_{0}$, $A_{1} \circ B_{1}$ has the same domain as $B_{1}$, and so on, the domain or shape of $A \circ B$ becomes very predictable, which is $(\text{shape}(B_{0}), \text{shape}(B_{1}), \ldots, \text{shape}(B_{\beta}))$. The shape of $ \left(A_{0} \circ \text{complement}(B_{0}, M), A_{1} \circ \text{complement}(B_{1}, M), \ldots, A_{\beta} \circ \text{complement}(B_{\beta}, M)\right)$ is less predictable, but its 1D size is predictable, which is $\left(\frac{M}{\text{size}(B_{0})}, \frac{M}{\text{size}(B_{1})}, \ldots, \frac{M}{\text{size}(B_{\beta})}\right)$.

CuTe implemented four logical division variants, including logical divide, zipped divide, tiled divide, and flat divide. Assuming the tiler is a tuple of two layouts, the logical division variants are defined as follows:

Layout Shape : (M, N, L, ...)
Tiler Shape  : <TileM, TileN>

logical_divide : ((TileM,RestM), (TileN,RestN), L, ...)
zipped_divide  : ((TileM,TileN), (RestM,RestN,L,...))
tiled_divide   : ((TileM,TileN), RestM, RestN, L, ...)
flat_divide    : (TileM, TileN, RestM, RestN, L, ...)

Implication of Logical Division

Because we have previously proved that $\left(B, \text{complement}(B, M)\right)$ is a layout that is a bijection $[0, M) \cong [0, M)$, the logical division $A / B := A \circ \left(B, \text{complement}(B, M)\right)$, where $M$ is the size of $A$, is a layout that has the same domain as $A$ and consequently also the same codomain as $A$. This means that the way the original layout $A$ maps from the coordinates $[0, M)$ to the integers is scrambled or permutated to a new layout, i.e., the logical division $A / B$.

Because of the definition of composition,

$$
\begin{align}
A / B
&:= A \circ \left(B, \text{complement}(B, M)\right) \\
&= \left( A \circ B, A \circ \text{complement}(B, M) \right) \\
\end{align}
$$

$A \circ B$ is the layout that selects a sublayout, i.e., a tile, from $A$ based on the layout $B$, and $A \circ \text{complement}(B, M)$ that repeats the $A \circ B$ layout to fill the domain and codomain of $A$.

Logical division informs us how to extract a tile using the tiler $B$ from a layout $A$, and how to repeat the tile to fill the domain and codomain of $A$, or how to select a tile from the consequent repeated layout using indexing.

Logical Product

Definition 4.1 Logical Product

Given a tiler and a layout, we could compute how the repeat layout that repeats the tiler to fill the domain and codomain of the layout using logical division. Similarly, given a tiler and a repeat layout, we could compute the resulting layout that is a tile of the repeat layout using logical product.

Let $A$ and $B$ be layouts, $M = \text{size}(A) \text{cosize}(B)$. Suppose that the pairs $\{A, M\}$ and $\{\text{complement}(A, M), B\}$ are admissible (for complementation and composition, respectively). Then we define the logical product $A \times B$ to be the layout

$$
\begin{align}
A \times B := \left(A , \text{complement}(A, M) \circ B\right)
\end{align}
$$

The complementation of a layout $A$ finds a complement layout $\text{complement}(A, M)$ with a positive integer $M$ so that when the two layouts are concatenated, such as $\left(A , \text{complement}(A, M)\right)$, the new layout is a bijection $[0, M) \cong [0, M)$. However, the codomain of the resulting layout we want to create might not need to be $[0, M)$. The layout $B$ can have $\text{cosize}(B)$ that is much larger than $\text{size}(B)$. Consequently, after repeating the layout $A$ using $\text{complement}(A, M) \circ B$, the resulting layout $A \times B$ will fill up to the codomain $M$ but potentially strided. In addition, even if $\text{size}(B) = \text{cosize}(B)$, we could permute $\text{complement}(A, M)$ by using the composition operation, because there can be multiple layouts whose cosize is $\text{cosize}(B)$.

Because $\text{complement}(A, M) \circ B$ is a sublayout of $\text{complement}(A, M)$, the layout concatenation $\left(A , \text{complement}(A, M) \circ B\right)$ remains a valid layout.

By-Mode Logical Product

In some cases, we would like to perform logical division for each mode of the layout separately. For example, if we would like to tile a 2D layout with a 2D tiler, we would like to perform logical product for each mode of the layout separately.

Let the layout $A$ be a concatenation of layouts $A = (A_{0}, A_{1}, \ldots, A_{\alpha})$, and $B$ be a tile of layouts $B = \langle B_{0}, B_{1}, \ldots, B_{\alpha} \rangle$. Note that $B$ is just a tuple of layouts instead of a concatenated layout. Then we define the by-mode logical product $A \times B$ to be the layout

$$
\begin{align}
A \times B &:= \left(A_{0} \times B_{0}, A_{1} \times B_{1}, \ldots, A_{\alpha} \times B_{\alpha}\right) \\
&= \left(A_{0} , \text{complement}(A_{0}, M) \circ B_{0}, A_{1} , \text{complement}(A_{1}, M) \circ B_{1}, \ldots, A_{\alpha} , \text{complement}(A_{\alpha}, M) \circ B_{\alpha}\right) \\
&= \left(\left(A_{0} , \text{complement}(A_{0}, M) \circ B_{0}\right), \left(A_{1} , \text{complement}(A_{1}, M) \circ B_{1}\right), \ldots, \left(A_{\alpha} , \text{complement}(A_{\alpha}, M) \circ B_{\alpha}\right)\right) \\
\end{align}
$$

Logical Product Variants

Similar to logical division, there are also variants of logical product, including logical product, zipped product, tiled product, and flat product, for the convenience of tile selection and iteration. Assuming the tiler is a tuple of two layouts, the logical product variants are defined as follows:

Layout Shape : (M, N, L, ...)
Tiler Shape  : <TileM, TileN>

logical_product : ((M,TileM), (N,TileN), L, ...)
zipped_product  : ((M,N), (TileM,TileN,L,...))
tiled_product   : ((M,N), TileM, TileN, L, ...)
flat_product    : (M, N, TileM, TileN, L, ...)

Implication of Logical Product

Similar to logical division, the logical product $A \times B$ informs the us what the layout is after repeating the original layout using the tiler.

Permutation Expressible As Layout Functions

This section explains how to retrieve all permutations that are expressible as layout functions in a structured way. This is important because some permutation algebra used in CUTLASS and CuTe, such as swizzle, does not seem to be expressed as layout function. The basic language of category theory is used to describe the process.

Definition 3.1 Ordered Factorization

We define the set $\text{ob}(\textbf{Fact})$ of ordered factorizations to consists of all expressions $[p_1 \ldots p_k]$ where $k \geq 0$ and the $p_{i}$ are primes (not necessarily distinct). The case $k = 0$ corresponds to the empty factorization, which we denote as $[\ ]$.

For example, the set $\text{ob}(\textbf{Fact})$ includes expressions such as $[\ ]$, $[2]$, $[3]$, $[22]$, $[23]$, $[32]$, $[232]$, etc.

Notation 3.3

Let $\underline{k}$ denote the set $\{1, 2, \ldots, k\}$ consisting of $k$ elements. (If $k = 0$, then $\underline{0} = \emptyset$ is the empty set.)

Definition 3.4 Category of Ordered Factorizations

We define the category $\textbf{Fact}$ of ordered factorizations as follows:

$\text{ob}(\textbf{Fact})$ is the set of objects of $\textbf{Fact}$.
For every expression $E = [p_1 \ldots p_k]$ in $\text{ob}(\textbf{Fact})$ and every morphism of finite sets $\alpha: \underline{n} \to \underline{k}$, we have a morphism

$$
\begin{align}
E^{\alpha} = [p_{\alpha(1)} \ldots p_{\alpha(n)}] \xrightarrow{\alpha_{E}} E = [p_1 \ldots p_k]
\end{align}
$$

in $\textbf{Fact}$. This defines the set of all morphisms with codomain $E$, and ranging over all $E$ thus defines the set of all morphisms in $\textbf{Fact}$.

The composition of morphisms is defined as follows. Suppose we have morphisms of finite sets $\alpha: \underline{n} \to \underline{k}$ and $\beta: \underline{m} \to \underline{n}$, and expressions $E = [p_1 \ldots p_k]$. Write

$$
\begin{align}
E^{\alpha} = [p_{\alpha(1)} \ldots p_{\alpha(n)}] = [q_1 \ldots q_n]
\end{align}
$$

Let $\gamma = \alpha \circ \beta: \underline{m} \to \underline{k}$. Then the composition of morphisms

$$
\begin{align}
\alpha_{E}: E^{\alpha} = [p_{\alpha(1)} \ldots p_{\alpha(n)}] \xrightarrow{} E = [p_1 \ldots p_k]
\end{align}
$$

$$
\begin{align}
\beta_{E^{\alpha}}: E^{\beta} = [q_{\beta(1)} \ldots q_{\beta(m)}] \xrightarrow{} E^{\alpha} = [q_1 \ldots q_n]
\end{align}
$$

is given by $\gamma_{E}: E^{\gamma} \xrightarrow{} E$, where we used that $[q_{\beta(1)} \ldots q_{\beta(m)}] = [p_{\gamma(1)} \ldots p_{\gamma(m)}]$.

It’s easy to check that the composition of morphisms in $\textbf{Fact}$ is associative and has identities, which are the two axioms that composition in a category must satisfy, so Definition 3.4 really does define a category.

To see why the composition of morphisms is associative, suppose we have morphisms of finite sets $\alpha: \underline{n} \to \underline{k}$, $\beta: \underline{m} \to \underline{n}$, and $\gamma: \underline{l} \to \underline{m}$. Then we have

$$
\begin{align}
\alpha \circ (\beta \circ \gamma) = (\alpha \circ \beta) \circ \gamma
\end{align}
$$

To see why the composition of morphisms has identities, suppose for every $n$ we have a morphism of finite sets $\text{id}_{\underline{n}}: \underline{n} \to \underline{n}$, such that $\text{id}_{\underline{n}}(i) = i$ for all $i \in \underline{n}$. Then we have

$$
\begin{align}
E^{\text{id}_{\underline{n}}} = [p_{\text{id}_{\underline{n}}(1)} \ldots p_{\text{id}_{\underline{n}}(n)}] \xrightarrow{} E = [p_{1} \ldots p_{n}]
\end{align}
$$

For every morphism $\alpha: \underline{n} \to \underline{k}$, we have

$$
\begin{align}
\alpha \circ \text{id}_{\underline{n}} = \text{id}_{\underline{k}} \circ \alpha = \alpha
\end{align}
$$

Therefore, $\text{id}_{\underline{n}}$ is the identity morphism for $\underline{n}$.

Notation 3.5

Let $\Sigma_{k}$ denote the symmetric group on $k$ letters. Given an element $\varphi \in \Sigma_{k}$, we also denote the associated automorphism of $\underline{k}$ by $\varphi$.

In mathematics, a group is a set with an operation that associates an element of the set to every pair of elements of the set (as does every binary operation) and satisfies the following constraints: the operation is associative, it has an identity element, and every element of the set has an inverse element.

In this sense, the symmetric group is a set of all permutations of a set of $k$ elements with an operation of composition of permutations (applying one permutation after another).

Suppose $E = [222]$. Then every permutation $\varphi \in \Sigma_{3}$ defines an automorphism $E^{\varphi} = E \xrightarrow{} E$ in $\textbf{Fact}$.

Suppose $E = [232]$. Then the transposition $\sigma = (13) \in \Sigma_{3}$ defines an automorphism $E^{\sigma} = E \xrightarrow{} E$ in $\textbf{Fact}$. On the other hand, the transposition $\tau = (12) \in \Sigma_{3}$ defines a morphism $E^{\tau} = [322] \xrightarrow{} E = [232]$ in $\textbf{Fact}$.

Remark 3.7

Let $\textbf{FinSet}$ denote the category of finite sets (or rather a skeleton, with objects given by the sets $\underline{n}$ for $n \geq 0$). Given an object $\underline{k} \in \textbf{FinSet}$, let $\textbf{FinSet}^{/\underline{k}}$ denote the overcategory, whose objects are morphisms $[\alpha: \underline{n} \to \underline{k}]$ and whose morphisms are commuting triangles. Recall that this category has a final object given by the identity morphism $[\text{id}_{\underline{k}}]$.

Then for every expression $E = [p_1 \ldots p_k]$ of length $k$, we have a functor

$$
\begin{align}
F_{E}: \textbf{FinSet}^{/\underline{k}} \to \textbf{Fact}
\end{align}
$$

that sends the object $[\alpha: \underline{n} \to \underline{k}]$ to the expression $E^{\alpha}$ and the unique morphism $[\alpha] \xrightarrow{} [\text{id}_{\underline{k}}]$ to $\alpha_{E}: E^{\alpha} \xrightarrow{} E$. This functor has every morphism in $\textbf{Fact}$ with codomain $E$ in its image.

Suppose we have an object of morphism $[\alpha: \underline{n} \to \underline{k}]$ and another object of morphism $[\beta: \underline{m} \to \underline{k}]$ in $\textbf{FinSet}^{/\underline{k}}$. Then the morphism of the overcategory is a commuting triangle, whose remaining morphism is $[\gamma: \underline{m} \to \underline{n}]$, that maps $[\alpha: \underline{n} \to \underline{k}]$ to $[\beta: \underline{m} \to \underline{k}]$.

The identity morphism $[\text{id}_{\underline{k}}]$ is the final object of $\textbf{FinSet}^{/\underline{k}}$ because every other object of morphism in $\textbf{FinSet}^{/\underline{k}}$ has a unique morphism to $[\text{id}_{\underline{k}}]$.

In this case, given an object of morphism $[\alpha: \underline{n} \to \underline{k}]$, the commuting triangle has a remaining morphism $[\gamma: \underline{k} \to \underline{n}]$, and we will have to show that this remaining morphism in the commuting triangle is unique.

Given $\underline{k} = \{1, 2, \ldots, k\}$, the identity morphism maps $i \in \underline{k}$ to $i \in \underline{k}$. Given a morphism $[\alpha: \underline{n} \to \underline{k}]$, in which without loss of generality we assume $\alpha(i) = i$ for all $i \in [1, k]$, the remaining morphism $[\gamma: \underline{k} \to \underline{n}]$ must have $i = \alpha(i)$ for all $i \in [1, k]$, so that we have a commuting triangle that $i \in \underline{k} \xrightarrow{\gamma} \xrightarrow{\alpha} i \in \underline{k}$. Otherwise, for the remaining morphism $[\gamma: \underline{k} \to \underline{n}]$, if there exists an $i \in \underline{k}$ such that $i \neq \alpha(i)$, then the commuting triangle will not be valid because $i \in \underline{k} \xrightarrow{\gamma} \xrightarrow{\alpha} j \in \underline{k}$ and $i \neq j$. Therefore, the morphism of commuting triangle is unique.

By definition, let $C$ and $D$ be categories. A functor $F$ from $C$ to $D$ is a mapping that

associates each object $X$ in $C$ to an object $F(X)$ in $D$,
associates each morphism $f: X \to Y$ in $C$ to a morphism $F(f): F(X) \to F(Y)$ in $D$ such that
- $F(\text{id}_{X}) = \text{id}_{F(X)}$ for every object $X$ in $C$, and
- $F(g \circ f) = F(g) \circ F(f)$ for all morphisms $f: X \to Y$ and $g: Y \to Z$ in $C$.

In the category $\textbf{FinSet}^{/\underline{k}}$, we have $X = [\alpha: \underline{n} \to \underline{k}]$ and $Y = [\gamma: \underline{m} \to \underline{k}]$, $Z = [\text{id}_{\underline{k}}]$, and the morphisms are commuting triangles $f = [\alpha] \xrightarrow{} [\gamma]$, $g = [\gamma] \xrightarrow{} [\text{id}_{\underline{k}}]$, and $g \circ f = [\alpha] \xrightarrow{} [\text{id}_{\underline{k}}]$.

In the category $\textbf{Fact}$, by the functor $F_{E}$, we have $F_{E}(X) = E^{\alpha} = [p_{\alpha(1)} \ldots p_{\alpha(n)}]$, $F_{E}(Y) = E^{\gamma} = [p_{\gamma(1)} \ldots p_{\gamma(m)}]$, $F_{E}(Z) = E = [p_1 \ldots p_k]$, and the morphisms are $F_{E}(f) = \alpha_{E}: E^{\alpha} \xrightarrow{} E^{\gamma}$, $F_{E}(g) = \gamma_{E}: E^{\gamma} \xrightarrow{} E$, and $F_{E}(g) \circ F_{E}(f) = \alpha_{E}: E^{\alpha} \xrightarrow{} E$.

Remark 3.8

In fact, we can identify $\textbf{Fact}$ itself as a certain overcategory (or rather, a full subcategory thereof). Namely, let $\mathcal{P}$ denote the the infinite set of primes $\{2, 3, 5 \ldots\}$, let $\textbf{Set}$ be the category of sets, and let $\textbf{FinSet}^{/\mathcal{P}}$ be the full subcategory of $\textbf{Set}^{/\mathcal{P}}$ on those morphisms $X \xrightarrow{} \mathcal{P}$ where $X$ is a finite set. Then we have an equivalence of categories

$$
\begin{align}
\textbf{Fact} \simeq \textbf{FinSet}^{/\mathcal{P}}
\end{align}
$$

that sends an expression $E = [p_1 \ldots p_k]$ to the morphism $E_{\bullet}: k \xrightarrow{} \mathcal{P}$ given by $i \mapsto p_i$.

Under this equivalence, the functor $F_{E}$ of Remark 3.7 identifies with the functor

$$
\begin{align}
\textbf{FinSet}^{/k} \simeq \left(\textbf{FinSet}^{/\mathcal{P}}\right)^{/E_{\bullet}} \xrightarrow{} \textbf{FinSet}^{/\mathcal{P}}
\end{align}
$$

that forgets the map to $E_{\bullet}$.

To understand this, let’s consider the following example.

Suppose we have an object $E = [232]$ from $\textbf{Fact}$. Then we have a morphism $E_{\bullet}: \underline{3} \xrightarrow{} \mathcal{P}$ given by

$$
\begin{align}
i = 1 &\mapsto p_{1} = 2 \\
i = 2 &\mapsto p_{2} = 3 \\
i = 3 &\mapsto p_{3} = 2
\end{align}
$$

Every object of morphisms in $\textbf{FinSet}^{/\mathcal{P}}$ can be mapped from an object from $\textbf{Fact}$ and thus we have an equivalence of categories $\textbf{Fact} \simeq \textbf{FinSet}^{/\mathcal{P}}$.

Because of the functor $F_{E}$ of Remark 3.7, with the equivalence above, we have

$$
\begin{align}
\textbf{FinSet}^{/\underline{k}} \to \textbf{FinSet}^{/\mathcal{P}}
\end{align}
$$

Definition 3.9

Suppose $E = [p_1 \ldots p_k]$ and $\alpha: \underline{n} \to \underline{k}$. We define a layout $L_{(E, \alpha)}$ as follows:

Its shape tuple is $(p_{\alpha(1)}, p_{\alpha(2)}, \ldots, p_{\alpha(n)})$.
Its stride tuple is $(d_{1}, d_{2}, \ldots, d_{n})$, where $d_{i} = \prod_{j < \alpha(i)} p_{j}$.

We also let $f_{(E, \alpha)}$ denote the associated layout function.

Suppose $E = [23]$ and $\varphi = (12) \in \Sigma_{2}$ is the nontrivial transposition. Then $L_{(E, \varphi)} = (3, 2) : (2, 1)$.

Suppose $E = [23]$, $\varphi(1) = 2$, $\varphi(2) = 1$, $\varphi(3) = 2$. Then $L_{(E, \varphi)} = (3, 2, 3) : (2, 1, 2)$. This layout seems strange because its layout function is not an injection mapping. However, it is still a valid layout by Definition 2.2.

Suppose $E = [222]$ and $\varphi = (231) \in \Sigma_{3}$, so $\varphi$ is a cycle of order $3$ with $\varphi(1) = 2$, $\varphi(2) = 3$, and $\varphi(3) = 1$. Then $L_{(E, \varphi)} = (2, 2, 2) : (2, 4, 1)$.

We could now see why $p_{i}$ is prime number. It’s used for constructing the stride tuple of any kind for the layout, because any natural number can be uniquely factored into a product of prime numbers.

Remark 3.11

Let $E = [p_1 \ldots p_k]$ and $\alpha: \underline{n} \to \underline{k}$. Let $N = p_{1} \cdot p_{2} \cdot \ldots \cdot p_{k}$ and $N^{\alpha} = p_{\alpha(1)} \cdot p_{\alpha(2)} \cdot \ldots \cdot p_{\alpha(n)}$. In what follows, consider the canonical isomorphisms

Then the associated layout function $f_{(E, \alpha)}: [0, N^{\alpha}) \to [0, N) \subset \mathbb{N}$ can be described as the multilinear function

$$
\begin{align}
[0, p_{\alpha(1)}) \times [0, p_{\alpha(2)}) \times \ldots \times [0, p_{\alpha(n)}) \xrightarrow{} [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k)
\end{align}
$$

that sends the basis vector $\delta_{i}$ of one vector space to the basis vector $\beta_{\alpha(i)}$ of the other for $i \in [1, n]$.

In particular, if $\alpha$ is itself a bijection, then $f_{(E, \alpha)}$ restricts to an automorphism of $[0, N)$.

Proof

We noticed that $f_{(E, \alpha)}: [0, N^{\alpha}) \to [0, N) \subset \mathbb{N}$. The domain of $f_{(E, \alpha)}$ is $[0, N^{\alpha})$ and it is obvious. The codomain of $f_{(E, \alpha)}$ is $[0, N) \subset \mathbb{N}$, however, is less obvious.

$$
\begin{align}
f_{(E, \alpha)} = x_{\alpha_{(1)}} d_{1} + x_{\alpha_{(2)}} d_{2} + \ldots + x_{\alpha_{(n)}} d_{n}
\end{align}
$$

where $x_{\alpha_{(i)}} \in [0, p_{\alpha(i)})$ and $d_{i} = \prod_{j < \alpha(i)} p_{j}$.

So we have

$$
\begin{align}
\max \left( f_{(E, \alpha)} \right) &= (p_{\alpha(1)} - 1) d_{1} + (p_{\alpha(2)} - 1) d_{2} + \ldots + (p_{\alpha(n)} - 1) d_{n} \\
&= (p_{\alpha(1)} - 1) \prod_{j < \alpha(1)} p_{j} + (p_{\alpha(2)} - 1) \prod_{j < \alpha(2)} p_{j} + \ldots + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
\end{align}
$$

Without losing generality, we assume $p_{\alpha(1)} \leq p_{\alpha(2)} \leq \ldots \leq p_{\alpha(n)}$. Then we have

$$
\begin{align}
\max \left( f_{(E, \alpha)} \right) &= (p_{\alpha(1)} - 1) \prod_{j < \alpha(1)} p_{j} + (p_{\alpha(2)} - 1) \prod_{j < \alpha(2)} p_{j} + \ldots + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
&\leq p_{\alpha(1)} \prod_{j < \alpha(1)} p_{j} + (p_{\alpha(2)} - 1) \prod_{j < \alpha(2)} p_{j} + \ldots + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
&= \prod_{j < \alpha(2)} p_{j} + (p_{\alpha(2)} - 1) \prod_{j < \alpha(2)} p_{j} + \ldots + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
&= p_{\alpha(2)} \prod_{j < \alpha(2)} p_{j} + \ldots + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
&\leq \prod_{j < \alpha(3)} p_{j} + \ldots + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
&\leq \ldots \\
&\leq p_{\alpha(n)} \prod_{j < \alpha(n - 1)} p_{j} + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
&= \prod_{j < \alpha(n)} p_{j} + (p_{\alpha(n)} - 1) \prod_{j < \alpha(n)} p_{j} \\
&= p_{\alpha(n)} \prod_{j < \alpha(n)} p_{j} \\
&= \prod_{j \leq \alpha(n)} p_{j} \\
&\leq \prod_{j \leq k} p_{j} \\
&= N
\end{align}
$$

Thus, we have $f_{(E, \alpha)}: [0, N^{\alpha}) \to [0, N) \subset \mathbb{N}$.

Because $f_{(E, \alpha)}$ is a multilinear function, and because of the canonical isomorphisms, $f_{(E, \alpha)}$ can be described as the multilinear function

$$
\begin{align}
[0, p_{\alpha(1)}) \times [0, p_{\alpha(2)}) \times \ldots \times [0, p_{\alpha(n)}) \xrightarrow{} [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k)
\end{align}
$$

We denote a vector space $V = [0, p_{\alpha(1)}) \times [0, p_{\alpha(2)}) \times \ldots \times [0, p_{\alpha(n)})$ and a vector space $W = [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k)$. Then the layout function $f_{(E, \alpha)}$ is a linear map $V \xrightarrow{} W$.

Suppose $v_{1}, v_{2}, av_{1}, bv_{2}, av_{1} + bv_{2} \in V$, $f_{(E, \alpha)}(v_{1}) = w_{1}$, and $f_{(E, \alpha)}(v_{2}) = w_{2}$. Then we have

$$
\begin{align}
f_{(E, \alpha)}(v_{1}) &= v_{1, 1} d_{1} + v_{1, 2} d_{2} + \ldots + v_{1, n} d_{n} \\
f_{(E, \alpha)}(v_{2}) &= v_{2, 1} d_{1} + v_{2, 2} d_{2} + \ldots + v_{2, n} d_{n} \\
f_{(E, \alpha)}(av_{1}) &= av_{1, 1} d_{1} + av_{1, 2} d_{2} + \ldots + av_{1, n} d_{n} \\
&= af_{(E, \alpha)}(v_{1}) \\
f_{(E, \alpha)}(bv_{2}) &= bv_{2, 1} d_{1} + bv_{2, 2} d_{2} + \ldots + bv_{2, n} d_{n} \\
&= bf_{(E, \alpha)}(v_{2}) \\
f_{(E, \alpha)}(av_{1} + bv_{2}) &= (av_{1} + bv_{2})_{1} d_{1} + (av_{1} + bv_{2})_{2} d_{2} + \ldots + (av_{1} + bv_{2})_{n} d_{n} \\
&= af_{(E, \alpha)}(v_{1}) + bf_{(E, \alpha)}(v_{2})
\end{align}
$$

So $f_{(E, \alpha)}: V \xrightarrow{} W$ is indeed a linear (multilinear) map.

Given an index $1 \leq i \leq \alpha$, let $\delta_{i} \in \mathbb{N}^{\times \alpha}$ denote the coordinate that is zero everywhere except in the $k$-th position, where it is 1. Note here the indexing is 1-based, instead of the similar one used in Proposition 2.14 which is 0-based. $\delta_{i}$ is the basis vector of the vector space $V$ for $1 \leq i \leq \alpha$.

We send $\delta_{i}$ to $f_{(E, \alpha)}$ for $1 \leq i \leq \alpha$. Then we have

$$
\begin{align}
f_{(E, \alpha)}(\delta_{i}) &= d_{i} \\
&= \prod_{j < \alpha(i)} p_{j}
\end{align}
$$

Given the canonical isomorphism $[0, N) \cong [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k)$, we have the multilinear function $g: W \xrightarrow{} \mathbb{N}$

$$
\begin{align}
g(w) = w_{1} + w_{2} p_{1} + w_{3} p_{1} p_{2} + \ldots + w_{k} \prod_{j < k} p_{j}
\end{align}
$$

Given an index $1 \leq i \leq k$, let $\beta_{i} \in \mathbb{N}^{\times k}$ denote the coordinate that is zero everywhere except in the $k$-th position, where it is 1. $\beta_{i}$ is the basis vector of the vector space $W$ for $1 \leq i \leq k$.

Thus, we have

$$
\begin{align}
f_{(E, \alpha)}(\delta_{i})
&= \prod_{j < \alpha(i)} p_{j} \\
&= g(\beta_{\alpha(i)})
\end{align}
$$

This means the basis vector $\delta_{i}$ in the vector space $V$ is sent to the basis vector $\beta_{\alpha(i)}$ in the vector space $W$ by the multilinear function.

Suppose $v = c_{1} \delta_{1} + c_{2} \delta_{2} + \ldots + c_{\alpha} \delta_{\alpha} \in V$. Then we have

$$
\begin{align}
f_{(E, \alpha)}(v) &= f_{(E, \alpha)}(c_{1} \delta_{1} + c_{2} \delta_{2} + \ldots + c_{\alpha} \delta_{\alpha}) \\
&= (c_{1} \delta_{1} + c_{2} \delta_{2} + \ldots + c_{\alpha} \delta_{\alpha})_{1} d_{1} + (c_{1} \delta_{1} + c_{2} \delta_{2} + \ldots + c_{\alpha} \delta_{\alpha})_{2} d_{2} + \ldots + (c_{1} \delta_{1} + c_{2} \delta_{2} + \ldots + c_{\alpha} \delta_{\alpha})_{\alpha} d_{\alpha} \\
&= c_{1} d_{1} + c_{2} d_{2} + \ldots + c_{\alpha} d_{\alpha} \\
&= c_{1} f_{(E, \alpha)}(\delta_{1}) + c_{2} f_{(E, \alpha)}(\delta_{2}) + \ldots + c_{\alpha} f_{(E, \alpha)}(\delta_{\alpha}) \\
&= c_{1} g(\beta_{\alpha(1)}) + c_{2} g(\beta_{\alpha(2)}) + \ldots + c_{\alpha} g(\beta_{\alpha(\alpha)}) \\
&= g(c_{1} \beta_{\alpha(1)} + c_{2} \beta_{\alpha(2)} + \ldots + c_{\alpha} \beta_{\alpha(\alpha)}) \\
\end{align}
$$

Therefore, we have set up the basis vector mapping for the multilinear function $f_{(E, \alpha)}: V \xrightarrow{} W$. Given $v = c_{1} \delta_{1} + c_{2} \delta_{2} + \ldots + c_{\alpha} \delta_{\alpha} \in V$, it maps to $w = c_{1} \beta_{\alpha(1)} + c_{2} \beta_{\alpha(2)} + \ldots + c_{\alpha} \beta_{\alpha(\alpha)} \in W$.

This concludes the proof. $\square$

Lemma 3.12

Elaborating on Remark 3.11, we have the following lemma, which indicates that composition in the category $\textbf{Fact}$ is compatible with the composition of layout functions.

Suppose we have morphisms of finite sets $\alpha: \underline{n} \to \underline{k}$, $\beta: \underline{m} \to \underline{n}$, and an expression $E = [p_1 p_2 \ldots p_k]$. Write $\gamma = \alpha \circ \beta$. Consider the composition

$$
\begin{align}
\gamma_{E}: E^{\gamma} = (E^{\alpha})^{\beta} \xrightarrow{\beta_{E^{\alpha}}} E^{\alpha} \xrightarrow{\alpha_{E}} E
\end{align}
$$

in $\textbf{Fact}$. Then the associated layout functions satisfy the composition equality

$$
\begin{align}
f_{(E, \gamma)} = f_{(E, \alpha)} \circ f_{(E^{\alpha}, \beta)}
\end{align}
$$

Proof

Let $N = p_{1} \cdot p_{2} \cdot \ldots \cdot p_{k}$, $N^{\alpha} = p_{\alpha(1)} \cdot p_{\alpha(2)} \cdot \ldots \cdot p_{\alpha(n)}$, and $N^{\gamma} = p_{\gamma(1)} \cdot p_{\gamma(2)} \cdot \ldots \cdot p_{\gamma(m)}$. We use the canonical isomorphisms

$$
\begin{align}
[0, N) &\cong [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k) \\
[0, N^{\alpha}) &\cong [0, p_{\alpha(1)}) \times [0, p_{\alpha(2)}) \times \ldots \times [0, p_{\alpha(n)}) \\
[0, N^{\gamma}) &\cong [0, p_{\gamma(1)}) \times [0, p_{\gamma(2)}) \times \ldots \times [0, p_{\gamma(m)})
\end{align}
$$

to write the domains and codomains of the layout functions in question.

More specifically, we have

$$
\begin{align}
f_{(E, \alpha)}: [0, N^{\alpha}) &\to [0, N) \\
f_{(E^{\alpha}, \beta)}: [0, N^{\gamma}) &\to [0, N^{\alpha}) \\
f_{(E, \gamma)}: [0, N^{\gamma}) &\to [0, N)
\end{align}
$$

We are trying to equate the multilinear function

$$
\begin{align}
f_{(E, \gamma)}: [0, p_{\gamma(1)}) \times [0, p_{\gamma(2)}) \times \ldots \times [0, p_{\gamma(m)}) \xrightarrow{} [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k)
\end{align}
$$

with the composition of the two multilinear functions

$$
\begin{align}
f_{(E, \alpha)}&: [0, p_{\alpha(1)}) \times [0, p_{\alpha(2)}) \times \ldots \times [0, p_{\alpha(n)}) \xrightarrow{} [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k) \\
f_{(E^{\alpha}, \beta)}&: [0, p_{\gamma(1)}) \times [0, p_{\gamma(2)}) \times \ldots \times [0, p_{\gamma(m)}) \xrightarrow{} [0, p_{\alpha(1)}) \times [0, p_{\alpha(2)}) \times \ldots \times [0, p_{\alpha(n)})
\end{align}
$$

We denote a vector space $V = [0, p_{\alpha(1)}) \times [0, p_{\alpha(2)}) \times \ldots \times [0, p_{\alpha(n)})$, a vector space $W = [0, p_1) \times [0, p_2) \times \ldots \times [0, p_k)$, and a vector space $U = [0, p_{\gamma(1)}) \times [0, p_{\gamma(2)}) \times \ldots \times [0, p_{\gamma(m)})$. The basis vectors of $V$, $W$, and $U$ are $\delta_{i}$, $\sigma_{j}$, and $\tau_{l}$ for $1 \leq i \leq n$, $1 \leq j \leq k$, and $1 \leq l \leq m$.

Based on the basis vector mapping by Remark 3.11, given $u = c_{1} \tau_{1} + c_{2} \tau_{2} + \ldots + c_{m} \tau_{m} \in U$, by $f_{(E^{\alpha}, \beta)}$, it maps to $v = c_{1} \delta_{\beta(1)} + c_{2} \delta_{\beta(2)} + \ldots + c_{m} \delta_{\beta(m)} \in V$. Then by $f_{(E, \alpha)}$, it maps to $w = c_{1} \sigma_{\alpha(\beta(1))} + c_{2} \sigma_{\alpha(\beta(2))} + \ldots + c_{m} \sigma_{\alpha(\beta(m))} \in W$.

Given $u = c_{1} \tau_{1} + c_{2} \tau_{2} + \ldots + c_{m} \tau_{m} \in U$, by $f_{(E, \gamma)}$, because $\gamma = \alpha \circ \beta$, $\gamma(i) = \alpha(\beta(i))$, it maps to $w^{\prime} = c_{1} \sigma_{\gamma(1)} + c_{2} \sigma_{\gamma(2)} + \ldots + c_{m} \sigma_{\gamma(m)} = c_{1} \sigma_{\alpha(\beta(1))} + c_{2} \sigma_{\alpha(\beta(2))} + \ldots + c_{m} \sigma_{\alpha(\beta(m))} \in W$.

Because $w = w^{\prime}$, we have $f_{(E, \gamma)} = f_{(E, \alpha)} \circ f_{(E^{\alpha}, \beta)}$.

This concludes the proof. $\square$

In Lemma 3.12, the per-mode condition of admissibility for composition (Definition 2.12) is satisfied. To see this, we have

$$
\begin{align}
E &= [p_1 p_2 \ldots p_k] \\
E^{\alpha} &= [p_{\alpha(1)} p_{\alpha(2)} \ldots p_{\alpha(n)}] \\
E^{\gamma} &= [p_{\gamma(1)} p_{\gamma(2)} \ldots p_{\gamma(m)}]
\end{align}
$$

$$
\begin{align}
L_{(E, \alpha)} &= \left(p_{\alpha(1)}, p_{\alpha(2)}, \ldots, p_{\alpha(n)}\right) : \left(d_{1}, d_{2}, \ldots, d_{n}\right) \\
\end{align}
$$

where $d_{i} = \prod_{j < \alpha(i)} p_{j}$.

$$
\begin{align}
L_{(E^{\alpha}, \beta)} &= \left(p_{\alpha(\beta(1))}, p_{\alpha(\beta(2))}, \ldots, p_{\alpha(\beta(m))}\right) : \left(d^{\prime}_{1}, d^{\prime}_{2}, \ldots, d^{\prime}_{m}\right) \\
\end{align}
$$

where $d^{\prime}_{i} = \prod_{j < \beta(i)} p_{\alpha(j)}$.

Because

$$
\begin{align}
M &= p_{\alpha(1)} \cdot p_{\alpha(2)} \cdot \ldots \cdot p_{\alpha(n)} \\
&= \left( \prod_{j < \beta(i)} p_{\alpha(j)} \right) \cdot p_{\alpha(\beta(i))} \cdot p_{\alpha(\beta(i) + 1)} \cdot \ldots \cdot p_{\alpha(n)} \\
&= \left( \prod_{j < \beta(i)} p_{\alpha(j)} \right) \cdot M^{\prime}
\end{align}
$$

Thus, $M$ is left divisible by $d^{\prime}_{i}$, $M^{\prime}$ is weakly left divisible and also left divisible by $p_{\alpha(\beta(i))}$, and the per-mode condition of admissibility for composition is satisfied.

The disjointness condition in Definition 2.17 is satisfied when $\beta: \underline{m} \to \underline{n}$ is an injective function and may be violated when it is not.

When $\beta: \underline{m} \to \underline{n}$ is injective, we have $m \leq n$, and $\beta{(i)} \neq \beta{(j)}$ for $i \neq j$. By Definition 2.16, for each mode $i \in [1, m]$, we have $N_{i} = p_{\alpha(\beta(i))}$, $I_{i} = [d^{\prime}_{i}, d^{\prime}_{i} (N_{i} - 1)]$. $M^{\prime} = p_{\alpha(1)} \cdot p_{\alpha(2)} \cdot \ldots \cdot p_{\alpha(n - 1)}$. So the interval of definition is $J_{i} = I_{i} \cap [1, M^{\prime})$. Because $d^{\prime}_{i} = \prod_{j < \beta(i)} p_{\alpha(j)} \geq 1$, $d^{\prime}_{i}(N_{i} - 1) = \prod_{j < \beta(i)} p_{\alpha(j)} \cdot (p_{\alpha(\beta(i))} - 1) = \prod_{j \leq \beta(i)} p_{\alpha(j)} - \prod_{j < \beta(i)} p_{\alpha(j)} < M^{\prime}$. Thus, $J_{i} = I_{i} = [\prod_{j < \beta(i)} p_{\alpha(j)}, \prod_{j \leq \beta(i)} p_{\alpha(j)} - \prod_{j < \beta(i)} p_{\alpha(j)}]$. Suppose we have a different mode $k$, $k \neq i$. Then $J_{k} = I_{k} = [\prod_{j < \beta(k)} p_{\alpha(j)}, \prod_{j \leq \beta(k)} p_{\alpha(j)} - \prod_{j < \beta(k)} p_{\alpha(j)}]$. Because $\beta(i) \neq \beta(k)$, without losing generality, we assume $\beta(i) < \beta(k)$. Then we have

$$
\begin{align}
\prod_{j < \beta(k)} p_{\alpha(j)} - \left( \prod_{j \leq \beta(i)} p_{\alpha(j)} - \prod_{j < \beta(i)} p_{\alpha(j)} \right)
&= \prod_{j < \beta(k)} p_{\alpha(j)} - \prod_{j \leq \beta(i)} p_{\alpha(j)} + \prod_{j < \beta(i)} p_{\alpha(j)} \\
&> 0
\end{align}
$$

Thus, $J_{i} \cap J_{k} = \emptyset$ for any $i \neq k$. The disjointness condition is satisfied.

When $\beta: \underline{m} \to \underline{n}$ is not injective, we don’t have $\beta{(i)} \neq \beta{(j)}$ for $i \neq j$. The disjointness condition may be violated.

So Lemma 3.12 actually proves Theorem 2.18 for layouts that has any arbitrary strides (not yet any arbitrary shapes) of the second layout that satisfies Definition 2.17.

Definition 3.14

We now define a “realization” functor from the category $\textbf{Fact}$ to the category $\textbf{FinSet}$ that sends morphisms of ordered factorizations to their associated layout functions.

Let $R: \textbf{Fact} \to \textbf{FinSet}$ be the functor defined as follows:

Let $E = [p_{1} p_{2} \ldots p_{k}]$ be an object of $\textbf{Fact}$ and let $N = p_{1} \cdot p_{2} \cdot \ldots \cdot p_{k}$. Then $R(E) = [0, N)$.
For every morphism $\alpha_{E}: E^{\alpha} \to E$, let $R(\alpha_{E}) = f_{(E, \alpha)}: [0, N^{\alpha}) \to [0, N)$ be as in Definition 3.9.

By Lemma 3.12, $R: \textbf{Fact} \to \textbf{FinSet}$ does indeed define a functor since it respects the composition of morphisms and identities as well.

We note that, as mentioned previously, $R$ does not contain every possible function expressible as a layout function in its image. However, it does contain every automorphism of $[0, N) \xrightarrow{\cong} [0, N)$ expressible as a layout function in its image.

Proposition 3.15

Let $N > 0$ be a positive integer and let $f: [0, N) \to [0, N)$ be an automorphism such that there exists a layout $L$ of size $N$ with $f = f_{L}$. Then $f_{L}$ is in the image of the realization functor $R$.

Proof

Without loss of generality, we may suppose that the shape tuple of $L$ is given by $(p_{1}, p_{2}, \ldots, p_{k})$ where the $p_{i}$ are all prime numbers and $N = p_{1} \cdot p_{2} \cdot \ldots \cdot p_{k}$.

In order for $f_{L}$ to be an automorphism of $[0, N)$, the sorted $L$, $L^{\varphi}$, must be of the form

$$
\begin{align}
L^{\varphi} = \left(p_{\varphi(1)}, p_{\varphi(2)}, \ldots, p_{\varphi(k)}\right) : \left(1, p_{\varphi(1)}, p_{\varphi(1)} p_{\varphi(2)}, \ldots, \prod_{1 \leq i < k} p_{\varphi(i)}\right)
\end{align}
$$

for some permutation $\varphi \in \Sigma_{k}$. This means that if we let $\psi = \varphi^{-1}$ be the inverse permutation, then

$$
\begin{align}
\psi_{E}: E^{\psi} = [p_{1} p_{2} \ldots p_{k}] = [p_{\psi(\varphi(1))} p_{\psi(\varphi(2))} \ldots p_{\psi(\varphi(k))}] \xrightarrow{} E = [p_{\varphi(1)} p_{\varphi(2)} \ldots p_{\varphi(k)}]
\end{align}
$$

is a morphism in $\textbf{Fact}$ that $R(\psi_{E}) = f_{L}$.

This concludes the proof. $\square$

Remark 3.16

One way to interpret Proposition 3.15 is that if we take the maximal subgroupoid $\textbf{Fact}^{\simeq}$ inside $\textbf{Fact}$, i.e., the subcategory of all invertible morphisms, then

$$
\begin{align}
R: \textbf{Fact}^{\simeq} \to \textbf{FinSet}
\end{align}
$$

carves out exactly those permutations expressible as layouts. Our motivation for this description is that for a fixed integer $N > 0$, the subset $\Sigma^{L}_{N}$ of $\Sigma_{N}$ on those automorphisms expressible as layout functions is typically not a subgroup (being not generally closed under the group multiplication, i.e., composition).

Instead, if we let

$$
\begin{align}
\textbf{Fact}^{\simeq}_{N} \subset \textbf{Fact}^{\simeq}
\end{align}
$$

be the full subgroupoid of those objects $[p_{1} p_{2} \ldots p_{k}]$ with $N = p_{1} \cdot p_{2} \cdot \ldots \cdot p_{k}$, then the $\Sigma^{L}_{N}$ consists of those morphisms in the image of $R$ on $\textbf{Fact}^{\simeq}_{N}$. However, we see that $\Sigma^{L}_{N}$ is closed under the operation of taking the group inverse (the objects taking permutations to their inverses are also in $\textbf{Fact}^{\simeq}_{N}$). Moreover, in the special case that $N$ is a prime power $p^{k}$, then $\Sigma^{L}_{N}$ is in fact a subgroup and is isomorphic to the symmetric group $\Sigma_{k}$. This corresponds to $\textbf{Fact}^{\simeq}_{p^{k}}$ being a groupoid with a single object $[pp \ldots p]$, i.e., a group.

References

CuTe Layout Algebra

https://leimao.github.io/article/CuTe-Layout-Algebra/

Author

Lei Mao

Posted on

10-20-2024

Updated on

07-14-2025

Licensed under

Mathematics,

Accelerated Computing,

CUDA,

CUTLASS,

CuTe,

Category Theory

CuTe Layout Algebra

Introduction

Layout Algebra Preliminaries

Definition 2.1: Layout

Definition 2.2: Layout Size, Length, and Mode

Concatenation

Isomorphism

Definition 2.3: Layout Function

Extension of Layout Function

Coalescence

Coalescence Rules

By-Mode Coalescence

Implication of Coalescence

Complementation

Definition 2.4: Sorted Layout

Definition 2.5: Admission for Complementation

Definition 2.6: Complementation

Proposition 2.7

Corollary 2.8 Complementation Disjointness

Remark 2.9 Complementation Disjointness, Ordering, and Boundedness

Non-Integral Layout Complementation

Implication of Complementation

Composition

Definition 2.11 Left Divisibility

Definition 2.12 Admission for Composition - Restricted Case

Definition 2.13 Composition - Restricted Case

Proposition 2.14

Definition 2.16 Interval of Definition

Definition 2.17 Composition - General Case

Theorem 2.18 Composition - General Case

Lemma 2.19 Concatenation of Layouts

Definition 2.21 CUTLASS Admission for Composition - Restricted Case

By-Mode Composition

Non-Integral Layout Composition

Implication of Composition

Logical Division

Definition 2.22 Logical Division

Lemma 2.23 Logical Division Implication

By-Mode Logical Division

Logical Division Variants

Implication of Logical Division

Logical Product

Definition 4.1 Logical Product

By-Mode Logical Product

Logical Product Variants

Implication of Logical Product

Permutation Expressible As Layout Functions

Definition 3.1 Ordered Factorization

Notation 3.3

Definition 3.4 Category of Ordered Factorizations

Notation 3.5

Remark 3.7

Remark 3.8

Definition 3.9

Remark 3.11

Lemma 3.12

Definition 3.14

Proposition 3.15

Remark 3.16

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue