cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices
Introduction
The cuBLAS GEMM API has very strict requirements on the storage format of the input and output matrices. If all the matrices are stored in column-major format, the cuBLAS GEMM API can be used straightforwardly. But if some of the matrices are stored in row-major format, setting the parameters for the cuBLAS GEMM API for such matrix multiplications can be error-prone.
In this blog post, we will discuss the relationship between the transpose and column-major storage of matrices and how cuBLAS GEMM API should be used for different cases.
cuBLAS GEMM
cuBLAS GEMM API
The cuBLAS single-precision GEMM API is declared as follows.
1 | cublasStatus_t cublasSgemm(cublasHandle_t handle, |
This function performs the general matrix-matrix multiplication
$$
\begin{align}
C = \alpha \text{op}(A) \text{op}(B) + \beta C
\end{align}
$$
where $\alpha$ and $\beta$ are scalars, and $A$, $B$, and $C$ are matrices stored in column-major format with dimensions of $\text{op}(A)$ being $m \times k$, $\text{op}(B)$ being $k \times n$, and $C$ being $m \times n$, respectively. Also for matrix $A$
$$
\begin{align}
\text{op}(A) =
\begin{cases}
A & \text{if transa = CUBLAS_OP_N} \\
A^{\top} & \text{if transa = CUBLAS_OP_T} \\
A^{\dagger} & \text{if transa = CUBLAS_OP_C} \\
\end{cases}
\end{align}
$$
cuBLAS GEMM and Row-Major Matrices
But what if some of the matrices are stored in row-major format? Let’s see a few examples.
Suppose $m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ is stored in row-major format, and $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ and $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ are stored in column-major format. The transpose of $A^{\prime}$, $k^{\prime} \times m^{\prime}$ matrix $A^{\prime\top}$, stored in column-major format, is equivalent to the original $A^{\prime}$ stored in row-major format. But in order to perform the general matrix-matrix multiplication using cuBLAS, $A^{\prime\top}$ has to be transposed to $A^{\prime}$. In this case, transa = CUBLAS_OP_T
, transb = CUBLAS_OP_N
, m = m'
, n = n'
, k = k'
, A = A'
, B = B'
, and C = C'
.
Suppose $m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ and $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ are stored in column-major format, and $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ is stored in row-major format. In this case, there is no way to transpose $C^{\prime}$ via the cuBLAS API.
We notice that we could transpose matrix $C$ in the formula first before performing the general matrix-matrix multiplication.
$$
\begin{align}
C^{\top} &= \alpha \left(\text{op}(A) \text{op}(B)\right)^{\top} + \beta C^{\top} \\
&= \alpha \text{op}(B)^{\top} \text{op}(A)^{\top} + \beta C^{\top} \\
&= \alpha \text{op}(B^{\top}) \text{op}(A^{\top}) + \beta C^{\top}
\end{align}
$$
So if $B^{\top}$, $A^{\top}$, and $C^{\top}$ are stored in column-major format, we could still perform the general matrix-matrix multiplication using the existing cuBLAS API.
In this case, the transpose of $C^{\prime}$, $n^{\prime} \times m^{\prime}$ matrix $C^{\prime\top}$, stored in column-major format, is equivalent to the original $C^{\prime}$ stored in row-major format. In addition, the matrix $A^{\prime}$ and $B^{\prime}$ have to be transposed as well. The transpose of $A^{\prime}$, $k^{\prime} \times m^{\prime}$ matrix $A^{\prime\top}$, stored in row-major format, is equivalent to the original $A^{\prime}$ stored in column-major format. The transpose of $B^{\prime}$, $n^{\prime} \times k^{\prime}$ matrix $B^{\prime\top}$, stored in row-major format, is equivalent to the original $B^{\prime}$ stored in column-major format. In this case, transa = CUBLAS_OP_T
, transb = CUBLAS_OP_T
, m = n'
, n = m'
, k = k'
, A = B'
, B = A'
, and C = C'
.
Conclusions
Suppose we want to perform matrix multiplication $C^{\prime} = \alpha A^{\prime} B^{\prime} + \beta C^{\prime}$, where $A^{\prime}$, $B^{\prime}$, and $C^{\prime}$ are matrices of shapes $m^{\prime} \times k^{\prime}$, $k^{\prime} \times n^{\prime}$, and $m^{\prime} \times n^{\prime}$, respectively, using cuBLAS API. The following table summarizes the relationship between the transpose and column-major storage of matrices $A^{\prime}$, $B^{\prime}$, and $C^{\prime}$, and how cuBLAS API should be used.
$m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ | $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ | $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ | transa | transb | m | n | k | A | B | C |
---|---|---|---|---|---|---|---|---|---|---|
Column Major | Column Major | Column Major | CUBLAS_OP_N | CUBLAS_OP_N | $m^{\prime}$ | $n^{\prime}$ | $k^{\prime}$ | $A^{\prime}$ | $B^{\prime}$ | $C^{\prime}$ |
Row Major | Column Major | Column Major | CUBLAS_OP_T | CUBLAS_OP_N | $m^{\prime}$ | $n^{\prime}$ | $k^{\prime}$ | $A^{\prime}$ | $B^{\prime}$ | $C^{\prime}$ |
Column Major | Row Major | Column Major | CUBLAS_OP_N | CUBLAS_OP_T | $m^{\prime}$ | $n^{\prime}$ | $k^{\prime}$ | $A^{\prime}$ | $B^{\prime}$ | $C^{\prime}$ |
Row Major | Row Major | Column Major | CUBLAS_OP_T | CUBLAS_OP_T | $m^{\prime}$ | $n^{\prime}$ | $k^{\prime}$ | $A^{\prime}$ | $B^{\prime}$ | $C^{\prime}$ |
Column Major | Column Major | Row Major | CUBLAS_OP_T | CUBLAS_OP_T | $n^{\prime}$ | $m^{\prime}$ | $k^{\prime}$ | $B^{\prime}$ | $A^{\prime}$ | $C^{\prime}$ |
Row Major | Column Major | Row Major | CUBLAS_OP_T | CUBLAS_OP_N | $n^{\prime}$ | $m^{\prime}$ | $k^{\prime}$ | $B^{\prime}$ | $A^{\prime}$ | $C^{\prime}$ |
Column Major | Row Major | Row Major | CUBLAS_OP_N | CUBLAS_OP_T | $n^{\prime}$ | $m^{\prime}$ | $k^{\prime}$ | $B^{\prime}$ | $A^{\prime}$ | $C^{\prime}$ |
Row Major | Row Major | Row Major | CUBLAS_OP_N | CUBLAS_OP_N | $n^{\prime}$ | $m^{\prime}$ | $k^{\prime}$ | $B^{\prime}$ | $A^{\prime}$ | $C^{\prime}$ |
References
cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices
https://leimao.github.io/blog/cuBLAS-Transpose-Column-Major-Relationship/