cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

Introduction

The cuBLAS GEMM API has very strict requirements on the storage format of the input and output matrices. If all the matrices are stored in column-major format, the cuBLAS GEMM API can be used straightforwardly. But if some of the matrices are stored in row-major format, setting the parameters for the cuBLAS GEMM API for such matrix multiplications can be error-prone.

In this blog post, we will discuss the relationship between the transpose and column-major storage of matrices and how cuBLAS GEMM API should be used for different cases.

cuBLAS GEMM

cuBLAS GEMM API

The cuBLAS single-precision GEMM API is declared as follows.

1
2
3
4
5
6
7
8
cublasStatus_t cublasSgemm(cublasHandle_t handle,
cublasOperation_t transa, cublasOperation_t transb,
int m, int n, int k,
const float *alpha,
const float *A, int lda,
const float *B, int ldb,
const float *beta,
float *C, int ldc)

This function performs the general matrix-matrix multiplication

$$
\begin{align}
C = \alpha \text{op}(A) \text{op}(B) + \beta C
\end{align}
$$

where $\alpha$ and $\beta$ are scalars, and $A$, $B$, and $C$ are matrices stored in column-major format with dimensions of $\text{op}(A)$ being $m \times k$, $\text{op}(B)$ being $k \times n$, and $C$ being $m \times n$, respectively. Also for matrix $A$

$$
\begin{align}
\text{op}(A) =
\begin{cases}
A & \text{if transa = CUBLAS_OP_N} \\
A^{\top} & \text{if transa = CUBLAS_OP_T} \\
A^{\dagger} & \text{if transa = CUBLAS_OP_C} \\
\end{cases}
\end{align}
$$

cuBLAS GEMM and Row-Major Matrices

But what if some of the matrices are stored in row-major format? Let’s see a few examples.

Suppose $m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ is stored in row-major format, and $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ and $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ are stored in column-major format. The transpose of $A^{\prime}$, $k^{\prime} \times m^{\prime}$ matrix $A^{\prime\top}$, stored in column-major format, is equivalent to the original $A^{\prime}$ stored in row-major format. But in order to perform the general matrix-matrix multiplication using cuBLAS, $A^{\prime\top}$ has to be transposed to $A^{\prime}$. In this case, transa = CUBLAS_OP_T, transb = CUBLAS_OP_N, m = m', n = n', k = k', A = A', B = B', and C = C'.

Suppose $m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ and $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ are stored in column-major format, and $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ is stored in row-major format. In this case, there is no way to transpose $C^{\prime}$ via the cuBLAS API.

We notice that we could transpose matrix $C$ in the formula first before performing the general matrix-matrix multiplication.

$$
\begin{align}
C^{\top} &= \alpha \left(\text{op}(A) \text{op}(B)\right)^{\top} + \beta C^{\top} \\
&= \alpha \text{op}(B)^{\top} \text{op}(A)^{\top} + \beta C^{\top} \\
&= \alpha \text{op}(B^{\top}) \text{op}(A^{\top}) + \beta C^{\top}
\end{align}
$$

So if $B^{\top}$, $A^{\top}$, and $C^{\top}$ are stored in column-major format, we could still perform the general matrix-matrix multiplication using the existing cuBLAS API.

In this case, the transpose of $C^{\prime}$, $n^{\prime} \times m^{\prime}$ matrix $C^{\prime\top}$, stored in column-major format, is equivalent to the original $C^{\prime}$ stored in row-major format. In addition, the matrix $A^{\prime}$ and $B^{\prime}$ have to be transposed as well. The transpose of $A^{\prime}$, $k^{\prime} \times m^{\prime}$ matrix $A^{\prime\top}$, stored in row-major format, is equivalent to the original $A^{\prime}$ stored in column-major format. The transpose of $B^{\prime}$, $n^{\prime} \times k^{\prime}$ matrix $B^{\prime\top}$, stored in row-major format, is equivalent to the original $B^{\prime}$ stored in column-major format. In this case, transa = CUBLAS_OP_T, transb = CUBLAS_OP_T, m = n', n = m', k = k', A = B', B = A', and C = C'.

Conclusions

Suppose we want to perform matrix multiplication $C^{\prime} = \alpha A^{\prime} B^{\prime} + \beta C^{\prime}$, where $A^{\prime}$, $B^{\prime}$, and $C^{\prime}$ are matrices of shapes $m^{\prime} \times k^{\prime}$, $k^{\prime} \times n^{\prime}$, and $m^{\prime} \times n^{\prime}$, respectively, using cuBLAS API. The following table summarizes the relationship between the transpose and column-major storage of matrices $A^{\prime}$, $B^{\prime}$, and $C^{\prime}$, and how cuBLAS API should be used.

$m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ transa transb m n k A B C
Column Major Column Major Column Major CUBLAS_OP_N CUBLAS_OP_N $m^{\prime}$ $n^{\prime}$ $k^{\prime}$ $A^{\prime}$ $B^{\prime}$ $C^{\prime}$
Row Major Column Major Column Major CUBLAS_OP_T CUBLAS_OP_N $m^{\prime}$ $n^{\prime}$ $k^{\prime}$ $A^{\prime}$ $B^{\prime}$ $C^{\prime}$
Column Major Row Major Column Major CUBLAS_OP_N CUBLAS_OP_T $m^{\prime}$ $n^{\prime}$ $k^{\prime}$ $A^{\prime}$ $B^{\prime}$ $C^{\prime}$
Row Major Row Major Column Major CUBLAS_OP_T CUBLAS_OP_T $m^{\prime}$ $n^{\prime}$ $k^{\prime}$ $A^{\prime}$ $B^{\prime}$ $C^{\prime}$
Column Major Column Major Row Major CUBLAS_OP_T CUBLAS_OP_T $n^{\prime}$ $m^{\prime}$ $k^{\prime}$ $B^{\prime}$ $A^{\prime}$ $C^{\prime}$
Row Major Column Major Row Major CUBLAS_OP_T CUBLAS_OP_N $n^{\prime}$ $m^{\prime}$ $k^{\prime}$ $B^{\prime}$ $A^{\prime}$ $C^{\prime}$
Column Major Row Major Row Major CUBLAS_OP_N CUBLAS_OP_T $n^{\prime}$ $m^{\prime}$ $k^{\prime}$ $B^{\prime}$ $A^{\prime}$ $C^{\prime}$
Row Major Row Major Row Major CUBLAS_OP_N CUBLAS_OP_N $n^{\prime}$ $m^{\prime}$ $k^{\prime}$ $B^{\prime}$ $A^{\prime}$ $C^{\prime}$

References

cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

https://leimao.github.io/blog/cuBLAS-Transpose-Column-Major-Relationship/

Author

Lei Mao

Posted on

12-12-2024

Updated on

12-12-2024

Licensed under


Comments