cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

12-12-202412-12-2024 blog 7 minutes read (About 1012 words) visits

Introduction

The cuBLAS GEMM API has very strict requirements on the storage format of the input and output matrices. If all the matrices are stored in column-major format, the cuBLAS GEMM API can be used straightforwardly. But if some of the matrices are stored in row-major format, setting the parameters for the cuBLAS GEMM API for such matrix multiplications can be error-prone.

In this blog post, we will discuss the relationship between the transpose and column-major storage of matrices and how cuBLAS GEMM API should be used for different cases.

cuBLAS GEMM

cuBLAS GEMM API

The cuBLAS single-precision GEMM API is declared as follows.

cublasStatus_t cublasSgemm(cublasHandle_t handle,
                           cublasOperation_t transa, cublasOperation_t transb,
                           int m, int n, int k,
                           const float *alpha,
                           const float *A, int lda,
                           const float *B, int ldb,
                           const float *beta,
                           float *C, int ldc)

This function performs the general matrix-matrix multiplication

$$
\begin{align}
C = \alpha \text{op}(A) \text{op}(B) + \beta C
\end{align}
$$

where $\alpha$ and $\beta$ are scalars, and $A$, $B$, and $C$ are matrices stored in column-major format with dimensions of $\text{op}(A)$ being $m \times k$, $\text{op}(B)$ being $k \times n$, and $C$ being $m \times n$, respectively. Also for matrix $A$

$$
\begin{align}
\text{op}(A) =
\begin{cases}
A & \text{if transa = CUBLAS_OP_N} \\
A^{\top} & \text{if transa = CUBLAS_OP_T} \\
A^{\dagger} & \text{if transa = CUBLAS_OP_C} \\
\end{cases}
\end{align}
$$

cuBLAS GEMM and Row-Major Matrices

But what if some of the matrices are stored in row-major format? Let’s see a few examples.

Suppose $m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ is stored in row-major format, and $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ and $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ are stored in column-major format. The transpose of $A^{\prime}$, $k^{\prime} \times m^{\prime}$ matrix $A^{\prime\top}$, stored in column-major format, is equivalent to the original $A^{\prime}$ stored in row-major format. But in order to perform the general matrix-matrix multiplication using cuBLAS, $A^{\prime\top}$ has to be transposed to $A^{\prime}$. In this case, transa = CUBLAS_OP_T, transb = CUBLAS_OP_N, m = m', n = n', k = k', A = A', B = B', and C = C'.

Suppose $m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$ and $k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$ are stored in column-major format, and $m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$ is stored in row-major format. In this case, there is no way to transpose $C^{\prime}$ via the cuBLAS API.

We notice that we could transpose matrix $C$ in the formula first before performing the general matrix-matrix multiplication.

$$
\begin{align}
C^{\top} &= \alpha \left(\text{op}(A) \text{op}(B)\right)^{\top} + \beta C^{\top} \\
&= \alpha \text{op}(B)^{\top} \text{op}(A)^{\top} + \beta C^{\top} \\
&= \alpha \text{op}(B^{\top}) \text{op}(A^{\top}) + \beta C^{\top}
\end{align}
$$

So if $B^{\top}$, $A^{\top}$, and $C^{\top}$ are stored in column-major format, we could still perform the general matrix-matrix multiplication using the existing cuBLAS API.

In this case, the transpose of $C^{\prime}$, $n^{\prime} \times m^{\prime}$ matrix $C^{\prime\top}$, stored in column-major format, is equivalent to the original $C^{\prime}$ stored in row-major format. In addition, the matrix $A^{\prime}$ and $B^{\prime}$ have to be transposed as well. The transpose of $A^{\prime}$, $k^{\prime} \times m^{\prime}$ matrix $A^{\prime\top}$, stored in row-major format, is equivalent to the original $A^{\prime}$ stored in column-major format. The transpose of $B^{\prime}$, $n^{\prime} \times k^{\prime}$ matrix $B^{\prime\top}$, stored in row-major format, is equivalent to the original $B^{\prime}$ stored in column-major format. In this case, transa = CUBLAS_OP_T, transb = CUBLAS_OP_T, m = n', n = m', k = k', A = B', B = A', and C = C'.

Conclusions

Suppose we want to perform matrix multiplication $C^{\prime} = \alpha A^{\prime} B^{\prime} + \beta C^{\prime}$, where $A^{\prime}$, $B^{\prime}$, and $C^{\prime}$ are matrices of shapes $m^{\prime} \times k^{\prime}$, $k^{\prime} \times n^{\prime}$, and $m^{\prime} \times n^{\prime}$, respectively, using cuBLAS API. The following table summarizes the relationship between the transpose and column-major storage of matrices $A^{\prime}$, $B^{\prime}$, and $C^{\prime}$, and how cuBLAS API should be used.

$m^{\prime} \times k^{\prime}$ matrix $A^{\prime}$	$k^{\prime} \times n^{\prime}$ matrix $B^{\prime}$	$m^{\prime} \times n^{\prime}$ matrix $C^{\prime}$	transa	transb	m	n	k	A	B	C
Column Major	Column Major	Column Major	CUBLAS_OP_N	CUBLAS_OP_N	$m^{\prime}$	$n^{\prime}$	$k^{\prime}$	$A^{\prime}$	$B^{\prime}$	$C^{\prime}$
Row Major	Column Major	Column Major	CUBLAS_OP_T	CUBLAS_OP_N	$m^{\prime}$	$n^{\prime}$	$k^{\prime}$	$A^{\prime}$	$B^{\prime}$	$C^{\prime}$
Column Major	Row Major	Column Major	CUBLAS_OP_N	CUBLAS_OP_T	$m^{\prime}$	$n^{\prime}$	$k^{\prime}$	$A^{\prime}$	$B^{\prime}$	$C^{\prime}$
Row Major	Row Major	Column Major	CUBLAS_OP_T	CUBLAS_OP_T	$m^{\prime}$	$n^{\prime}$	$k^{\prime}$	$A^{\prime}$	$B^{\prime}$	$C^{\prime}$
Column Major	Column Major	Row Major	CUBLAS_OP_T	CUBLAS_OP_T	$n^{\prime}$	$m^{\prime}$	$k^{\prime}$	$B^{\prime}$	$A^{\prime}$	$C^{\prime}$
Row Major	Column Major	Row Major	CUBLAS_OP_T	CUBLAS_OP_N	$n^{\prime}$	$m^{\prime}$	$k^{\prime}$	$B^{\prime}$	$A^{\prime}$	$C^{\prime}$
Column Major	Row Major	Row Major	CUBLAS_OP_N	CUBLAS_OP_T	$n^{\prime}$	$m^{\prime}$	$k^{\prime}$	$B^{\prime}$	$A^{\prime}$	$C^{\prime}$
Row Major	Row Major	Row Major	CUBLAS_OP_N	CUBLAS_OP_N	$n^{\prime}$	$m^{\prime}$	$k^{\prime}$	$B^{\prime}$	$A^{\prime}$	$C^{\prime}$

References

cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

https://leimao.github.io/blog/cuBLAS-Transpose-Column-Major-Relationship/

Author

Lei Mao

Posted on

12-12-2024

Updated on

12-12-2024

Licensed under

Accelerated Computing,

CUDA,

cuBLAS

cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

Introduction

cuBLAS GEMM

cuBLAS GEMM API

cuBLAS GEMM and Row-Major Matrices

Conclusions

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue