Use Shared Memory in Templated Kernels in CUDA Programming
Introduction
Using shared memory in CUDA could potentially increase the performance of your program. However, when I tried to use shared memory in templated CUDA kernels, I got weird errors from compiler. It turns out that CUDA does not directly allow shared memory usage in template functions. After searching for a while, I found the motivations behind and some solutions to get around this problem.
Problem
Let’s say we want to use shared memory in the following templated kernel function.
1 | template <typename T> |
When you compile the program, you will definitely get the following error.
1 | $ make |
Problem Causes
The problem root is actually simple. In order to use shared memory, we have to use the keyword extern
in our kernel function to declare a variable outside the current scope. It has no problem at all when the kernel function is not templated. However, if your kernel function is templated, there is a chance that you will use different types for the templated the kernel functions, and the extern
variable you declared will have conflicting types. Therefore it is not allowed to use shared memory with template type directly in the kernel function.
Solutions
Use CUDPP Header
One solution is to use the SharedMemory
struct defined in the open source CUDPP library. You could simply copy the sharedmem.h file to your source directory, and use the following code to declare shared memory. Then everything compiles!
1 |
|
How does it work? Let us check the source code.
1 | template <typename T> |
We can easily see from the source code that basically SharedMemory
for different types of the shared memory have different variable names! No conflicts anymore. It is implemented using C++ template specialization so that for different types the variable name could be different.
Use Pointer Casting
The other simple solution is to use pointer typecasting.
1 |
|
This solution essentially uses the same pointer variable name for memory but casting the pointer type to different types later on.
References
Use Shared Memory in Templated Kernels in CUDA Programming
https://leimao.github.io/blog/CUDA-Shared-Memory-Templated-Kernel/