Pass Function Pointers to Kernels in CUDA Programming
Introduction
Ever since I started to learn to CUDA, my impression of CUDA kernels is that it is a very isolated piece of code in the program and has lots of different restrictions. Because of this, I used to write CUDA kernel functions that have code duplications and do similar jobs. Today let us take a look at how to use C++ templates and function pointers for CUDA kernels to reduce the code duplications.
It should be noted that to the best of my knowledge there is no similar tutorial on this. I experimented a lot and make the final program available to the public.
Tutorial
Code
The following is the code to compute the sum and the product of two values by passing different function pointers to the CUDA kernel. It also uses C++ template extensively. The code is also available on my GitHub Gist.
The key to passing function pointers to CUDA kernel is to use static pointers to device pointers followed by copying the pointers to the host side. Otherwise, I am sure you will get different kinds of weird errors.
1 |
|
Compile
To compile the program, use nvcc
.
1 | $ nvcc main.cu -o main |
Run
If the program compiles successfully, you should be able to see the following message when you run the program.
1 | $ ./main |
References
Pass Function Pointers to Kernels in CUDA Programming
https://leimao.github.io/blog/Pass-Function-Pointers-to-Kernels-CUDA/