CUDA Shared Memory Capacity 07-04-2022 07-04-2022 blog 12 minutes read (About 1863 words)Use Large Shared Memory for CUDA Kernel Optimization CUDA Read more
CUDA Occupancy Calculation 06-25-2022 06-25-2022 blog 4 minutes read (About 567 words)Ensuring High CUDA Occupancy for Performance CUDA Read more
CUDA Shared Memory Bank 06-22-2022 06-22-2022 blog 9 minutes read (About 1349 words)Avoiding CUDA Shared Memory Bank Conflicts CUDA Read more
CUDA Kernel Execution Overlap 06-10-2022 06-10-2022 blog 7 minutes read (About 1038 words)CUDA Computation Resources, CUDA Implicit Synchronization, and CUDA Kernel Execution CUDA Read more
Nsight Systems in Docker 06-01-2022 06-01-2022 blog 4 minutes read (About 658 words)Portable Nsight Systems Docker, CUDA Read more
Proper CUDA Error Checking 05-25-2022 05-25-2022 blog 7 minutes read (About 1079 words)Best Practice for CUDA Error Checking CUDA Read more
CUDA Compilation Architecture Macro 05-01-2022 05-01-2022 blog 10 minutes read (About 1439 words)Compilation Control Flow for Different GPU Architectures GPU, CUDA Read more
CUDA Compilation 04-28-2022 04-28-2022 blog 6 minutes read (About 846 words)GPU Compilation and Compatibility GPU, CUDA Read more
Function Binding and Performance Measurement 04-07-2022 05-12-2022 blog 6 minutes read (About 938 words)Creating Helper Functions for Performance Measurement in C++, CUDA and Python CPP, Python, CUDA Read more
CUDA Matrix Multiplication 03-21-2022 03-28-2022 blog 32 minutes read (About 4801 words)Implement Matrix Multiplication and Batched Matrix Multiplication Using CUDA CPP, CUDA Read more
PyTorch Benchmark 12-13-2021 12-13-2021 blog 9 minutes read (About 1289 words)Equivalence of the Exponential Function Definitions CUDA, PyTorch Read more
Multi-Thread Single-Stream VS Single-Thread Multi-Stream CUDA 10-18-2021 05-12-2022 blog 13 minutes read (About 1944 words)CUDA Programming Choices for CUDA Stream Deep Learning, Mathematics, CUDA, High Performance Computing, Computer Architecture, Parallel Computing Read more
Page-Locked Host Memory for Data Transfer 06-26-2021 06-26-2021 blog 6 minutes read (About 966 words)Faster Data Transfer Between Host and CUDA Device CUDA, Operating System, CUDA Programming Read more
CUDA Driver VS CUDA Runtime 10-01-2020 10-30-2020 blog 4 minutes read (About 593 words)libcuda.so VS libcudart.so Software Engineering, CUDA Read more
CUDA Stream 02-02-2020 06-12-2022 blog 8 minutes read (About 1261 words)Understand CUDA Stream Based Concurrency from High Level CUDA Read more
Use Shared Memory in Templated Kernels in CUDA Programming 05-04-2019 05-04-2019 blog 5 minutes read (About 702 words)A Trick to Work Around CPP, CUDA, C Read more
Pass Function Pointers to Kernels in CUDA Programming 04-28-2019 04-28-2019 blog 4 minutes read (About 547 words)Some Alchemy in CUDA Programming CPP, CUDA, C Read more
CUDA Block and Grid 03-12-2019 03-12-2019 blog 4 minutes read (About 588 words)Understand the Concept of Block and Grid in CUDA Parallel Computing CUDA Read more