CuTe Matrix Transpose 11-20-2024 11-20-2024 article an hour read (About 8775 words)Matrix Transpose CUDA Kernel Implementation Using CuTe Mathematics, CUDA, Accelerated Computing, CUTLASS, CuTe Read More
Build and Develop CUTLASS CUDA Kernels 11-12-2024 11-17-2024 blog 7 minutes read (About 1029 words)Employing CUTLASS for Accelerated Computing Docker, CUDA, CMake, Accelerated Computing, CUTLASS Read More
CuTe Layout Algebra 10-20-2024 10-20-2024 article 2 hours read (About 16932 words)Mathematical Fundamentals to CUTLASS Computing Mathematics, CUDA, Accelerated Computing, CUTLASS, CuTe, Category Theory Read More
PyTorch Eager Mode Quantization TensorRT Acceleration 05-24-2024 05-24-2024 blog 7 minutes read (About 1051 words)TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models Deep Learning, Python, Inference, TensorRT, PyTorch, Quantization, NVIDIA, Accelerated Computing, GPU Read More
CUDA Matrix Multiplication Optimization 01-20-2024 01-20-2024 article 2 hours read (About 19282 words)General Matrix Multiplication CUDA Performance Optimization CPP, CUDA, NVIDIA, Accelerated Computing Read More
CUDA Tensor Layouts for Convolution 06-04-2023 06-04-2023 blog 13 minutes read (About 1960 words)Motivations for Different Tensor Layouts CUDA, Accelerated Computing Read More
NVIDIA Tensor Core Programming 05-18-2023 12-27-2023 blog 28 minutes read (About 4243 words)Fast Matrix Multiplication and Accumulation on GPU CPP, CUDA, NVIDIA, Accelerated Computing Read More
Moore's Law 04-10-2023 04-10-2023 blog 7 minutes read (About 1085 words)Moore's Law Is Dead. What's Next? Accelerated Computing, GPU, CPU Read More
Transformer Autoregressive Inference Optimization 04-06-2023 04-06-2023 article 27 minutes read (About 4084 words)Principles for Faster Transformer Inference Deep Learning, Inference, Natural Language Processing, Transformer, Optimization, Accelerated Computing Read More
Strassen Algorithm 01-13-2023 01-13-2023 blog 7 minutes read (About 1016 words)Asymptotically Faster Matrix Multiplication Algorithm Algorithm, Accelerated Computing, Computer Science Read More