Build and Develop CUTLASS CUDA Kernels 11-12-2024 11-17-2024 blog 7 minutes read (About 1029 words)Employing CUTLASS for Accelerated Computing Accelerated Computing, CUDA, CUTLASS, Docker, CMake Read More
CuTe Layout Algebra 10-20-2024 07-14-2025 article 2 hours read (About 19874 words)Mathematical Fundamentals to CUTLASS Computing Mathematics, Accelerated Computing, CUDA, CUTLASS, CuTe, Category Theory Read More
PyTorch Eager Mode Quantization TensorRT Acceleration 05-24-2024 05-24-2024 blog 7 minutes read (About 1051 words)TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models Deep Learning, Python, Inference, Quantization, Accelerated Computing, NVIDIA, TensorRT, PyTorch, GPU Read More
CUDA Matrix Multiplication Optimization 01-20-2024 01-20-2024 article 2 hours read (About 19282 words)General Matrix Multiplication CUDA Performance Optimization CPP, Accelerated Computing, CUDA, NVIDIA Read More
CUDA Tensor Layouts for Convolution 06-04-2023 06-04-2023 blog 13 minutes read (About 1960 words)Motivations for Different Tensor Layouts Accelerated Computing, CUDA Read More
NVIDIA Tensor Core Programming 05-18-2023 12-27-2023 blog 28 minutes read (About 4243 words)Fast Matrix Multiplication and Accumulation on GPU CPP, Accelerated Computing, CUDA, NVIDIA Read More
Moore's Law 04-10-2023 04-10-2023 blog 7 minutes read (About 1085 words)Moore's Law Is Dead. What's Next? Accelerated Computing, GPU, CPU Read More
Transformer Autoregressive Inference Optimization 04-06-2023 04-06-2023 article 27 minutes read (About 4084 words)Principles for Faster Transformer Inference Deep Learning, Inference, Natural Language Processing, Optimization, Transformer, Accelerated Computing Read More
Strassen Algorithm 01-13-2023 01-13-2023 blog 7 minutes read (About 1016 words)Asymptotically Faster Matrix Multiplication Algorithm Computer Science, Accelerated Computing, Algorithm Read More
CSR Sparse Matrix Multiplication 12-21-2022 12-21-2022 blog 13 minutes read (About 1886 words)Accelerate Sparse Matrix Multiplication Using CSR Format Accelerated Computing Read More
CUDA Matrix Multiplication 03-21-2022 03-04-2023 blog 32 minutes read (About 4792 words)Implement Matrix Multiplication and Batched Matrix Multiplication Using CUDA CPP, Accelerated Computing, CUDA Read More