Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs
  • Tags
  • Accelerated Computing

CuTe Tiled MMA

 01-09-2025 10-19-2025 blog 30 minutes read (About 4482 words)
Understanding CuTe Tiled MMA Using an Example

 
CUTLASS, 
CUDA, 
CuTe, 
Accelerated Computing  
  Read More

AWQ: Activation-Aware Weight Quantization

 01-01-2025 01-01-2025 blog 18 minutes read (About 2738 words)
Same Performance as Group-Wise Weight-Only Quantization But with Better Accuracy

 
Deep Learning, 
Mathematics, 
Quantization, 
CUDA, 
Accelerated Computing  
  Read More

cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

 12-12-2024 12-12-2024 blog 7 minutes read (About 1012 words)
Calling cuBLAS GEMM API Correctly

 
CUDA, 
Accelerated Computing, 
cuBLAS  
  Read More

CuTe Swizzle

 12-01-2024 10-01-2025 blog 19 minutes read (About 2909 words)
CuTe Shared Memory Swizzling Abstractions

 
Mathematics, 
CUTLASS, 
CUDA, 
CuTe, 
Accelerated Computing  
  Read More

CuTe Matrix Transpose

 11-20-2024 09-30-2025 article an hour read (About 10892 words)
Matrix Transpose CUDA Kernel Implementation Using CuTe

 
Mathematics, 
CUTLASS, 
CUDA, 
CuTe, 
Accelerated Computing  
  Read More

Build and Develop CUTLASS CUDA Kernels

 11-12-2024 11-17-2024 blog 7 minutes read (About 1029 words)
Employing CUTLASS for Accelerated Computing

 
CUTLASS, 
CUDA, 
Accelerated Computing, 
Docker, 
CMake  
  Read More

CuTe Layout Algebra

 10-20-2024 07-14-2025 article 2 hours read (About 19874 words)
Mathematical Fundamentals to CUTLASS Computing

 
Mathematics, 
CUTLASS, 
CUDA, 
CuTe, 
Accelerated Computing, 
Category Theory  
  Read More

PyTorch Eager Mode Quantization TensorRT Acceleration

 05-24-2024 05-24-2024 blog 7 minutes read (About 1051 words)
TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models

 
Deep Learning, 
Python, 
Inference, 
Quantization, 
Accelerated Computing, 
NVIDIA, 
TensorRT, 
PyTorch, 
GPU  
  Read More

CUDA Matrix Multiplication Optimization

 01-20-2024 01-20-2024 article 2 hours read (About 19282 words)
General Matrix Multiplication CUDA Performance Optimization

 
CPP, 
CUDA, 
Accelerated Computing, 
NVIDIA  
  Read More

CUDA Tensor Layouts for Convolution

 06-04-2023 06-04-2023 blog 13 minutes read (About 1960 words)
Motivations for Different Tensor Layouts

 
CUDA, 
Accelerated Computing  
  Read More

NVIDIA Tensor Core Programming

 05-18-2023 12-27-2023 blog 28 minutes read (About 4243 words)
Fast Matrix Multiplication and Accumulation on GPU

 
CPP, 
CUDA, 
Accelerated Computing, 
NVIDIA  
  Read More

Moore's Law

 04-10-2023 04-10-2023 blog 7 minutes read (About 1085 words)
Moore's Law Is Dead. What's Next?

 
Accelerated Computing, 
GPU, 
CPU  
  Read More
Previous
Next
  • 1
  • 2
  • 3
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Menlo Park, California

Posts

1239

Categories

8

Tags

768

  Follow   Sponsor

Advertisement


Categories

  • article20
  • blog547
  • essay314
  • life279
  • miscellaneous2
  • photography49
  • project20
  • reading8

follow.it

Recents

11-30-2025

避免使用劣质湿巾

essay

11-28-2025

Ed R. Levin County Park

photography

11-28-2025

Ed R. Levin County Park 徒步

life

11-27-2025

血谜拼图

essay

11-27-2025

2025 Bishop Ranch Turkey Trot 5K 竞赛

life

Archives

  • November 202525
  • October 202524
  • September 202515
  • August 202527
  • July 202523
  • See All >>

Tags

Outdoors284
Hiking217
California215
CPP116
Mathematics102
Deep Learning84
CUDA66
Photography62
Running57
Wildlife41
Bird36
Racing36
Software Engineering36
Machine Learning34
Python34
Movie32
Statistics32
Park31
Linux30
China29
See All >>
Lei Mao's Log Book

© 2017-2025 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×