Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs
  • Tags
  • Accelerated Computing

CuTe ldmatrix

 10-03-2025 10-03-2025 blog 22 minutes read (About 3357 words)
CUDA PTX ldmatrix Instruction and Its CuTe Wrapper

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

CuTe Tilers

 09-15-2025 09-15-2025 blog 10 minutes read (About 1524 words)
Designing Tilers for Data Access

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

CuTe Inverse Layout

 08-13-2025 08-13-2025 blog 9 minutes read (About 1390 words)
Deriving Inverse Layout Mathematically

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

CuTe Blocked and Raked Products

 08-07-2025 08-07-2025 blog 9 minutes read (About 1283 words)
Creating Tiled Layouts Using Blocked Product and Raked Product

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

CuTe Local Tile

 08-01-2025 08-01-2025 blog 6 minutes read (About 865 words)
Elucidating CuTe Inner Partition and Local Tile

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

CuTe Local Partition

 07-25-2025 08-01-2025 blog 15 minutes read (About 2291 words)
Elucidating CuTe Outer Partition and Local Partition

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

CuTe Index To Coordinate

 07-19-2025 07-19-2025 blog 14 minutes read (About 2040 words)
Inverse Layout Function

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

Online Safe Softmax

 06-23-2025 06-23-2025 blog 5 minutes read (About 741 words)
Safe and Efficient Online Softmax Calculation

 
Deep Learning, 
Mathematics, 
Accelerated Computing  
  Read More

Roofline Performance Model

 03-26-2025 03-26-2025 blog 7 minutes read (About 1078 words)
Understand the Performance Limitations and Gaps

 
Accelerated Computing, 
High Performance Computing, 
Computer Architecture, 
Performance  
  Read More

CuTe Tiled MMA

 01-09-2025 10-03-2025 blog 30 minutes read (About 4482 words)
Understanding CuTe Tiled MMA Using an Example

 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe  
  Read More

AWQ: Activation-Aware Weight Quantization

 01-01-2025 01-01-2025 blog 18 minutes read (About 2738 words)
Same Performance as Group-Wise Weight-Only Quantization But with Better Accuracy

 
Deep Learning, 
Mathematics, 
Quantization, 
Accelerated Computing, 
CUDA  
  Read More

cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

 12-12-2024 12-12-2024 blog 7 minutes read (About 1012 words)
Calling cuBLAS GEMM API Correctly

 
Accelerated Computing, 
CUDA, 
cuBLAS  
  Read More
Previous
Next
  • 1
  • 2
  • 3
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Santa Clara, California

Posts

1196

Categories

8

Tags

750

  Follow   Sponsor

Advertisement


Categories

  • article20
  • blog536
  • essay304
  • life267
  • miscellaneous2
  • photography39
  • project20
  • reading8

follow.it

Recents

10-08-2025

Setting Up Remote Development Using Custom Template On Runpod

blog

10-07-2025

恶魔阿萨谢尔在召唤你

essay

10-04-2025

Bair Island

photography

10-04-2025

汉堡王 Monster 套餐

essay

10-04-2025

Bair Island 徒步

life

Archives

  • October 20256
  • September 202515
  • August 202527
  • July 202523
  • June 202547
  • See All >>

Tags

Outdoors272
Hiking206
California203
CPP114
Mathematics100
Deep Learning84
CUDA60
Running52
Photography47
Software Engineering36
Machine Learning34
Python34
Racing34
Movie32
Statistics31
Bird30
Linux30
Park30
Wildlife30
China29
See All >>
Lei Mao's Log Book

© 2017-2025 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×