Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs
  • Tags
  • CUDA

CUDA Performance Hot VS Cold Measurement

 03-12-2025 03-12-2025 blog 8 minutes read (About 1200 words)
Flushing GPU L2 Cache

 
CPP, 
CUDA, 
NVIDIA, 
GPU, 
Nsight Compute  
  Read More

CuTe Tiled MMA

 01-09-2025 10-19-2025 blog 30 minutes read (About 4482 words)
Understanding CuTe Tiled MMA Using an Example

 
CUDA, 
Accelerated Computing, 
CUTLASS, 
CuTe  
  Read More

NVIDIA GPU Compute Capability

 01-02-2025 01-22-2026 blog 15 minutes read (About 2202 words)
A Table of NVIDIA GPUs and Their Compute Capabilities

 
CUDA, 
NVIDIA, 
GPU  
  Read More

AWQ: Activation-Aware Weight Quantization

 01-01-2025 01-01-2025 blog 18 minutes read (About 2738 words)
Same Performance as Group-Wise Weight-Only Quantization But with Better Accuracy

 
Deep Learning, 
Mathematics, 
Quantization, 
CUDA, 
Accelerated Computing  
  Read More

cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices

 12-12-2024 12-12-2024 blog 7 minutes read (About 1012 words)
Calling cuBLAS GEMM API Correctly

 
CUDA, 
Accelerated Computing, 
cuBLAS  
  Read More

SMPlayer GPU Acceleration

 12-06-2024 12-07-2024 blog 2 minutes read (About 328 words)
Playing Videos with GPU Acceleration in SMPlayer

 
CUDA, 
Linux, 
GPU, 
SMPlayer  
  Read More

CuTe Swizzle

 12-01-2024 10-01-2025 blog 19 minutes read (About 2909 words)
CuTe Shared Memory Swizzling Abstractions

 
Mathematics, 
CUDA, 
Accelerated Computing, 
CUTLASS, 
CuTe  
  Read More

CuTe Matrix Transpose

 11-20-2024 09-30-2025 article an hour read (About 10892 words)
Matrix Transpose CUDA Kernel Implementation Using CuTe

 
Mathematics, 
CUDA, 
Accelerated Computing, 
CUTLASS, 
CuTe  
  Read More

Build and Develop CUTLASS CUDA Kernels

 11-12-2024 11-17-2024 blog 7 minutes read (About 1029 words)
Employing CUTLASS for Accelerated Computing

 
CUDA, 
Accelerated Computing, 
CUTLASS, 
Docker, 
CMake  
  Read More

CuTe Layout Algebra

 10-20-2024 07-14-2025 article 2 hours read (About 19874 words)
Mathematical Fundamentals to CUTLASS Computing

 
Mathematics, 
CUDA, 
Accelerated Computing, 
CUTLASS, 
CuTe, 
Category Theory  
  Read More

CUDA Cooperative Groups

 08-06-2024 08-06-2024 blog 20 minutes read (About 3073 words)
CUDA Reduction Using Cooperative Groups As An Example

 
CPP, 
CUDA, 
NVIDIA  
  Read More

CUDA Reduction

 07-30-2024 07-30-2024 blog 15 minutes read (About 2214 words)
Parallel Reduction CUDA Implementations

 
CPP, 
CUDA, 
NVIDIA  
  Read More
Previous
Next
  • 1
  • 2
  • 3
  • 4
  • …
  • 7
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Menlo Park, California

Posts

1336

Categories

8

Tags

805

  Follow   Sponsor

Advertisement


Categories

  • article21
  • blog568
  • essay338
  • life309
  • miscellaneous2
  • photography70
  • project20
  • reading8

follow.it

Recents

04-16-2026

2023 年恐怖电影《感恩节》

essay

04-12-2026

Page Table for Page-Locked Host Memory

blog

04-12-2026

2026 Airport Runway Run at San Carlos Airport 5K 竞赛

life

04-11-2026

Don Edwards San Francisco Bay National Wildlife Refuge - Ravenswood 徒步

life

04-11-2026

Don Edwards San Francisco Bay National Wildlife Refuge - Ravenswood

photography

Archives

  • April 202610
  • March 202618
  • February 202617
  • January 202616
  • December 202535
  • See All >>

Tags

Outdoors314
California245
Hiking238
CPP121
Mathematics102
Deep Learning86
Photography84
CUDA73
Running67
Wildlife61
Bird55
Racing45
Movie36
Python36
Software Engineering36
Machine Learning34
NVIDIA32
Statistics32
China31
Linux31
See All >>
Lei Mao's Log Book

© 2017-2026 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×