Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs

Perfetto GPU Flow Artifacts

 02-20-2026 02-20-2026 blog 6 minutes read (About 952 words)
Understanding and Resolving Flow Artifacts in Perfetto GPU Profiling Traces

 
GPU, 
Perfetto  
  Read More

CUDA Shared Memory Bank Conflict-Free Vectorized Access

 02-13-2026 02-13-2026 blog 14 minutes read (About 2060 words)
Instruction-Level Phase Based Bank Conflict-Free Execution

 
CUDA, 
NVIDIA, 
Parallel Computing, 
GPU  
  Read More

C++ Latch and Barrier

 02-06-2026 02-06-2026 blog 8 minutes read (About 1154 words)
Scheduling and Synchronizing Threads Using std::latch and std::barrier

 
CPP, 
Multithreading, 
Parallel Programming  
  Read More

CUDA Rendezvous Stream

 01-26-2026 01-26-2026 blog 11 minutes read (About 1690 words)
Simplifying Synchronization Complexities Using CUDA Rendezvous Streams

 
CUDA, 
NVIDIA, 
Parallel Computing, 
GPU  
  Read More

Randomized SVD

 01-19-2026 01-19-2026 blog 12 minutes read (About 1749 words)
Efficient Approximation of Singular Value Decomposition Using Random Projections

 
Linear Algebra, 
SVD, 
Randomized SVD  
  Read More

PyTorch CUDA Graph Capture

 01-12-2026 01-12-2026 blog 23 minutes read (About 3454 words)
Using PyTorch CUDA Graph APIs

 
CUDA, 
PyTorch, 
CUDA Graph, 
Perfetto  
  Read More

Disqus Affiliate Links URL Hijacking

 01-06-2026 01-06-2026 blog 3 minutes read (About 407 words)
URL Hijacking Caused By Third-Party Service

 
Disqus, 
Web Security  
  Read More

Inspecting and Visualizing Torch FX Graph

 12-31-2025 12-31-2025 blog 13 minutes read (About 1882 words)
Torch FxGraphDrawer

 
Python, 
PyTorch, 
Torch FX  
  Read More

NVIDIA NVML GPU Statistics

 12-25-2025 12-25-2025 blog 15 minutes read (About 2214 words)
Mimicking nvidia-smi dmon Using NVIDIA NVML

 
CPP, 
CUDA, 
NVIDIA, 
GPU, 
NVML  
  Read More

Radix Sort

 12-18-2025 12-18-2025 blog 19 minutes read (About 2808 words)
A Non-Comparative Sorting Algorithm

 
CPP, 
Python, 
Algorithm  
  Read More

Install NVIDIA RTX 5080

 12-10-2025 12-10-2025 blog 5 minutes read (About 703 words)
Installing NVIDIA RTX 5080 on an Old Desktop

 
NVIDIA, 
Ubuntu, 
GPU  
  Read More

NVIDIA Tensor Core TN Layout MMA Instruction

 12-06-2025 12-06-2025 blog 16 minutes read (About 2389 words)
GEMM Layout, History, Performance, and Implementation

 
CPP, 
CUDA, 
NVIDIA, 
CUTLASS, 
CuTe, 
MMA, 
GEMM, 
Tensor Core  
  Read More
Previous
Next
  • 1
  • 2
  • 3
  • …
  • 48
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Menlo Park, California

Posts

1354

Categories

8

Tags

812

  Follow   Sponsor

Advertisement


Categories

  • article21
  • blog572
  • essay344
  • life314
  • miscellaneous2
  • photography73
  • project20
  • reading8

follow.it

Recents

05-10-2026

PyTorch Custom Operation

blog

05-10-2026

汉堡王 The Mandalorian and Grogu 套餐

essay

05-09-2026

Tilden Regional Parks Botanic Garden 参观

life

05-09-2026

Tilden Regional Park

photography

05-09-2026

Tilden Regional Parks Botanic Garden

photography

Archives

  • May 20269
  • April 202618
  • March 202618
  • February 202617
  • January 202616
  • See All >>

Tags

Outdoors319
California250
Hiking241
CPP122
Mathematics102
Deep Learning87
Photography87
CUDA75
Running71
Wildlife64
Bird58
Racing47
Movie38
Python37
Software Engineering36
Machine Learning35
China32
Linux32
NVIDIA32
Statistics32
See All >>
Lei Mao's Log Book

© 2017-2026 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×