Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs
  • Tags
  • CUDA

How Is FARS, The Fully Automated Research System?

 04-22-2026 04-22-2026 blog 5 minutes read (About 740 words)
The AI Just Tried To Fool People

 
Artificial Intelligence, 
CUDA, 
Research, 
CUDA Graphs, 
Mixture-of-Experts  
  Read More

Page Table for Page-Locked Host Memory

 04-12-2026 04-12-2026 blog 17 minutes read (About 2541 words)
Page Table GPU Memory Overhead and Sharing Page-Locked Host Memory Across Processes

 
CUDA, 
Computer Architecture, 
NVIDIA, 
GPU, 
Memory Management  
  Read More

CUDA_LAUNCH_BLOCKING=1

 03-20-2026 03-20-2026 blog 4 minutes read (About 599 words)
Debugging CUDA Applications

 
CUDA, 
Debug  
  Read More

CUDA Shared Memory Bank Conflict-Free Vectorized Access

 02-13-2026 02-13-2026 blog 14 minutes read (About 2060 words)
Instruction-Level Phase Based Bank Conflict-Free Execution

 
CUDA, 
NVIDIA, 
Parallel Computing, 
GPU  
  Read More

CUDA Rendezvous Stream

 01-26-2026 01-26-2026 blog 11 minutes read (About 1690 words)
Simplifying Synchronization Complexities Using CUDA Rendezvous Streams

 
CUDA, 
NVIDIA, 
Parallel Computing, 
GPU  
  Read More

PyTorch CUDA Graph Capture

 01-12-2026 01-12-2026 blog 23 minutes read (About 3454 words)
Using PyTorch CUDA Graph APIs

 
CUDA, 
PyTorch, 
CUDA Graph, 
Perfetto  
  Read More

NVIDIA NVML GPU Statistics

 12-25-2025 12-25-2025 blog 15 minutes read (About 2214 words)
Mimicking nvidia-smi dmon Using NVIDIA NVML

 
CPP, 
CUDA, 
NVIDIA, 
GPU, 
NVML  
  Read More

NVIDIA Tensor Core TN Layout MMA Instruction

 12-06-2025 12-06-2025 blog 16 minutes read (About 2389 words)
GEMM Layout, History, Performance, and Implementation

 
CPP, 
CUDA, 
NVIDIA, 
CUTLASS, 
CuTe, 
MMA, 
GEMM, 
Tensor Core  
  Read More

Benchmarking NVIDIA Tensor Core MMA Instruction Peak Performances

 11-26-2025 11-26-2025 blog 11 minutes read (About 1646 words)
Reproducing NVIDIA Advertised GPU AI Peak Performances Using CUTLASS and CuTe

 
CPP, 
CUDA, 
NVIDIA, 
CUTLASS, 
CuTe, 
MMA, 
GEMM, 
Tensor Core  
  Read More

Nsight Streamer

 11-04-2025 11-04-2025 blog 3 minutes read (About 515 words)
Nsight Systems and Nsight Compute GUIs In a Web Browser

 
CUDA, 
NVIDIA, 
Nsight Compute, 
Nsight Systems, 
Nsight Streamer  
  Read More

CuTe Arithmetic Tuple Tensor

 10-20-2025 10-20-2025 blog 16 minutes read (About 2388 words)
The Tensor Coordinate Generator In CuTe

 
Mathematics, 
CUDA, 
Accelerated Computing, 
CUTLASS, 
CuTe  
  Read More

CuTe Tiled Copy

 10-16-2025 10-16-2025 blog 28 minutes read (About 4216 words)
Understanding CuTe Tiled Copy

 
Mathematics, 
CUDA, 
Accelerated Computing, 
CUTLASS, 
CuTe  
  Read More
Previous
Next
  • 1
  • 2
  • …
  • 7
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Menlo Park, California

Posts

1341

Categories

8

Tags

807

  Follow   Sponsor

Advertisement


Categories

  • article21
  • blog569
  • essay340
  • life310
  • miscellaneous2
  • photography71
  • project20
  • reading8

follow.it

Recents

04-22-2026

How Is FARS, The Fully Automated Research System?

blog

04-22-2026

算计: 七天的死亡游戏

essay

04-18-2026

Lake Chabot Regional Park 徒步

life

04-18-2026

Lake Chabot Regional Park

photography

04-16-2026

2023 年恐怖电影《感恩节》

essay

Archives

  • April 202614
  • March 202618
  • February 202617
  • January 202616
  • December 202536
  • See All >>

Tags

Outdoors315
California246
Hiking239
CPP121
Mathematics102
Deep Learning86
Photography85
CUDA74
Running68
Wildlife62
Bird56
Racing45
Movie37
Python36
Software Engineering36
Machine Learning34
NVIDIA32
Statistics32
China31
Linux31
See All >>
Lei Mao's Log Book

© 2017-2026 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×