Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs
  • Tags
  • CPP

PyTorch Custom Operation

 05-10-2026 05-10-2026 blog 23 minutes read (About 3501 words)
Implementing PyTorch Custom Operations In C++ and CUDA Using torch.library

 
CPP, 
Python, 
CUDA, 
PyTorch, 
AOTInductor  
  Read More

PyTorch Export

 03-31-2026 04-01-2026 blog 6 minutes read (About 857 words)
Exporting Graph-Representable PyTorch Models for Inference

 
CPP, 
Inference, 
PyTorch  
  Read More

C++ Latch and Barrier

 02-06-2026 02-06-2026 blog 8 minutes read (About 1154 words)
Scheduling and Synchronizing Threads Using std::latch and std::barrier

 
CPP, 
Multithreading, 
Parallel Programming  
  Read More

NVIDIA NVML GPU Statistics

 12-25-2025 12-25-2025 blog 15 minutes read (About 2214 words)
Mimicking nvidia-smi dmon Using NVIDIA NVML

 
CPP, 
CUDA, 
NVIDIA, 
GPU, 
NVML  
  Read More

Radix Sort

 12-18-2025 12-18-2025 blog 19 minutes read (About 2808 words)
A Non-Comparative Sorting Algorithm

 
CPP, 
Python, 
Algorithm  
  Read More

NVIDIA Tensor Core TN Layout MMA Instruction

 12-06-2025 12-06-2025 blog 16 minutes read (About 2389 words)
GEMM Layout, History, Performance, and Implementation

 
CPP, 
CUDA, 
NVIDIA, 
CUTLASS, 
CuTe, 
MMA, 
GEMM, 
Tensor Core  
  Read More

Benchmarking NVIDIA Tensor Core MMA Instruction Peak Performances

 11-26-2025 11-26-2025 blog 11 minutes read (About 1646 words)
Reproducing NVIDIA Advertised GPU AI Peak Performances Using CUTLASS and CuTe

 
CPP, 
CUDA, 
NVIDIA, 
CUTLASS, 
CuTe, 
MMA, 
GEMM, 
Tensor Core  
  Read More

Core Dump and GDB

 11-15-2025 11-15-2025 blog 7 minutes read (About 1029 words)
Analyzing Core Dump Files Using GDB

 
CPP, 
GDB, 
Core Dump  
  Read More

AddressSanitizer

 09-27-2025 09-27-2025 blog 21 minutes read (About 3161 words)
Compile-Time Instrumentation for Detecting Memory Errors

 
CPP, 
CMake, 
GCC, 
Memory Error  
  Read More

Illegal Memory Access and Segmentation Fault

 08-27-2025 08-27-2025 blog 9 minutes read (About 1381 words)
Memory Access Boundary Checking

 
CPP, 
Operating System, 
Memory Management, 
Memory Safety  
  Read More

Floating Point Constant Values In C++, CUDA, and Python

 08-22-2025 08-22-2025 blog 6 minutes read (About 889 words)
Essential Constants for Numerical Algorithms and Scientific Computations

 
CPP, 
Python, 
CUDA  
  Read More

Load CUDA Kernel at Runtime Using CUDA Driver APIs

 06-30-2025 06-30-2025 blog an hour read (About 11131 words)
Dynamically Loading CUDA Kernels

 
CPP, 
CUDA  
  Read More
Previous
Next
  • 1
  • 2
  • …
  • 11
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Menlo Park, California

Posts

1365

Categories

8

Tags

818

  Follow   Sponsor

Advertisement


Categories

  • article21
  • blog575
  • essay347
  • life317
  • miscellaneous2
  • photography75
  • project20
  • reading8

follow.it

Recents

05-28-2026

PyTorch AOTInductor Hybrid Lowering

blog

05-23-2026

Carquinez Strait Regional Shoreline 徒步

life

05-23-2026

Carquinez Strait Regional Shoreline

photography

05-22-2026

PyTorch Triton Kernel Transparent Tracing and Compilation

blog

05-22-2026

脸庞

essay

Archives

  • May 202620
  • April 202618
  • March 202618
  • February 202617
  • January 202616
  • See All >>

Tags

Outdoors322
California253
Hiking242
CPP122
Mathematics102
Photography89
Deep Learning87
CUDA75
Running73
Wildlife66
Bird60
Racing49
Movie39
Python37
Software Engineering36
Machine Learning35
China33
Linux32
NVIDIA32
Statistics32
See All >>
Lei Mao's Log Book

© 2017-2026 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×