Lei Mao's Log Book
Lei Mao's Log BookCurriculumBlogArticlesProjectsPublicationsReadingsLifeEssayPhotographyArchivesCategoriesTagsFAQs
  • Tags
  • Accelerated Computing

Build and Develop CUTLASS CUDA Kernels

 11-12-2024 11-17-2024 blog 7 minutes read (About 1029 words)
Employing CUTLASS for Accelerated Computing

 
Accelerated Computing, 
CUDA, 
CUTLASS, 
Docker, 
CMake  
  Read More

CuTe Layout Algebra

 10-20-2024 07-14-2025 article 2 hours read (About 19874 words)
Mathematical Fundamentals to CUTLASS Computing

 
Mathematics, 
Accelerated Computing, 
CUDA, 
CUTLASS, 
CuTe, 
Category Theory  
  Read More

PyTorch Eager Mode Quantization TensorRT Acceleration

 05-24-2024 05-24-2024 blog 7 minutes read (About 1051 words)
TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models

 
Deep Learning, 
Python, 
Inference, 
Quantization, 
Accelerated Computing, 
NVIDIA, 
TensorRT, 
PyTorch, 
GPU  
  Read More

CUDA Matrix Multiplication Optimization

 01-20-2024 01-20-2024 article 2 hours read (About 19282 words)
General Matrix Multiplication CUDA Performance Optimization

 
CPP, 
Accelerated Computing, 
CUDA, 
NVIDIA  
  Read More

CUDA Tensor Layouts for Convolution

 06-04-2023 06-04-2023 blog 13 minutes read (About 1960 words)
Motivations for Different Tensor Layouts

 
Accelerated Computing, 
CUDA  
  Read More

NVIDIA Tensor Core Programming

 05-18-2023 12-27-2023 blog 28 minutes read (About 4243 words)
Fast Matrix Multiplication and Accumulation on GPU

 
CPP, 
Accelerated Computing, 
CUDA, 
NVIDIA  
  Read More

Moore's Law

 04-10-2023 04-10-2023 blog 7 minutes read (About 1085 words)
Moore's Law Is Dead. What's Next?

 
Accelerated Computing, 
GPU, 
CPU  
  Read More

Transformer Autoregressive Inference Optimization

 04-06-2023 04-06-2023 article 27 minutes read (About 4084 words)
Principles for Faster Transformer Inference

 
Deep Learning, 
Inference, 
Natural Language Processing, 
Optimization, 
Transformer, 
Accelerated Computing  
  Read More

Strassen Algorithm

 01-13-2023 01-13-2023 blog 7 minutes read (About 1016 words)
Asymptotically Faster Matrix Multiplication Algorithm

 
Computer Science, 
Accelerated Computing, 
Algorithm  
  Read More

CSR Sparse Matrix Multiplication

 12-21-2022 12-21-2022 blog 13 minutes read (About 1886 words)
Accelerate Sparse Matrix Multiplication Using CSR Format

 
Accelerated Computing  
  Read More

CUDA Matrix Multiplication

 03-21-2022 03-04-2023 blog 32 minutes read (About 4792 words)
Implement Matrix Multiplication and Batched Matrix Multiplication Using CUDA

 
CPP, 
Accelerated Computing, 
CUDA  
  Read More
Previous
Next
  • 1
  • 2
Lei Mao

Lei Mao

Artificial Intelligence Machine Learning Computer Science

Santa Clara, California

Posts

1173

Categories

8

Tags

742

  Follow   Sponsor

Advertisement


Categories

  • article20
  • blog531
  • essay296
  • life262
  • miscellaneous2
  • photography34
  • project20
  • reading8

follow.it

Recents

08-27-2025

Illegal Memory Access and Segmentation Fault

blog

08-24-2025

Alviso Marina County Park

photography

08-24-2025

Alviso Marina County Park 徒步

life

08-23-2025

Downtown Pleasanton

photography

08-23-2025

Marilyn Murphy Kane Trail

photography

Archives

  • August 202525
  • July 202523
  • June 202547
  • May 202527
  • April 202521
  • See All >>

Tags

Outdoors266
Hiking201
California197
CPP112
Mathematics98
Deep Learning82
CUDA57
Running51
Photography41
Software Engineering35
Machine Learning34
Python34
Racing33
Movie32
Statistics31
Linux30
Park30
China29
Docker26
Bird25
See All >>
Lei Mao's Log Book

© 2017-2025 Lei Mao  Powered by Hexo & Icarus
Site UV:  Site PV:

×