Grouped Query Attention Performance Theoretical Analysis 02-03-2025 03-02-2025 blog 11 minutes read (About 1612 words)Sharing Key and Value Tensors for a Group of Query Tensors to Mitigate Transformer Attention Layer Performance Bottleneck Deep Learning, Neural Network, Transformer, Computer Architecture, Performance Optimization, Large Language Model Read More
Fix NVIDIA Driver After Ubuntu Unattended Upgrade 01-30-2025 01-30-2025 blog 2 minutes read (About 303 words)A Quick and Safe Log for Fixing NVIDIA Driver NVIDIA, Ubuntu, Driver Read More
Transformer Vanilla Attention Performance Theoretical Analysis 01-27-2025 03-02-2025 blog 9 minutes read (About 1275 words)Performance Bottleneck for Serving Transformer Models Deep Learning, Neural Network, Transformer, Computer Architecture, Performance Optimization, Large Language Model Read More
iPad Battery Health 01-22-2025 01-22-2025 blog 3 minutes read (About 498 words)Check iPad Battery Remaining Capacity Apple, iPad Read More
CS2 Mouse Fix 01-14-2025 01-14-2025 blog 4 minutes read (About 599 words)Making Mouse Working Again In Counter-Strike 2 Game, Counter-Strike, CS, CS2, Mouse, Monitor, Steam Read More
CuTe Tiled MMA 01-09-2025 10-19-2025 blog 30 minutes read (About 4482 words)Understanding CuTe Tiled MMA Using an Example Accelerated Computing, CUDA, CUTLASS, CuTe Read More
NVIDIA GPU Compute Capability 01-02-2025 03-21-2025 blog 15 minutes read (About 2230 words)A Table of NVIDIA GPUs and Their Compute Capabilities CUDA, NVIDIA, GPU Read More
AWQ: Activation-Aware Weight Quantization 01-01-2025 01-01-2025 blog 18 minutes read (About 2738 words)Same Performance as Group-Wise Weight-Only Quantization But with Better Accuracy Deep Learning, Mathematics, Quantization, Accelerated Computing, CUDA Read More
NeurIPS 2024 Area Chair Experience 12-26-2024 12-26-2024 blog 9 minutes read (About 1389 words)First Time Serving as NeurIPS Area Chair Deep Learning, NeurIPS, Conference Read More
C++ Compile-Time Type Map 12-22-2024 12-22-2025 blog 6 minutes read (About 921 words)C++ Select Types Based On Template Types CPP, CPP17, Metaprogramming Read More
Ubuntu 24.04 LTS GUI File Operation Slowness 12-14-2024 12-14-2024 blog a minute read (About 217 words)Ubuntu 24.04 LTS GUI Severe Issue Workaround Ubuntu Read More
cuBLAS GEMM API Usages for Column-Major and Row-Major Matrices 12-12-2024 12-12-2024 blog 7 minutes read (About 1012 words)Calling cuBLAS GEMM API Correctly Accelerated Computing, CUDA, cuBLAS Read More