TensorRT Static Plugin VS Dynamic Plugin 06-05-2025 06-05-2025 blog 13 minutes read (About 1941 words)Managing The Lifetime and Registration of TensorRT Plugins Deep Learning, CPP, TensorRT Read More
Computing Hessian Matrix Via Automatic Differentiation 05-22-2025 05-22-2025 blog 18 minutes read (About 2664 words)Computing Higher-Order Derivatives Using Automatic Differentiation Deep Learning, Mathematics, Neural Networks, Calculus, Automatic Differentiation Read More
Optimal Brain Surgeon 05-14-2025 05-14-2025 blog 40 minutes read (About 6047 words)Derivation and Extension of The Classical Optimal Brain Surgeon Algorithm Deep Learning, Mathematics, Neural Networks, Calculus, Neural Network Pruning Read More
ICML 2025 Area Chair Experience 05-03-2025 05-03-2025 blog 8 minutes read (About 1181 words)First Time Serving as ICML Area Chair Machine Learning, Deep Learning, ICML, Conference Read More
TensorRT Implicit Weight Quantization 04-29-2025 04-29-2025 blog 8 minutes read (About 1265 words)TensorRT Implicit Weight Quantization Caveats and Tricks Deep Learning, Mathematics, Quantization, TensorRT Read More
Automatic Differentiation Revisited 04-12-2025 04-12-2025 blog 15 minutes read (About 2198 words)Jacobian Matrix, De Novo Chain Rule Expression, Jacobian-Vector Product, Vector-Jacobian Product Deep Learning, Mathematics, Linear Algebra, Calculus, Automatic Differentiation Read More
Grouped Query Attention Performance Theoretical Analysis 02-03-2025 03-02-2025 blog 11 minutes read (About 1612 words)Sharing Key and Value Tensors for a Group of Query Tensors to Mitigate Transformer Attention Layer Performance Bottleneck Deep Learning, Neural Network, Transformer, Computer Architecture, Performance Optimization, Large Language Model Read More
Transformer Vanilla Attention Performance Theoretical Analysis 01-27-2025 03-02-2025 blog 9 minutes read (About 1275 words)Performance Bottleneck for Serving Transformer Models Deep Learning, Neural Network, Transformer, Computer Architecture, Performance Optimization, Large Language Model Read More
AWQ: Activation-Aware Weight Quantization 01-01-2025 01-01-2025 blog 18 minutes read (About 2738 words)Same Performance as Group-Wise Weight-Only Quantization But with Better Accuracy Deep Learning, Mathematics, Quantization, CUDA, Accelerated Computing Read More
NeurIPS 2024 Area Chair Experience 12-26-2024 12-26-2024 blog 9 minutes read (About 1389 words)First Time Serving as NeurIPS Area Chair Deep Learning, Conference, NeurIPS Read More
Neural Radiance Fields 07-24-2024 07-24-2024 blog 6 minutes read (About 826 words)Scene Representation and Differentiable Rendering with Neural Radiance Fields Deep Learning, Computer Vision, Neural Network, NeRF, Neural Radiance Fields Read More
LoRA and LoRAPrune 07-11-2024 07-11-2024 blog 11 minutes read (About 1664 words)Fine-Tuning and Pruning of Large Language Models Using Low-Rank Adaptation Deep Learning, Neural Network, Neural Network Pruning, LoRA, LoRAPrune Read More