PyTorch Eager Mode Quantization TensorRT Acceleration 05-24-2024 05-24-2024 blog 7 minutes read (About 1051 words)TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models Deep Learning, Python, Inference, TensorRT, PyTorch, Quantization, NVIDIA, Accelerated Computing, GPU Read More
TensorRT Python Inference 05-18-2024 05-18-2024 blog 12 minutes read (About 1843 words)TensorRT Python Inference Example Deep Learning, Python, Inference, TensorRT, NVIDIA, GPU Read More
Transformer Autoregressive Inference Optimization 04-06-2023 04-06-2023 article 27 minutes read (About 4084 words)Principles for Faster Transformer Inference Deep Learning, Inference, Natural Language Processing, Transformer, Optimization, Accelerated Computing Read More
ONNX Runtime JavaScript 11-28-2022 11-28-2022 blog 16 minutes read (About 2458 words)Front-End Neural Network Inference Artificial Intelligence, Deep Learning, Inference, ONNX, Neural Networks Read More
Simple Inference Server 12-30-2020 12-30-2020 project 7 minutes read (About 979 words)Running Machine Learning Inference as Service from Scratch Machine Learning, Deep Learning, Python, Inference Read More