Perfetto GPU Flow Artifacts

02-20-202602-20-2026 blog 6 minutes read (About 952 words)

Understanding and Resolving Flow Artifacts in Perfetto GPU Profiling Traces

GPU,

Perfetto

CUDA Shared Memory Bank Conflict-Free Vectorized Access

02-13-202602-13-2026 blog 14 minutes read (About 2060 words)

Instruction-Level Phase Based Bank Conflict-Free Execution

CUDA,

NVIDIA,

Parallel Computing,

GPU

CUDA Rendezvous Stream

01-26-202601-26-2026 blog 11 minutes read (About 1690 words)

Simplifying Synchronization Complexities Using CUDA Rendezvous Streams

CUDA,

NVIDIA,

Parallel Computing,

GPU

NVIDIA NVML GPU Statistics

12-25-202512-25-2025 blog 15 minutes read (About 2214 words)

Mimicking nvidia-smi dmon Using NVIDIA NVML

CPP,

CUDA,

NVIDIA,

GPU,

NVML

Install NVIDIA RTX 5080

12-10-202512-10-2025 blog 5 minutes read (About 703 words)

Installing NVIDIA RTX 5080 on an Old Desktop

NVIDIA,

Ubuntu,

GPU

Setting Up Environment Variables In SSH Sessions Over TCP On Runpod

10-10-202510-10-2025 blog 12 minutes read (About 1785 words)

Fixing a Environment Variables Issue for Runpod

CUDA,

NVIDIA,

Docker,

GPU,

Cloud Computing,

Runpod,

IDE,

SSH

Setting Up Remote Development Using Custom Template On Runpod

10-08-202510-13-2025 blog 12 minutes read (About 1814 words)

Custom Remote Development Using GPUs on Runpod

CUDA,

NVIDIA,

Docker,

GPU,

Cloud Computing,

Runpod,

IDE,

SSH

CUDA Local Memory

03-19-202503-19-2025 blog 12 minutes read (About 1835 words)

Is Local Array Placed In Registers or In Local Memory?

CUDA,

GPU

CUDA Performance Hot VS Cold Measurement

03-12-202503-12-2025 blog 8 minutes read (About 1200 words)

Flushing GPU L2 Cache

CPP,

CUDA,

NVIDIA,

GPU,

Nsight Compute

NVIDIA GPU Compute Capability

01-02-202501-22-2026 blog 15 minutes read (About 2202 words)

A Table of NVIDIA GPUs and Their Compute Capabilities

CUDA,

NVIDIA,

GPU

SMPlayer GPU Acceleration

12-06-202412-07-2024 blog 2 minutes read (About 328 words)

Playing Videos with GPU Acceleration in SMPlayer

CUDA,

Linux,

GPU,

SMPlayer

PyTorch Eager Mode Quantization TensorRT Acceleration

05-24-202405-24-2024 blog 7 minutes read (About 1051 words)

TensorRT Acceleration for PyTorch Native Eager Mode Quantization Models

Deep Learning,

Python,

Inference,

Quantization,

Accelerated Computing,

NVIDIA,

TensorRT,

PyTorch,

GPU

Perfetto GPU Flow Artifacts

CUDA Shared Memory Bank Conflict-Free Vectorized Access

CUDA Rendezvous Stream

NVIDIA NVML GPU Statistics

Install NVIDIA RTX 5080

Setting Up Environment Variables In SSH Sessions Over TCP On Runpod

Setting Up Remote Development Using Custom Template On Runpod

CUDA Local Memory

CUDA Performance Hot VS Cold Measurement

NVIDIA GPU Compute Capability

SMPlayer GPU Acceleration

PyTorch Eager Mode Quantization TensorRT Acceleration

Advertisement

Categories

follow.it

Recents

Archives

Tags