Learn to Understand a Modern Graphic Card

01-26-201901-26-2019 blog 8 minutes read (About 1217 words) visits

Introduction

Since I have limited knowledge about computer hardware and system, as a new NVIDIA employee, I would like to catch up quickly. So the first thing I need to learn is graphic cards, although I have been using it during my daily work without totally understanding how it works. The bottom line is to at least understand the specifications of a graphic card, and know what you can do with such a graphic card.

This blog post might contain stupid errors since I am also learning. So please correct me if I made any mistakes.

Graphic Cards

NVIDIA RTX 2080 Ti and AMD VEGA 64, which are currently the best consumer-focused graphic cards from NVIDIA and AMD gaming platforms respectively, were used for the analysis and comparison in this blog post.

Specifications

The specifications were pulled from NVIDIA and AMD official websites, and TechPowerUp.

Specs	NVIDIA RTX 2080 Ti	AMD VEGA 64
Architecture	Turing	Vega
Release Date	9/20/2018	8/7/2017
Price	~$1200	~$500
Cores / Processors	4352	4096
Base Clock	1350 MHz	1247 MHz
Boost Clock	1635(OC) MHz	1546 MHz
Process Size	12 nm	14 nm
Memory Type	GDDR6	HBM2
Memory Interface	352-bit	2048-bit
Memory Bandwidth	616 GB/s	484 GB/s
Memory Size	11 GB	8 GB
Texture Mapping Units	272	256
Render Output Processors	88	64
Tensor Cores	544	None
Ray Tracing Cores	68	None
Single Precision Performance	13.4 TFLOPS	12.5 TFLOPS
Texture Rate	420.2 GT/s	393.2 GT/s
Pixel Rate	136.0 GP/s	98.30 GP/s

GPU Engines

CUDA Cores / Stream Processors

CUDA (Compute Unified Device Architecture) cores of NVIDIA GPUs, which corresponds to “Stream Processors” of AMD GPUs, are the processing unit of GPU. Multiple CUDA cores contribute to the parallel processing of a task on GPU. For the GPUs from the same generation or architecture, more CUDA cores usually means higher computation performance. But this relationship does not hold between different GPU generations or architectures, due to the internal implementation of CUDA cores can be different. For the same reason, the number of CUDA cores on NVIDIA GPU is also not comparable to the number of stream processors on AMD GPU.

Base Clock / Boost Clock

The concept of base clock, or base frequency, is similar to CPU frequency. The higher the frequency is, the faster the GPU process the task. Similarly, boost clock is similar to the turbo frequency on Intel CPUs. Basically, when GPU knows there is a large task coming, it will automatically increase to a higher frequency in order to process the task faster.

Process Size

When we look at CPUs, we find that the process size of CPUs becomes smaller and smaller as new generations of CPUs were born. Smaller process size means that more transistors could fit in a given space. While the basic functionality preserves, the energy consumption, and production cost also drop. It should be noted that we ignored the quantum tunnelling effect in this discussion.

GPU Memory

GPU memory is also a critical part of a graphic card, because GPU directly communicates with it.

Memory Type

NVIDIA uses GDDR6 memory for RTX 2080 Ti, while AMD uses HBM2 memory for VEGA 64.

GDDR stands for “Graphics Double Data Rate”, and HBM stands for “High Bandwidth Memory”. GDDR is usually cheaper and easier to manufacture than HBM, while HBM has, of course, higher maximum bandwidth and lower power consumption. One GDDR6 chip usually uses 32-bit bus width. One HBM2 stack consists of at most 8 stacked DRAM dies. Each DRAM die has 128-bit bus width.

It should be noted that NVIDIA does use HBM2 memory in its high-end data center cards such as Tesla V100.

Memory Interface

Memory interface (memory bus) essentially determines how many memory chips that could connect to the GPU. If the memory interface is 352-bit, and each memory chip has an interface of 32-bit. This means that this GPU could connect to at most 11 memory chips. So I could infer that RTX 2080 Ti uses 11 1-GB GDDR6 memory chips.

Memory Bandwidth

While HBM does provide much higher maximum bandwidth (bandwidth ceiling) compared to GDDR, we found the actual memory bandwidth per GB of memory of AMD VEGA 64 (60.5/s) is not much better than that of RTX 2080 Ti (56/s). So I think using expensive HBM2 memory on AMD VEGA 64 might be a over-kill, and the HBM2 bandwidth really depends on its internal implementation.

Other Modules

Texture Mapping Units

A texture mapping units (TMUs) is able to rotate, resize, and distort a bitmap image (performing texture sampling). It is reasonable to assume that the card with more TMUs will be faster at processing texture information.

Render Output Processors

The render output processors (ROPs), also known as raster operation processors are responsible for writing pixel data to memory. The speed at which this is done is known as the fill rate. While the job of the ROPs is important, it is not really a performance bottleneck as much as it once was, and is not used as a relative performance indicator to good effect at this time.

Tensor Cores

Tensor cores are basically programmable matrix-multiply-and-accumulate units that accelerate deep learning training and inference, invented by NVIDIA, providing up to 500 trillion tensor operations a second. Tensor cores used to be only available on NVIDIA high-end Tesla V100 and Titan V. However, in Turing architecture, all of the GeForce RTX 20 series graphic cards have tensor cores, even including RTX 2060.

AMD graphic cards do not have such or similar modules, to the best of my knowledge.

Ray Tracing Cores

The most exciting feature of NVIDIA RTX 20 series graphic cards is ray tracing. The ray-tracing cores are of course used to accelerate the real-time ray tracing algorithm computation.

Evaluation Metrics

Single Precision Performance

Single precision performance is an indication of how fast a graphic card could do single precision (float 32) operations. Normally when I use TensorFlow, the datatypes I used for tensors are often single precision.

Texture Rate

Texture rate is the maximum number of texture map elements that can be processed per second. Higher the texture rate, the faster the game renders displays demanding games fluently. I think this is an indication of whether the graphic card could support high refresh rate monitors or not.

Pixel Rate

Pixel rate is the maximum amount of pixels the GPU could possibly write to the local memory in one second. Higher the pixel rate, the higher is the screen resolution the GPU could support. I think this is an indication of whether the graphic card could support high-resolution monitors or not.

References

Learn to Understand a Modern Graphic Card

https://leimao.github.io/blog/Learn-Graphic-Card/

Author

Lei Mao

Posted on

01-26-2019

Updated on

01-26-2019

Licensed under

NVIDIA,

Graphic Card,

GPU

PaypalBuy me a coffee

Learn to Understand a Modern Graphic Card

Introduction

Graphic Cards

Specifications

GPU Engines

CUDA Cores / Stream Processors

Base Clock / Boost Clock

Process Size

GPU Memory

Memory Type

Memory Interface

Memory Bandwidth

Other Modules

Texture Mapping Units

Render Output Processors

Tensor Cores

Ray Tracing Cores

Evaluation Metrics

Single Precision Performance

Texture Rate

Pixel Rate

References

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

Advertisement

Catalogue