Amdahl's Law VS Gustafson's Law

Introduction

Computer applications can generally be divided into strong scaling and weak scaling applications. They could be accelerated by parallel computing to different extent. The relationships between the strong scaling and weak scaling applications and how much they could be accelerated by parallel computing are described by Amdahl’s law and Gustafson’s law, respectively.

In this blog post, I would like to discuss Amdahl’s law and strong scaling, and Gustafson’s law and weak scaling.

Amdahl’s Law and Strong Scaling

Strong scaling is a measure of how, for a fixed overall problem size, the time to solution decreases as more processors are added to a system. This also suggests reduced workload per processor. Strong scaling is usually described by Amdahl’s law.

Amdahl’s law can be formulated as

$$
S = \frac{1}{(1-p) + \frac{p}{N}}
$$

where

$S$ latency is the theoretical speedup of the execution of the whole task;
$N$ is the speedup of the part of the task that benefits from improved system resources, usually it is the number of processors;
$p$ is the proportion of execution time that the part benefiting from improved resources originally occupied.

As $p \rightarrow 1$, the function becomes more and more linear, i.e., $S \rightarrow N$. As $N \rightarrow \infty$, we got the upper bound of the speedup, $S = \frac{1}{1-p}$.

Strong scaling is mostly used for long-running math-bound applications to find a parallel setup which results in a reasonable runtime with moderate resource costs. Such math-bound application could be computing the result of two large matrix multiplication.

Gustafson’s Law and Weak Scaling

Weak scaling is a measure of how the time to solution changes as more processors are added to a system with a fixed problem size per processor; i.e., where the overall problem size increases as the number of processors is increased. This also suggests constant workload per processor. Weak scaling is usually described by Gustafson’s law.

Gustafson’s law can be formulated as

$$
\begin{align}
S &= \frac{(1-p) + Np}{(1-p) + p} \\
&= (1-p) + Np \\
\end{align}
$$

where

$S$ latency is the theoretical speedup of the execution of the whole task;
$N$ is the speedup of the part of the task that benefits from improved system resources, usually it is the number of processors;
$p$ is the proportion of execution time that the part benefiting from improved resources originally occupied.

The function is linear, and the slope is $p$. As $N \rightarrow \infty$, the intercept becomes less important, i.e., $S = Np$.

Weak scaling is mostly used for large memory-bound applications where the required memory scales as the overall problem scales. Such memory-bound application could be using more molecules in molecular dynamics simulation, using more particles in particle filtering, use larger population in evolutionary search.

Caveats

The terms strong and weak themselves do not give any information whatsoever on how well an application actually scales.

References

Author

Lei Mao

Posted on

03-23-2022

Updated on

03-23-2022

Licensed under


Comments