Translations:Scalability/4/en

Secondly, the parallelization of the program normally requires a certain amount of communication and synchronization among the parallel processes and the cost of this "parallel overhead" will increase with the number of processes working together, typically as a power of the number of cores, $T_{c}\propto n^{\alpha }$ where $\alpha >1$ . If we now suppose that the scientific part of the program's run time is divided equally among the number of cores apart from a residual serial part, so $T_{s}=A+B/n$ , the total duration of the program $T=T_{s}+T_{c}=A+B/n+Cn^{\alpha }$ (with $A$ , $B$ and $C$ positive real numbers whose value depends on the particular cluster, program and test problem) will ultimately be dominated by this final parallel overhead factor as $n\to \infty$ . In the case where $A$ and $B$ are much larger than $C$ , when we plot the curve of the run time versus the number of CPU cores we will obtain something that looks like the accompanying figure.

The most important point to note about this curve is that while for smaller numbers of cores the run time falls, at a certain number of cores a minimum is reached (for $n\approx 22$ ), and after that the program duration starts to increase as we add more processes: too many cooks spoil the broth, according to the proverb. When you are using a parallel program, it's crucial to carry out such a scalability analysis in order to know, for the nature and size of problem you're working on and the cluster you're using, what is the optimal choice of the number of CPU cores: 4, 128, 1024, or some other figure?

Translations:Scalability/4/en

Navigation menu

Search