Scalability/en: Difference between revisions

Updating to match new version of source page
(Updating to match new version of source page)
 
(Updating to match new version of source page)
Line 8: Line 8:
Secondly, the parallelization of the program normally requires a certain amount of communication and synchronization among the parallel processes and the cost of this "parallel overhead" will increase with the number of processes working together, typically as a power of the number of cores, <math>T_c \propto n^\alpha</math> where <math>\alpha > 1</math>.  If we now suppose that the scientific part of the program's run time is divided equally among the number of cores apart from a residual serial part, so <math>T_s = A + B/n</math>, the total duration of the program <math>T = T_s + T_c = A + B/n + C n^\alpha</math> (with <math>A</math>, <math>B</math> and <math>C</math> positive real numbers whose value depends on the particular cluster, program and test problem) will ultimately be dominated by this final parallel overhead factor as <math>n\to\infty</math>. In the case where <math>A</math> and <math>B</math> are much larger than <math>C</math>, when we plot the curve of the run time versus the number of CPU cores we will obtain something that looks like the accompanying figure.  
Secondly, the parallelization of the program normally requires a certain amount of communication and synchronization among the parallel processes and the cost of this "parallel overhead" will increase with the number of processes working together, typically as a power of the number of cores, <math>T_c \propto n^\alpha</math> where <math>\alpha > 1</math>.  If we now suppose that the scientific part of the program's run time is divided equally among the number of cores apart from a residual serial part, so <math>T_s = A + B/n</math>, the total duration of the program <math>T = T_s + T_c = A + B/n + C n^\alpha</math> (with <math>A</math>, <math>B</math> and <math>C</math> positive real numbers whose value depends on the particular cluster, program and test problem) will ultimately be dominated by this final parallel overhead factor as <math>n\to\infty</math>. In the case where <math>A</math> and <math>B</math> are much larger than <math>C</math>, when we plot the curve of the run time versus the number of CPU cores we will obtain something that looks like the accompanying figure.  
[[File:Scaling plot.png|thumb]]
[[File:Scaling plot.png|thumb]]
The most important point to note about this curve is that while for smaller numbers of cores the run time falls, at a certain number of cores a minimum is reached (for <math>n\approx 22</math>), and after that the program duration starts to ''increase'' as we add more processors: too many cooks spoil the broth, according to the proverb. When you are using a parallel program, it's crucial to carry out such a scalability analysis in order to know, for the nature and size of problem you're working on and the cluster you're using, what is the optimal choice of the number of CPU cores: 4, 128, 1024, or some other figure?
The most important point to note about this curve is that while for smaller numbers of cores the run time falls, at a certain number of cores a minimum is reached (for <math>n\approx 22</math>), and after that the program duration starts to ''increase'' as we add more processes: too many cooks spoil the broth, according to the proverb. When you are using a parallel program, it's crucial to carry out such a scalability analysis in order to know, for the nature and size of problem you're working on and the cluster you're using, what is the optimal choice of the number of CPU cores: 4, 128, 1024, or some other figure?


It is up to you to choose a test problem for the scalability analysis. You want a problem that is relatively small so that these tests can be carried out quickly, but not so small as to be completely unrepresentative of a production run. A test problem that requires 30 to 60 minutes to finish on one or two cores is probably a good choice. One which runs in under ten minutes is almost certainly too short to be of value. In certain contexts, such as an analysis of the program's behaviour under weak scaling (see below), you also want to have a test problem whose size can be easily increased, ideally in a fairly continuous manner.   
It is up to you to choose a test problem for the scalability analysis. You want a problem that is relatively small so that these tests can be carried out quickly, but not so small as to be completely unrepresentative of a production run. A test problem that requires 30 to 60 minutes to finish on one or two cores is probably a good choice. One which runs in under ten minutes is almost certainly too short to be of value. In certain contexts, such as an analysis of the program's behaviour under weak scaling (see below), you also want to have a test problem whose size can be easily increased, ideally in a fairly continuous manner.   
38,897

edits