cc_staff
30
edits
(→Selected references: uncomment the french reference) |
No edit summary |
||
Line 1: | Line 1: | ||
== A Primer on Parallel Programming == | == A Primer on Parallel Programming == | ||
{{quote|To pull a bigger wagon it is easier to add more oxen than to find (or build) a bigger ox.|Gropp, Lusk & Skjellum|Using MPI}} | {{quote|To pull a bigger wagon it is easier to add more oxen than to find (or build) a bigger ox.|Gropp, Lusk & Skjellum|Using MPI}} | ||
To build a house as quickly as possible, we do not look for the fastest person to do all the work but instead we hire many people and spread the work among them so that tasks are performed at the same time --- "in parallel". Computational problems are conceptually similar. Since there is a limit to how fast a single machine can perform, we attempt to divide up the computational problem and assign work to be completed in parallel to multiple computers. | |||
The most significant concept to master in designing and building parallel applications is ''communication''. Complexity arises due to communication requirements. In order for multiple workers to accomplish a task in parallel, they need to be able to communicate with one another. In the context of software, we have many processes each working on part of a solution, needing values that were computed---or are yet to be computed!---by other processes. | The most significant concept to master in designing and building parallel applications is ''communication''. Complexity arises due to communication requirements. In order for multiple workers to accomplish a task in parallel, they need to be able to communicate with one another. In the context of software, we have many processes each working on part of a solution, needing values that were computed---or are yet to be computed!---by other processes. | ||
Line 8: | Line 7: | ||
There are two major models of computational parallelism: shared memory, and distributed memory. | There are two major models of computational parallelism: shared memory, and distributed memory. | ||
In shared memory parallelism (commonly and casually abbreviated SMP) all | In shared memory parallelism (commonly and casually abbreviated SMP), all processors see the same memory image, or to put it another way, all memory is globally addressable and all the processes can ultimately access it. Communication between processes on an SMP machine is implicit --- any process can read and write values to memory that can be subsequently accessed an manipulated directly by others. The challenge in writing these kinds of programs is data consistency: one should take care to ensure data is not modified by more than one process at a time. | ||
[[Image:Smp.png|frame|center|'''Figure 1''': ''A conceptual picture of a shared memory architecture'']] | [[Image:Smp.png|frame|center|'''Figure 1''': ''A conceptual picture of a shared memory architecture'']] | ||
Distributed memory parallelism is equivalent to a collection of workstations linked by a dedicated network for communication: a cluster. In this model, processes each have their own private memory, and may run on physically distinct machines. When processes need to communicate, they do so by sending messages. A process typically invokes a function to send data and the destination process invokes a function to receive it. A major challenge in distributed memory programming is how to minimize communication overhead. Networks, even the fastest dedicated hardware interconnects, transmit data orders of magnitude slower than within a single machine. Memory access times are typically measured in ones to hundreds of nanoseconds, while network latency is typically expressed in microseconds. | Distributed memory parallelism is equivalent to a collection of workstations linked by a dedicated network for communication: a cluster. In this model, processes each have their own private memory, and may run on physically distinct machines. When processes need to communicate, they do so by sending ''messages''. A process typically invokes a function to send data and the destination process invokes a function to receive it. A major challenge in distributed memory programming is how to minimize communication overhead. Networks, even the fastest dedicated hardware interconnects, transmit data orders of magnitude slower than within a single machine. Memory access times are typically measured in ones to hundreds of nanoseconds, while network latency is typically expressed in microseconds. | ||
[[Image:Cluster.png|frame|center|'''Figure 2''': ''A conceptual picture of a cluster architecture'']] | [[Image:Cluster.png|frame|center|'''Figure 2''': ''A conceptual picture of a cluster architecture'']] | ||
Line 20: | Line 19: | ||
== What is MPI? == | == What is MPI? == | ||
The Message Passing Interface (MPI) is, strictly speaking, a ''standard'' describing a set of subroutines, functions, objects, ''etc.'', with which one can write parallel programs. Many different ''implementations'' of the standard have been produced, such as Open MPI, MPICH, and MVAPICH. The standard describes how MPI should be called from Fortran, C, and C++ languages, but unofficial "bindings" can be found for several other languages. | The Message Passing Interface (MPI) is, strictly speaking, a ''standard'' describing a set of subroutines, functions, objects, ''etc.'', with which one can write parallel programs in a distributed memory environment. Many different ''implementations'' of the standard have been produced, such as Open MPI, MPICH, and MVAPICH. The standard describes how MPI should be called from Fortran, C, and C++ languages, but unofficial "bindings" can be found for several other languages. | ||
MPI is an open, non-proprietary standard so an MPI program can easily be ported to many different computers. Applications that use it can be run on a large number of processors at once, often with good efficiency (called "scalability"). And because memory is local to each process some aspects of debugging are simplified --- it isn't possible for one process to interfere with the memory of another, and if a program generates a segmentation fault the resulting core file can be processed by standard serial debugging tools. However, due to the need to manage communication and synchronization explicitly, MPI programs may appear more complex than programs written with tools that support implicit communication. Furthermore, in designing an MPI program one should take care to minimize communication overhead in order that it not overwhelm the speed-up gained from parallel computation. | MPI is an open, non-proprietary standard so an MPI program can easily be ported to many different computers. Applications that use it can be run on a large number of processors at once, often with good efficiency (called "scalability"). And because memory is local to each process some aspects of debugging are simplified --- it isn't possible for one process to interfere with the memory of another, and if a program generates a segmentation fault the resulting core file can be processed by standard serial debugging tools. However, due to the need to manage communication and synchronization explicitly, MPI programs may appear more complex than programs written with tools that support implicit communication. Furthermore, in designing an MPI program one should take care to minimize communication overhead in order that it not overwhelm the speed-up gained from parallel computation. |