MPI: Difference between revisions

→‎Communication: more copyediting
(→‎Communication: copyediting)
(→‎Communication: more copyediting)
Line 219: Line 219:
MPI provides a large number of functions for sending and receiving data of almost any composition in a variety of communication patterns (one-to-one, one-to-many, many-to-one, and many-to-many). But the simplest to understand are the functions send a sequence of one or more instances of an atomic data type from one process to one other process, <tt>MPI_Send</tt> and <tt>MPI_Recv</tt>.
MPI provides a large number of functions for sending and receiving data of almost any composition in a variety of communication patterns (one-to-one, one-to-many, many-to-one, and many-to-many). But the simplest to understand are the functions send a sequence of one or more instances of an atomic data type from one process to one other process, <tt>MPI_Send</tt> and <tt>MPI_Recv</tt>.


A process sends data by calling the <tt>MPI_Send</tt> function. Referring to the following function prototypes, <tt>MPI_Send</tt> can be summarized as sending '''count''' contiguous instances of '''datatype''' to the process with the specified '''rank''', and the data is in the buffer pointed to by '''message''''''Tag''' is a programmer-specified identifier that becomes associated with the message, and can be used to organize the communication process (for example, to distinguish two distinct streams of interleaved data). Our examples do not require this, so we will pass in the value 0 for the '''tag'''. '''Comm''' is the communicator described above, and we will continue to use <tt>MPI_COMM_WORLD</tt>.
A process sends data by calling the <tt>MPI_Send</tt> function. Referring to the following function prototypes, <tt>MPI_Send</tt> can be summarized as sending <tt>count</tt> contiguous instances of <tt>datatype</tt> to the process with the specified <tt>rank</tt>, and the data is in the buffer pointed to by <tt>message</tt><tt>Tag</tt> is a programmer-specified identifier that becomes associated with the message, and can be used to organize the communication process (for example, to distinguish two distinct streams of interleaved data). Our examples do not require this, so we will pass in the value 0 for the <tt>tag</tt>. <tt>Comm</tt> is the communicator described above, and we will continue to use <tt>MPI_COMM_WORLD</tt>.


{| border="0" cellpadding="5" cellspacing="0" align="center"
{| border="0" cellpadding="5" cellspacing="0" align="center"
Line 247: Line 247:
|}
|}


Note that the '''datatype''' argument, specifying the type of data contained in the '''message''' buffer, is a variable. This is intended to provide a layer of compatibility between processes that could be running on architectures for which the native format for these types differs. It is possible to register new data types, but for this tutorial we will only use the pre-defined types provided by MPI. There is an MPI type pre-defined for all atomic data types in the source language (for C: <tt>MPI_CHAR</tt>, <tt>MPI_FLOAT</tt>, <tt>MPI_SHORT</tt>, <tt>MPI_INT</tt>, etc. and for Fortran: <tt>MPI_CHARACTER</tt>, <tt>MPI_INTEGER</tt>, <tt>MPI_REAL</tt>, etc.). YOu can find a full list of these types in the references provided below.
Note that the <tt>datatype</tt> argument, specifying the type of data contained in the <tt>message</tt> buffer, is a variable. This is intended to provide a layer of compatibility between processes that could be running on architectures for which the native format for these types differs. It is possible to register new data types, but for this tutorial we will only use the pre-defined types provided by MPI. There is an MPI type pre-defined for all atomic data types in the source language (for C: <tt>MPI_CHAR</tt>, <tt>MPI_FLOAT</tt>, <tt>MPI_SHORT</tt>, <tt>MPI_INT</tt>, etc. and for Fortran: <tt>MPI_CHARACTER</tt>, <tt>MPI_INTEGER</tt>, <tt>MPI_REAL</tt>, etc.). YOu can find a full list of these types in the references provided below.


<tt>MPI_Recv</tt> works in much the same way as <tt>MPI_Send</tt>. Referring to the function prototypes below, '''message''' is now a pointer to an allocated buffer of sufficient size to receive '''count''' contiguous instances of '''datatype''', which is received from process '''rank'''. <tt>MPI_Recv</tt> takes one additional argument, '''status''', which in C should be a reference to an allocated <tt>MPI_Status</tt> structure, and in Fortran an array of <tt>MPI_STATUS_SIZE</tt> integers. Upon return it will be filled in with information related to the received message. We will not make use of this argument in this tutorial, but it must be present.
<tt>MPI_Recv</tt> works in much the same way as <tt>MPI_Send</tt>. Referring to the function prototypes below, <tt>message</tt> is now a pointer to an allocated buffer of sufficient size to receive <tt>count</tt> instances of <tt>datatype</tt>, to be received from process <tt>rank</tt>. <tt>MPI_Recv</tt> takes one additional argument, <tt>status</tt>, which in C should be a reference to an allocated <tt>MPI_Status</tt> structure, and in Fortran an array of <tt>MPI_STATUS_SIZE</tt> integers. Upon return it will contain some information about the received message. We will not make use of it in this tutorial, but the argument must be present.


{| border="0" cellpadding="5" cellspacing="0" align="center"
{| border="0" cellpadding="5" cellspacing="0" align="center"
Line 278: Line 278:
|}
|}


The sending process must know the rank of the receiving process, and the receiving process must know the rank of the sending process. In our example, we need to do some arithmetic such that, given its rank, each process knows both where to send its data, <tt>[(rank + 1) % size]</tt>, and where to receive from <tt>[(rank + size) - 1) % size]</tt>. We'll now make the required modifications to our parallel "Hello, world!" program. As a first cut, we'll simply have each process first send its message, then receive what is being sent to it.
With this simple use of <tt>MPI_Send</tt> and <tt>MPI_Recv</tt> the sending process must know the rank of the receiving process, and the receiving process must know the rank of the sending process. In our example the following arithmetic is useful:
* <tt>(rank + 1) % size</tt> is the process to send to, and
* <tt>(rank + size - 1) % size</tt> is the process to receive from.
We can now make the required modifications to our parallel "Hello, world!" program.  


{| border="0" cellpadding="5" cellspacing="0" align="center"
{| border="0" cellpadding="5" cellspacing="0" align="center"
Line 303: Line 306:
   
   
     sendto = (rank + 1) % size;
     sendto = (rank + 1) % size;
     recvfrom = ((rank + size) - 1) % size;
     recvfrom = (rank + size - 1) % size;
   
   
     MPI_Send(outbuf, BUFMAX, MPI_CHAR, sendto, 0, MPI_COMM_WORLD);
     MPI_Send(outbuf, BUFMAX, MPI_CHAR, sendto, 0, MPI_COMM_WORLD);
Line 341: Line 344:
   
   
     sendto = mod((rank + 1), num_procs)
     sendto = mod((rank + 1), num_procs)
     recvfrom = mod(((rank + num_procs) - 1), num_procs)
     recvfrom = mod((rank + num_procs - 1), num_procs)
   
   
     call MPI_SEND(outbuf, BUFMAX, MPI_CHARACTER, sendto, 0, MPI_COMM_WORLD, ierr)
     call MPI_SEND(outbuf, BUFMAX, MPI_CHARACTER, sendto, 0, MPI_COMM_WORLD, ierr)
Line 354: Line 357:
|}
|}


Compile and run this program on 2, 4 and 8 processors. While it certainly seems to be working as intended, there is something we are overlooking. The MPI standard says nothing about whether sends are buffered---that is to say, we should not assume that <tt>MPI_Send</tt> returns immediately buffering our message. If the send isn't buffered, the code as written would deadlock with all processes performing its send and blocking as there are no receives to consume the message until after the send returns. Clearly there is buffering in the libraries on our systems as the code did not deadlock, however it is poor design to rely on this. You may find your code fails if used on a system in which there is no buffering provided by the library, and even where buffering is provided, the call will still block if the buffer fills up.
Compile and run this program on 2, 4 and 8 processors. While it certainly seems to be working as intended, there is a hidden problem here. The MPI standard does not ''guarantee'' that <tt>MPI_Send</tt> returns before the message has been delivered. Most implementations ''buffer'' the data from <tt>MPI_Send</tt> and return without waiting for it to be delivered. But if it were not buffered, the code we've written would deadlock: Each process would call <tt>MPI_Send</tt> and then wait for its neighbour process to call <tt>MPI_Recv</tt>. Since the neighbour would also be waiting at the <tt>MPI_Send</tt> stage, they would all wait forever. Clearly there ''is'' buffering in the libraries on our systems since the code did not deadlock, but it is poor design to rely on this. The code code could fail if used on a system in which there is no buffering provided by the library. Even where buffering is provided, the call might still block if the buffer fills up.


  [orc-login2 ~]$ mpicc -Wall phello2.c -o phello2
  [orc-login2 ~]$ mpicc -Wall phello2.c -o phello2
Bureaucrats, cc_docs_admin, cc_staff
2,879

edits