[Beowulf] Parallel Programming Question

Fri Jun 26 23:30:20 PDT 2009

amjad ali wrote:
> Hello all,
> 
> In an mpi parallel code which of the following two is a better way:
> 
> 1)      Read the input data from input data files only by the master process
> and then broadcast it other processes.
> 
> 2)      All the processes read the input data directly from input data files
> (no need of broadcast from the master process). Is it possible?.

Both.... it depends on the details of course.  How big are the input files?
Does each client need them all, or just their fraction?  If the clients read
from the input files are they local to the clients or being read from a shared
file system?

What does your network look like?

Keep in mind that when you say broadcast that many (not all) MPI
implementations do not do a true network layer broadcast... and that in most
situations network uplinks are distinct from the downlinks (except for the
ACKs).

If all clients need all input files you can achieve good performance by either
using a bit torrent approach (send 1/N of the file to each of N clients then
have them re-share it), or even just a simple chain.  Head -> node A -> node B
-> node C.  This works better than you might think since Node A can start
uploading immediately and the upload bandwidth doesn't compete with the
download bandwidth (well not much usually).

For the typical case a MPI broadcast of 1GB because 8 nodes need 128MB
wouldn't be worth it.  Instead just send 128MB to each client with MPI_Send.
In general I see a higher percentage of peak bandwidth with MPI than I do with
NFS, but NFS can be tuned to be a reasonably high fraction of wirespeed as well.

Keep in mind that it's not hard to become disk limited on the head node, you
might want to take a look at how you are reading the files and the bandwidth
available before you go optimizing the network layer.