[Beowulf] using two separate networks for different data streams

William Gropp gropp at mcs.anl.gov
Sat Jan 28 12:01:03 PST 2006

At 06:28 PM 1/27/2006, Dan Stromberg wrote:
>On Fri, 2006-01-27 at 19:57 +0100, Daniel Pfenniger wrote:
> >
> > Ricardo Reis wrote:
> > >
> > > First, Hi all and thanks for your answers. Were truly useful. Which
> > > brings me to...
> > >
> > > On Fri, 27 Jan 2006, Mark Hahn wrote:
> > >
> > >> I wonder whether anyone has critically evaluated whether this is
> > >> important.
> > >> cluster people I talk to like to say fuzzy things like "separate 
> networks
> > >> make the cluster breathe better".
> > >>
> > >> as much as I admire car analogies, I observe that when apps are 
> doing IO,
> > >> they tend not to be doing MPI.  if your workload is like that, bonding
> > >> rather than partitioning would actually improve performance.  I wonder
> > >> whether the partitioning approach might actual reflect other 
> constraints,
> > >> such as using half-duplex hubs, or low-bisection networks.
> >
> > The network for MPI should in many cases have low latency, so is expensive
> > (Myrinet, InfiniBand, etc.) in regards of Ethernet.  The I/O, NFS and
> > system network does not need low latency, and so for bargain cost can be
> > added, with the additional ground that it provides a control network to
> > tweak the nodes remotely when the expensive low latency network is down.
>That leads to a question for the compute cluster we're currently
>planning to buy here at UCI:
>Is there a way of characterizing in what proportion a given application
>relies on OpenMP, and how much the application depends on MPI (and hence
>MPI network latency) - other than speaking with application developers
>to get their intuitive feel, that is?  :)
>We're looking to buy a Gigabit Ethernet network for the MPI on this, but
>if that's obscenely high latency, and the primary application the
>cluster's being purchased for is heavily dependent on MPI, then we might
>want to be ignoring the GigE and going for something else.
>Any thoughts?

If you can get your users to relink (but not recompile) their MPI 
applications, there are a number of tools that you can use to understand 
the communication needs of those applications.  For example, FPMPI2 
(www.mcs.anl.gov/fpmpi) collects summary information about each MPI 
communication routine and separates the information into messages of 
various sizes; this lets you see how much time you are spending on short 
(latency-sensitive) messages.  There are more sophisticated tools that can 
allow you to estimate the performance of those applications under changes 
in the latency and bandwidth of the MPI implementation.


>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit 

William Gropp

More information about the Beowulf mailing list