[Beowulf] Performance characterising a HPC application

stephen mulcahy smulcahy at aplpi.com
Wed Mar 21 05:42:20 PDT 2007


Hi,

Mark Hahn wrote:
> well, if the node is compute-bound, nearly all time will be user-time.
> if interconnect-bound, much time will be system or idle.  if system time
> dominates, then cpu or memory is too slow.  if there is idle time, you
> bottleneck is probably latency (perhaps network, but possibly also of
> whoever you're communicating with - compute node or fileserver.)

Thanks - its starting to look like latency is the issue here alright.
There is plenty of idle time on the processors. Increasing the frame
size to jumbo frames resulted in an overall speed-up of the model (about
30%) suggesting that the total number of packets being sent was reduced,
suggesting that packet latency is the current bottleneck.

>>>> 4. headnode bound.
>>>
>>> do you mean for NFS traffic?
>>
>> More in terms of managing the responses from the compute nodes.
> 
> just job start/completes?  that's normally pretty trivial, though some
> queueing systems make a complete hash of it...

I don't have a good understanding of the model code itself but it seems
to periodically send data to/from the head-node, persumably for
aggregation and storage ... I guess I wasn't sure if the head-node's
abilities to process this data was a bottleneck, but since there is
plenty of idle time and spare memory on that system too - it seems to be
coming back to latency.

> if the net is a bandwidth bottleneck, then you'd see lots of back-to-back
> packets, adding up to near wire-speed.  if latency is the issue, you'll see
> relatively long delays between request and response (in NFS, for instance).
> my real point is simply that tcpdump allows you to see the unadorned truth
> about what's going on.  obviously, tcpdump will let you see the rate and
> scale of your flows, and between which nodes...

My concern is that I won't see the wood for the trees if I'm looking at
tcpdump data - I don't mind a little adorning of my truth if possible :)
But I might give it a shot now that I have some ideas about what I'm
looking for. Do you eyeball raw tcpdump data or use wireshark to browse it?

> well, maybe.  it's a bit jump from 1x Gb to IB or 10GE - I wish it were
> easier to advocate Myri 2G as an intermediate step, since I actually don't
> see a lot of apps showing signs of dissatisfaction with ~250 MB/s
> interconnect - and IB/10GE don't have much advantage, if any, in latency.

How does Myrinet compare price-wise to IB/10GE? How does it compare in
terms of reliability?

> http://www.sharcnet.ca/~hahn/m-g.C is a benchmark I'm working on.  it's
> mainly set up to just probe bw and latency for every pair of nodes in a
> cluster (obviously diagnostic).  I have some simple scripts to turn the
> results into some decent images.  it's obviously a work in progress, but
> has some nice properties.  I'm thinking of collecting at least a low-res
> histogram for each measure, rather than just min/avg/max, since the
> lat/bw distibutions might be quite interesting.

Kewl - sounds nice and straightforward - I'm happy to give this a shot
if you want. I'll be onsite again tomorrow.

> I'm guessing you're simply bandwidth-limited, though it's unclear
> whether this is a simple bottleneck at the server, or affects "basal"
> intra-node communication as well.

The peaks of bandwidth usage are too small to suggest that it's
bandwidth limited, I'm still leaning towards latency.

> I think that's a reasonably good switch.  one interesting thing about it
> is that it supports up to 2 10G ports.  if it turns out that your nodes
> are frequently waiting on your server, adding a 1G module, XFP and NIC
> might be a very nice tune-up.  that assumes that the server can _do_
> something at much greater than 1x Gb speeds, of course!

Interesting suggestion - I had a bit of hair-pulling getting it to
smoothly handle jumbo frames though - I'm wondering how much hassle a
10G module would be :)

I'll do some research on this though.

Thanks again for your comments - most helpful.

-stephen

-- 
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
   GMIT, Dublin Rd, Galway, Ireland.      http://www.aplpi.com



More information about the Beowulf mailing list