[Beowulf] The move to gigabit - technical questions

Wed Mar 16 08:10:33 PST 2005

At 10:52 AM 3/16/2005 -0500, Robert G. Brown wrote:
>On Wed, 16 Mar 2005, Robert G. Brown wrote:
>
>> On Tue, 15 Mar 2005, Vincent Diepeveen wrote:
>> 
>> > At 05:41 PM 3/14/2005 -0500, Glen Gardner wrote:
>> > >Gigabit will be a little faster than 100Mbit on a small cluster, but
not 
>> > >a lot.
>> > 
>> > What is 'not a lot'.
>> > 
>> > I would guess it's factor 10 faster in bandwidth?
>
>I hate to reply to myself, but I should have noted that the below
>applies to BANDWIDTH, not latency, dominated communictions.  It was

Well Robert, it's obvious you understood me correct. I was talking about
bandwidth and i find a factor 8-9 times faster bandwidth of moving from
100mbit to 1gbit a *considerable* jump forward, especially because the
price for such network cards is one that everyone can afford.

For latency and even more bandwidth we all know there is highend network
cards like dolphin, quadrics, myri and hopefully also infiniband (i see at
the linux kernel list a lot of postings regarding infiniband, which are the
actual *brands* that sell a concrete card right now which can be bought
stand alone? If so url?).

For distributed shared memory (DSM) what supercomputers need, there is
quadrics. Is there any other highend network card where one can approach
the RAM of the cards directly? (like with the shmem Cray library quadrics
has and i can find them on their homepage).

Obviously for any latency issue one doesn't buy cheapo gigabit cards. So
the remaining interesting thing is what type of bandwidth it can give us.

At my 100mbit LANs i measured i could effectively put roughly 60 mbit
throughput to it (bidirectional, so 60 mbit in total of sends and receives).

A 8.5 to 9 times higher bandwidth would mean say roughly give 480 mbit
which is about 60MB/s. Obviously there will be theoretic testsetups getting
more. 

Not so interesting when discussing practical working of the LAN's in
operational systems.

Actually 60MB/s would be too little for my chess program.

lmbench is not so impressive as a benchmark. It doesn't measure TLB
trashing of main memory.

I've had past years so so many discussions with professors who do not
understand the difference between bandwidth latency that lmbench gives
versus their actual application that is just busy TLB trashing which it
doesn't measure very accurately.

Example is that at my dual k7's doing a single read of 8 bytes in a 400MB
buffer is eating up about 400 ns exactly on average using my own simplistic
testset. Such researches however work with like 60ns on paper as lmbench
gives it to them.

I feel the word latency has been overused in that respect. The word
'bandwidth' however is very clear in this context.

>implicit from Vincent's reply, but I should have made it explicit.  For
>lots of small packets gigabit's advantage probably won't be 10x, and
>this is another case where a higher-end network is indicated.  However,
>the latency probably won't change a lot with different switches or
>switch arrangements, either, except for the worse along paths with
>multiple switch hops in between.
>
>I should also have pointed out to the original poster that there are
>nice tools (e.g. netperf, netpipe, lmbench) that will help him analyze
>his raw network performance outside of a particular application that
>might well have poor "networking" performance for reasons that have
>nothing to do with the actual network.  There are also lots of articles
>out there both in the list archives, in Cluster World magazine back
>issues, in linux magazine back issues, and on various websites
>(including mine and brahma's) that can really help one understand just
>what ethernet is and how it works and what its numbers should be.  It is
>the most widely implemented and widely understood network, good, bad,
>and ugly features notwithstanding.
>
>   rgb
>
>> 
>> (Maybe, you don't get QUITE 100% of the raw clock advantage in all
>> applications on all hardware, Vincent;-).  However, for most
>> applications on most hardware you >>should<< get a signficant advantage
>> -- 80-95% of 10x, or 8-9.5x.  Not a just "a little".
>> 
>> A really, really cheap switch might have problems with bisection
>> bandwidth and chop this down for simultaneous flat-out bidirectional
>> data streams, but relatively few parallel applications engage in
>> flat-out bidirectional communications.  Even if it does, your problem is
>> more likely to be with resource contention (e.g. two hosts trying to
>> talk to a third at the same time) than it is with actual bandwidth
>> oversubscription.  This is what Vincent is suggesting that you look into
>> (or let us look into:-) below.
>> 
>> If your particular usage pattern does create resource contention, then
>> you might well need to either hand-optimize the pattern to avoid
>> saturating your cheap hardware, create a network with cheap components
>> that effectively breaks up the pathological communications pattern
>> (which it sounds like is what you actually did) or buy better hardware
>> (either better gigE switches or a "real" HPC network).
>> 
>> However you shouldn't really trash gigE itself -- it isn't at fault and
>> your results aren't typical.
>> 
>>     rgb
>> 
>> > 
>> > >I ended up using 5 cheap gigabit switches to make a gigabit
concentrator 
>> > >for my 12 node cluster.
>> > >It eliminated the tendency for the network to saturate under a heavy
load.
>> > 
>> > Very interesting, can you post a connection scheme and routing table?
>> > 
>> > >It also let me use gigabit network cards in my I/O node and controlling 
>> > >node with a small improvement in file I/O.
>> > 
>> > Streaming i/o or random access?
>> > 
>> > cheapo disk arrays get what is it, 400MB/s handsdown or so?
>> > 
>> > that's raid5 readspeed, plenty security at a raid5 array.
>> > 
>> > >The compute nodes remaind with 100 Mbit to conserve power. The setup 
>> > >works rather nicely.
>> > 
>> > what type of software do you run at it,
>> > embarrassingly parallel software?
>> > 
>> > Vincent
>> > 
>> > >Glen
>> > >
>> > >Vincent Diepeveen wrote:
>> > >
>> > >>Good evening,
>> > >>
>> > >>It's interesting to investigate what gigabit can do for small home
clusters.
>> > >>
>> > >>Any latency oriented approach is doomed to fail obviously at
gigabit. But
>> > >>they're cheap. For 40 euro i see several getting offered already.
>> > >>
>> > >>First important question is of course how much system time those
NIC's eat
>> > >>when fully loading their bandwidth.
>> > >>
>> > >>Example, i have an old dual k7 here with pci 2.2 (32 bits 33Mhz).
>> > >>Suppose i put a gigabit card in it.
>> > >>
>> > >>In say 6 messages a second i ship 8MB data at a time. Ship and send
in turn.
>> > >>
>> > >>So it ships a packet of 8MB, then receives a packet of 8MB.
>> > >>
>> > >>Other than the cost of the thread to store the packet to RAM, does
such a
>> > >>card in any way stop or block the cpu's which are 100% loaded with
>> > >>searching software (my chessprogram diep in this case)?
>> > >>
>> > >>What penalty other than that thread handling the message is there in
terms
>> > >>of system time reduction to the 2 processes searching?
>> > >>
>> > >>Oh btw, i assume that gigabit can handle 48MB/s user data a second?
>> > >>
>> > >>Vincent
>> > >>
>> > >>_______________________________________________
>> > >>Beowulf mailing list, Beowulf at beowulf.org
>> > >>To change your subscription (digest mode or unsubscribe) visit
>> > http://www.beowulf.org/mailman/listinfo/beowulf
>> > >>
>> > >>  
>> > >>
>> > >
>> > >-- 
>> > >Glen E. Gardner, Jr.
>> > >AA8C
>> > >AMSAT MEMBER 10593
>> > >Glen.Gardner at verizon.net
>> > >
>> > >
>> > >http://members.bellatlantic.net/~vze24qhw/index.html
>> > >
>> > >
>> > >
>> > >
>> > >
>> > 
>> 
>> 
>
>-- 
>Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>Duke University Dept. of Physics, Box 90305
>Durham, N.C. 27708-0305
>Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>