[Beowulf] Slection from processor choices; Requesting Giudence

Scott Atchley atchley at myri.com
Fri Jun 16 05:40:08 PDT 2006


On Jun 16, 2006, at 1:43 AM, <laytonjb at charter.net>  
<laytonjb at charter.net> wrote:

>>>> Initially, we are deciding to use Gigabit ehternet switch and  
>>>> 1GB of
>>>> RAM at each node.
>>
>> that seems like an odd choice.  it's not much ram, and gigabit is
>> extremely slow (relative to alternatives, or in comparison to on- 
>> board
>> memory access.)
>
> This is a common misconception that people make (and Mark is one of  
> the
> best on this list). I'm not directing my comments at Mark, but  
> using his
> comment as a platform for my soapbox :) Ths misconception is that  
> you need a
> low-latency network for CFD codes because of the message sizes.
>
> Let me spew some benchmarks I've seen.
>
> - One CD  code that I've benchmarked got over 80% scaling at 200 CPUs
> (2 CPUs per node) with plain GigE.
> - This same code only got about 12% faster performance on Myrinet  
> 2G than
> GigE. It also only got a few percent better than Myrinet 2G with  
> IB. Infinipath
> was about the same as IB but maybe a bit faster.

Was this with GM or MX? On Fluent and StarCD, I see anywhere from 4%  
to 15% speedup on the _same_ Myrinet 2G hardware when moving from GM  
to MX. Since the hardware did not change (i.e. no additional  
bandwidth), the performance increase was due to the much lower  
latency of MX compared to GM.

Also, when using MX, if the code uses mostly large messages, you can  
set an environment variable, MX_RCACHE=1, that uses a registration  
cache that can improve large message performance (i.e. reach max  
bandwidth sooner). With both Fluent and StarCD, I do not see any  
benefit to using this option.

> - The same code only lost about 1% in performance in switching to  
> dual-core
> compared to single-core CPUs out to about 16 or 32 CPUs total (the  
> was the limit of
> the testing). This isn't a network related benchmark, but I thought  
> I would
> thow it out or fun :)
> - On Overflow2, we've seen IB and IP to be about the same (at least  
> well
> within the noise) for the size problems we've tested (fairly large)  
> and the
> range of CPU counts.
>
>    We have some experience with other CFD codes such as Star-CD,  
> Fluent,
> CFD++, CFL3D, Overflow2, etc. and they all return about the same  
> general
> trends. They all have the same gross trends althought there are some
> differences. Many of the differences have to do with the algorithms  
> used and
> the implementations of the algorithms. For example, is the code  
> structured or
> unstructured? Does it do overlapping (chimera) grids? How  
> "overlapped" are
> the grids? How load-balanced is the problem and the algorithm? Are  
> you doing
> node-centered or cell-centered (in the case of unstructured). There  
> are
> many things to consider. In many cases GigE is good enough and you  
> can't
> beat the price. Level 5 looks very attractive and may be the price/ 
> performance
> king. GAMMA is pretty cool as well although I don't have any  
> benchmarks (yet).
> Myrinet 2G is good as well and is close to the price/performance  
> king. IB,
> IP, Quadrics are all good as well, but they may not be the best in  
> terms of
> price/performance (I even have one benchmark where IB is slower  
> than GigE.
> I'm still trying to explain that one :)  ).
>
> Here are some other observations:
>
> - The network/MPI combination is fairly critical to good  
> performance and to
> price/performance. I have done some benchmarks where the right MPI  
> library
> on GigE produces faster results than a bad MPI library on Myrinet.  
> Seems
> counter-intuitive, but I've seen it.
> - The MPI is pretty important to good performance particularly on  
> GigE. One
> benchmark I did showed over a 2X in performance on LAM compared to
> MPICH1. I know MPICH1 is really old and we have a shiny new MPICH2,  
> but
> you would be suprised how many people start with MPICH1 and how many
> people stick with it.
>
> So, I'm not picking on Mark but I wanted to throw out some random  
> observations
> I've made over the years. Not that I'm an expert but I've got a few  
> CFD/cluster
> bruises and thought I would show people why I got them :)
>
> Jeff
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list