[Beowulf] torus versus (fat) tree topologies

Sun Nov 14 12:52:45 PST 2004

>   I guess I haven't mentioned it yet but I'm a PhD student in the
> Mechanical and Aerospace Engineering department at Syracuse University
> in upstate New York.  Prior to my arrival here I only had superficial
> knowledge of clustering and have subsequently spent the last year
> researching, reading, configuring, testing, etc ...  all of this while
> working on my PhD research (CFD).  So I'm essentially the administrator
> _and_ major user of it. I have to admit it's kinda nice to have almost
> exclusive use of that much horsepower (64 Opteron 242's) for my work!

that's quite reasonable if you're planning to use off-the-shelf hardware.
that is, if you're an engineer doing research using HPC.  if you're 
actually doing research into the implementation of CFD using HPC,
then you should probably look a bit closer at adaptivity, for instance,
which winds up making FEM much less nearest-neighbor...

> benefit to torus topology for this case it might be an option.  BTW, a
> managed HP Pro/Curve (forget model) 36-port gigabit switch is currently
> used, which possibly may also be hindering performance.

the port count indicates that's an older-generation switch, probably
with poorer bandwidth than current models.

> support for Fluent with their product.  Dolphin has been _extremely_
> helpful in this respect, providing an SCI cluster for me to test Fluent
> and offering suggestions for running it (thanks Simen).

I'd be astonished if all of the tier-1 vendors didn't have a test cluster
available for your asking, probably with fluent installed.

> simply because the are not enough users of it.  As a consequence, I am
> looking to find the "best" interconnect solution which will allow a few
> people use of most or all of the CPUs for the jobs we run.

there's always a danger of over-benchmarking, but you should probably
see if you can get access to an IB cluster.  for CFM, I'm a little 
surprised you appear to care so much about latency, since I'd expect
your workload to have the usual volume/surface-area scaling, and 
thus doing a lot of work in a single node, and needing only moderate
bursts of bandwidth for nontrivial problem sizes.

from looking at list prices on the web, Myrinet, IB and Dolphinics have
similar per-port prices which are noticably lower than Quadrics
but also dramatically higher than gigabit.  I suspect most people 
would agree that Quadics is a latency specialist, at least for not
purely nearest-neighbor applications.  OTOH, for cheap nodes, you
should probably consider whether spending 50% of the node price 
makes sense for the performance boost.  (I see 242-based servers 
starting at around $2k list, and your total gigabit cost would be 
less than $100/port.)