[Beowulf] cluster building for teaching (on the cheap)

Douglas Eadline deadline at clustermonkey.net
Sun Aug 27 18:30:27 PDT 2006


> I've never done this before and I'd appreciate your collective input:
> (1) Does Linux/MPICH/gcc/g95 work pretty well with dual core opteron

>From the OS perspective the extra core looks like just another CPU. So
it can work with MPI (one would assume MPI messages via memory transfer
will be faster than same host TCP transfers, but this deserves testing).
gfortran (and g95) have no provisions for the extra core, but none are
really needed if you are using MPI. Some commercial compilers have
some auto-parallelization switches, but how effective these are
depends on what you are doing (i.e. is it better to treat
each core as a "MPI node" or try to run inner loops as
parallel threads and handle the outer loops using MPI calls?
There is no easy answer at this point.

> (2) Am I better off buying 8 of the cheapest Dells I can find and
networking those together?

Depends. If you look at this recent post:


and download the white paper you will see that
for some applications your money may be better spent on
"desktop" hardware rather than "server hardware".
(Hint: If you have an aversion to signing up for the white paper
a link to the raw data is in the text box on the second page)

As point of reference, the small 8 node Sempron (1.7GHz)
GHz cluster that cost $2500 was faster than a quad opteron
(2GHz) system on one of the Gromacs benchmarks (dppc).
Of course we all know, YMMV and it all depends on the application.
And, this data point does not mean 8 Semprons are faster than
four Opterons in all cases...

The value cluster while a bit dated still tells a good story. Jeff Layton
and I were able to get p-to-p of $171/GFLOPS for HPL (14.5 Total GFLOPS).


> (2.5) Do you pay a premium for a 1-u or 2-u enclosure?


> (3) In general (processor type, peripherals held constant), is it
cheaper to buy 2x standard processor boxes, 1 dual processor box, or
half of a dual processor, dual core box?

There is no hard or fast rule, running code helps determine what is best.
Here are some things to consider, however. Cheap and fast nodes can be built
for a low cost, but some Ethernet choices may throttle the available
performance. I'm partial to Intel Ethernet chip-sets and SMC switches
because I have been able to coax them into performing well.
AND, there are some parameters to play with (interrupt mitigation,
MTU size, MPI library, compiler, for instance) that can make a big
difference. I have found that there are no hard fast rules for these
choices, but a day of testing does wonders.


More information about the Beowulf mailing list