[Beowulf] compute node hardware selection

tomislav_maric@gmx.com tomislav.maric at gmx.com
Sat Apr 17 06:19:06 PDT 2010

Hi everyone, 

I have built and played with my home beowulf cluster running rocks for a while and 
now I would like to construct a bigger one. I have bought a book called Building Clustered
Linux Systems (being a noob at HPC). The book is most excellent. The cluster is to be 
used for CFD and FEM computations. A side note: I have a MSc in computational
science so I know my applications in detail, and the numerics behind the calculations, but
on the HPC side, I'm kind of a noob (learning it for only one year of the required
25 years ;) ).

I have some questions regarding the choice of hardware architecture for the compute nodes.

Since I am on a low budget I would like to implement a 16 compute nodes cluster of COTS 
electronics (for my budget, Xeons and Opterons are not COTS, definitely) and I have trouble
deciding whether to Dual Core or Quad Core, Intel vs AMD processors for the cluster.

gcc is used for the compilation and the AMD hardware is cheaper, so I'm inclined on AMD, 
but I would appreciate any advice on this: I have been given advice before that icc
can get me up to 20% of speed increase on Intel processors. 

Another reason is the fact that since I'm running coarse grained CFD/FEM simulations, 
there is not much use of the bigger cache that is held by the server type processors
like Opteron and Xeon, or am I wrong? The data is really huge so not much can happen
in the cache that can stay there for a while and make it be useful that way. 

I have read that for the multiple core processors the system bus can get saturated, so I 
am running benchmarks on two single machines: one with 2 core processor and the other
one with 4 cores. 

The idea is to run a benchmarking case of my choice (transient multiphase incompressible
fluid flow) and increase the case size and the number of processors to see when the 
parallelization is impacted by the traffic on the system bus and to estimate the biggest
size of the simulation case for the single compute node. 

How can I avoid I/O speeds to impact my IPC estimation for a single slice? I will up 
a RAID 0 on the machine and with no networking involved, I am not sure that there is anything
else I could do to take the I/O impact out of the benchmarking for the single slice. 

I am describing this in details because I really want to avoid spending the budget in the 
wrong way. I will appreciate any advice that you can spare. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20100417/587360d9/attachment.html>

More information about the Beowulf mailing list