[Beowulf] Small form computers as cluster nodes - any comments about the Shuttle brand ?

Bogdan Costescu Bogdan.Costescu at iwr.uni-heidelberg.de
Sun Aug 9 06:50:59 PDT 2009


On Fri, 7 Aug 2009, David Ramirez wrote:

> Due to space constraints I am considering implementing a 8-node (+ master)
> HPC cluster project using small form computers. Knowing that Shuttle is a
> reputable brand, with several years in the market, I wonder if any of you
> out there has already used them on clusters and how has been your experience
> (performance, reliability etc.)

I've built a cluster of 80 nodes, which will turn 5 this month. Using 
Shuttle SB75G2, supports ECC, has a GigE on board (Broadcom) and the 
power supply is more than enough for the CPU (PIV Northwood 3.2GHz), 
one SATA HDD, a low power and performance graphics card (there's no on 
board graphics unfortunately) and an extra GigE card (Intel E1000). 
The decision for adding an extra NIC was not due to problems with the 
Broadcom chip, but simply to have dedicated networks; the Broadcom is 
able to do PXE just fine and this is the way these nodes have booted 
since setting them up.

I was pleasantly surprised by the reliability of these computers. 
Given their tightness, they require attention and good skills when 
building them, f.e. using good quality thermal paste to avoid local 
thermal problems and routing cables to avoid transport thermal 
problems. About 70 of the 80 are still running well today, most of the 
failed ones stopped working correctly after the 3 years of warranty so 
I didn't make much effort to find out what is wrong - the main problem 
being instability under combined CPU and I/O load. Of course, when RAM 
and HDDs failed and were easy to recognize as causes, they were 
replaced as needed.

As I wrote earlier on this list, the main disadvantage of such SFFs is 
the lack of IPMI support. There is no serial console support in the 
BIOS, so changing BIOS settings is a pain. Power control can be 
achieved with a PDU, but I didn't choose this way because I knew that 
the nodes should be always up and I wouldn't have to press the power 
buttons too often ;-) Another thing to keep in mind is that, due to 
their tightness, they are quite sensitive to the external temperature 
- if the A/C fails, expect a sharp raise in internal temperature, so 
setting up monitoring, both environmental and for the builtin sensors, 
is recommended.

Good luck!

-- 
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de



More information about the Beowulf mailing list