[Beowulf] Tyan S2882

Bernd Schubert bernd-schubert at gmx.de
Thu Sep 28 07:04:02 PDT 2006

On Wednesday 27 September 2006 11:20, Gebhardt Thomas wrote:
> Hi,
> > We are currently deploying Tyan S2882 Dual Opteron Boards, and we have
> > found the system to be quite unstable. After BIOS updates and kernel
> > changes we still get random kernel panics when under load.
> Me too :-(
> We've got a 85 Node Dual Opteron Cluster. I've documented most of the
> crashes on
> http://clust-doc.hrz.uni-marburg.de/index.php/Hardware_Bulletin .

Gosh, good that we didn't buy our cluster from your vendor, they made us an 
offer, too. We did buy from Transtec, there were also some memory related 
problems during the first few weeks, but all those nodes became smoothly 
replaced and ever since everything is running almost perfectly (a small 
exeption was in the past the SIL3114 sata controller, at least the driver of 
2.6.11 made some problems under heavy load, but this seams to be fixed with 
newer kernel versions). Its only a 16 node cluster (Tyan S2881 boards with 
4GB and 8GB memory), but given your failure numbers, we also should have seen 
many crashes during the last 2 years.

In the past our main fileserver also was a Tyan S2882 system, it randomly 
(without any load) entirely locks up sometimes, without any log messages 
(monitored with serial cable).  Sometimes its running stable for month, 
sometimes it crashes once a week - we had to replace the entire system, since 
it was not suitable for a high-availibility node. We are additionally 
monitoring the memory using bluesmoke - there were never any logged problems.

Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg

More information about the Beowulf mailing list