[Beowulf] Tyan S2882
Bernd Schubert
bernd-schubert at gmx.de
Thu Sep 28 07:04:02 PDT 2006
On Wednesday 27 September 2006 11:20, Gebhardt Thomas wrote:
> Hi,
>
> > We are currently deploying Tyan S2882 Dual Opteron Boards, and we have
> > found the system to be quite unstable. After BIOS updates and kernel
> > changes we still get random kernel panics when under load.
>
> Me too :-(
>
> We've got a 85 Node Dual Opteron Cluster. I've documented most of the
> crashes on
> http://clust-doc.hrz.uni-marburg.de/index.php/Hardware_Bulletin .
Gosh, good that we didn't buy our cluster from your vendor, they made us an
offer, too. We did buy from Transtec, there were also some memory related
problems during the first few weeks, but all those nodes became smoothly
replaced and ever since everything is running almost perfectly (a small
exeption was in the past the SIL3114 sata controller, at least the driver of
2.6.11 made some problems under heavy load, but this seams to be fixed with
newer kernel versions). Its only a 16 node cluster (Tyan S2881 boards with
4GB and 8GB memory), but given your failure numbers, we also should have seen
many crashes during the last 2 years.
In the past our main fileserver also was a Tyan S2882 system, it randomly
(without any load) entirely locks up sometimes, without any log messages
(monitored with serial cable). Sometimes its running stable for month,
sometimes it crashes once a week - we had to replace the entire system, since
it was not suitable for a high-availibility node. We are additionally
monitoring the memory using bluesmoke - there were never any logged problems.
--
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
More information about the Beowulf
mailing list