[Beowulf] Tyan S2882

Krugger merc4krugger at gmail.com
Tue Sep 26 04:30:38 PDT 2006


We are currently deploying Tyan S2882 Dual Opteron Boards, and we have
found the system to be quite unstable. After BIOS updates and kernel
changes we still get random kernel panics when under load.

Anyone has these boards and has found any solution, as I have mailed
other users of this board  who also reported random kernel panics and
an unusual number of hardware problems.

So far we have solved the
- broken BIOS problem with an update to the most recent BIOS.
- Discovered that some power supplies can produce problems
- FS corruption due to a firmeware problem in a RAID hardware board
- MCE chipkill errors (non-fatal) due to apparent bad RAM

To be solved:
- random kernel panics that take out the logging even when all debug
flags are set in the kernel, as it fails to sync the disc during the
kernel panic.

