Dual-Athlon Cluster Problems

Erwan Velu erwan at mandrakesoft.com
Thu Jan 23 11:15:29 PST 2003


Le jeu 23/01/2003 à 19:52, Martin Siegert a écrit :
> > 4/ Are there any other outstanding issues with these machines 
> >    under constant heavy load ?
> In 99% of all the crashes I have seen on my cluster (and I have seen
> a lot) the reason was bad memory. If you did not buy memory certified by
> the company that sold you the motherboard exchange it and your problems
> will go away.
Agreed, you should try to boot each node using memtest86
(http://www.memtest86.com/memtest86-3.0.tar.gz) which is writen in
assembly code and executed at boot time so it isn't linked with any
operating system.
This is the best test I know for being sure that the memory is good.
-- 
Erwan Velu
Linux Cluster Distribution Project Manager
MandrakeSoft
43 rue d'aboukir 75002 Paris
Phone Number : +33 (0) 1 40 41 17 94
Fax Number   : +33 (0) 1 40 41 92 00
Web site     : http://www.mandrakesoft.com
OpenPGP key  : http://www.mandrakesecure.net/cks/ 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20030123/53b8231e/attachment.sig>


More information about the Beowulf mailing list