[Beowulf] cheap PCs this christmas

Douglas Eadline
Wed Nov 23 07:14:15 PST 2005

> more to the point, if you're going to network $300 PCs, ECC should almost
> certainly not be on your xmas list...

Here are some of my experiences working with non-ECC memory on $300 nodes.
The value cluster (http:clustermonkey.net//content/view/41/29/) Jeff
Layton and I built uses non-ECC PC2700 memory. Aware that not all memory
is the same, I purchased Infineon memory and ran Memtest86
(http://www.memtest86.com) on each node for at least 4 hours. I found no
problems with any of the memory running Memtest86.

I have run the system quite hard on several occasions. I have
run the NAS suite (which is self checking) and never had an error. I also
ran HPL a "whole lot" and never had an issue with bad residuals (while HPL
is not self checking, a memory problem might cause a bad residual). So on
Kronos I have a certain level of confidence that the memory is sound.

I also know that the possibility of errors does exist and that without
ECC, you are living a bit on the dangerous side, but I think with any
cluster you get certain feel for when things are not right.

I have also found that leading edge memory also seems to be the least
stable and if you wait a while (6 months?) the memory seems to get more
stable. Plus, there are DIMMS and then there are DIMMS. Companies can make
"junk" DIMMS and sell them in the Windows market because the desk top
normally does not push memory the waya cluster does.


