Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] ECC support on motherboards?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Håkon Bugge Hakon.Bugge at scali.com
Tue May 13 14:16:17 PDT 2008


At 19:17 13.05.2008, Perry E. Metzger wrote:
>So another question is, how can you reliably test any of this stuff?
>It isn't like you can reliably induce single bit errors and see if the
>hardware catches them. (A special memory module that let you test
>would be a wonderful thing, but I've never even heard of such a thing.)

Well, you can trust the HW vs, the firmware. 
Further, for some chipsets it is possible to 
simply stop the memory refresh for some time 
(~1  minute) while the system is idle. After 
this, you enable it again, and you should see 
single and/or double bit errors. This 
enabling/disabling through setpci or other. If 
you do not see errors after this, you can try to explain why...

Once I wrote tool which examined all settings of 
a particular chipset. That raised numerous questions to the vendor.


Hakon


>I'm doing the planning for a new cluster and the whole thing is
>remarkably bothersome. You can't easily figure out what motherboards
>will even pretend to do ECC that easily, you can't easily check once
>you have a sample motherboard in hand. It isn't even easy to get ECC
>memory for more modern standards. I'm starting to wonder if doing all
>calculations twice, once on each of two machines, isn't easier, but it
>seems utterly wrong to do that...
>
>Perry

--
Håkon Bugge
CTO
mob. +47 92 48 45 14
off. +47 92 44 81 11
fax. +47 22 23 36 66
Hakon.Bugge at scali.com
Skype: hakon_bugge

Scali - http://www.scali.com
Higher Performance Computing





More information about the Beowulf mailing list