[Beowulf] Advanced Clustering's Breakin
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Prentice Bisbal prentice at ias.eduWed Oct 1 08:44:01 PDT 2008
- Previous message: [Beowulf] Has DDR IB gone the way of the Dodo?
- Next message: [Beowulf] Compute Node OS on Local Disk vs. Ram Disk
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> We have a tool on our website called "breakin" that is Linux 2.6.25.9 > patched with K8 and K10f Opteron EDAC reporting facilities. It can > usually find and identify failed RAM in fifteen minutes (two hours at > most). The EDAC patches to the kernel aren't that great about naming > the correct memory rank, though. > > Make sure you have multibit (sometimes says 4-bit) ECC enabled in your BIOS. > > http://www.advancedclustering.com/software/breakin.html I've been using breakin for the past week or two on my new cluster. I get some results that seem to be inconsistent. For example on a node I'll get this: Test | Pass | Fail | Last Message ------------------------------------------ hdhealth | 315 | 0 | No disk devices found Then in the log section: 00h 57m 40s: Disabling burnin test 'hdhealth' If I reboot and restart the testing, it will see a hard disk. Why is breaking not always seeing the disk? I've tried to dump logs to a USB drive, but breakin refuses to mount the correct partition on my usb drive (/dev/sdb vs. /dev/sdb1, or vice versa). I sent e-mail to Advanced Clustering regarding these issues, but didn't get any response, so I"m hoping I have better luck here. -- Prentice
- Previous message: [Beowulf] Has DDR IB gone the way of the Dodo?
- Next message: [Beowulf] Compute Node OS on Local Disk vs. Ram Disk
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
