Problem with Alpha and strange message.

Greg Lindahl lindahl at conservativecomputer.com
Mon Oct 1 09:12:46 PDT 2001


On Mon, Oct 01, 2001 at 09:35:07AM -0600, Carlos Lopez wrote:

> Hello, we have a 4 node cluster with Alphas, and lately I've been 
> recieving the next messages from the console of the master node:
> 
> Sep 18 15:56:47 master kernel: TSUNAMI machine check: vector=0x630
> pc=0xfffffc0000333410 code=0x100000086

This is an Alpha question, not really related to beowulf. Here's a
table that describes what the machine checks mean:

Code      Reason                  Example or Common Cause
====      ======                  =======================

620	System Correctable        correctable errors in the memory subsystem,
				  eg single bit ECC errors, detected async to
				  processor execution

630	Processor Correctable     correctable cache and TLB errors, detected
				  internally by the processor

660	System Uncorrectable      unrecoverable memory errors

670	Processor Uncorrectable   unrecoverable cache or TLB errors, or
                                  read of a non-existent I/O space location

If you frequently get 630's, I'd advise replacing the CPU.

g




More information about the Beowulf mailing list