[Beowulf] RAID question

Joe Landman landman at scalableinformatics.com
Sat Mar 14 08:48:59 PDT 2015

On 03/13/2015 08:52 PM, mathog wrote:
> A bit off topic, but some of you may have run into something similar.


> Anybody run into a hardware/software glitch with symptoms like this on 
> a similar system???
> Is there some way on these sorts of Dell's to run per disk diagnostics 
> from BIOS or UEFI even if they are already grouped into a virtual disk 
> by the controller?  I suspect that the disk which is /dev/sdb may 
> really be on its way out, but I couldn't get smartctl to work off the 
> DVD or from the copy on disk.   (The smartctl commands used were 
> tested on the twin machine, and they worked there.)  The BIOS showed 
> that SMART was disabled on all of the disks.  Web searches for 
> diagnostics for this controller all referenced software that requires 
> a running OS, nothing built into the BIOS/UEFI.  (It is set to use BIOS.)

System Rescue CD is your friend 

I've seen a number of RAID cards hiccup and blow away data in a number 
of cases (and sadly the software only RAIDs are minimally better, only 
as you can see their code, but they all have bugs).

These days, we tend to boot everything stateless (pure PXE/ramboot).  
This solves the RAID OS disks going away issue (rather completely), as 
we pull our config from our database (replicated/distributed, as the PXE 
boot images are as well).  For stateful installs, I'd recommend getting 
a copy of system rescue as noted above.  Lots of tools, full 
environment, including an incredible "boot off OS on drive" even if grub 
is blown away.

Unfortunately, you can't infer much from drive labelling (sda/sdb etc.) 
so you'd need to see if your RAID became split-brained somehow.  I've 
seen that a few times, actually with MD raid more than with hardware RAID.

