[Beowulf] RAID question

mathog mathog at caltech.edu
Wed Mar 18 14:00:15 PDT 2015


J?rg Sa?mannshausen <j.sassmannshausen at ucl.ac.uk> wrote:

> $ smartctl -i /dev/sda -d megaraid,X

Right.

The issues have been resolved.  If anybody is still curious, this is 
what happened.

The disappearing files/directories were the result of a script that was 
run as root which moved /boot and /bin to an obscure subdirectory 
belonging to that user.

The disk errors were a red herring. The system had a Seagate USB disk 
plugged into it which I was not aware of.  (It was less not obvious 
because of the rats nest of cables behind it.)  This disk's partition 
table was marked bootable - even though there was nothing on that disk 
which would have supported a boot.  This was the disk that was showing 
up as /dev/sdb.  When CentOS booted normally it was automatically 
mounting this disk, which is why there was no mention of it in 
/etc/fstab.  However, nothing was using this disk.  It looks like at 30 
minute intervals the OS "pinged" the device to see if it was still 
there, and the enclosure/disk did not fully support whatever command was 
being used for this operation, resulting in the sense error messages in 
the log files. When the rescue DVD was
used it saw this device, created /dev/sda for it (yes, device names were 
exchanged in the two environments) and didn't mount it.

Long SMART tests have now been run on each of the internal disks using 
smartctl commands like the one above, and all the disks are fine.  
megacli also comes up clean.  The USB disk is no longer plugged in, 
which solved the issue of sense error messages going to 
/var/log/messages.

Thanks for all of the suggestions,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


More information about the Beowulf mailing list