[Beowulf] Slow RAID reads, no errors logged, why?

David Mathog mathog at caltech.edu
Mon Mar 19 14:36:18 PDT 2018


On 19-Mar-2018 14:21, Alex Chekholko wrote:
> Normally I would suggest to do a diagnostic read dd from each disk, but 
> you
> may not be able to do that with your RAID controller since it hides the
> individual disks.

I run full smartctl scans on the disks once a week.  In fact those were 
running at the time (on both machines).  Nothing turns up other than 
what was posted.

> 
> My next recommendation would be a full AC cycle; can you power the host 
> off
> for a few minutes? It's a bit cargo cult-y but sometimes it works. It 
> may
> also help (or not) for you to spin around 3 times while the machine is 
> off.

Did a reboot, but not a full AC cycle.  There is a battery backup on the 
RAID controller, so power down is not really ever power off for that 
board.  When it came back up the IO rate was unchanged.  Perhaps a few 
minutes with the power off might clear some stuck bits elsewhere in the 
system.

Checked the speed on a third similar machine, H370P controller and again 
SAS
disks.  This one is all SEAGATE ST4000NM0005.  It was as fast as the 'A' 
machine (more or less.)

Thanks,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


More information about the Beowulf mailing list