[Beowulf] Re: how can I know that a hard disk died? (Dimitri Antoniou)

Steve Cousins cousins at limpet.umeoce.maine.edu
Fri Aug 12 06:08:07 PDT 2005

On Fri, 12 Aug 2005 Dimitri Antoniou wrote:

>  Hi,
>  We have a 16-node HP LC1000 cluster, with 3 hard disks
>  managed by hardware RAID.
>  Recently, a hard disk died, and we only found out
>  when we went to the room the cluster stays
>  and noticed a failure light on the disk.
>  Now, this room is in a separate building in campus
>  and we can't really travel there daily to check the disks.
>  Is there a way to check from the command line
>  if all 3 disks operate?
>  As I said above, this is hardware RAID.
>  When the disk died, the system didn't notify us,
>  and we haven't found any message in log files,
>  at least not anything obvious.

What brand is the controller?  What OS? All RAID cards that I have run
into have some sort of command line interface that allows you to write a
cron script to check for failed drives and email you if something is
wrong.  For instance our Dell systems use afacli (Adaptec PERC card) and
megamgr (AMI PERC card) and our 3Ware systems use tw_cli.

Good luck,

 Steve Cousins, Ocean Modeling Group    Email: cousins at umit.maine.edu
 Marine Sciences, 208 Libby Hall        http://rocky.umeoce.maine.edu
 Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302

More information about the Beowulf mailing list