[Beowulf] Re: how can I know that a hard disk died? (Dimitri Antoniou)
Steve Cousins
cousins at limpet.umeoce.maine.edu
Fri Aug 12 06:08:07 PDT 2005
On Fri, 12 Aug 2005 Dimitri Antoniou wrote:
> Hi,
>
> We have a 16-node HP LC1000 cluster, with 3 hard disks
> managed by hardware RAID.
>
> Recently, a hard disk died, and we only found out
> when we went to the room the cluster stays
> and noticed a failure light on the disk.
>
> Now, this room is in a separate building in campus
> and we can't really travel there daily to check the disks.
> Is there a way to check from the command line
> if all 3 disks operate?
> As I said above, this is hardware RAID.
>
> When the disk died, the system didn't notify us,
> and we haven't found any message in log files,
> at least not anything obvious.
What brand is the controller? What OS? All RAID cards that I have run
into have some sort of command line interface that allows you to write a
cron script to check for failed drives and email you if something is
wrong. For instance our Dell systems use afacli (Adaptec PERC card) and
megamgr (AMI PERC card) and our 3Ware systems use tw_cli.
Good luck,
Steve
______________________________________________________________________
Steve Cousins, Ocean Modeling Group Email: cousins at umit.maine.edu
Marine Sciences, 208 Libby Hall http://rocky.umeoce.maine.edu
Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302
More information about the Beowulf
mailing list