[Beowulf] Need some advise: Sun storage' management server hangs repeatedly

Reuti reuti at staff.uni-marburg.de
Mon Jan 18 03:30:03 PST 2010


Hi,

Am 18.01.2010 um 10:13 schrieb Sangamesh B:

> Hello all,
>
>      Thanks for your suggestions.
>      But we lost the access to the cluster because of the delay.

but the access to the service processor should still be there, and I  
think Skylar referred to the ILOM interface.

-- Reuti


>
>     But I got useful information to debug next time.
>
> Thanks,
> Sangamesh
> On Thu, Jan 14, 2010 at 10:38 AM, Skylar Thompson  
> <skylar at cs.earlham.edu> wrote:
> Sangamesh B wrote:
> > Hi HPC experts,
> >
> >      I seek your advise/suggestion to resolve a storage(NAS) server'
> > repeated hanging problem.
> >
> >      We've a 23 nodes Rocks-5.1 HPC cluster. The Sun storage of
> > capacity 12 TB is connected to a management server Sun Fire X4150
> > installed with RHEL 5.3 and this server is connected to a Gigabit
> > switch which provides cluster private network. The home  
> directories on
> > the cluster are NFS mounted from storage partitions across all nodes
> > including the master.
> >
> >    This server gets hanged repeatedly. As an initial troubleshooting
> > we installed Ganglia, to check network utilization. But its normal.
> > We're not getting how to troubleshoot it and resolve the problem.  
> Can
> > anybode help us resolve this issue?
> Is there anything amiss according to the service processor?
>
> --
> -- Skylar Thompson (skylar at cs.earlham.edu)
> -- http://www.cs.earlham.edu/~skylar/
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list