[Beowulf] Need some advise: Sun storage' management server hangs repeatedly

Sangamesh B forum.san at gmail.com
Mon Jan 11 23:19:47 PST 2010


Hi HPC experts,

     I seek your advise/suggestion to resolve a storage(NAS) server'
repeated hanging problem.

     We've a 23 nodes Rocks-5.1 HPC cluster. The Sun storage of capacity 12
TB is connected to a management server Sun Fire X4150 installed with RHEL
5.3 and this server is connected to a Gigabit switch which provides cluster
private network. The home directories on the cluster are NFS mounted from
storage partitions across all nodes including the master.

   This server gets hanged repeatedly. As an initial troubleshooting we
installed Ganglia, to check network utilization. But its normal. We're not
getting how to troubleshoot it and resolve the problem. Can anybode help us
resolve this issue?

Thanks,
Sangamesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20100112/3854d372/attachment.html>


More information about the Beowulf mailing list