[Beowulf] Problems with a JS21 - Ah, the networking...

John Hearns john.hearns at streamline-computing.com
Sat Sep 29 02:22:47 PDT 2007


On Fri, 2007-09-28 at 17:43 -0300, Ivan Paganini wrote:
> Hello everybody,
> 
> I am beginning to take care of an IBM's JS21. The cluster consists of

> The myrinet connection was working right, but sometimes a user program
> just got stuck - one of the processes was sleeping, and all others
> were running. Then, the program hangs. 
> 
> Any suggestions? 

Contact Myricom support? 

BTW, if you are doing the debugging by yourself, start from the bottom.
Take two machines, run mx_info, mx_endpoint (should be nothing if no
programs running) and mx_counters.
Then do your pingpong and further stress tests as in the README.




More information about the Beowulf mailing list