Mpich 1.2.3 first run problem

Jim Matthews beowulf at cfdlab.larc.nasa.gov
Mon Sep 16 22:00:40 PDT 2002


It's looking like it's a universal problem...  I am surprised that there 
is no mention of it on the mpich web page.  LAM can work for some of our 
users but as far as I know LAM does not support the spawning of multiple 
jobs by the same user from a single machine.  Our clusters have front 
end systems which support job runs on several subclusters and it is 
quite common for users to want to start up more than one job on multiple 
subclusters.  I guess I will try submitting a bug report to the mpich 
people and see what happens...

Thanks,


--JIM


Mark Hartner wrote:

>>7.0 through 7.3.  All of these systems exhibit the same problem with 
>>mpich 1.2.3, upon reboot.  Mpich 1.2.1 and LAM MPI does not exhibit this 
>>behavior.  Has anyone experienced this problem or know what could be 
>>causing it?
>>    
>>
>
>We saw the exact same problem on our cluster. We even saw it with a simple
>'hello world' program. Our solution was to switch to LAM MPI. We had a
>little trouble getting LAM MPI and MPE working, but eventually got it
>working. We can send you the bug fix if you want to use LAM and MPE.
>
>Mark
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20020917/e0f25f19/attachment.html>


More information about the Beowulf mailing list