<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body>
It's looking like it's a universal problem... I am surprised that there
is no mention of it on the mpich web page. LAM can work for some of our
users but as far as I know LAM does not support the spawning of multiple
jobs by the same user from a single machine. Our clusters have front end
systems which support job runs on several subclusters and it is quite common
for users to want to start up more than one job on multiple subclusters.
I guess I will try submitting a bug report to the mpich people and see what
happens...<br>
<br>
Thanks,<br>
<br>
<br>
--JIM<br>
<br>
<br>
Mark Hartner wrote:<br>
<blockquote type="cite"
cite="midPine.LNX.4.21.0209161700280.16006-100000@famine.cs.utah.edu">
<blockquote type="cite">
<pre wrap="">7.0 through 7.3. All of these systems exhibit the same problem with
mpich 1.2.3, upon reboot. Mpich 1.2.1 and LAM MPI does not exhibit this
behavior. Has anyone experienced this problem or know what could be
causing it?
</pre>
</blockquote>
<pre wrap=""><!---->
We saw the exact same problem on our cluster. We even saw it with a simple
'hello world' program. Our solution was to switch to LAM MPI. We had a
little trouble getting LAM MPI and MPE working, but eventually got it
working. We can send you the bug fix if you want to use LAM and MPE.
Mark
_______________________________________________
Beowulf mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a>
To change your subscription (digest mode or unsubscribe) visit <a class="moz-txt-link-freetext" href="http://www.beowulf.org/mailman/listinfo/beowulf">http://www.beowulf.org/mailman/listinfo/beowulf</a>
</pre>
</blockquote>
<br>
</body>
</html>