OpenPBS under RH7?
Galen Arnold
arnoldg at ncsa.uiuc.edu
Thu Apr 26 08:08:50 PDT 2001
Roger,
Define "pretty good sized". I may have a patch for you to try if it's the
same problem I ran into. See the NCSA Scaling Patch and README at:
http://www-unix.mcs.anl.gov/openpbs/
We're testing another patch to PBS that also helps with scaling--it may be
added to that site soon.
-Galen
On Thu, 26 Apr 2001, Roger L. Smith wrote:
>
> Hello Folks,
>
> We've got a pretty good sized Linux cluster that we are finally about to
> release to our users. During all of our testing and benchmarking, we've
> been using RedHat 7.0 and the 2.4.2 kernel. We've worked out most all of
> the problems until we installed OpenPBS (v2.3.12).
>
> Our test users have been experiencing a problem where the pbs_mom daemon
> on the first node on a given job will die. It typically seems to die
> either when the job finishes, or when the user tries to qdel the job.
> It's not consistent, however. One of our users estimates that it fails 2
> out of 5 times. The corrective action for this includes deleting
> everything out of /var/spool/PBS/mom_priv/jobs/<jobnum*> and restarting
> the daemon on the node.
>
> This is becoming a very serious issue for us, and may require that I
> downgrade the entire cluster to RedHat 6.2 and/or a 2.2 kernel. I'm not
> anxious to do this for several reasons.
>
> Is anyone on this list running a similar configuration, or have any
> experience with this problem?
>
> _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
> | Roger L. Smith Phone:662-325-3625 roger at ERC.MsState.Edu |
> | Systems Administrator FAX: 662-325-7692 WWW.ERC.MsState.Edu/~roger |
> |-------------------------------------------------------------------------|
> | Mississippi State University/National Science Foundation |
> |______Engineering Research Center for Computational Field Simulation_____|
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
+
Galen Arnold, system engineer--systems group arnoldg at ncsa.uiuc.edu
National Center for Supercomputing Applications (217) 244-3473
152 Computer Applications Bldg., 605 E. Spfld. Ave., Champaign, IL 61820
More information about the Beowulf
mailing list