OpenPBS under RH7?

Galen Arnold arnoldg at ncsa.uiuc.edu
Thu Apr 26 08:08:50 PDT 2001


Roger,

Define "pretty good sized".  I may have a patch for you to try if it's the
same problem I ran into.  See the NCSA Scaling Patch and README at:

	http://www-unix.mcs.anl.gov/openpbs/

We're testing another patch to PBS that also helps with scaling--it may be
added to that site soon.

-Galen

On Thu, 26 Apr 2001, Roger L. Smith wrote:

>
> Hello Folks,
>
> We've got a pretty good sized Linux cluster that we are finally about to
> release to our users.  During all of our testing and benchmarking, we've
> been using RedHat 7.0 and the 2.4.2 kernel.  We've worked out most all of
> the problems until we installed OpenPBS (v2.3.12).
>
> Our test users have been experiencing a problem where the pbs_mom daemon
> on the first node on a given job will die.  It typically seems to die
> either when the job finishes, or when the user tries to qdel the job.
> It's not consistent, however.  One of our users estimates that it fails 2
> out of 5 times.  The corrective action for this includes deleting
> everything out of /var/spool/PBS/mom_priv/jobs/<jobnum*> and restarting
> the daemon on the node.
>
> This is becoming a very serious issue for us, and may require that I
> downgrade the entire cluster to RedHat 6.2 and/or a 2.2 kernel.  I'm not
> anxious to do this for several reasons.
>
> Is anyone on this list running a similar configuration, or have any
> experience with this problem?
>
>  _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_
> | Roger L. Smith            Phone:662-325-3625      roger at ERC.MsState.Edu |
> | Systems Administrator     FAX:  662-325-7692 WWW.ERC.MsState.Edu/~roger |
> |-------------------------------------------------------------------------|
> |         Mississippi State University/National Science Foundation        |
> |______Engineering Research Center for Computational Field Simulation_____|
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
+
Galen Arnold, system engineer--systems group       arnoldg at ncsa.uiuc.edu
National Center for Supercomputing Applications           (217) 244-3473
152 Computer Applications Bldg., 605 E. Spfld. Ave., Champaign, IL 61820





More information about the Beowulf mailing list