Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Lost cycles due to PBS (was Re: Uptime data/studies/anecdotes)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Ron Chen ron_chen_123 at yahoo.com
Thu Apr 11 00:53:52 PDT 2002


--- Chris Black <cblack at eragen.com> wrote:
> On Tue, Apr 02, 2002 at 12:46:07PM -0600, Roger L.
> Smith wrote:
> > On Tue, 2 Apr 2002, Richard Walsh wrote:
> [stuff deleted]
> > PBS is our leading cause of cycle loss.  We now
> run a cron job on the
> > headnode that checks every 15 minutes to see if
> the PBS daemons have died,
> > and if so, it automatically restarts them.  About
> 75% of the time that I
> > have a node fail to accept jobs, it is because its
> pbs_mom has died, not
> > because there is anything wrong with the node.
> > 
> 
> We used to have the same problem with PBS,
> especially when many jobs were 
> in the queue. At that point sometimes the pbs master
> died as well.
> Since we've switched to SGE/GridEngine/CODINE I've
> been MUCH happier.
> Plus there are lots of nifty things you can do with
> the expandibility of 
> writing your own load monitors via shell scripts and
> such.
> The whole point of this post is:
> GNQS < PBS < Sun Gridengine :)
> 
> Chris (who tried two other batch schedulers until
> settling on SGE)
> 

I also have similar experience -- I tried PBS, it is
hard to install, and there are not much scheduling
policies -- but it is hard to config.

Then I read the news about SGE, and since it does not
require root access to install/run, I gave it a try. I
did an experience a few weeks ago -- submitting over
30,000 "sleep jobs" to SGE, and it did not die! If
the master host is down, another machine takes over,
so there is not lost of computing power.

I think SGE 5.3 is better than anything available. I
tried commerical DRM systems, other open source
packages, but so far SGE is by far the best.

BTW, Chris, how many nodes are there in your cluster?

-Ron

P.S. I'm doing a port of SGE to FreeBSD, hope people
find it useful

__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/



More information about the Beowulf mailing list