Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

PBS Scheduler

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Joseph Landman landman at scalableinformatics.com
Fri Sep 27 07:45:47 PDT 2002


On Fri, 2002-09-27 at 07:27, Ivan Oleynik wrote:
> Hi,
> 
> I have a problem with PBS scheduler: every time when I run IO intensive
> series of jobs it goes down. As a result, the whole pbs queue with other
> jobs become suspended.
> 
> I could not see any useful info in sched_logs and server_logs files except
> of noninformative messages:
> 
> 0001;PBS_Server;Svr;PBS_Server;Connection refused (111) in contact_sched,Could not contact Scheduler

This is actually quite informative.  What I have experienced in the past
with PBS and heavy NFS loads is that the cluster head node runs out of
tcp/udp slots as specified in the /etc/inetd.conf or /etc/xinetd.conf
files.  Depending upon which one you use, you will need to bump those
limits up a bit. 

> For this particular test I run a bunch of mpich jobs requesting just 1
> processor per job, and the number of the submitted jobs was 6 times the
> number of available nodes. Each job does intensive IO via NFS running over
> Myrinet (writing files ~ 300 Mb each).

[...]

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615




More information about the Beowulf mailing list