PBS Scheduler

Fri Sep 27 07:45:47 PDT 2002

On Fri, 2002-09-27 at 07:27, Ivan Oleynik wrote:
> Hi,
> 
> I have a problem with PBS scheduler: every time when I run IO intensive
> series of jobs it goes down. As a result, the whole pbs queue with other
> jobs become suspended.
> 
> I could not see any useful info in sched_logs and server_logs files except
> of noninformative messages:
> 
> 0001;PBS_Server;Svr;PBS_Server;Connection refused (111) in contact_sched,Could not contact Scheduler

This is actually quite informative.  What I have experienced in the past
with PBS and heavy NFS loads is that the cluster head node runs out of
tcp/udp slots as specified in the /etc/inetd.conf or /etc/xinetd.conf
files.  Depending upon which one you use, you will need to bump those
limits up a bit. 

> For this particular test I run a bunch of mpich jobs requesting just 1
> processor per job, and the number of the submitted jobs was 6 times the
> number of available nodes. Each job does intensive IO via NFS running over
> Myrinet (writing files ~ 300 Mb each).

[...]

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615