[Beowulf] Strange SGE scheduling problem

Reuti reuti at staff.uni-marburg.de
Wed Jul 23 14:35:25 PDT 2008


Am 22.07.2008 um 23:54 schrieb Schoenefeld, Keith:

> My cluster has 8 slots (cores)/node in the form of two quad-core
> processors. Only recently we've started running jobs on it that  
> require
> 12 slots.  We've noticed significant speed problems running  
> multiple 12
> slot jobs, and quickly discovered that the node that was running 4  
> slots
> on one job and 4 slots on another job was running both jobs on the  
> same
> processor cores (i.e. both job1 and job2 were running on CPU's #0-#3,
> and the CPUs #4-#7 were left idling.  The result is that the jobs were
> competing for time on half the processors that were available.

how did you check this? With `top`? You have one queue with 8 slots  
per machine?

-- Reuti

> In addition, a 4 slot job started well after the 12 slot job has  
> ramped
> up results in the same problem (both the 12 slot job and the four slot
> job get assigned to the same slots on a given node).
> Any insight as to what is occurring here and how I could prevent it  
> from
> happening?  We were are using SGE + mvapich 1.0 and a PE that has the
> $fill_up allocation rule.
> I have also posted this question to the hpc_training-l at georgetown.edu
> mailing list, so my apologies for people who get this email multiple
> times.
> Any help is appreciated.
> -- KS
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list