[Beowulf] Torque Error: multi-req PBS jobs not allowed
Rahul Nabar
rpnabar at gmail.com
Mon Apr 20 15:29:55 PDT 2009
On Mon, Apr 20, 2009 at 5:11 PM, Greg Lindahl <lindahl at pbm.com> wrote:
> On Mon, Apr 20, 2009 at 04:59:31PM -0500, Rahul Nabar wrote:
>
>> Why would PBS-Torque not allow this and my previous threads
>> "JOBNODEMATCHPOLICY EXACTNODE" by default? Are there any reasons not
>> to use them? The compromise is not obvious to me.
>
> In general it's most efficient to have a job use whole nodes. If you
> use a partial node, jobs will likely interfere with each other,
> reducing overall performance.
>
> Now if your code only runs on N**2 nodes, using whole nodes can be
> painful. But with M*N nodes, you're usually OK.
My code (depending on the specific job at hand ) parallelizes well
over a multiple of a small integer. Mostly (a) multiples of 4 or (b)
multiples of 9.
Since I have 8 cpu/server Job-Type-(a) is great but Job-Type-(b)
requires a trade off. If I wanted to request whole nodes I'd have to
shoot for 8x9=72 but that degree of parallalization is overkill. At
that point the parallalization is no longer so efficient.
The ideal situation for our cluster then seems to me: Request full
nodes + a fragment. If I can selectively allow these
non-full-node-fragments from a specific pool of nodes I could minimize
my overall cluster fragmentation.
But I've no clue how to implement this. Has anybody tried such a solution?
--
Rahul
More information about the Beowulf
mailing list