[Beowulf] Please help to setup Beowulf

Wed Feb 18 07:30:37 PST 2009

>> what's wrong with 500k job submissions?  to me, the existence of "array 
>> jobs"
>> is an admission that the job/queueing system is inefficient.
>
> When I compare this e.g. to C:
>
> - a loop like:
>
>   for (i=1;i<=100000;i++)
>       printf("Hello from run %d.\n", i);
>
> - and you can guess: 100,000 times:
>
>   printf("Hello from run %d.\n", i++);
>
>
> While the execution time is nearly the same, compilation of the first one is 
> faster by far.

this implies that the compiler's lexer sucks, which is as close to my
point this strained analogy can get.

> that it can't compile sequential code very well and generates a big 
> executable?

big executables are fine (this one would be a whole 2 MB!).

> The compiler must read the source, and SGE has to read the job 
> requirements for every job again and again and store it.

if SGE has to keep re-reading user files, this suggests to me that its 
design is poor.  it's obvious to me that a scheduler should be based on 
a production-quality DB, for instance, and should clearly give some 
thought to performance when there are many runnable jobs.

> (PS: not to mention, that a for-loop/array-job is easier to handle for the 
> user)

is it?  an array job, in my experience at least, is just syntactic sugar
for submission.  all the internals of job scheduling still have to operate,
though as you point out, the scheduler may take advantage of the fact that 
each sub-job has identical resource requirements.  it still needs to dispatch
each sub-job separately, each SJ may fail in unique ways, etc.  in the end,
the submission and resource-matching code of the scheduler has gotten a
break, but everything else works just as hard.  consider, for instance, that 
flat naming schemes are conceptually simpler, but array jobs break that.
remember that the user still has to write some sort of script to evaluate
whether each SJ worked, and recover from the failures.