[Beowulf] sun grid engine on Scyld beowulf cluster

BillKnebel billk01 at metrumrg.com
Sun Feb 20 15:41:19 PST 2005


Chris,

I was able to get grid engine to run on the Scyld cluster using the 
approach of setting the master (head) node as the submit, admin, and 
execute host.  Unfortunately, starting a set of jobs on the cluster 
results in all jobs being run on the head node only (if grid engine only 
commands are used) or I can integrate grid engine "qsub" command with  
some of the Scyld tools to get jobs started then migrated ( to a point) 
over the cluster.  However, I am still running into problems becuase all 
of the queueing variables for grid engine read the headnode info and 
since all jobs run on the compute nodes, the headnode appears to be 
always free which results in all jobs being started at once. This is not 
ideal. 

I am waiting on some feedback from Scyld/Penguin computing on some 
related issues that will hopefully solve some of these problems. 

Bill
Chris Dagdigian wrote:

>
> I know Grid Engine well but not Scyld so forgive my ignorance if I say 
> something stupid and given the level of expertise on this list I'm 
> quite certain I'm about to make a fool myself :)
>
> If Scyld is presenting you with a single system image (ie a single 
> linux server that can farm out tasks to all those nodes) then you 
> would install SGE in the same way that you would install it on a big 
> SMP box:
>
> 1. Install the SGE qmaster and scheduler on the master node
> 2. Install the execution host on the master node as well
>
> You will only have 1 execd per queue but each queue can be configured 
> with N number of "job slots" which actually control how many jobs can 
> run at the same time on the same machine.
>
> Try setting your # of job slots within your single SGE queue to the 
> number of nodes in your cluster. This is simlar to what you would do 
> on a big SMP machine -- small number of queues each supporting a 
> decent jobslot count.
>
> Then submit a bunch of jobs and see if SGE causes the master node to 
> fall over under load. If not then Scyld is doing its thing behind the 
> scenes to migrate stuff around to the other nodes.
>
> -Chris
>
>
>
> billk01 wrote:
>
>> I am in the process of installing SGE on a Scyld beowulf cluster.  As
>> most people are aware, the Scyld cluster runs a complete OS (linux) only
>> on the master node and the compute nodes are simply for executing.
>> During the SGE install, it requires adding the compute nodes as execute
>> hosts.  I do not understand how to do this given the current setup of a
>> scyld cluster since you can't "login" to the nodes to execute the
>> install script.  The script does exist on an NFS shared directory
>> (cluster wide).  Has anybody else ran into this problem?
>>
>
>
>



More information about the Beowulf mailing list