[Beowulf] sun grid engine on Scyld beowulf cluster
billk01 at metrumrg.com
Sun Feb 20 15:41:19 PST 2005
I was able to get grid engine to run on the Scyld cluster using the
approach of setting the master (head) node as the submit, admin, and
execute host. Unfortunately, starting a set of jobs on the cluster
results in all jobs being run on the head node only (if grid engine only
commands are used) or I can integrate grid engine "qsub" command with
some of the Scyld tools to get jobs started then migrated ( to a point)
over the cluster. However, I am still running into problems becuase all
of the queueing variables for grid engine read the headnode info and
since all jobs run on the compute nodes, the headnode appears to be
always free which results in all jobs being started at once. This is not
I am waiting on some feedback from Scyld/Penguin computing on some
related issues that will hopefully solve some of these problems.
Chris Dagdigian wrote:
> I know Grid Engine well but not Scyld so forgive my ignorance if I say
> something stupid and given the level of expertise on this list I'm
> quite certain I'm about to make a fool myself :)
> If Scyld is presenting you with a single system image (ie a single
> linux server that can farm out tasks to all those nodes) then you
> would install SGE in the same way that you would install it on a big
> SMP box:
> 1. Install the SGE qmaster and scheduler on the master node
> 2. Install the execution host on the master node as well
> You will only have 1 execd per queue but each queue can be configured
> with N number of "job slots" which actually control how many jobs can
> run at the same time on the same machine.
> Try setting your # of job slots within your single SGE queue to the
> number of nodes in your cluster. This is simlar to what you would do
> on a big SMP machine -- small number of queues each supporting a
> decent jobslot count.
> Then submit a bunch of jobs and see if SGE causes the master node to
> fall over under load. If not then Scyld is doing its thing behind the
> scenes to migrate stuff around to the other nodes.
> billk01 wrote:
>> I am in the process of installing SGE on a Scyld beowulf cluster. As
>> most people are aware, the Scyld cluster runs a complete OS (linux) only
>> on the master node and the compute nodes are simply for executing.
>> During the SGE install, it requires adding the compute nodes as execute
>> hosts. I do not understand how to do this given the current setup of a
>> scyld cluster since you can't "login" to the nodes to execute the
>> install script. The script does exist on an NFS shared directory
>> (cluster wide). Has anybody else ran into this problem?
More information about the Beowulf