[Beowulf] Dual head or service node related question ...
reuti at staff.uni-marburg.de
Fri Dec 4 03:13:08 PST 2009
Am 04.12.2009 um 10:24 schrieb Hearns, John:
> What is viewed as the best practice (or what are people doing) on
> something like an SGI ICE system with multiple service or head nodes?
> Does one service node generally assume the same role as the
> head node above (serving NFS, logins, and running services like
> PBS pro)? Or ... if NFS is used, is it perhaps served from another
> service node and mounted both on the login node and the compute
I don't know for the original system you mentioned. We use SGE (not
PBSpro) and I prefer putting it's qmaster also on the fileserver (the
additional load by the fileserver is easier to predict than the
varying work of interactive users). Then you can have as many login/
submission machines as you like or need - there is no daemon running
at all on them (though it might be different for PBSpro). The
submission machines just need read access to /usr/sge or whereever
it's installed to source the settings file and have access to the
commands. Nevertheless it could be installed w/o NFS access at all -
even the nodes could spare NFS, but you would lose some fucntionality
and need some kind of file-staging for the jobs files.
SGE's options regarding NFS are explained here: http://
gridengine.sunsource.net/howto/nfsreduce.html The options having just
local spool directories fits my needs best. Maybe PBSpro has similar
How is PBSpro doing its spooling - do they have some kind of database
Is anyone putting the qmaster(s) in separate virtual machine(s) on
the file server for failover - I got this idea recently?
> Two service nodes which act as login/batch submission nodes.
> PBSpro configured to fail over between them (ie one is the PBS
> primary server).
> Separate server for storage – SGI connect these storage servers via
> the Infiniband fabric,
> and use multiple Infiniband ports to spread the load – you can
> easily configure this at cluster install time,
> ie. every nth node connects to a different Infiniband port on the
> storage server.
More information about the Beowulf