[Beowulf] Dual head or service node related question ...

Reuti reuti at staff.uni-marburg.de
Fri Dec 4 03:13:08 PST 2009


Am 04.12.2009 um 10:24 schrieb Hearns, John:

> What is viewed as the best practice (or what are people doing) on
> something like an SGI ICE system with multiple service or head nodes?
> Does one service node generally assume the same role as the
> head node above (serving NFS, logins, and running services like
> PBS pro)?  Or ... if NFS is used, is it perhaps served from another
> service node and mounted both on the login node and  the compute
> nodes?

I don't know for the original system you mentioned. We use SGE (not  
PBSpro) and I prefer putting it's qmaster also on the fileserver (the  
additional load by the fileserver is easier to predict than the  
varying work of interactive users). Then you can have as many login/ 
submission machines as you like or need - there is no daemon running  
at all on them (though it might be different for PBSpro). The  
submission machines just need read access to /usr/sge or whereever  
it's installed to source the settings file and have access to the  
commands. Nevertheless it could be installed w/o NFS access at all -  
even the nodes could spare NFS, but you would lose some fucntionality  
and need some kind of file-staging for the jobs files.

SGE's options regarding NFS are explained here: http:// 
gridengine.sunsource.net/howto/nfsreduce.html The options having just  
local spool directories fits my needs best. Maybe PBSpro has similar  

How is PBSpro doing its spooling - do they have some kind of database  
like SGE?

Is anyone putting the qmaster(s) in separate virtual machine(s) on  
the file server for failover - I got this idea recently?

-- Reuti

> Two service nodes which act as login/batch submission nodes.
> PBSpro configured to fail over between them (ie one is the PBS  
> primary server).
> Separate server for storage – SGI connect these storage servers via  
> the Infiniband fabric,
> and use multiple Infiniband ports to spread the load – you can  
> easily configure this at cluster install time,
> ie. every nth node connects to a different Infiniband port on the  
> storage server.

More information about the Beowulf mailing list