[Beowulf] What services do you run on your cluster nodes?

Eric Thibodeau kyron at neuralbs.com
Mon Sep 22 12:44:24 PDT 2008


Ashley Pittman wrote:
> On Mon, 2008-09-22 at 14:56 -0400, Eric Thibodeau wrote:
>   
>>> My question is this: how extreme do you go in disabling non-essential
>>> services on your cluster nodes? Do you turn off *everything* that's not
>>> absolutely necessary, do you leave somethings running to make
>>> administration easier?
>>>       
>
> If it were up to me I'd turn *everything* possible off except sshd and
> ntp.  The problem however is the maintenance cost of doing this, it's
> fine if you've only got one cluster and one app but as soon as you try
> to support multiple users on multiple distributions the cost of ensuring
> everything is shut down on all of them skyrockets and it becomes easier
> which is to stick with the status quo :(
>   
O_o...you mean you're still using local OS installations ... ew!
>   
>> Everything is turned off and, most of the time, a quick glance at 
>> ganglia brings out problems. Simple scripts can be built to perform 
>> cyclic checks on the nodes and would be less disruptive IMHO.
>>     
>>> I'm curious to see how everyone else has their cluster(s) configured
Well, while at it, here are my node's services (this one I built 3years 
ago, the new images are different now):

thinkbig1 ~ # rc-status
Runlevel: unionfs
 ntp-client                                                                                                                                                               
[  started  ]
 ntpd                                                                                                                                                                     
[  started  ]
 sshd                                                                                                                                                                     
[  started  ]
 acpid                                                                                                                                                                    
[  started  ]
 gmond                                                                                                                                                                    
[  started  ]
 portmap                                                                                                                                                                  
[  started  ]
 autofs                                                                                                                                                                   
[  started  ]
 nfsmount                                                                                                                                                                 
[  started  ]
 netmount                                                                                                                                                                 
[  started  ]
 vixie-cron                                                                                                                                                               
[  started  ]
 local                                                                                                                                                                    
[  started  ]
Runlevel: UNASSIGNED
 fsck                                                                                                                                                                     
[  started  ]
 rpc.statd                                                                                                                                                                
[  started  ]
 udev-postmount                                                                                                                                                           
[  started  ]

>> The only actual research I found on OS interference impacting HPC 
>> computing is titled "A measurement and simulation methodology for 
>> parallel computing performance studies" by Matthew Joseph Sottile. I 
>> would be curious to know if anyone else has dipped into the subject and 
>> come up with conclusive results on the subject.
>>     
>
> At medium to large scales it becomes hugely important,
> http://www.sc-conference.org/sc2003/paperpdfs/pap301.pdf
>
> Also look at "whatelse" from http://www.c3.lanl.gov/pal/software.shtml
>
> Ashley Pittman.
>   
Hey, thanks for the links, I've coded my own whatelse (flimsy but does 
the trick) and I'll read up the article.

Eric




More information about the Beowulf mailing list