[Beowulf] anyone using SALT on your clusters?

Douglas Eadline deadline at eadline.org
Fri Jun 28 12:56:00 PDT 2013


> On Fri, Jun 28, 2013 at 09:45:50AM +0100, Jonathan Barber wrote:
>
>> The problem with SSH based approaches is when you have failed nodes -
>> normally they cause the entire command to hang until the attempted
>> connection times out.
>
> Normally what people do is ping the node before trying ssh on it. And
> have reasonable timeouts around both the ssh connect and the command
> execution. There's no fundamental reason why this is any different
> from messaging or subscription-plus-messaging.

I have found using whatsup-pingd
(https://computing.llnl.gov/linux/whatsup.html) run once every minute or
so, to create a list of "up nodes"
and "down nodes" is very handy. You can even point pdsh WCOLL to the up
nodes file.

--
Doug

>
> -- greg
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Mailscanner: Clean
>


-- 
Doug

-- 
Mailscanner: Clean




More information about the Beowulf mailing list