[Beowulf] anyone using SALT on your clusters?

Jonathan Barber jonathan.barber at gmail.com
Mon Jul 1 06:01:13 PDT 2013

On 28 June 2013 20:29, Greg Lindahl <lindahl at pbm.com> wrote:

> On Fri, Jun 28, 2013 at 09:45:50AM +0100, Jonathan Barber wrote:
> > The problem with SSH based approaches is when you have failed nodes -
> > normally they cause the entire command to hang until the attempted
> > connection times out.
> Normally what people do is ping the node before trying ssh on it. And
> have reasonable timeouts around both the ssh connect and the command
> execution. There's no fundamental reason why this is any different
> from messaging or subscription-plus-messaging.

Pinging the host prior to connecting only determines that the IP stack is
working, not that the OS is capable of handling an ssh connection. Of
course, you could do a TCP SYN ping to determine that the sshd demon is up,
but this can still return a false positive result if the NFS mount the host
is based upon is hosed. At which point your ssh host liveness check is
going to start hanging which could cause your host liveness list to be
out-of-date and now we're back to where we started. This is not only a ssh
problem - I've had the same issue with func [1] which is SSL-based but also
a push architecture.

WRT to timeouts, the problem is determining whether a timeout means that
the host is blocking with no possibility of responding (e.g. the NFS mount
problem) or that the host is busy and had half completed the command before
it was terminated by the timeout.

For me, this results in the practical difference that the pub-sub model
means that the agent has the ability to subscribe to the messages and is
therefore alive - and that therefore the list of live hosts is always

Of course, if it works for you, that's fine by me! If it ain't broke don't
fix it, etc., etc.


p.s. If I sound bitter about NFS - especially on linux in the past, it's
because I am :)

[1] https://fedorahosted.org/func/

> -- greg
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

Jonathan Barber <jonathan.barber at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20130701/720508f1/attachment.html>

More information about the Beowulf mailing list