[Beowulf] Diskess cluster dictribution

Christopher Samuel samuel at unimelb.edu.au
Sun Jan 12 21:54:53 PST 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Patrick,

On 12/01/14 10:22, Patrick Kirby wrote:

> We have been having trouble with the cluster and would like to
> switch to a different cluster software. The problems seem to stem
> from the fact that Pelican is not tolerant of any interruption is
> communication with slaves. I would appreciate suggestions for a
> fault tolerant replacement for Pelican HPC that would be
> appropriate for this project.

Looking at Pelican HPC (which I'd not come across before) it appears
to just be a Linux distro that sets up a cluster for a single user to
be able to run MPI jobs (Open-MPI in this case, without a queuing system).

Now that begs the question - what do you mean by fault tolerance?

Do you mean your MPI application needs to be able to cope with nodes
disappearing?

There is no queuing system by default from the look of it so I'm
guessing that's what you're after, but it'd be good to get some
clarification.

All the best,
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlLTf60ACgkQO2KABBYQAh9MNwCfansb/nr8G5u2l6Nwd63xqORr
MBsAnRXZqjldZ6AsBnzQvf+U93T9Laj1
=tu6n
-----END PGP SIGNATURE-----



More information about the Beowulf mailing list