[Beowulf] Does anyone here mix CISC and RISC within their clusters.

Christopher Samuel samuel at unimelb.edu.au
Wed Oct 26 23:12:16 PDT 2016


On 27/10/16 16:00, Darren Wise wrote:

> Along with seven dual socket, quadcore AMD x86-64 CISC nodes running
> ubuntu 16.4LTS, MPICH and OpenMPI are giving me some strange errors but
> as soon as I opt out the SUN box everything runs smoothly.

You're not just mixing architectures, you're mixing Debian derived
distros on x86 with RHEL derived distros on Sparc.  So you are likely
mixing quite different OpenMPI versions as well.

I'd suggest building the latest OpenMPI 1.10.x release from source on
both and trying that instead.

Ideally I'd suggest building Slurm on all architectures, then build
OpenMPI 1.10.x using the --with-slurm flag and then try launching that
with srun so Slurm can do the MPI wire up for you (so you don't have to
bother with SSH issues).  That might be over-engineering it though. :-)

-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci


More information about the Beowulf mailing list