[Beowulf] slow mpi init/finalize

Michael Di Domenico mdidomenico4 at gmail.com
Wed Oct 11 07:12:02 PDT 2017


i'm seeing issues on a mellanox fdr10 cluster where the mpi setup and
teardown takes longer then i expect it should on larger rank count
jobs.  i'm only trying to run ~1000 ranks and the startup time is over
a minute.  i tested this with both openmpi and intel mpi, both exhibit
close to the same behavior.

has anyone else seen this or might know how to fix it?  i expect ~1000
ranks to take sometime to setup, but it seems to be taking longer then
i think it should


More information about the Beowulf mailing list