[Beowulf] Puzzling Intel mpi behavior with slurm
Prentice Bisbal
pbisbal at pppl.gov
Fri Apr 6 12:37:28 PDT 2018
See the URL below for a good overview of how Slurm works:
https://slurm.schedmd.com/quickstart.html
The way I understand it, tasks are started by Slurmd. Ssh is not
involved at all.
SGE does the same thing with 'tight integration'. The tasks are started
on the compute nodes by sgeexecd, which spawns an sge sheperd task,
which then spawns the actual task.
To really complicate things, you should look at process management
interface (PMI). This is a middle layer between Slurm (or an other
scheduler) and the MPI tasks. It's a standardized abstraction layer to
make programming MPI implementations and schedulers easier. It also
increases startup time of the MPI jobs, which is not insignificant for
large jobs.
www.mcs.anl.gov/papers/P1760.pdf
Prentice
On 04/05/2018 11:10 AM, Faraz Hussain wrote:
> Here's something quite baffling. I have a cluster running slurm but
> have not setup passwordless ssh for a user yet. So when the user runs
> "mpirun -n 2 -hostfile hosts hostname", it will hang because of ssh
> issue. That is expected.
>
> Now the baffling thing is the mpirun command works inside a slurm
> script! How can it work if passwordless ssh has not been configured?
> Does slurm use some different authentication (munge?) to login to the
> hosts and execute the hostname command?
>
> Or does slurm have some fancy behind the scenes integration with Intel
> mpi ?
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180406/8bc88be3/attachment.html>
More information about the Beowulf
mailing list