[Beowulf] MPICH "remote access" speed
Robert G. Brown
rgb at phy.duke.edu
Mon Nov 29 07:40:12 PST 2004
On Mon, 29 Nov 2004 steve_heaton at ozemail.com.au wrote:
> G'day all
> Another newbie question for which I beg your tolerence.
> I'm money poor but time rich. Given the limitations of my hardware (4x
> dual P3 nodes, GigaEther switched interconnect) I'd like to tweak the
> cluster's performance as much as possible :)
> I was wondering if anyone has put RSH v's SSH head-to-head under MPICH
> in terms of performance?
> I've read in a few places that SSH is a bit slower due to the need to
> exchange keys etc. On the other hand I hear that some applications can
> leave the connection open under SSH and that can make them faster than
> having to RSH for every exchange.
The primary point at which a remote shell is executed by any of the PVM
or MPI libraries is when tasks are being spawned, either a remote
task-spawning daemon or a user task. Once the task (or the virtual
compute cluster) is set up, communications are handled by the library
itself (strictly) and the efficiency of the remote shell is no longer
For four nodes there are virtually no sane patterns of usage where you
should worry or care about the overhead and timing of rsh vs ssh. I've
done timings of the two on fairly current hardware (results should be in
the list archives somewhere -- I recall < 1 sec to execute a task on top
of ssh, order of a few 0.1 sec to execute the same task on top of rsh
but won't swear to it any more) but no longer redo them because I
literally don't have a host where rsh is installed or enabled to test
with. The scripts I use(d) are here:
where I'd recommend the tarball form, if you want to play with or test
Practically speaking, the marginal overhead of a few tenths of a second
per remote shell'd task is totally ignorable (a couple or three seconds
at MOST) on a four node cluster running any sort of task that takes more
than a minute to complete. Or a forty node cluster, for that matter,
running a task that takes more than a few minutes to complete. When you
get to four hundred node clusters, the marginal overhead starts to add
up to a few minutes. With e.g. pvm, this overhead is associated with
building the cluster, not running tasks on it, and the few minutes is
paid only once for (possibly) running many tasks, and remains pretty
much negligible compared to hours to days of runtime over the lifetime
of the virtual machine.
For MPI I believe that whether or not it remains negligible depends on
the flavor of MPI used and whether or not they rely on rsh to actually
spawn the remote node tasks per instance or rather start up a virtual
cluster and spawn the tasks via a "permanent" daemon. I >>think<< LAM
is the latter and MPICH is the former, but I don't use MPI enough to be
certain. Somebody on list will likely address this.
In conclusion, at your current cluster size don't worry about it at all
on the basis of PERFORMANCE -- you probably won't be able to see any
performance difference at all, let alone any performance difference that
matters. That might change (or might not!) as you scale your cluster up
to order of 100+ nodes.
The issue(s) aside from raw performance and scaling to consider are just
how accessible your cluster is. If it is inside a solid firewall, rsh
is secure enough, maybe (at least compared to the already open door
associated with pvm or mpi itself). If the nodes can be "seen" (pinged)
by hosts outside your administrative control, I personally think ssh is
essential just to prevent or retard password snooping and several other
forms of attack that rsh openly invites, EVEN though pvm or mpi may
still represent potentially exploitable avenues into the nodes.
rsh-based attacks are in nearly any script-kiddie's rootkit repertoire;
pvm-based attacks may exist out there but they aren't so common (as far
as I know, anyway).
A final advantage to ssh that can pay you back a bit for the extra
overhead is its superior handling of the environment and ability to e.g.
forward ports and X11 connections. With ssh you can easily drill
private holes through nearly any firewall that permits port 22 access at
all, which can be a tremendous boon when you try to administer the nodes
from far away through the eye of a needle (enabling, e.g. rsync access
to controlled ports even though the firewall ordinarily blocks them).
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf