Beowulf vs Dual

Simen Thoresen simen-tt at online.no
Sat Apr 21 07:19:31 PDT 2001


>
>note that there are cluster interconnects that claim latencies 
>in the 2-4 us range.  a simple ping-pong with small UDP packets
>over cheap-o 100bT shows around 120 us latency.
>
Just to but in here,
SCI claims a hardware latency of 1.5us. We reproducibly show a MPI latency (with the barrier command) of 5us for same ring nodes, or 5.7us for non-local nodes ( which is worst case latency for a 2D cluster of any size );

Local nodes
[demo at idefix demo]$ mpimon /opt/scali/examples/bin/barrier -- if1-1 if1-2
Barrier size  2 iterations 8192 [2 procs - Resolution 1.29us]
2 nodes    5.05 us
2 nodes    4.98 us

Non-local nodes
[demo at idefix demo]$ mpimon /opt/scali/examples/bin/barrier -- if1-1 if3-2
Barrier size  2 iterations 8192 [2 procs - Resolution 1.01us]
2 nodes    5.38 us
2 nodes    5.27 us

...and SMP (software only) latency
[demo at idefix demo]$ mpimon /opt/scali/examples/bin/barrier -- if1-1 2
Barrier size  2 iterations 8192 [2 procs - Resolution 1.01us]
2 nodes    0.68 us
2 nodes    0.68 us

Scalis (our MPI developer) does not run programs in threads, but uses the same library for communication between same-system processes as for SCI-connected processes, so there might be some handwaving going on here.
The difference from the previous example (0.82us) might be mainly processor speed and not anything else, but compared to anything else, the SMP latency is still king.

Yours,
-Simen (Dolphin ICS systems administrator) 
--
Simen Thoresen, Beowulf-cleaner and random artist - close and personal.

Er det ikke rart?
The gnu RART-project on http://valinor.dolphinics.no:1080/~simentt/rart






More information about the Beowulf mailing list