[Beowulf] Questions regarding interconnects

Vincent Diepeveen diep at xs4all.nl
Fri Mar 25 17:16:12 PST 2005

Hello Patrick,

>Do more research.

I hope you don't mind i grab the opportunity to ask a technical question to
you. Feel free to answer when you have time after Easter.

Suppose we have nodes A,B,.... Each node is a dual connected with myri.
Each node has a single card inside. 

Each node has 3 threads. One thread is just busy shipping large messages
using MPI. So eating a lot of bandwidth from the card. One such messages is
really a lot of megabytes.

Now let's suppose thread B.1 is busy receiving a really huge message of
several megabytes. That takes considerable time.

In meantime a small message of 4 bytes arrives for thread B.2 
which is latency crucial. 

3 Questions.
  Q1: does B.2 have to wait for B.1 to receive the entire message?
  Q2: in case B.2 doesn't need to wait for B.1 to receive the entire message,
      but can receive it in between, what is the switch time latency of 
      the myri hardware? So what is the worst case time it takes to 
      receive the message (i understand on average it is 50% better, 
      but i work with worst cases in my software; suppose that n processors 
      are ready with an explosion, wipes everything out, 
      i want to interrupt the stuff then).
  Q3: is there difference in currently offered myri cards here, 
      if so which one can and which one can't?

Many Thanks for answerring,

At 04:02 PM 3/25/2005 -0500, Patrick Geoffray wrote:
>Hi Vincent,

>> The overhead of the MPI implementation layer *receiving* bytes is just so
>> so huge. A cards theoretic one-way pingpong latency is just irrelevant to
>> that, because that one way pingpong programs at all cards is eating 100%
>> system time, effectively losing a full cpu.
>You are mistaken about the MPI receive overhead. You are also mistaken 
>in your belief than one-sided operations are the Silver bullets. RMA 
>operations may be more appropriate to an application design, but it 
>shares many constraints with message passing: you have to poll to know 
>when it's done, you have to tell the other side where to write 
>(equivalent to posting a recv). It has drawbacks like usually not 
>scaling in space (each sender should write to a different location).
>Patrick Geoffray
>Myricom, Inc.

More information about the Beowulf mailing list