[Beowulf] Re: Re: Home beowulf - NIC latencies

Mon Feb 21 00:02:44 PST 2005

Vincent Diepeveen wrote:
> A problem of MPI over DSM type forms of parallellism has been described
> very well by Chrilly Donninger with respect to his chessprogram Hydra which
> runs at a few nodes MPI :
> 
> For every write :
> 
> MPI_Isend(....)
> MPI_Test(&Reg,&flg,&Stat)
> while(!flg) {
>     Hydra_MsgPending();  // Important, read in messages and process them
> while waiting on complete. Otherwise the own Input-Buffer can overflow
>                                          // and we get a deadlock.
>     MPI_Test(&Reg,&flg,&Stat);
> }
> 
> The above is dead slow simply and delays the software.

You are effectively waiting for the send completion, and that can 
require synchronization with the receive side if the message size is 
large enough.

> In a DSM model like Quadrics you don't have all these delays.

You don't have these delays with message passing if you do it 
differently. You can post multiple sends and wait on all of them at the 
same time, or post a send and the compute the next step before waiting. 
RMA would remove the synchronization with the remote side, but you need 
to know where to Put the data over there.

> Can Myri memory on the card (4MB and 8MB in the $1500 version) get used to
> directly write to the RAM on a remote network card?

The memory on the NIC is not related to Remote Memory Access. The SRAM 
is used to host the firmware code and some data such as the routes, 
physical addresses, the name of the captain, whatever. More memory means 
that you can fit more routes (on a 7 hops topologies, you need to store 
8 bytes, 7 routing bytes and a length, for every routes, and you need 8 
different routes for each destination, per link, to have an effecive 
route dispersion scheme) or do something special (you can write your own 
firmware if you are crazy or you know what you are doing). 2MB is fine 
for most cases.

> If so which library can i download for that for myri cards?

GM supports RMA (PUT and GET) but do not expect the same latency as 
Quadrics. MX does not have been available with one-sided operations yet, 
but the latency is much better.

Patrick
-- 

Patrick Geoffray
Myricom, Inc.
http://www.myri.com